date:20190725

Re: [Qemu-devel] [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation

2019-07-25 Thread Liu, Yi L

Hi Eric,

> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: Wednesday, July 24, 2019 5:33 PM
> To: Liu, Yi L ; David Gibson 
> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free implementation
> 
> Hi Yi, David,
> 
> On 7/24/19 6:57 AM, Liu, Yi L wrote:
> >> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> >> Behalf Of David Gibson
> >> Sent: Tuesday, July 23, 2019 11:58 AM
> >> To: Liu, Yi L 
> >> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> >> implementation
> >>
> >> On Mon, Jul 22, 2019 at 07:02:51AM +, Liu, Yi L wrote:
>  From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
>  On Behalf Of David Gibson
>  Sent: Wednesday, July 17, 2019 11:07 AM
>  To: Liu, Yi L 
>  Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
>  implementation
> 
>  On Tue, Jul 16, 2019 at 10:25:55AM +, Liu, Yi L wrote:
> >> From: kvm-ow...@vger.kernel.org
> >> [mailto:kvm-ow...@vger.kernel.org] On
>  Behalf
> >> Of David Gibson
> >> Sent: Monday, July 15, 2019 10:55 AM
> >> To: Liu, Yi L 
> >> Subject: Re: [RFC v1 05/18] vfio/pci: add pasid alloc/free
> >> implementation
> >>
> >> On Fri, Jul 05, 2019 at 07:01:38PM +0800, Liu Yi L wrote:
> >>> This patch adds vfio implementation 
> >>> PCIPASIDOps.alloc_pasid/free_pasid().
> >>> These two functions are used to propagate guest pasid allocation
> >>> and free requests to host via vfio container ioctl.
> >>
> >> As I said in an earlier comment, I think doing this on the device
> >> is conceptually incorrect.  I think we need an explcit notion of
> >> an SVM context (i.e. the namespace in which all the PASIDs live)
> >> - which will IIUC usually be shared amongst multiple devices.
> >> The create and free PASID requests should be on that object.
> >
> > Actually, the allocation is not doing on this device. System wide,
> > it is done on a container. So not sure if it is the API interface
> > gives you a sense that this is done on device.
> 
>  Sorry, I should have been clearer.  I can see that at the VFIO
>  level it is done on the container.  However the function here takes
>  a bus and devfn, so this qemu internal interface is per-device,
>  which doesn't really make sense.
> >>>
> >>> Got it. The reason here is to pass the bus and devfn info, so that
> >>> VFIO can figure out a container for the operation. So far in QEMU,
> >>> there is no good way to connect the vIOMMU emulator and VFIO regards
> >>> to SVM.
> >>
> >> Right, and I think that's an indication that we're not modelling
> >> something in qemu that we should be.
> >>
> >>> hw/pci layer is a choice based on some previous discussion. But yes,
> >>> I agree with you that we may need to have an explicit notion for
> >>> SVM. Do you think it is good to introduce a new abstract layer for
> >>> SVM (may name as SVMContext).
> >>
> >> I think so, yes.
> >>
> >> If nothing else, I expect we'll need this concept if we ever want to
> >> be able to implement SVM for emulated devices (which could be useful
> >> for debugging, even if it's not something you'd do in production).
> >>
> >>> The idea would be that vIOMMU maintain the SVMContext instances and
> >>> expose explicit interface for VFIO to get it. Then VFIO register
> >>> notifiers on to the SVMContext. When vIOMMU emulator wants to do
> >>> PASID alloc/free, it fires the corresponding notifier. After call
> >>> into VFIO, the notifier function itself figure out the container it
> >>> is bound. In this way, it's the duty of vIOMMU emulator to figure
> >>> out a proper notifier to fire. From interface point of view, it is
> >>> no longer per-device.
> >>
> >> Exactly.
> >
> > Cool, let me prepare another version with the ideas. Thanks for your
> > review. :-)
> >
> > Regards,
> > Yi Liu
> >
> >>> Also, it leaves the PASID management details to vIOMMU emulator as
> >>> it can be vendor specific. Does it make sense?
> >>> Also, I'd like to know if you have any other idea on it. That would
> >>> surely be helpful. :-)
> >>>
> > Also, curious on the SVM context
> > concept, do you mean it a per-VM context or a per-SVM usage context?
> > May you elaborate a little more. :-)
> 
>  Sorry, I'm struggling to find a good term for this.  By "context" I
>  mean a namespace containing a bunch of PASID address spaces, those
>  PASIDs are then visible to some group of devices.
> >>>
> >>> I see. May be the SVMContext instance above can include multiple
> >>> PASID address spaces. And again, I think this relationship should be
> >>> maintained in vIOMMU emulator.
> >
> So if I understand we now head towards introducing new notifiers taking a
> "SVMContext" as argument instead of an IOMMUMemoryRegion.

yes, this is the rough idea.
 
> I think we need to be clear about how both abstractions (SVMContext

Re: [Qemu-devel] [PATCH] Fix Guest VM crash due to iSCSI Sense Key error

2019-07-25 Thread John Snow

On 7/7/19 10:55 PM, shaju.abra...@nutanix.com wrote:
> From: Shaju Abraham 
> 
> During the  IDE DMA transfer for a ISCSI target,when libiscsi encounters
> a SENSE KEY error, it sets the task->sense to  the value "COMMAND ABORTED".
> The function iscsi_translate_sense() later translaters this error to 
> -ECANCELED
> and this value is passed to the callback function. In the case of  IDE DMA 
> read
> or write, the callback function returns immediately if the value of the ret
> argument is -ECANCELED.
> Later when ide_cancel_dma_sync() function is invoked  the assertion
> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets 
> terminated.
> Fix the issue by making the value of s->bus->dma->aiocb = NULL when
> -ECANCELED is passed to the callback.
> 
> Signed-off-by: Shaju Abraham 
> ---
>  hw/ide/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 6afadf8..78ea357 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret)
>  bool stay_active = false;
>  
>  if (ret == -ECANCELED) {
> +s->bus->dma->aiocb = NULL;
>  return;
>  }
>  
> 

The part that makes me nervous here is that I can't remember why we do
NO cleanup whatsoever for the ECANCELED case.

commit 0d910cfeaf2076b116b4517166d5deb0fea76394
Author: Fam Zheng 
Date:   Thu Sep 11 13:41:07 2014 +0800

ide/ahci: Check for -ECANCELED in aio callbacks

... This looks like we never expected the aio callbacks to ever get
called with ECANCELED, so we treat this as a QEMU-internal signal.

It looks like we expect these callbacks to do NOTHING in this case; but
I'm not sure where the IDE state machine does its cleanup otherwise.
(The DMA might have been canceled, but the DMA and IDE state machines
still need to exit their loop.)

If you take a look at this patch from 2014 though, there are many other
spots where we have littered ECANCELED checks that might also cause
problems if we're receiving error codes we thought we couldn't get normally.

I am worried this patch papers over something worse.

--js

Re: [Qemu-devel] [QEMU-SECURITY] ide: fix assertion in ide_dma_cb() to prevent qemu DoS from quest

2019-07-25 Thread John Snow

On 7/5/19 10:07 AM, Alexander Popov wrote:
> This assertion was introduced in the commit a718978ed58a in July 2015.
> It implies that the size of successful DMA transfers handled in
> ide_dma_cb() should be multiple of 512 (the size of a sector).
> 
> But guest systems can initiate DMA transfers that don't fit this
> requirement. Let's improve the assertion to prevent qemu DoS from quests.
> 
> PoC for Linux that uses SCSI_IOCTL_SEND_COMMAND to perform such an ATA
> command and crash qemu:
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> #define CMD_SIZE 2048
> 
> struct scsi_ioctl_cmd_6 {
>   unsigned int inlen;
>   unsigned int outlen;
>   unsigned char cmd[6];
>   unsigned char data[];
> };
> 
> int main(void)
> {
>   intptr_t fd = 0;
>   struct scsi_ioctl_cmd_6 *cmd = NULL;
> 
>   cmd = malloc(CMD_SIZE);
>   if (!cmd) {
>   perror("[-] malloc");
>   return 1;
>   }
> 
>   memset(cmd, 0, CMD_SIZE);
>   cmd->inlen = 1337;
>   cmd->cmd[0] = READ_6;
> 
>   fd = open("/dev/sg0", O_RDONLY);
>   if (fd == -1) {
>   perror("[-] opening sg");
>   return 1;
>   }
> 
>   printf("[+] sg0 is opened\n");
> 
>   printf("[.] qemu should break here:\n");
>   fflush(stdout);
>   ioctl(fd, SCSI_IOCTL_SEND_COMMAND, cmd);
>   printf("[-] qemu didn't break\n");
> 
>   free(cmd);
> 
>   return 1;
> }
> 
> Signed-off-by: Alexander Popov 
> ---
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 6afadf8..304fe69 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -868,7 +868,7 @@ static void ide_dma_cb(void *opaque, int ret)
>  
>  sector_num = ide_get_sector(s);
>  if (n > 0) {
> -assert(n * 512 == s->sg.size);
> +assert(n == s->sg.size / 512);
>  dma_buf_commit(s, s->sg.size);
>  sector_num += n;
>  ide_set_sector(s, sector_num);
> 

Oh, this is fun.

So you're actually requesting 131072 bytes (256 sectors) but you're
giving it far too short of a PRDT.

But ... the prepare_buf callback got anything at all, so it was happy to
proceed, but the callback chokes over the idea that the SGlist wasn't
formatted correctly -- it can't deal with partial tails.

I think it might be the case that the sglist needs to be allowed to have
an unaligned tail, and then the second trip to the dma_cb when there
isn't any more regions in the PRDT to add to the SGList to transfer at
least a single sector -- but the IDE state machine still has sectors to
transfer -- we need to trigger the short PRD clause.

Papering over it by truncating the tail I think isn't sufficient; there
are other problems this exposes.

As an emergency patch, it might be better to just do this whenever we
see partial tails:

prepared = ...prepare_buf(s->bus->dma, s->io_buffer_size);
if (prepared % 512) {
ide_dma_error(s);
return;
}

I think that prepare_buf does not give unaligned results if you provided
*too many* bytes, so the unaligned return only happens when you starve it.

I can worry about a proper fix for 4.2+.

--js

Re: [Qemu-devel] [PATCH] migration: notify runstate immediately before vcpu stops

2019-07-25 Thread Yan Zhao

On Thu, Jul 25, 2019 at 06:39:07PM +0800, Dr. David Alan Gilbert wrote:
> * Yan Zhao (yan.y.z...@intel.com) wrote:
> > for some devices to do live migration, it is needed to do something
> > immediately before vcpu stops. add a notification here.
> > 
> > Signed-off-by: Yan Zhao 
> > ---
> >  cpus.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/cpus.c b/cpus.c
> > index b09b702..d5d4abe 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -1068,6 +1068,7 @@ static int do_vm_stop(RunState state, bool send_stop)
> >  int ret = 0;
> >  
> >  if (runstate_is_running()) {
> > +vm_state_notify(1, state);
> 
> SO that's quite interesting in that you'll end up getting a
> notificatiion like 'running=true, state=RUN_STATE_SHUTDOWN'
> that might be unexpected by existing callers.
> 
> Have you checked existing callers?  Also does this cause another event
> to be sent on the QMP - if so we need to chekc if this would confuse
> libvirt.
> 
> Dave

hi Dave
yes, this may cause problem for existing handlers as this is an
unexpected condition. like for ide's ide_restart_cb.
So, do you think it's a better that do the notification earlier, before
vm_stop_force_state() in migration.c and call 
notifier_list_notify(_state_notifiers, s)
to notify migration state instead ?

Thanks
Yan



> 
> >  cpu_disable_ticks();
> >  pause_all_vcpus();
> >  runstate_set(state);
> > -- 
> > 2.7.4
> > 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [Qemu-riscv] [PATCH-4.2 v1 2/6] target/riscv: Remove strict perm checking for CSR R/W

2019-07-25 Thread Jonathan Behrens

Unless I'm missing something, this is the only place that QEMU checks the
privilege level for read and writes to CSRs. The exact computation used
here won't work with the hypervisor extension, but we also can't just get
rid of privilege checking entirely...

Jonathan

On Thu, Jul 25, 2019 at 2:56 PM Alistair Francis 
wrote:

> The privledge check based on the CSR address mask 0x300 doesn't work
> when using Hypervisor extensions so remove the check
>
> Signed-off-by: Alistair Francis 
> ---
>  target/riscv/csr.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index e0d4586760..af3b762c8b 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -797,9 +797,8 @@ int riscv_csrrw(CPURISCVState *env, int csrno,
> target_ulong *ret_value,
>
>  /* check privileges and return -1 if check fails */
>  #if !defined(CONFIG_USER_ONLY)
> -int csr_priv = get_field(csrno, 0x300);
>  int read_only = get_field(csrno, 0xC00) == 3;
> -if ((write_mask && read_only) || (env->priv < csr_priv)) {
> +if (write_mask && read_only) {
>  return -1;
>  }
>  #endif
> --
> 2.22.0
>
>
>

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin

On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> > Actually, I think I have a better idea.
> > At the moment we just get an exit on these reads and return all-ones.
> > Yes, in theory there could be a UR bit set in a bunch of
> > registers but in practice no one cares about these,
> > and I don't think we implement them.
> > So how about mapping a single page, read-only, and filling it
> > with all-ones?
> 
> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> 
> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> 
> Paolo

I think I know what you are thinking of doing:
map /dev/zero so we get a single VMA but all mapped to
a single zero pte?

We could start with that, at least as an experiment.
Further:

- we can limit the amount of fragmentation and simply
  unmap everything if we exceed a specific limit:
  with more than X devices it's no longer a lightweight
  VM anyway :)

- we can implement /dev/ones. in fact, we can implement
  /dev/byteXX for each possible value, the cost will
  be only 1M on a 4k page system.
  it might come in handy for e.g. free page hinting:
  at the moment if guest memory is poisoned
  we can not unmap it, with this trick we can
  map it to /dev/byteXX.

Note that the kvm memory array is still fragmented.
Again, we can fallback on disabling the optimization
if there are too many devices.


-- 
MST

Re: [Qemu-devel] [PATCH v4] qapi: Add InetSocketAddress member keep-alive

2019-07-25 Thread Markus Armbruster

Eric Blake  writes:

> On 7/25/19 10:26 AM, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>> 
>>> It's needed to provide keepalive for nbd client to track server
>>> availability.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> 
>> Reviewed-by: Markus Armbruster 
>
> It looks like this could go in any number of ways (Markus since it
> touches qapi, Dan since it touches sockets, or me since it was written
> for use with NBD); but unless anyone complains, I'm happy to queue this
> through my NBD tree for 4.2.

Go right ahead.

Re: [Qemu-devel] [Qemu-block] [QEMU] [PATCH v5 0/8] Add Qemu to SeaBIOS LCHS interface

2019-07-25 Thread Laszlo Ersek

On 07/25/19 02:50, John Snow wrote:
> 
> 
> On 7/24/19 8:47 PM, John Snow wrote:
>>
>>
>> On 7/19/19 6:10 AM, Sam Eiderman wrote:
>>> Well, this patch introduces 3 command line parameters (“lcyls”, “lheads”, 
>>> “lsecs”)
>>> to “scsi-hd” “ide-hd” and “virtio-pci-blk” so this somehow has something to 
>>> do with
>>> block.
>>>
>>> This patch also adds fw_cfg interface to send these parameters to SeaBIOS.
>>>
>>> "scripts/get_maintainer.pl -f hw/nvram/fw_cfg.c” gives
>>>
>>> "Philippe Mathieu-Daudé"  (supporter:Firmware 
>>> configur...)
>>> Laszlo Ersek  (reviewer:Firmware configur...)
>>> Gerd Hoffmann  (reviewer:Firmware configur…)
>>>
>>> And this was already Reviewed-by Gerd.
>>>
>>> How should I proceed?
>>>
>>> Sam
>>>
>>
>> I feel like it would be up to Gerd as the general SeaBIOS point of contact?
>>
> 
> ...ah, who is offline for vacation.
> 
> We're in freeze right now anyway, so I would think that Gerd and/or
> Kevin can work out who ought to stage this for a PR when the tree opens
> again.
> 

I think the sole patch in the series that modifies "hw/nvram/fw_cfg.c" is

 [Qemu-devel] [QEMU] [PATCH v5 7/8] bootdevice: FW_CFG interface for LCHS values
  http://mid.mail-archive.com/20190626123948.10199-8-shmuel.eiderman@oracle.com

and neither Phil nor myself seem to be CC'd on it (I've found the message in my 
list folder only).

Regarding fw_cfg, I only review Phil's fw_cfg patches (so that whenever he 
posts patches, he can count on my review); other than that, I generally skip 
fw_cfg patches. And, I totally don't have a tree for collecting such patches.

Now, while Phil does:

  T: git https://github.com/philmd/qemu.git fw_cfg-next

I still don't think that tree would be the best for queueing this series, given 
the diffstat:

 bootdevice.c | 148 +---
 hw/block/virtio-blk.c|   6 +
 hw/ide/qdev.c|   7 +-
 hw/nvram/fw_cfg.c|  14 +-
 hw/scsi/scsi-bus.c   |  15 ++
 hw/scsi/scsi-disk.c  |  14 ++
 include/hw/block/block.h |  22 +-
 include/hw/scsi/scsi.h   |   1 +
 include/sysemu/sysemu.h  |   4 +
 tests/Makefile.include   |   2 +-
 tests/hd-geo-test.c  | 582 +++
 11 files changed, 774 insertions(+), 41 deletions(-)

Just my two cents.

Thanks
Laszlo

[Qemu-devel] [PATCH-4.2 v1 4/6] target/riscv: Create function to test if FP is enabled

2019-07-25 Thread Alistair Francis

Let's creaate a function that tests if floating point support is
enabled. We can then protect all floating point operations based on if
they are enabled.

This patch so far doesn't change anything, it's just preparing for the
Hypervisor support for floating point operations.

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h|  6 +-
 target/riscv/cpu_helper.c | 10 ++
 target/riscv/csr.c| 19 ++-
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0adb307f32..2dc9b17678 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -255,6 +255,7 @@ void riscv_cpu_do_interrupt(CPUState *cpu);
 int riscv_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
 int riscv_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 bool riscv_cpu_exec_interrupt(CPUState *cs, int interrupt_request);
+bool riscv_cpu_fp_enabled(CPURISCVState *env);
 int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch);
 hwaddr riscv_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 void  riscv_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
@@ -298,7 +299,10 @@ static inline void cpu_get_tb_cpu_state(CPURISCVState 
*env, target_ulong *pc,
 #ifdef CONFIG_USER_ONLY
 *flags = TB_FLAGS_MSTATUS_FS;
 #else
-*flags = cpu_mmu_index(env, 0) | (env->mstatus & MSTATUS_FS);
+*flags = cpu_mmu_index(env, 0);
+if (riscv_cpu_fp_enabled(env)) {
+*flags |= env->mstatus & MSTATUS_FS;
+}
 #endif
 }
 
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index f027be7f16..225e407cff 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -71,6 +71,16 @@ bool riscv_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 
 #if !defined(CONFIG_USER_ONLY)
 
+/* Return true is floating point support is currently enabled */
+bool riscv_cpu_fp_enabled(CPURISCVState *env)
+{
+if (env->mstatus & MSTATUS_FS) {
+return true;
+}
+
+return false;
+}
+
 int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint32_t interrupts)
 {
 CPURISCVState *env = >env;
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index af3b762c8b..7b73b73cf7 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,7 +46,7 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 static int fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
 #endif
@@ -108,7 +108,7 @@ static int pmp(CPURISCVState *env, int csrno)
 static int read_fflags(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
 #endif
@@ -119,7 +119,7 @@ static int read_fflags(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_fflags(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
 env->mstatus |= MSTATUS_FS;
@@ -131,7 +131,7 @@ static int write_fflags(CPURISCVState *env, int csrno, 
target_ulong val)
 static int read_frm(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
 #endif
@@ -142,7 +142,7 @@ static int read_frm(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_frm(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
 env->mstatus |= MSTATUS_FS;
@@ -154,7 +154,7 @@ static int write_frm(CPURISCVState *env, int csrno, 
target_ulong val)
 static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
 #endif
@@ -166,7 +166,7 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
 return -1;
 }
 env->mstatus |= MSTATUS_FS;
@@ -307,6 +307,7 @@ static int write_mstatus(CPURISCVState *env, int csrno, 
target_ulong val)
 {
 target_ulong mstatus = env->mstatus;
 target_ulong mask = 0;
+int dirty;
 
 /* flush tlb on mstatus fields that affect VM */
 if (env->priv_ver

[Qemu-devel] [PATCH-4.2 v1 2/6] target/riscv: Remove strict perm checking for CSR R/W

2019-07-25 Thread Alistair Francis

The privledge check based on the CSR address mask 0x300 doesn't work
when using Hypervisor extensions so remove the check

Signed-off-by: Alistair Francis 
---
 target/riscv/csr.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index e0d4586760..af3b762c8b 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -797,9 +797,8 @@ int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong 
*ret_value,
 
 /* check privileges and return -1 if check fails */
 #if !defined(CONFIG_USER_ONLY)
-int csr_priv = get_field(csrno, 0x300);
 int read_only = get_field(csrno, 0xC00) == 3;
-if ((write_mask && read_only) || (env->priv < csr_priv)) {
+if (write_mask && read_only) {
 return -1;
 }
 #endif
-- 
2.22.0

[Qemu-devel] [PATCH-4.2 v1 6/6] target/riscv: Fix Floating Point register names

2019-07-25 Thread Alistair Francis

From: Atish Patra 

As per the RISC-V spec, Floating Point registers are named as f0..f31
so lets fix the register names accordingly.

Signed-off-by: Atish Patra 
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f8d07bd20a..af1e9b7690 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -40,10 +40,10 @@ const char * const riscv_int_regnames[] = {
 };
 
 const char * const riscv_fpr_regnames[] = {
-  "ft0", "ft1", "ft2",  "ft3",  "ft4", "ft5", "ft6",  "ft7",
-  "fs0", "fs1", "fa0",  "fa1",  "fa2", "fa3", "fa4",  "fa5",
-  "fa6", "fa7", "fs2",  "fs3",  "fs4", "fs5", "fs6",  "fs7",
-  "fs8", "fs9", "fs10", "fs11", "ft8", "ft9", "ft10", "ft11"
+  "f0", "f1", "f2",  "f3",  "f4", "f5", "f6", "f7",
+  "f8", "f9", "f10",  "f11",  "f12", "f13", "f14", "f15",
+  "f16", "f17", "f18",  "f19",  "f20", "f21", "f22", "f23",
+  "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31"
 };
 
 const char * const riscv_excp_names[] = {
-- 
2.22.0

[Qemu-devel] [PATCH-4.2 v1 0/6] RISC-V: Hypervisor prep work part 2

2019-07-25 Thread Alistair Francis

The first four patches are ones that I have pulled out of my original
Hypervisor series at an attempt to reduce the number of patches in the
series.

These four patches all make sense without the Hypervisor series so can
be merged seperatley and will reduce the review burden of the next
version of the patches.

The fifth patch is a prep patch for the new v0.4 Hypervisor spec.

The final patch is unreleated to Hypervisor that I'm just slipping in
here because it seems easier then sending it by itself.

Alistair Francis (5):
  target/riscv: Don't set write permissions on dirty PTEs
  target/riscv: Remove strict perm checking for CSR R/W
  riscv: plic: Remove unused interrupt functions
  target/riscv: Create function to test if FP is enabled
  target/riscv: Update the Hypervisor CSRs to v0.4

Atish Patra (1):
  target/riscv: Fix Floating Point register names

 hw/riscv/sifive_plic.c | 12 
 include/hw/riscv/sifive_plic.h |  3 ---
 target/riscv/cpu.c |  8 
 target/riscv/cpu.h |  6 +-
 target/riscv/cpu_bits.h| 35 +-
 target/riscv/cpu_helper.c  | 16 
 target/riscv/csr.c | 22 ++---
 7 files changed, 50 insertions(+), 52 deletions(-)

-- 
2.22.0

[Qemu-devel] [PATCH-4.2 v1 1/6] target/riscv: Don't set write permissions on dirty PTEs

2019-07-25 Thread Alistair Francis

Setting write permission on dirty PTEs results in userspace inside a
Hypervisor guest (VU) becoming corrupted. This appears to be becuase it
ends up with write permission in the second stage translation in cases
where we aren't doing a store.

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index e32b6126af..f027be7f16 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -340,10 +340,8 @@ restart:
 if ((pte & PTE_X)) {
 *prot |= PAGE_EXEC;
 }
-/* add write permission on stores or if the page is already dirty,
-   so that we TLB miss on later writes to update the dirty bit */
-if ((pte & PTE_W) &&
-(access_type == MMU_DATA_STORE || (pte & PTE_D))) {
+/* add write permission on stores */
+if ((pte & PTE_W) && (access_type == MMU_DATA_STORE)) {
 *prot |= PAGE_WRITE;
 }
 return TRANSLATE_SUCCESS;
-- 
2.22.0

[Qemu-devel] [PATCH-4.2 v1 3/6] riscv: plic: Remove unused interrupt functions

2019-07-25 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_plic.c | 12 
 include/hw/riscv/sifive_plic.h |  3 ---
 2 files changed, 15 deletions(-)

diff --git a/hw/riscv/sifive_plic.c b/hw/riscv/sifive_plic.c
index 0950e89e15..864a1bed42 100644
--- a/hw/riscv/sifive_plic.c
+++ b/hw/riscv/sifive_plic.c
@@ -161,18 +161,6 @@ static void sifive_plic_update(SiFivePLICState *plic)
 }
 }
 
-void sifive_plic_raise_irq(SiFivePLICState *plic, uint32_t irq)
-{
-sifive_plic_set_pending(plic, irq, true);
-sifive_plic_update(plic);
-}
-
-void sifive_plic_lower_irq(SiFivePLICState *plic, uint32_t irq)
-{
-sifive_plic_set_pending(plic, irq, false);
-sifive_plic_update(plic);
-}
-
 static uint32_t sifive_plic_claim(SiFivePLICState *plic, uint32_t addrid)
 {
 int i, j;
diff --git a/include/hw/riscv/sifive_plic.h b/include/hw/riscv/sifive_plic.h
index ce8907f6aa..3b8a623919 100644
--- a/include/hw/riscv/sifive_plic.h
+++ b/include/hw/riscv/sifive_plic.h
@@ -69,9 +69,6 @@ typedef struct SiFivePLICState {
 uint32_t aperture_size;
 } SiFivePLICState;
 
-void sifive_plic_raise_irq(SiFivePLICState *plic, uint32_t irq);
-void sifive_plic_lower_irq(SiFivePLICState *plic, uint32_t irq);
-
 DeviceState *sifive_plic_create(hwaddr addr, char *hart_config,
 uint32_t num_sources, uint32_t num_priorities,
 uint32_t priority_base, uint32_t pending_base,
-- 
2.22.0

[Qemu-devel] [PATCH-4.2 v1 5/6] target/riscv: Update the Hypervisor CSRs to v0.4

2019-07-25 Thread Alistair Francis

Update the Hypervisor CSR addresses to match the v0.4 spec.

Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_bits.h | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 11f971ad5d..97b96c4e19 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -173,6 +173,24 @@
 #define CSR_SPTBR   0x180
 #define CSR_SATP0x180
 
+/* Hpervisor CSRs */
+#define CSR_HSTATUS 0x600
+#define CSR_HEDELEG 0x602
+#define CSR_HIDELEG 0x603
+#define CSR_HCOUNTERNEN 0x606
+#define CSR_HGATP   0x680
+
+#if defined(TARGET_RISCV32)
+#define HGATP_MODE   SATP32_MODE
+#define HGATP_ASID   SATP32_ASID
+#define HGATP_PPNSATP32_PPN
+#endif
+#if defined(TARGET_RISCV64)
+#define HGATP_MODE   SATP64_MODE
+#define HGATP_ASID   SATP64_ASID
+#define HGATP_PPNSATP64_PPN
+#endif
+
 /* Physical Memory Protection */
 #define CSR_PMPCFG0 0x3a0
 #define CSR_PMPCFG1 0x3a1
@@ -206,23 +224,6 @@
 #define CSR_DPC 0x7b1
 #define CSR_DSCRATCH0x7b2
 
-/* Hpervisor CSRs */
-#define CSR_HSTATUS 0xa00
-#define CSR_HEDELEG 0xa02
-#define CSR_HIDELEG 0xa03
-#define CSR_HGATP   0xa80
-
-#if defined(TARGET_RISCV32)
-#define HGATP_MODE   SATP32_MODE
-#define HGATP_ASID   SATP32_ASID
-#define HGATP_PPNSATP32_PPN
-#endif
-#if defined(TARGET_RISCV64)
-#define HGATP_MODE   SATP64_MODE
-#define HGATP_ASID   SATP64_ASID
-#define HGATP_PPNSATP64_PPN
-#endif
-
 /* Performance Counters */
 #define CSR_MHPMCOUNTER30xb03
 #define CSR_MHPMCOUNTER40xb04
-- 
2.22.0

Re: [Qemu-devel] [PATCH for-4.1? 1/2] stellaris_input: Fix vmstate description of buttons field

2019-07-25 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Thu, 25 Jul 2019 at 18:02, Dr. David Alan Gilbert
>  wrote:
> >
> > * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > > gamepad_state::buttons is a pointer to an array of structs,
> > > not an array of structs, so should be declared in the vmstate
> > > with VMSTATE_STRUCT_VARRAY_POINTER_INT32; otherwise we
> > > corrupt memory on incoming migration.
> > >
> > > We bump the vmstate version field as the easiest way to
> > > deal with the migration break, since migration wouldn't have
> > > worked reliably before anyway.
> > >
> > > Signed-off-by: Peter Maydell 
> >
> > OK, it would be great to change num_buttons to uint32_t and make that a
> > UINT32 at some point;  it's hard to press negative buttons.
> 
> Is there much benefit though?

Not much and it's not urgent

> As an aside, I'm surprised also the macro doesn't complain
> that we said the num_buttons field is int32 but it's really "int"...
> arguably a different kind of missing type check.

I was just concerned what would happen if your migration stream
had a negative value in.

Dave

> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH for-4.2 16/24] target/arm: Add regime_has_2_ranges

2019-07-25 Thread Richard Henderson

On 7/25/19 8:59 AM, Alex Bennée wrote:
> Can you elucidate what we mean by having 2 ranges?

Both positive and negative virtual addresses, with the hole in the middle,
controlled by two translation tables.

In the current (E.a) manual, see D.5.2.1 page D5-2516, in the (sub-)section
titled "About address translation and supported input address ranges".  It is
summarized in table D5-1.


r~

Re: [Qemu-devel] [PATCH for-4.2 15/24] target/arm: Reorganize ARMMMUIdx

2019-07-25 Thread Richard Henderson

On 7/25/19 8:57 AM, Alex Bennée wrote:
> 
> Richard Henderson  writes:
> 
>> Prepare for, but do not yet implement, the EL2&0 regime and the
>> Secure EL2 regime.  Rename all of the a-profile symbols to make
>> the distictions clearer.
> 
> Perhaps a summary of the renames would be useful here? My head is
> spinning a little given the number that we have and not being familiar
> with the naming convention.
> 
>   ARMMMUIdx[_StageN]_[M][S]Enn[_nn]
> 
>   _StageN - stage N only, otherwise assumed 1+2 lookup?

How about "full", since most of the indicies only have a single stage lookup.

>   M - M profile (interleaved with A profile)
>   S - Secure
>   Enn - visible exception level, so E02 is shared EL0 and EL2?
>   _nn - not sure?
> 
> The cpu.h comment is very detailed but doesn't actually give me enough
> information to decode an ARMMMUIdx when I come across it in the code.

I have wondered if it was worth going back and splitting this patch so that we
do exactly one rename at at time.  It would mean 9 of these patches...


r~

Re: [Qemu-devel] [PATCH for-4.1? 2/2] vmstate.h: Type check VMSTATE_STRUCT_VARRAY macros

2019-07-25 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On Thu, 25 Jul 2019 at 18:27, Dr. David Alan Gilbert
>  wrote:
> >
> > * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > > The VMSTATE_STRUCT_VARRAY_UINT32 macro is intended to handle
> > > migrating a field which is an array of structs, but where instead of
> > > migrating the entire array we only migrate a variable number of
> > > elements of it.
> > >
> > > The VMSTATE_STRUCT_VARRAY_POINTER_UINT32 macro is intended to handle
> > > migrating a field which is of pointer type, and points to a
> > > dynamically allocated array of structs of variable size.
> > >
> > > We weren't actually checking that the field passed to
> > > VMSTATE_STRUCT_VARRAY_UINT32 really is an array, with the result that
> > > accidentally using it where the _POINTER_ macro was intended would
> > > compile but silently corrupt memory on migration.
> > >
> > > Add type-checking that enforces that the field passed in is
> > > really of the right array type. This applies to all the VMSTATE
> > > macros which use flags including VMS_VARRAY_* but not VMS_POINTER.
> > >
> > > Signed-off-by: Peter Maydell 
> >
> > > ---
> > >  include/migration/vmstate.h | 27 +--
> > >  1 file changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > > index ca68584eba4..2df333c3612 100644
> > > --- a/include/migration/vmstate.h
> > > +++ b/include/migration/vmstate.h
> > > @@ -227,8 +227,19 @@ extern const VMStateInfo vmstate_info_bitmap;
> > >  extern const VMStateInfo vmstate_info_qtailq;
> > >
> > >  #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
> > > +/* Check that t2 is an array of t1 of size n */
> > >  #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
> >
> > I'd have to admit I don't understand why that does what you say;
> > I'd expected something to index a t2 pointer with [n].
> 
> Note that this is just a comment describing what the existing
> macro does, as a way to distinguish its job from that of the
> new macro I'm adding.
> 
> What happens here is that t2 is a type like "foo [32]", ie
> it is an array type already. t1 is the base 'foo' type; so the macro
> is checking that t1[n] matches t2, where n is passed in to us
> and must match the declared array size of the field (32 in
> my example). (In C the size of the array is carried around as
> part of its type, and must match on both sides of the expression;
> so if you pass in the name of an array field that's the wrong size the
> type check will fail, which is what we want.)

Ah, OK that makes sense; what it really needs is that example to make
me realise that t2 was already the array.

Dave

> > However, for the rest of it, from migration I'm happy:
> >
> > Reviewed-by: Dr. David Alan Gilbert 
> >
> > given it's just fixing an ARM bug, and given it'll blow up straight away
> > I think it's OK for 4.1; the only risk is if we find a compiler we don't
> > like.
> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH for-4.1? 1/2] stellaris_input: Fix vmstate description of buttons field

2019-07-25 Thread Peter Maydell

On Thu, 25 Jul 2019 at 18:02, Dr. David Alan Gilbert
 wrote:
>
> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > gamepad_state::buttons is a pointer to an array of structs,
> > not an array of structs, so should be declared in the vmstate
> > with VMSTATE_STRUCT_VARRAY_POINTER_INT32; otherwise we
> > corrupt memory on incoming migration.
> >
> > We bump the vmstate version field as the easiest way to
> > deal with the migration break, since migration wouldn't have
> > worked reliably before anyway.
> >
> > Signed-off-by: Peter Maydell 
>
> OK, it would be great to change num_buttons to uint32_t and make that a
> UINT32 at some point;  it's hard to press negative buttons.

Is there much benefit though?

As an aside, I'm surprised also the macro doesn't complain
that we said the num_buttons field is int32 but it's really "int"...
arguably a different kind of missing type check.

thanks
-- PMM

Re: [Qemu-devel] [Qemu-block] [PATCH v3 0/4] virtio/block: handle zoned backing devices

2019-07-25 Thread John Snow




On 7/23/19 6:19 PM, Dmitry Fomichev wrote:
> Currently, attaching zoned block devices (i.e., storage devices
> compliant to ZAC/ZBC standards) using several virtio methods doesn't
> work properly as zoned devices appear as regular block devices at the
> guest. This may cause unexpected i/o errors and, potentially, some
> data corruption.
> 

Hi, I'm quite uninitiated here, what's a zoned block device? What are
the ZAC/ZBC standards?

I've found this:
https://www.snia.org/sites/default/files/SDC/2016/presentations/smr/DamienLeMoal_ZBC-ZAC_Linux.pdf

It looks like ZAC/ZBC are new commands -- what happens if we just don't
use them, exactly?

> To be more precise, attaching a zoned device via virtio-pci-blk,
> virtio-scsi-pci/scsi-disk or virtio-scsi-pci/scsi-hd demonstrates the
> above behavior. The virtio-scsi-pci/scsi-block method works with a
> recent patch. The virtio-scsi-pci/scsi-generic method also appears to
> handle zoned devices without problems.
> 

What exactly fails, out of curiosity?

Naively, it seems strange to me that you'd have something that presents
itself as a block device but can't be used like one. Usually I expect to
see new features / types of devices used inefficiently when we aren't
aware of a special attribute/property they have, but not create data
corruption.

The only reason I ask is because it seems odd that you need to add a
special flag to e.g. legacy IDE devices that explicitly says they don't
support zoned block devices -- instead of adding flags to virtio devices
that say they explicitly do support that feature set.

--js

> This patch set adds code to check if the backing device that is being
> opened is a zoned Host Managed device. If this is the case, the patch
> prohibits attaching such device for all use cases lacking proper
> zoned support.
> 
> Host Aware zoned block devices are designed to work as regular block
> devices at a guest system that does not support ZBD. Therefore, this
> patch set doesn't prohibit attachment of Host Aware devices.
> 
> Considering that there is still a couple of different working ways
> to attach a ZBD, this patch set provides a reasonable short-term
> solution for this problem. What about long term?
> 
> It appears to be beneficial to add proper ZBD support to virtio-blk.
> In order to support this use case properly, some virtio-blk protocol
> changes will be necessary. They are needed to allow the host code to
> propagate some ZBD properties that are required for virtio guest
> driver to configure the guest block device as ZBD, such as zoned
> device model, zone size and the total number of zones. Further, some
> support needs to be added for REPORT ZONES command as well as for zone
> operations, such as OPEN ZONE, CLOSE ZONE, FINISH ZONE and RESET ZONE.
> 
> These additions to the protocol are relatively straightforward, but
> they need to be approved by the virtio TC and the whole process may
> take some time.
> 
> ZBD support for virtio-scsi-pci/scsi-disk and virtio-scsi-pci/scsi-hd
> does not seem as necessary. Users will be expected to attach zoned
> block devices via virtio-scsi-pci/scsi-block instead.
> 
> This patch set contains some Linux-specific code. This code is
> necessary to obtain Zoned Block Device model value from Linux sysfs.
> 
> History:
> 
> v1 -> v2:
> - rework the code to be permission-based
> - always allow Host Aware devices to be attached
> - add fix for Host Aware attachments aka RCAP output snoop
> 
> v2 -> v3:
> - drop the patch for RCAP output snoop - merged separately
> 
> 
> Dmitry Fomichev (4):
>   block: Add zoned device model property
>   raw: Recognize zoned backing devices
>   block/ide/scsi: Set BLK_PERM_SUPPORT_ZONED
>   raw: Don't open ZBDs if backend can't handle them
> 
>  block.c   | 19 +
>  block/file-posix.c| 88 +--
>  block/raw-format.c|  8 
>  hw/block/block.c  |  8 +++-
>  hw/block/fdc.c|  4 +-
>  hw/block/nvme.c   |  2 +-
>  hw/block/virtio-blk.c |  2 +-
>  hw/block/xen-block.c  |  2 +-
>  hw/ide/qdev.c |  2 +-
>  hw/scsi/scsi-disk.c   | 13 +++---
>  hw/scsi/scsi-generic.c|  2 +-
>  hw/usb/dev-storage.c  |  2 +-
>  include/block/block.h | 21 +-
>  include/block/block_int.h |  4 ++
>  include/hw/block/block.h  |  3 +-
>  15 files changed, 150 insertions(+), 30 deletions(-)
>

Re: [Qemu-devel] [PATCH for-4.1? 2/2] vmstate.h: Type check VMSTATE_STRUCT_VARRAY macros

2019-07-25 Thread Peter Maydell

On Thu, 25 Jul 2019 at 18:27, Dr. David Alan Gilbert
 wrote:
>
> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > The VMSTATE_STRUCT_VARRAY_UINT32 macro is intended to handle
> > migrating a field which is an array of structs, but where instead of
> > migrating the entire array we only migrate a variable number of
> > elements of it.
> >
> > The VMSTATE_STRUCT_VARRAY_POINTER_UINT32 macro is intended to handle
> > migrating a field which is of pointer type, and points to a
> > dynamically allocated array of structs of variable size.
> >
> > We weren't actually checking that the field passed to
> > VMSTATE_STRUCT_VARRAY_UINT32 really is an array, with the result that
> > accidentally using it where the _POINTER_ macro was intended would
> > compile but silently corrupt memory on migration.
> >
> > Add type-checking that enforces that the field passed in is
> > really of the right array type. This applies to all the VMSTATE
> > macros which use flags including VMS_VARRAY_* but not VMS_POINTER.
> >
> > Signed-off-by: Peter Maydell 
>
> > ---
> >  include/migration/vmstate.h | 27 +--
> >  1 file changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > index ca68584eba4..2df333c3612 100644
> > --- a/include/migration/vmstate.h
> > +++ b/include/migration/vmstate.h
> > @@ -227,8 +227,19 @@ extern const VMStateInfo vmstate_info_bitmap;
> >  extern const VMStateInfo vmstate_info_qtailq;
> >
> >  #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
> > +/* Check that t2 is an array of t1 of size n */
> >  #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
>
> I'd have to admit I don't understand why that does what you say;
> I'd expected something to index a t2 pointer with [n].

Note that this is just a comment describing what the existing
macro does, as a way to distinguish its job from that of the
new macro I'm adding.

What happens here is that t2 is a type like "foo [32]", ie
it is an array type already. t1 is the base 'foo' type; so the macro
is checking that t1[n] matches t2, where n is passed in to us
and must match the declared array size of the field (32 in
my example). (In C the size of the array is carried around as
part of its type, and must match on both sides of the expression;
so if you pass in the name of an array field that's the wrong size the
type check will fail, which is what we want.)

> However, for the rest of it, from migration I'm happy:
>
> Reviewed-by: Dr. David Alan Gilbert 
>
> given it's just fixing an ARM bug, and given it'll blow up straight away
> I think it's OK for 4.1; the only risk is if we find a compiler we don't
> like.

thanks
-- PMM

Re: [Qemu-devel] Sphinx and docs/index.rst: dead code?

2019-07-25 Thread Peter Maydell

On Thu, 25 Jul 2019 at 18:26, John Snow  wrote:
> On 7/25/19 12:42 PM, Peter Maydell wrote:
> > This means you'll end up building 90% of our documentation twice,
> > which is something I was trying to avoid with the current setup.
> >
>
> Why? Wouldn't it suffice to build just one of the top-level docs just once?
>
> (I guess if you later decided to build the other top-level doc later it
> would duplicate the work, but is that the usual case?)

"make" needs to build the including-for-developers set so
you can refer to it in tree. But what we want for "make install"
is the only-for-users set. We don't want to build stuff during
"make install"; so "make" needs to have already built that set too.

The underlying problem here is that Sphinx doesn't really like
the idea of building documentation but only installing a subset.
So you get to pick which downsides you dislike least when
choosing the workaround. The current setup isn't necessarily
ideal but it seemed good-enough and to more or less match
the five-manuals approach Paolo originally outlined. We can
certainly change it if there's a less-awkward way to do things.

thanks
-- PMM

Re: [Qemu-devel] [RFC 16/19] fuzz: add general fuzzer entrypoints

2019-07-25 Thread Philippe Mathieu-Daudé

Hi Aleksander,

On 7/25/19 5:23 AM, Oleinik, Alexander wrote:
> Defines LLVMFuzzerInitialize and LLVMFuzzerTestOneInput
> 
> Signed-off-by: Alexander Oleinik 
> ---
>  tests/fuzz/fuzz.c | 262 ++
>  tests/fuzz/fuzz.h |  96 +
>  2 files changed, 358 insertions(+)
>  create mode 100644 tests/fuzz/fuzz.c
>  create mode 100644 tests/fuzz/fuzz.h
> 
> diff --git a/tests/fuzz/fuzz.c b/tests/fuzz/fuzz.c
> new file mode 100644
> index 00..0421b9402c
> --- /dev/null
> +++ b/tests/fuzz/fuzz.c
> @@ -0,0 +1,262 @@
> +#include "tests/fuzz/ramfile.h"
> +#include "migration/qemu-file.h"
> +#include "migration/global_state.h"
> +#include "migration/savevm.h"
> +#include "tests/libqtest.h"
> +#include "exec/memory.h"
> +#include "migration/migration.h"
> +#include "fuzz.h"
> +#include "tests/libqos/qgraph.h"
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +QTestState *s;
> +
> +QEMUFile *ramfile;
> +QEMUFile *writefile;
> +ram_disk *rd; 
> +typedef QSLIST_HEAD(, FuzzTarget) FuzzTargetList;
> +
> +FuzzTargetList* fuzz_target_list;
> +
> +uint64_t total_mr_size = 0;
> +uint64_t mr_index = 0;
> +
> +const MemoryRegion* mrs[1000];
> +
> +
> +// Save just the VMStateDescriptors
> +void save_device_state(void)
> +{
> +writefile = qemu_fopen_ram();
> +global_state_store();
> +qemu_save_device_state(writefile);
> +qemu_fflush(writefile);
> +ramfile = qemu_fopen_ro_ram(rd);
> +}
> +
> +// Save the entire vm state including RAM
> +void save_vm_state(void) 
> +{
> +writefile = qemu_fopen_ram();
> +vm_stop(RUN_STATE_SAVE_VM);
> +global_state_store();
> +qemu_savevm_state(writefile, NULL);
> +qemu_fflush(writefile);
> +ramfile = qemu_fopen_ro_ram(rd);
> +}
> +
> +/* Reset state by rebooting */
> +void reboot()
> +{
> +qemu_system_reset(SHUTDOWN_CAUSE_NONE);
> +}
> +
> +/* Restore device state */
> +void load_device_state()
> +{
> +qemu_freopen_ro_ram(ramfile);
> +
> +int ret = qemu_load_device_state(ramfile);
> +if (ret < 0){
> +printf("reset error\n");
> +exit(-1);
> +}
> +}
> +
> +/* Restore full vm state */
> +void load_vm_state()
> +{
> +qemu_freopen_ro_ram(ramfile);
> +
> +vm_stop(RUN_STATE_RESTORE_VM);
> +/* qemu_system_reset(SHUTDOWN_CAUSE_NONE); */
> +
> +int ret = qemu_loadvm_state(ramfile);
> +if (ret < 0){
> +printf("reset error\n");
> +exit(-1);
> +}
> +migration_incoming_state_destroy();
> +vm_start();
> +}
> +
> +void qtest_setup()
> +{
> +s = qtest_init_fuzz(NULL, NULL);
> +global_qtest = s;
> +}
> +
> +void fuzz_add_target(const char* name,
> +const char* description,
> +void(*init_pre_main)(void),
> +void(*init_pre_save)(void),
> +void(*save_state)(void),
> +void(*reset)(void),
> +void(*pre_fuzz)(void),
> +void(*fuzz)(const unsigned char*, size_t),
> +void(*post_fuzz)(void),
> +int* main_argc,
> +char*** main_argv)
> +{
> +
> +FuzzTarget *target;
> +FuzzTarget *tmp;
> +if(!fuzz_target_list)
> +fuzz_target_list = g_new0(FuzzTargetList, 1);
> +
> +QSLIST_FOREACH(tmp, fuzz_target_list, target_list) {
> +if (g_strcmp0(tmp->name->str, name) == 0) {
> +fprintf(stderr, "Error: Fuzz target name %s already in use\n", 
> name);
> +abort();
> +}
> +}
> +target = g_new0(FuzzTarget, 1);
> +target->name = g_string_new(name);
> +target->description = g_string_new(description);
> +target->init_pre_main = init_pre_main;
> +target->init_pre_save = init_pre_save;
> +target->save_state = save_state;
> +target->reset = reset;
> +target->pre_fuzz = pre_fuzz;
> +target->fuzz = fuzz;
> +target->post_fuzz = post_fuzz;
> +target->main_argc = main_argc;
> +target->main_argv = main_argv;
> +QSLIST_INSERT_HEAD(fuzz_target_list, target, target_list);
> +}
> +
> +
> +FuzzTarget* fuzz_get_target(char* name)
> +{
> +FuzzTarget* tmp;
> +if(!fuzz_target_list){
> +fprintf(stderr, "Fuzz target list not initialized");
> +abort();
> +}
> +
> +QSLIST_FOREACH(tmp, fuzz_target_list, target_list) {
> +if (g_strcmp0(tmp->name->str, name) == 0) {
> +break;
> +}
> +}
> +return tmp;
> +}
> +
> +FuzzTarget* fuzz_target;
> +
> +
> +
> +static void usage(void)
> +{
> +printf("Usage: ./fuzz --FUZZ_TARGET [LIBFUZZER ARGUMENTS]\n");
> +printf("where --FUZZ_TARGET is one of:\n");
> +FuzzTarget* tmp;
> +if(!fuzz_target_list){
> +fprintf(stderr, "Fuzz target list not initialized");
> +abort();
> +}
> +QSLIST_FOREACH(tmp, fuzz_target_list, target_list) {
> +QSLIST_FOREACH(tmp, fuzz_target_list, target_list) {
> +printf(" --%s  : %s\n", tmp->name->str, tmp->description->str);
> +}
>

Re: [Qemu-devel] [PATCH v27 5/8] target/avr: Add limited support for USART and 16 bit timer peripherals

2019-07-25 Thread Michael Rolnik

Hi Pavel.

Please see my answers below.

On Thu, Jul 25, 2019 at 1:00 PM Pavel Dovgalyuk  wrote:

> > From: Qemu-devel [mailto:qemu-devel-bounces+patchwork-qemu-
> > devel=patchwork.kernel@nongnu.org] On Behalf Of Michael Rolnik
> > From: Sarah Harris 
> >
> > These were designed to facilitate testing but should provide enough
> function to be useful in
> > other contexts.
>
> USART is very useful for testing, but to which model of AVR is belongs?
> We also started implementation of USART and other devices in our
> internship program,
> using prior version of your patches.
> There were other register addresses for the registers and some of them
> even intersect
> making read/write logic more complex (we looked at Atmega8).
>

This is a question for Sara as she built it. (I think it was ATmega2560)


>
> You also mix the board and the SoC into one file, making hardware-on-chip
> harder to reuse.
>
> I think that the structure can be revised in the following way:
> Board -> SoC -> Devices
>
> Board includes SoC, loads the firmware, and adds some external peripheral
> devices, if needed.
>
> SoC includes embedded peripherals. It dispatches IO memory accesses and
> passes them
> to the devices. In this case you can have different register addresses for
> different SoCs, but
> the embedded device emulation code can be mostly the same for simple
> devices like USART.
>
> > Only a subset of the functions of each peripheral is implemented, mainly
> due to the lack of a
> > standard way to handle electrical connections (like GPIO pins).
>
You are right.

>
> We did not got too much results, you can check for our changes here:
> https://github.com/Dovgalyuk/qemu/tree/avr8
>
> But we can help you in development of better version of the patches and
> split the work
> for making this platform more usable.
>

Gladly.
You are welcome.
My initial intent was to provide CPU only and I hoped that others will
model the devices.

*Richard, Phil & all,*
what would be the right mechanism / procedure to split the work?


>
> Pavel Dovgalyuk
>
>

-- 
Best Regards,
Michael Rolnik

Re: [Qemu-devel] [PATCH for-4.1? 1/2] stellaris_input: Fix vmstate description of buttons field

2019-07-25 Thread Philippe Mathieu-Daudé

On 7/25/19 7:02 PM, Dr. David Alan Gilbert wrote:
> * Peter Maydell (peter.mayd...@linaro.org) wrote:
>> gamepad_state::buttons is a pointer to an array of structs,
>> not an array of structs, so should be declared in the vmstate
>> with VMSTATE_STRUCT_VARRAY_POINTER_INT32; otherwise we
>> corrupt memory on incoming migration.
>>
>> We bump the vmstate version field as the easiest way to
>> deal with the migration break, since migration wouldn't have
>> worked reliably before anyway.
>>
>> Signed-off-by: Peter Maydell 
> 
> OK, it would be great to change num_buttons to uint32_t and make that a
> UINT32 at some point;  it's hard to press negative buttons.

Since the version is incremented, now seems to be a good time.

> 
> Reviewed-by: Dr. David Alan Gilbert 

Reviewed-by: Philippe Mathieu-Daudé 

> 
>> ---
>>  hw/input/stellaris_input.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/input/stellaris_input.c b/hw/input/stellaris_input.c
>> index 20c87d86f40..3a666d61d47 100644
>> --- a/hw/input/stellaris_input.c
>> +++ b/hw/input/stellaris_input.c
>> @@ -60,12 +60,14 @@ static const VMStateDescription vmstate_stellaris_button 
>> = {
>>  
>>  static const VMStateDescription vmstate_stellaris_gamepad = {
>>  .name = "stellaris_gamepad",
>> -.version_id = 1,
>> -.minimum_version_id = 1,
>> +.version_id = 2,
>> +.minimum_version_id = 2,
>>  .fields = (VMStateField[]) {
>>  VMSTATE_INT32(extension, gamepad_state),
>> -VMSTATE_STRUCT_VARRAY_INT32(buttons, gamepad_state, num_buttons, 0,
>> -  vmstate_stellaris_button, gamepad_button),
>> +VMSTATE_STRUCT_VARRAY_POINTER_INT32(buttons, gamepad_state,
>> +num_buttons,
>> +vmstate_stellaris_button,
>> +gamepad_button),
>>  VMSTATE_END_OF_LIST()
>>  }
>>  };
>> -- 
>> 2.20.1
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin

On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote:
> On 25/07/19 17:01, Michael S. Tsirkin wrote:
> >> It would be educational to try to enable ACPI core but disable all
> >> optional features.
> 
> A lot of them are select'ed so it's not easy.
> 
> > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.
> 
> That's what the NEMU guys experimented with.  It's not supported by our
> DSDT since it uses ACPI GPE,

Well there are two GPE blocks in FADT. We could just switch to
these if necesary I think.

> and the reduction in code size is small
> (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).
> 
> Paolo

Well ACPI is 150k loc I think, right?

linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1
 145926 total

So 100k wouldn't be too shabby.

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Michael S. Tsirkin

On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> > Actually, I think I have a better idea.
> > At the moment we just get an exit on these reads and return all-ones.
> > Yes, in theory there could be a UR bit set in a bunch of
> > registers but in practice no one cares about these,
> > and I don't think we implement them.
> > So how about mapping a single page, read-only, and filling it
> > with all-ones?
> 
> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> 
> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> 
> Paolo

It isn't but that's not what baremetal does. So there's some risk
there ...

Why is all zeroes better? We still need to map it, right?

-- 
MST

Re: [Qemu-devel] [PATCH for-4.1? 2/2] vmstate.h: Type check VMSTATE_STRUCT_VARRAY macros

2019-07-25 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> The VMSTATE_STRUCT_VARRAY_UINT32 macro is intended to handle
> migrating a field which is an array of structs, but where instead of
> migrating the entire array we only migrate a variable number of
> elements of it.
> 
> The VMSTATE_STRUCT_VARRAY_POINTER_UINT32 macro is intended to handle
> migrating a field which is of pointer type, and points to a
> dynamically allocated array of structs of variable size.
> 
> We weren't actually checking that the field passed to
> VMSTATE_STRUCT_VARRAY_UINT32 really is an array, with the result that
> accidentally using it where the _POINTER_ macro was intended would
> compile but silently corrupt memory on migration.
> 
> Add type-checking that enforces that the field passed in is
> really of the right array type. This applies to all the VMSTATE
> macros which use flags including VMS_VARRAY_* but not VMS_POINTER.
> 
> Signed-off-by: Peter Maydell 

> ---
>  include/migration/vmstate.h | 27 +--
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index ca68584eba4..2df333c3612 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -227,8 +227,19 @@ extern const VMStateInfo vmstate_info_bitmap;
>  extern const VMStateInfo vmstate_info_qtailq;
>  
>  #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
> +/* Check that t2 is an array of t1 of size n */
>  #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)

I'd have to admit I don't understand why that does what you say;
I'd expected something to index a t2 pointer with [n].

However, for the rest of it, from migration I'm happy:

Reviewed-by: Dr. David Alan Gilbert 

given it's just fixing an ARM bug, and given it'll blow up straight away
I think it's OK for 4.1; the only risk is if we find a compiler we don't
like.


>  #define type_check_pointer(t1,t2) ((t1**)0 - (t2*)0)
> +/*
> + * type of element 0 of the specified (array) field of the type.
> + * Note that if the field is a pointer then this will return the
> + * pointed-to type rather than complaining.
> + */
> +#define typeof_elt_of_field(type, field) typeof(((type *)0)->field[0])
> +/* Check that field f in struct type t2 is an array of t1, of any size */
> +#define type_check_varray(t1, t2, f) \
> +(type_check(t1, typeof_elt_of_field(t2, f))  \
> + + QEMU_BUILD_BUG_ON_ZERO(!QEMU_IS_ARRAY(((t2 *)0)->f)))
>  
>  #define vmstate_offset_value(_state, _field, _type)  \
>  (offsetof(_state, _field) +  \
> @@ -253,6 +264,10 @@ extern const VMStateInfo vmstate_info_qtailq;
>  vmstate_offset_array(_state, _field, uint8_t,\
>   sizeof(typeof_field(_state, _field)))
>  
> +#define vmstate_offset_varray(_state, _field, _type) \
> +(offsetof(_state, _field) +  \
> + type_check_varray(_type, _state, _field))
> +
>  /* In the macros below, if there is a _version, that means the macro's
>   * field will be processed only if the version being received is >=
>   * the _version specified.  In general, if you add a new field, you
> @@ -347,7 +362,7 @@ extern const VMStateInfo vmstate_info_qtailq;
>  .info   = &(_info),  \
>  .size   = sizeof(_type), \
>  .flags  = VMS_VARRAY_UINT32|VMS_MULTIPLY_ELEMENTS,   \
> -.offset = offsetof(_state, _field),  \
> +.offset = vmstate_offset_varray(_state, _field, _type),  \
>  }
>  
>  #define VMSTATE_ARRAY_TEST(_field, _state, _num, _test, _info, _type) {\
> @@ -376,7 +391,7 @@ extern const VMStateInfo vmstate_info_qtailq;
>  .info   = &(_info),  \
>  .size   = sizeof(_type), \
>  .flags  = VMS_VARRAY_INT32,  \
> -.offset = offsetof(_state, _field),  \
> +.offset = vmstate_offset_varray(_state, _field, _type),  \
>  }
>  
>  #define VMSTATE_VARRAY_INT32(_field, _state, _field_num, _version, _info, 
> _type) {\
> @@ -416,7 +431,7 @@ extern const VMStateInfo vmstate_info_qtailq;
>  .info   = &(_info),  \
>  .size   = sizeof(_type), \
>  .flags  = VMS_VARRAY_UINT16, \
> -.offset = offsetof(_state, _field),  \
> +.offset = vmstate_offset_varray(_state, _field, _type),  \
>  }
>  
>  #define VMSTATE_VSTRUCT_TEST(_field, _state, _test, _version, _vmsd, _type, 
> _struct_version) { \
> @@ -520,7 +535,7 @@ extern const VMStateInfo

Re: [Qemu-devel] Sphinx and docs/index.rst: dead code?

2019-07-25 Thread John Snow




On 7/25/19 12:42 PM, Peter Maydell wrote:
> On Thu, 25 Jul 2019 at 17:34, John Snow  wrote:
>> Yup; I think a single point of entry would be nice -- I think we need to
>> start hosting our sphinx documentation because it's confusing that we
>> have both the traditional manual (hosted by Stefan Weil) and this newer
>> one that isn't available anywhere.
>>
>> The interop manual in particular is crucial to get hosted.
> 
> Yes, this would be a good thing.
> 
>> We could perhaps formalize this as follows:
>>
>> - index.rst, which is an "absolutely everything included" single point
>> of entry manual for developers and contributors,
>>
>> - user.rst, which could be a single point of entry for end users, to be
>> bundled in distro packaging.
> 
> This means you'll end up building 90% of our documentation twice,
> which is something I was trying to avoid with the current setup.
> 

Why? Wouldn't it suffice to build just one of the top-level docs just once?

(I guess if you later decided to build the other top-level doc later it
would duplicate the work, but is that the usual case?)

> It occurs to me that we don't necessarily need the 'top level'
> page to be generated by Sphinx -- we could just ship an index.html
> which has helpful links to the individual manuals.
> 

True; we can leave it as a manual process and check the build artifact
into the repo if we want. [We likely ought to leave the source in the
tree in that case though, if we want to update the theming and other stuff.]

> (https://wiki.qemu.org/Features/Documentation is the current
> plan and lists the various manuals we'll end up with. 'user'
> in that plan means the documentation for the user-mode emulation.)
> 

Ah, whoops -- sorry for the namespace collision. I'll call it
'user-manual' or something instead if I talk about this in the future.

Thanks for the wiki page.

> thanks
> -- PMM
>

Re: [Qemu-devel] [PATCH 4/4] block-backend: Queue requests while drained

2019-07-25 Thread Eric Blake

On 7/25/19 11:27 AM, Kevin Wolf wrote:
> This fixes device like IDE that can still start new requests from I/O
> handlers in the CPU thread while the block backend is drained.
> 
> The basic assumption is that in a drain section, no new requests should
> be allowed through a BlockBackend (blk_drained_begin/end don't exist,
> we get drain sections only on the node level). However, there are two
> special cases where requests should not be queued:
> 
> 1. Block jobs: We already make sure that block jobs are paused in a
>drain section, so they won't start new requests. However, if the
>drain_begin is called on the job's BlockBackend first, it can happen
>that we deadlock because the job stays busy until it reaches a pause
>point - which it can't if it's requests aren't processed any more.

its (remember, "it's" is only okay if "it is" works as well)

> 
>The proper solution here would be to make all requests through the
>job's filter node instead of using a BlockBackend. For now, just
>disabling request queuin on the job BlockBackend is simpler.

queuing

> 
> 2. In test cases where making requests through bdrv_* would be
>cumbersome because we'd need a BdrvChild. As we already got the
>functionality to disable request queuing from 1., use it in tests,
>too, for convenience.
> 
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC 13/19] fuzz: add ctrl vq support to virtio-net in libqos

2019-07-25 Thread Oleinik, Alexander

On Thu, 2019-07-25 at 12:25 -0400, John Snow wrote:
> 
> On 7/24/19 11:23 PM, Oleinik, Alexander wrote:
> > Signed-off-by: Alexander Oleinik 
> 
> Is there some explanation for why the below patch does what the
> subject
> line claims for the uninitiated?
When multiqueue mode (VIRTIO_NET_F_MQ) is disabled, virtio-net sets up
three queues. 0:receiveq, 1:transmitq and 2:controlq. 
> I don't know why increasing the number of queues from 2 to 3 here is
> correct in the general case, OR why it would "add ctrl vq support".
> (Or what it has to do with fuzzing, in general.)

Prior to the change, accessing the ctrl vq through QOS, would trigger a
segfault, since only two queues were allocated to QVirtioDevice*
interface->queues.

Also, when VIRTIO_NET_F_MQ is enabled, the number of queues is 2*N + 1,
so I think in that case n->n_queues is also short by one in the code
below.

> [Only responding because this landed in tests/libqos, which I do try
> to
> keep an eye on, but this patch is opaque to me. --js]
> 
> > ---
> >  tests/libqos/virtio-net.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tests/libqos/virtio-net.c b/tests/libqos/virtio-net.c
> > index 66405b646e..247a0a17a8 100644
> > --- a/tests/libqos/virtio-net.c
> > +++ b/tests/libqos/virtio-net.c
> > @@ -51,7 +51,7 @@ static void virtio_net_setup(QVirtioNet
> > *interface)
> >  if (features & (1u << VIRTIO_NET_F_MQ)) {
> >  interface->n_queues = qvirtio_config_readw(vdev, 8) * 2;
> >  } else {
> > -interface->n_queues = 2;
> > +interface->n_queues = 3;
> >  }
> >  
> >  interface->queues = g_new(QVirtQueue *, interface->n_queues);
> >

Re: [Qemu-devel] [PATCH 3/4] mirror: Keep target drained until graph changes are done

2019-07-25 Thread Eric Blake

On 7/25/19 11:27 AM, Kevin Wolf wrote:
> Calling bdrv_drained_end() for target_bs can restarts requests too

restart

> early, so that they would execute on mirror_top_bs, which however has
> already dropped all permissions.
> 
> Keep the target node drained until all graph changes have completed.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/mirror.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 8cb75fb409..7483051f8d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -644,6 +644,11 @@ static int mirror_exit_common(Job *job)
>  bdrv_ref(mirror_top_bs);
>  bdrv_ref(target_bs);
>  
> +/* The mirror job has no requests in flight any more, but we need to
> + * drain potential other users of the BDS before changing the graph. */

Is checkpatch going to gripe about your comment style,

> +assert(s->in_drain);
> +bdrv_drained_begin(target_bs);
> +
>  /* Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
>   * inserting target_bs at s->to_replace, where we might not be able to 
> get
>   * these permissions.
> @@ -684,12 +689,7 @@ static int mirror_exit_common(Job *job)
>  bdrv_reopen_set_read_only(target_bs, ro, NULL);
>  }
>  
> -/* The mirror job has no requests in flight any more, but we need to
> - * drain potential other users of the BDS before changing the graph. 
> */

even though it is just code motion?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH for-4.1? 1/2] stellaris_input: Fix vmstate description of buttons field

2019-07-25 Thread Dr. David Alan Gilbert

* Peter Maydell (peter.mayd...@linaro.org) wrote:
> gamepad_state::buttons is a pointer to an array of structs,
> not an array of structs, so should be declared in the vmstate
> with VMSTATE_STRUCT_VARRAY_POINTER_INT32; otherwise we
> corrupt memory on incoming migration.
> 
> We bump the vmstate version field as the easiest way to
> deal with the migration break, since migration wouldn't have
> worked reliably before anyway.
> 
> Signed-off-by: Peter Maydell 

OK, it would be great to change num_buttons to uint32_t and make that a
UINT32 at some point;  it's hard to press negative buttons.


Reviewed-by: Dr. David Alan Gilbert 

> ---
>  hw/input/stellaris_input.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/input/stellaris_input.c b/hw/input/stellaris_input.c
> index 20c87d86f40..3a666d61d47 100644
> --- a/hw/input/stellaris_input.c
> +++ b/hw/input/stellaris_input.c
> @@ -60,12 +60,14 @@ static const VMStateDescription vmstate_stellaris_button 
> = {
>  
>  static const VMStateDescription vmstate_stellaris_gamepad = {
>  .name = "stellaris_gamepad",
> -.version_id = 1,
> -.minimum_version_id = 1,
> +.version_id = 2,
> +.minimum_version_id = 2,
>  .fields = (VMStateField[]) {
>  VMSTATE_INT32(extension, gamepad_state),
> -VMSTATE_STRUCT_VARRAY_INT32(buttons, gamepad_state, num_buttons, 0,
> -  vmstate_stellaris_button, gamepad_button),
> +VMSTATE_STRUCT_VARRAY_POINTER_INT32(buttons, gamepad_state,
> +num_buttons,
> +vmstate_stellaris_button,
> +gamepad_button),
>  VMSTATE_END_OF_LIST()
>  }
>  };
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 7/7] iotests: Disable 126 for some vmdk subformats

2019-07-25 Thread Eric Blake

On 7/25/19 10:57 AM, Max Reitz wrote:
> Several vmdk subformats do not work with iotest 126, so disable them.
> 
> (twoGbMaxExtentSparse actually should work, but fixing that is a bit
> difficult.  The problem is that the vmdk descriptor file will contain a
> referenc to "image:base.vmdk", which the block layer cannot open because

reference

> it does not know the protocol "image".  This is not trivial to solve,
> because I suppose real protocols like "http://; should be supported.
> Making vmdk treat all paths with a potential protocol prefix that the
> block layer does not recognize as plain files seems a bit weird,
> though.  Ignoring this problem does not seem too bad.)
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/126 | 6 ++
>  1 file changed, 6 insertions(+)

Are you aiming to get any of this series in -rc3, or is it firmly 4.2
material?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v4] qapi: Add InetSocketAddress member keep-alive

2019-07-25 Thread Daniel P . Berrangé

On Thu, Jul 25, 2019 at 11:38:56AM -0500, Eric Blake wrote:
> On 7/25/19 10:26 AM, Markus Armbruster wrote:
> > Vladimir Sementsov-Ogievskiy  writes:
> > 
> >> It's needed to provide keepalive for nbd client to track server
> >> availability.
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > 
> > Reviewed-by: Markus Armbruster 
> 
> It looks like this could go in any number of ways (Markus since it
> touches qapi, Dan since it touches sockets, or me since it was written
> for use with NBD); but unless anyone complains, I'm happy to queue this
> through my NBD tree for 4.2.

I'm fine with you queuing it, since I have nothing pending myself
to bundle it up with

  Acked-by: Daniel P. Berrangé 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] Sphinx and docs/index.rst: dead code?

2019-07-25 Thread Peter Maydell

On Thu, 25 Jul 2019 at 17:34, John Snow  wrote:
> Yup; I think a single point of entry would be nice -- I think we need to
> start hosting our sphinx documentation because it's confusing that we
> have both the traditional manual (hosted by Stefan Weil) and this newer
> one that isn't available anywhere.
>
> The interop manual in particular is crucial to get hosted.

Yes, this would be a good thing.

> We could perhaps formalize this as follows:
>
> - index.rst, which is an "absolutely everything included" single point
> of entry manual for developers and contributors,
>
> - user.rst, which could be a single point of entry for end users, to be
> bundled in distro packaging.

This means you'll end up building 90% of our documentation twice,
which is something I was trying to avoid with the current setup.

It occurs to me that we don't necessarily need the 'top level'
page to be generated by Sphinx -- we could just ship an index.html
which has helpful links to the individual manuals.

(https://wiki.qemu.org/Features/Documentation is the current
plan and lists the various manuals we'll end up with. 'user'
in that plan means the documentation for the user-mode emulation.)

thanks
-- PMM

Re: [Qemu-devel] Exploring Sphinx, autodoc, apidoc, and coverage tools for python/qemu

2019-07-25 Thread John Snow

On 7/25/19 5:02 AM, Peter Maydell wrote:
> On Wed, 24 Jul 2019 at 22:06, John Snow  wrote:
>> And then you can edit e.g. the top-level index.rst TOC in docs/index.rst
>> to look like this:
>>
>> ```
>> .. toctree::
>>:maxdepth: 2
>>:caption: Contents:
>>
>>interop/index
>>devel/index
>>specs/index
>>modules
>> ```
> 
> This is obviously just prototyping, but you don't want to put
> anything new in this top level index.rst -- it's only used
> for by-hand full-tree docs builds. Builds from the makefile
> rules will just build our separate manuals (interop, devel,
> specs) separately. You want to put your new documentation into
> whichever manual is best suited (probably devel/, but possibly
> interop/ in some cases?)
> 

Yes, understood -- I was prototyping in a "fresh" / empty repository, so
I was keeping the instructions reasonably comparable to what
sphinx-quickstart might produce if people wanted to explore this outside
of the complexity of the QEMU tree.

If the experiment had gone better, I'd have wanted to either model this
as a Developer sub-manual, or a new top-level manual under "Python
Library" or some such.

(It's hard to say which the QMP library is: it WAS internal developer
tooling, but I'm prototyping turning qmp-shell into something that could
be considered a reference implementation for interop that non-developers
might have a genuine interest in using for non-libvirt scenarios.)

> (Will read the rest of this email later.)
> 

TLDR: I wanted to document for the mailing list that Autodoc seems to
have some shortcomings that doesn't make it very attractive for
documenting our python utilities in a rigorous way.

One of the benefits, in my mind, of using doc generation utilities for
documenting API is the ability to fail the build when the documentation
has observable omissions/mistakes.

Autodoc doesn't seem capable of providing that; though it still may be
more useful than nothing.

--js

> thanks
> -- PMM
>

Re: [Qemu-devel] [PATCH v4] qapi: Add InetSocketAddress member keep-alive

2019-07-25 Thread Eric Blake

On 7/25/19 10:26 AM, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy  writes:
> 
>> It's needed to provide keepalive for nbd client to track server
>> availability.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> 
> Reviewed-by: Markus Armbruster 

It looks like this could go in any number of ways (Markus since it
touches qapi, Dan since it touches sockets, or me since it was written
for use with NBD); but unless anyone complains, I'm happy to queue this
through my NBD tree for 4.2.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH for-4.1? 2/2] vmstate.h: Type check VMSTATE_STRUCT_VARRAY macros

2019-07-25 Thread Peter Maydell

The VMSTATE_STRUCT_VARRAY_UINT32 macro is intended to handle
migrating a field which is an array of structs, but where instead of
migrating the entire array we only migrate a variable number of
elements of it.

The VMSTATE_STRUCT_VARRAY_POINTER_UINT32 macro is intended to handle
migrating a field which is of pointer type, and points to a
dynamically allocated array of structs of variable size.

We weren't actually checking that the field passed to
VMSTATE_STRUCT_VARRAY_UINT32 really is an array, with the result that
accidentally using it where the _POINTER_ macro was intended would
compile but silently corrupt memory on migration.

Add type-checking that enforces that the field passed in is
really of the right array type. This applies to all the VMSTATE
macros which use flags including VMS_VARRAY_* but not VMS_POINTER.

Signed-off-by: Peter Maydell 
---
 include/migration/vmstate.h | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ca68584eba4..2df333c3612 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -227,8 +227,19 @@ extern const VMStateInfo vmstate_info_bitmap;
 extern const VMStateInfo vmstate_info_qtailq;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
+/* Check that t2 is an array of t1 of size n */
 #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
 #define type_check_pointer(t1,t2) ((t1**)0 - (t2*)0)
+/*
+ * type of element 0 of the specified (array) field of the type.
+ * Note that if the field is a pointer then this will return the
+ * pointed-to type rather than complaining.
+ */
+#define typeof_elt_of_field(type, field) typeof(((type *)0)->field[0])
+/* Check that field f in struct type t2 is an array of t1, of any size */
+#define type_check_varray(t1, t2, f) \
+(type_check(t1, typeof_elt_of_field(t2, f))  \
+ + QEMU_BUILD_BUG_ON_ZERO(!QEMU_IS_ARRAY(((t2 *)0)->f)))
 
 #define vmstate_offset_value(_state, _field, _type)  \
 (offsetof(_state, _field) +  \
@@ -253,6 +264,10 @@ extern const VMStateInfo vmstate_info_qtailq;
 vmstate_offset_array(_state, _field, uint8_t,\
  sizeof(typeof_field(_state, _field)))
 
+#define vmstate_offset_varray(_state, _field, _type) \
+(offsetof(_state, _field) +  \
+ type_check_varray(_type, _state, _field))
+
 /* In the macros below, if there is a _version, that means the macro's
  * field will be processed only if the version being received is >=
  * the _version specified.  In general, if you add a new field, you
@@ -347,7 +362,7 @@ extern const VMStateInfo vmstate_info_qtailq;
 .info   = &(_info),  \
 .size   = sizeof(_type), \
 .flags  = VMS_VARRAY_UINT32|VMS_MULTIPLY_ELEMENTS,   \
-.offset = offsetof(_state, _field),  \
+.offset = vmstate_offset_varray(_state, _field, _type),  \
 }
 
 #define VMSTATE_ARRAY_TEST(_field, _state, _num, _test, _info, _type) {\
@@ -376,7 +391,7 @@ extern const VMStateInfo vmstate_info_qtailq;
 .info   = &(_info),  \
 .size   = sizeof(_type), \
 .flags  = VMS_VARRAY_INT32,  \
-.offset = offsetof(_state, _field),  \
+.offset = vmstate_offset_varray(_state, _field, _type),  \
 }
 
 #define VMSTATE_VARRAY_INT32(_field, _state, _field_num, _version, _info, 
_type) {\
@@ -416,7 +431,7 @@ extern const VMStateInfo vmstate_info_qtailq;
 .info   = &(_info),  \
 .size   = sizeof(_type), \
 .flags  = VMS_VARRAY_UINT16, \
-.offset = offsetof(_state, _field),  \
+.offset = vmstate_offset_varray(_state, _field, _type),  \
 }
 
 #define VMSTATE_VSTRUCT_TEST(_field, _state, _test, _version, _vmsd, _type, 
_struct_version) { \
@@ -520,7 +535,7 @@ extern const VMStateInfo vmstate_info_qtailq;
 .vmsd   = &(_vmsd),  \
 .size   = sizeof(_type), \
 .flags  = VMS_STRUCT|VMS_VARRAY_UINT8,   \
-.offset = offsetof(_state, _field),  \
+.offset = vmstate_offset_varray(_state, _field, _type),  \
 }
 
 /* a variable length array (i.e. _type *_field) but we know the
@@ -573,7 +588,7 @@ extern const VMStateInfo vmstate_info_qtailq;
 .vmsd   = &(_vmsd),  \
 .size   =

[Qemu-devel] [PATCH for-4.1? 0/2] Typecheck VMSTATE VARRAY macros and fix bug found

2019-07-25 Thread Peter Maydell

Damien's patch to fix a pl330 vmstate mixup between
VMSTATE_STRUCT_VARRAY_UINT32 and VMSTATE_STRUCT_VARRAY_POINTER_UINT32
led me to think about whether we could catch that particular
mixup. It turns out that we can, by adding a type check that the
field given to the macro is really an array of the correct type.

This only found one other instance of the same bug, in the
stellaris_input device; patch 1 fixes that and then patch 2
is the improved type checking.

We should probably also go through and look at the other
VMSTATE macros that use a raw 'offsetof' rather than one
of the vmstate_offset_foo type-checking macros, and see if
we can add type checks there. (Documentation of the macros
would be nice too...)

I've marked this as for-4.1? because the stellaris bugfix
definitely seems worth including in the release but I'm less
certain about whether to put in the typecheck -- David/Juan can
decide that.

thanks
-- PMM

Based-on: <20190724143553.21557-1-damien.he...@greensocs.com>
("pl330: fix vmstate description" -- otherwise the new typecheck
will cause a compile failure due to presence of the bug that
patch fixes)

Peter Maydell (2):
  stellaris_input: Fix vmstate description of buttons field
  vmstate.h: Type check VMSTATE_STRUCT_VARRAY macros

 include/migration/vmstate.h | 27 +--
 hw/input/stellaris_input.c  | 10 ++
 2 files changed, 27 insertions(+), 10 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH for-4.1? 1/2] stellaris_input: Fix vmstate description of buttons field

2019-07-25 Thread Peter Maydell

gamepad_state::buttons is a pointer to an array of structs,
not an array of structs, so should be declared in the vmstate
with VMSTATE_STRUCT_VARRAY_POINTER_INT32; otherwise we
corrupt memory on incoming migration.

We bump the vmstate version field as the easiest way to
deal with the migration break, since migration wouldn't have
worked reliably before anyway.

Signed-off-by: Peter Maydell 
---
 hw/input/stellaris_input.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/input/stellaris_input.c b/hw/input/stellaris_input.c
index 20c87d86f40..3a666d61d47 100644
--- a/hw/input/stellaris_input.c
+++ b/hw/input/stellaris_input.c
@@ -60,12 +60,14 @@ static const VMStateDescription vmstate_stellaris_button = {
 
 static const VMStateDescription vmstate_stellaris_gamepad = {
 .name = "stellaris_gamepad",
-.version_id = 1,
-.minimum_version_id = 1,
+.version_id = 2,
+.minimum_version_id = 2,
 .fields = (VMStateField[]) {
 VMSTATE_INT32(extension, gamepad_state),
-VMSTATE_STRUCT_VARRAY_INT32(buttons, gamepad_state, num_buttons, 0,
-  vmstate_stellaris_button, gamepad_button),
+VMSTATE_STRUCT_VARRAY_POINTER_INT32(buttons, gamepad_state,
+num_buttons,
+vmstate_stellaris_button,
+gamepad_button),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.20.1

Re: [Qemu-devel] Sphinx and docs/index.rst: dead code?

2019-07-25 Thread John Snow




On 7/25/19 5:08 AM, Peter Maydell wrote:
> On Thu, 25 Jul 2019 at 00:22, John Snow  wrote:
>>
>> Does anything actually use this file? It doesn't appear to be used for
>> generating the HTML manuals.
> 
> It's there for if you want to do a "build all the manuals into
> a single document" -- see the comments at the top of docs/conf.py.
> Basically this is there because for QEMU's own purposes we want
> to build several separate manuals (devel, interop, etc) so we
> can install for the user only the ones that are user-facing
> (ie "not devel"). But it seemed to me at the time to be worth
> also supporting the "build a full thing" for the benefit of
> sites like readthedocs. Currently the only way to use it is
> if you hand-run an appropriate sphinx-build command.

Yup; I think a single point of entry would be nice -- I think we need to
start hosting our sphinx documentation because it's confusing that we
have both the traditional manual (hosted by Stefan Weil) and this newer
one that isn't available anywhere.

The interop manual in particular is crucial to get hosted.

> (The fact that we will need to do some running of other
> commands to autogenerate .rst input from .hx files might mean
> that it's not really possible to support this sort of "third party
> site" docs setup in the end; but for the moment we retain the
> option to do this.)
> 

OK, sure -- just wasn't sure if the feature was coming or going. I'll
consider it "coming" then.

We could perhaps formalize this as follows:

- index.rst, which is an "absolutely everything included" single point
of entry manual for developers and contributors,

- user.rst, which could be a single point of entry for end users, to be
bundled in distro packaging.

Then we can build either a single-target dev manual, a single-target
user manual, or any of the individual component manuals. I'm not sure
which should be the default, though.

> Doing a top-level build will also complain about a couple
> random .rst files in docs/ not being in the toc tree --
> we need to fix this at some point anyway by putting that
> documentation into its proper place in the manual set.
> 
>> It looks like we might use it for latex, man and texinfo output from
>> sphinx judging by docs/conf.py, but I don't think we actually use sphinx
>> to generate such output, so I think this is all dead code.
> 
> We will generate the manpage output in the same way we do
> HTML, by invoking sphinx-build on the different manual
> directories -- nothing in-tree does this today but this
> patch:
> https://patchew.org/QEMU/20190618104718.25433-1-peter.mayd...@linaro.org/
> does the conversion of the qemu-ga docs/manpage and should
> be ready to go in once 4.1 is out the door.
> 
> thanks
> -- PMM
> 

Nice!

Thanks for the info.

--js

Re: [Qemu-devel] [PATCH v5 30/42] qemu-img: Use child access functions

2019-07-25 Thread Max Reitz

On 24.07.19 11:54, Vladimir Sementsov-Ogievskiy wrote:
> 21.06.2019 16:15, Vladimir Sementsov-Ogievskiy wrote:
>> 19.06.2019 18:49, Max Reitz wrote:
>>> On 19.06.19 11:18, Vladimir Sementsov-Ogievskiy wrote:
 13.06.2019 1:09, Max Reitz wrote:
> This changes iotest 204's output, because blkdebug on top of a COW node
> used to make qemu-img map disregard the rest of the backing chain (the
> backing chain was broken by the filter).  With this patch, the
> allocation in the base image is reported correctly.
>
> Signed-off-by: Max Reitz 
> ---
>    qemu-img.c | 36 
>    tests/qemu-iotests/204.out |  1 +
>    2 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index 07b6e2a808..7bfa6e5d40 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -992,7 +992,7 @@ static int img_commit(int argc, char **argv)
>    if (!blk) {
>    return 1;
>    }
> -    bs = blk_bs(blk);
> +    bs = bdrv_skip_implicit_filters(blk_bs(blk));

 if filename is json, describing explicit filter over normal node, bs will 
 be
 explicit filter ...

>    qemu_progress_init(progress, 1.f);
>    qemu_progress_print(0.f, 100);
> @@ -1009,7 +1009,7 @@ static int img_commit(int argc, char **argv)
>    /* This is different from QMP, which by default uses the 
> deepest file in
>     * the backing chain (i.e., the very base); however, the 
> traditional
>     * behavior of qemu-img commit is using the immediate backing 
> file. */
> -    base_bs = backing_bs(bs);
> +    base_bs = bdrv_filtered_cow_bs(bs);
>    if (!base_bs) {

 and here we'll fail.
>>>
>>> Right, will change to bdrv_backing_chain_next().
>>>
>    error_setg(_err, "Image does not have a backing 
> file");
>    goto done;
> @@ -1626,19 +1626,18 @@ static int 
> convert_iteration_sectors(ImgConvertState *s, int64_t sector_num)
>    if (s->sector_next_status <= sector_num) {
>    int64_t count = n * BDRV_SECTOR_SIZE;
> +    BlockDriverState *src_bs = blk_bs(s->src[src_cur]);
> +    BlockDriverState *base;
>    if (s->target_has_backing) {
> -
> -    ret = bdrv_block_status(blk_bs(s->src[src_cur]),
> -    (sector_num - src_cur_offset) *
> -    BDRV_SECTOR_SIZE,
> -    count, , NULL, NULL);
> +    base = bdrv_backing_chain_next(src_bs);

 As you described in another patch, will not we here get allocated in base 
 as allocated, because of
 counting filters above base?
>>>
>>> Damn, yes.  So
>>>
>>>  base = bdrv_filtered_cow_bs(bdrv_skip_rw_filters(src_bs));
>>>
>>> I suppose.
>>>
 Hmm. I now think, why filters don't report everything as unallocated, 
 would not it be more correct
 than fallthrough to child?
>>>
>>> I don’t know, actually.  Maybe, maybe not.
>>>
>    } else {
> -    ret = bdrv_block_status_above(blk_bs(s->src[src_cur]), NULL,
> -  (sector_num - src_cur_offset) *
> -  BDRV_SECTOR_SIZE,
> -  count, , NULL, NULL);
> +    base = NULL;
>    }
> +    ret = bdrv_block_status_above(src_bs, base,
> +  (sector_num - src_cur_offset) *
> +  BDRV_SECTOR_SIZE,
> +  count, , NULL, NULL);
>    if (ret < 0) {
>    error_report("error while reading block status of sector 
> %" PRId64
>     ": %s", sector_num, strerror(-ret));
>>>
>>> [...]
>>>
> @@ -2949,7 +2950,7 @@ static int img_map(int argc, char **argv)
>    if (!blk) {
>    return 1;
>    }
> -    bs = blk_bs(blk);
> +    bs = bdrv_skip_implicit_filters(blk_bs(blk));

 Hmm, another thought about implicit filters, how they could be here in 
 qemu-img?
>>>
>>> Hm, I don’t think they can.
>>>
 If implicit are only
 job filters. Do you prepared it for implicit COR? But we discussed with 
 Kevin that we'd better deprecate
 copy-on-read option..

 So, if implicit filters are for compatibility, we'll have them only in 
 block-jobs. So, seems no reason to support
 them in qemu-img.
>>>
>>> Seems reasonable, yes.
>>>
 Also, in block-jobs, we can abandon creating implicit filters above any 
 filter nodes, as well as abandon creating any
 filter nodes above implicit filters. This will still support old 
 scenarios, but

Re: [Qemu-devel] [PATCH v2 09/11] iotests: Convert to preallocated encrypted qcow2

2019-07-25 Thread Max Reitz

On 25.07.19 17:30, Maxim Levitsky wrote:
> On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
>> Add a test case for converting an empty image (which only returns zeroes
>> when read) to a preallocated encrypted qcow2 image.
>> qcow2_has_zero_init() should return 0 then, thus forcing qemu-img
>> convert to create zero clusters.
>>
>> Signed-off-by: Max Reitz 
>> Acked-by: Stefano Garzarella 
>> Tested-by: Stefano Garzarella 
>> ---
>>  tests/qemu-iotests/188 | 20 +++-
>>  tests/qemu-iotests/188.out |  4 
>>  2 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/188 b/tests/qemu-iotests/188
>> index be7278aa65..afca44df54 100755
>> --- a/tests/qemu-iotests/188
>> +++ b/tests/qemu-iotests/188
>> @@ -48,7 +48,7 @@ SECRETALT="secret,id=sec0,data=platypus"
>>  
>>  _make_test_img --object $SECRET -o 
>> "encrypt.format=luks,encrypt.key-secret=sec0,encrypt.iter-time=10" $size
>>  
>> -IMGSPEC="driver=$IMGFMT,file.filename=$TEST_IMG,encrypt.key-secret=sec0"
>> +IMGSPEC="driver=$IMGFMT,encrypt.key-secret=sec0,file.filename=$TEST_IMG"
> This change I think doesn't change anything
> 
>>  
>>  QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
>>  
>> @@ -68,6 +68,24 @@ echo
>>  echo "== verify open failure with wrong password =="
>>  $QEMU_IO --object $SECRETALT -c "read -P 0xa 0 $size" --image-opts $IMGSPEC 
>> | _filter_qemu_io | _filter_testdir
>>  
>> +_cleanup_test_img
>> +
>> +echo
>> +echo "== verify that has_zero_init returns false when preallocating =="
>> +
>> +# Empty source file
>> +if [ -n "$TEST_IMG_FILE" ]; then
>> +TEST_IMG_FILE="${TEST_IMG_FILE}.orig" _make_test_img $size
>> +else
>> +TEST_IMG="${TEST_IMG}.orig" _make_test_img $size
>> +fi
> 
> I wonder why do we have TEST_IMG_FILE and TEST_IMG, I don't know iotests well 
> enough
> From the quick look at the code, the TEST_IMG_FILE is an actual file, while 
> TEST_IMG can
> be various URL like address.

In theory, $TEST_IMG is what you give to the various qemu commands for
what you want to test.  It can be a URL, a plain path, or even in option
syntax (think file.filename=$TEST_IMG_FILE).  $TEST_IMG_FILE points to
the actual file on the local filesystem.

In practice, $TEST_IMG_FILE can be empty and then you only have
$TEST_IMG to work with.  Also, many tests only support the file protocol
anyway, which is exactly one such case, so they just use $TEST_IMG all
the time.

Max

>> +
>> +$QEMU_IMG convert -O "$IMGFMT" --object $SECRET \
>> +-o 
>> "encrypt.format=luks,encrypt.key-secret=sec0,encrypt.iter-time=10,preallocation=metadata"
>>  \
>> +"${TEST_IMG}.orig" "$TEST_IMG"
>> +
>> +$QEMU_IMG compare --object $SECRET --image-opts "${IMGSPEC}.orig" "$IMGSPEC"
>> +
>>  
>>  # success, all done
>>  echo "*** done"
>> diff --git a/tests/qemu-iotests/188.out b/tests/qemu-iotests/188.out
>> index 97b1402671..c568ef3701 100644
>> --- a/tests/qemu-iotests/188.out
>> +++ b/tests/qemu-iotests/188.out
>> @@ -15,4 +15,8 @@ read 16777216/16777216 bytes at offset 0
>>  
>>  == verify open failure with wrong password ==
>>  qemu-io: can't open: Invalid password, cannot unlock any keyslot
>> +
>> +== verify that has_zero_init returns false when preallocating ==
>> +Formatting 'TEST_DIR/t.IMGFMT.orig', fmt=IMGFMT size=16777216
>> +Images are identical.
>>  *** done
> 
> Reviewed-by: Maxim Levitsky 
> Best regards,
>   Maxim Levitsky
> 




signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 4/4] block-backend: Queue requests while drained

2019-07-25 Thread Kevin Wolf

This fixes device like IDE that can still start new requests from I/O
handlers in the CPU thread while the block backend is drained.

The basic assumption is that in a drain section, no new requests should
be allowed through a BlockBackend (blk_drained_begin/end don't exist,
we get drain sections only on the node level). However, there are two
special cases where requests should not be queued:

1. Block jobs: We already make sure that block jobs are paused in a
   drain section, so they won't start new requests. However, if the
   drain_begin is called on the job's BlockBackend first, it can happen
   that we deadlock because the job stays busy until it reaches a pause
   point - which it can't if it's requests aren't processed any more.

   The proper solution here would be to make all requests through the
   job's filter node instead of using a BlockBackend. For now, just
   disabling request queuin on the job BlockBackend is simpler.

2. In test cases where making requests through bdrv_* would be
   cumbersome because we'd need a BdrvChild. As we already got the
   functionality to disable request queuing from 1., use it in tests,
   too, for convenience.

Signed-off-by: Kevin Wolf 
---
 include/sysemu/block-backend.h | 11 +++---
 block/backup.c |  1 +
 block/block-backend.c  | 69 +-
 block/commit.c |  2 +
 block/mirror.c |  6 ++-
 blockjob.c |  3 ++
 tests/test-bdrv-drain.c|  1 +
 7 files changed, 76 insertions(+), 17 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 7320b58467..d453a4e9a1 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -104,6 +104,7 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, 
uint64_t *shared_perm);
 
 void blk_set_allow_write_beyond_eof(BlockBackend *blk, bool allow);
 void blk_set_allow_aio_context_change(BlockBackend *blk, bool allow);
+void blk_set_disable_request_queuing(BlockBackend *blk, bool disable);
 void blk_iostatus_enable(BlockBackend *blk);
 bool blk_iostatus_is_enabled(const BlockBackend *blk);
 BlockDeviceIoStatus blk_iostatus(const BlockBackend *blk);
@@ -119,10 +120,10 @@ BlockBackend *blk_by_qdev_id(const char *id, Error 
**errp);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
unsigned int bytes, QEMUIOVector *qiov,
-   BdrvRequestFlags flags);
+   BdrvRequestFlags flags, bool 
wait_while_drained);
 int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
-   unsigned int bytes, QEMUIOVector *qiov,
-   BdrvRequestFlags flags);
+unsigned int bytes, QEMUIOVector *qiov,
+BdrvRequestFlags flags, bool 
wait_while_drained);
 
 static inline int coroutine_fn blk_co_pread(BlockBackend *blk, int64_t offset,
 unsigned int bytes, void *buf,
@@ -130,7 +131,7 @@ static inline int coroutine_fn blk_co_pread(BlockBackend 
*blk, int64_t offset,
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
 
-return blk_co_preadv(blk, offset, bytes, , flags);
+return blk_co_preadv(blk, offset, bytes, , flags, true);
 }
 
 static inline int coroutine_fn blk_co_pwrite(BlockBackend *blk, int64_t offset,
@@ -139,7 +140,7 @@ static inline int coroutine_fn blk_co_pwrite(BlockBackend 
*blk, int64_t offset,
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, bytes);
 
-return blk_co_pwritev(blk, offset, bytes, , flags);
+return blk_co_pwritev(blk, offset, bytes, , flags, true);
 }
 
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
diff --git a/block/backup.c b/block/backup.c
index 715e1d3be8..f66b2f4ee7 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -635,6 +635,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 if (ret < 0) {
 goto error;
 }
+blk_set_disable_request_queuing(job->target, true);
 
 job->on_source_error = on_source_error;
 job->on_target_error = on_target_error;
diff --git a/block/block-backend.c b/block/block-backend.c
index fdd6b01ecf..603b281743 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -79,6 +79,9 @@ struct BlockBackend {
 QLIST_HEAD(, BlockBackendAioNotifier) aio_notifiers;
 
 int quiesce_counter;
+CoQueue queued_requests;
+bool disable_request_queuing;
+
 VMChangeStateEntry *vmsh;
 bool force_allow_inactivate;
 
@@ -339,6 +342,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 
 block_acct_init(>stats);
 
+qemu_co_queue_init(>queued_requests);
 notifier_list_init(>remove_bs_notifiers);

[Qemu-devel] [PATCH 0/4] block-backend: Queue requests while drained

2019-07-25 Thread Kevin Wolf

This series fixes the problem that devices like IDE, which submit
requests as a direct result of I/O from the CPU thread, can continue to
submit new requests even in a drained section.

In order to avoid a dependency for this series, I borrowed a patch from
Max.

Kevin Wolf (3):
  block: Remove blk_pread_unthrottled()
  mirror: Keep target drained until graph changes are done
  block-backend: Queue requests while drained

Max Reitz (1):
  block: Reduce (un)drains when replacing a child

 include/sysemu/block-backend.h | 13 +++---
 block.c| 49 +---
 block/backup.c |  1 +
 block/block-backend.c  | 85 +++---
 block/commit.c |  2 +
 block/mirror.c | 20 
 blockjob.c |  3 ++
 hw/block/hd-geometry.c |  7 +--
 tests/test-bdrv-drain.c|  1 +
 9 files changed, 118 insertions(+), 63 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH 3/4] mirror: Keep target drained until graph changes are done

2019-07-25 Thread Kevin Wolf

Calling bdrv_drained_end() for target_bs can restarts requests too
early, so that they would execute on mirror_top_bs, which however has
already dropped all permissions.

Keep the target node drained until all graph changes have completed.

Signed-off-by: Kevin Wolf 
---
 block/mirror.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 8cb75fb409..7483051f8d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -644,6 +644,11 @@ static int mirror_exit_common(Job *job)
 bdrv_ref(mirror_top_bs);
 bdrv_ref(target_bs);
 
+/* The mirror job has no requests in flight any more, but we need to
+ * drain potential other users of the BDS before changing the graph. */
+assert(s->in_drain);
+bdrv_drained_begin(target_bs);
+
 /* Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
  * inserting target_bs at s->to_replace, where we might not be able to get
  * these permissions.
@@ -684,12 +689,7 @@ static int mirror_exit_common(Job *job)
 bdrv_reopen_set_read_only(target_bs, ro, NULL);
 }
 
-/* The mirror job has no requests in flight any more, but we need to
- * drain potential other users of the BDS before changing the graph. */
-assert(s->in_drain);
-bdrv_drained_begin(target_bs);
 bdrv_replace_node(to_replace, target_bs, _err);
-bdrv_drained_end(target_bs);
 if (local_err) {
 error_report_err(local_err);
 ret = -EPERM;
@@ -704,7 +704,6 @@ static int mirror_exit_common(Job *job)
 aio_context_release(replace_aio_context);
 }
 g_free(s->replaces);
-bdrv_unref(target_bs);
 
 /*
  * Remove the mirror filter driver from the graph. Before this, get rid of
@@ -724,9 +723,12 @@ static int mirror_exit_common(Job *job)
 bs_opaque->job = NULL;
 
 bdrv_drained_end(src);
+bdrv_drained_end(target_bs);
+
 s->in_drain = false;
 bdrv_unref(mirror_top_bs);
 bdrv_unref(src);
+bdrv_unref(target_bs);
 
 return ret;
 }
-- 
2.20.1

[Qemu-devel] [PATCH 1/4] block: Remove blk_pread_unthrottled()

2019-07-25 Thread Kevin Wolf

The functionality offered by blk_pread_unthrottled() goes back to commit
498e386c584. Then, we couldn't perform I/O throttling with synchronous
requests because timers wouldn't be executed in polling loops. So the
commit automatically disabled I/O throttling as soon as a synchronous
request was issued.

However, for geometry detection during disk initialisation, we always
used (and still use) synchronous requests even if guest requests use AIO
later. Geometry detection was not wanted to disable I/O throttling, so
bdrv_pread_unthrottled() was introduced which disabled throttling only
temporarily.

All of this isn't necessary any more because we do run timers in polling
loop and even synchronous requests are now using coroutine
infrastructure internally. For this reason, commit 90c78624f already
removed the automatic disabling of I/O throttling.

It's time to get rid of the workaround for the removed code, and its
abuse of blk_root_drained_begin()/end(), as well.

Signed-off-by: Kevin Wolf 
---
 include/sysemu/block-backend.h |  2 --
 block/block-backend.c  | 16 
 hw/block/hd-geometry.c |  7 +--
 3 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 733c4957eb..7320b58467 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -117,8 +117,6 @@ char *blk_get_attached_dev_id(BlockBackend *blk);
 BlockBackend *blk_by_dev(void *dev);
 BlockBackend *blk_by_qdev_id(const char *id, Error **errp);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
-int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
-  int bytes);
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
unsigned int bytes, QEMUIOVector *qiov,
BdrvRequestFlags flags);
diff --git a/block/block-backend.c b/block/block-backend.c
index 0056b526b8..fdd6b01ecf 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1237,22 +1237,6 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
 return rwco.ret;
 }
 
-int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
-  int count)
-{
-int ret;
-
-ret = blk_check_byte_request(blk, offset, count);
-if (ret < 0) {
-return ret;
-}
-
-blk_root_drained_begin(blk->root);
-ret = blk_pread(blk, offset, buf, count);
-blk_root_drained_end(blk->root, NULL);
-return ret;
-}
-
 int blk_pwrite_zeroes(BlockBackend *blk, int64_t offset,
   int bytes, BdrvRequestFlags flags)
 {
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 79384a2b0a..dcbccee294 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -63,12 +63,7 @@ static int guess_disk_lchs(BlockBackend *blk,
 
 blk_get_geometry(blk, _sectors);
 
-/**
- * The function will be invoked during startup not only in sync I/O mode,
- * but also in async I/O mode. So the I/O throttling function has to
- * be disabled temporarily here, not permanently.
- */
-if (blk_pread_unthrottled(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
+if (blk_pread(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
 return -1;
 }
 /* test msdos magic */
-- 
2.20.1

[Qemu-devel] [PATCH 2/4] block: Reduce (un)drains when replacing a child

2019-07-25 Thread Kevin Wolf

From: Max Reitz 

Currently, bdrv_replace_child_noperm() undrains the parent until it is
completely undrained, then re-drains it after attaching the new child
node.

This is a problem with bdrv_drop_intermediate(): We want to keep the
whole subtree drained, including parents, while the operation is
under way.  bdrv_replace_child_noperm() breaks this by allowing every
parent to become unquiesced briefly, and then redraining it.

In fact, there is no reason why the parent should become unquiesced and
be allowed to submit requests to the new child node if that new node is
supposed to be kept drained.  So if anything, we have to drain the
parent before detaching the old child node.  Conversely, we have to
undrain it only after attaching the new child node.

Thus, change the whole drain algorithm here: Calculate the number of
times we have to drain/undrain the parent before replacing the child
node then drain it (if necessary), replace the child node, and then
undrain it.

Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block.c | 49 +
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index cbd8da5f3b..e1595bd058 100644
--- a/block.c
+++ b/block.c
@@ -2238,13 +2238,27 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
   BlockDriverState *new_bs)
 {
 BlockDriverState *old_bs = child->bs;
-int i;
+int new_bs_quiesce_counter;
+int drain_saldo;
 
 assert(!child->frozen);
 
 if (old_bs && new_bs) {
 assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
 }
+
+new_bs_quiesce_counter = (new_bs ? new_bs->quiesce_counter : 0);
+drain_saldo = new_bs_quiesce_counter - child->parent_quiesce_counter;
+
+/*
+ * If the new child node is drained but the old one was not, flush
+ * all outstanding requests to the old child node.
+ */
+while (drain_saldo > 0 && child->role->drained_begin) {
+bdrv_parent_drained_begin_single(child, true);
+drain_saldo--;
+}
+
 if (old_bs) {
 /* Detach first so that the recursive drain sections coming from @child
  * are already gone and we only end the drain sections that came from
@@ -2252,28 +2266,22 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 if (child->role->detach) {
 child->role->detach(child);
 }
-while (child->parent_quiesce_counter) {
-bdrv_parent_drained_end_single(child);
-}
 QLIST_REMOVE(child, next_parent);
-} else {
-assert(child->parent_quiesce_counter == 0);
 }
 
 child->bs = new_bs;
 
 if (new_bs) {
 QLIST_INSERT_HEAD(_bs->parents, child, next_parent);
-if (new_bs->quiesce_counter) {
-int num = new_bs->quiesce_counter;
-if (child->role->parent_is_bds) {
-num -= bdrv_drain_all_count;
-}
-assert(num >= 0);
-for (i = 0; i < num; i++) {
-bdrv_parent_drained_begin_single(child, true);
-}
-}
+
+/*
+ * Detaching the old node may have led to the new node's
+ * quiesce_counter having been decreased.  Not a problem, we
+ * just need to recognize this here and then invoke
+ * drained_end appropriately more often.
+ */
+assert(new_bs->quiesce_counter <= new_bs_quiesce_counter);
+drain_saldo += new_bs->quiesce_counter - new_bs_quiesce_counter;
 
 /* Attach only after starting new drained sections, so that recursive
  * drain sections coming from @child don't get an extra .drained_begin
@@ -2282,6 +2290,15 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 child->role->attach(child);
 }
 }
+
+/*
+ * If the old child node was drained but the new one is not, allow
+ * requests to come in only after the new node has been attached.
+ */
+while (drain_saldo < 0 && child->role->drained_end) {
+bdrv_parent_drained_end_single(child);
+drain_saldo++;
+}
 }
 
 /*
-- 
2.20.1

Re: [Qemu-devel] [RFC 13/19] fuzz: add ctrl vq support to virtio-net in libqos

2019-07-25 Thread John Snow

On 7/24/19 11:23 PM, Oleinik, Alexander wrote:
> Signed-off-by: Alexander Oleinik 

Is there some explanation for why the below patch does what the subject
line claims for the uninitiated?

I don't know why increasing the number of queues from 2 to 3 here is
correct in the general case, OR why it would "add ctrl vq support".
(Or what it has to do with fuzzing, in general.)

[Only responding because this landed in tests/libqos, which I do try to
keep an eye on, but this patch is opaque to me. --js]

> ---
>  tests/libqos/virtio-net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/libqos/virtio-net.c b/tests/libqos/virtio-net.c
> index 66405b646e..247a0a17a8 100644
> --- a/tests/libqos/virtio-net.c
> +++ b/tests/libqos/virtio-net.c
> @@ -51,7 +51,7 @@ static void virtio_net_setup(QVirtioNet *interface)
>  if (features & (1u << VIRTIO_NET_F_MQ)) {
>  interface->n_queues = qvirtio_config_readw(vdev, 8) * 2;
>  } else {
> -interface->n_queues = 2;
> +interface->n_queues = 3;
>  }
>  
>  interface->queues = g_new(QVirtQueue *, interface->n_queues);
>

Re: [Qemu-devel] [PATCH v2 02/11] mirror: Fix bdrv_has_zero_init() use

2019-07-25 Thread Max Reitz

On 25.07.19 18:21, Max Reitz wrote:
> On 25.07.19 17:28, Maxim Levitsky wrote:

[...]

>> For example, QMP reference states that MIRROR_SYNC_MODE_TOP copies data in 
>> the topmost image to the destination.
>> If there is only the topmost image, I could image the caller assume that 
>> target is identical to the source.
> 
> It doesn’t say that it copies the data in the topmost image.  It says it
> copies the data *allocated* in the topmost image.  It follows that it
> will not copy any data that is not allocated.

(I just saw that MirrorSyncMode indeed just says “data in the topmost
image”.  I was looking at DriveBackup and blockdev-backup, which say
“only the sectors allocated in the topmost image”.  I think what
DriveBackup and blockdev-backup say takes precedence here.)

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] ppc: remove redundant capability check for unset irq

2019-07-25 Thread Greg Kurz

On Thu, 25 Jul 2019 10:40:11 -0500
Shivaprasad G Bhat  wrote:

> The KVM_CAP_PPC_UNSET_IRQ is part of kernel since v2.6.36.
> Kernels older than that are not supported anymore.
> So, remove the checks.
> 

Ok to drop the dead paths but we do need this cap to be present.

int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int level)
{
unsigned virq = level ? KVM_INTERRUPT_SET_LEVEL : KVM_INTERRUPT_UNSET;
...
}

Please add an error message and exit in kvm_arch_init() like you did for
SET_LEVEL.

> Signed-off-by: Shivaprasad G Bhat 
> ---
>  target/ppc/kvm.c |4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 5ab5e6c6a9..94a2ecb84f 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -55,7 +55,6 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
>  KVM_CAP_LAST_INFO
>  };
>  
> -static int cap_interrupt_unset;
>  static int cap_segstate;
>  static int cap_booke_sregs;
>  static int cap_ppc_smt;
> @@ -104,7 +103,6 @@ static int kvmppc_get_dec_bits(void);
>  
>  int kvm_arch_init(MachineState *ms, KVMState *s)
>  {
> -cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
>  cap_segstate = kvm_check_extension(s, KVM_CAP_PPC_SEGSTATE);
>  cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS);
>  cap_ppc_smt_possible = kvm_vm_check_extension(s, 
> KVM_CAP_PPC_SMT_POSSIBLE);
> @@ -1309,7 +1307,7 @@ int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int 
> level)
>  return 0;
>  }
>  
> -if (!kvm_enabled() || !cap_interrupt_unset) {
> +if (!kvm_enabled()) {
>  return 0;
>  }
>  
>

Re: [Qemu-devel] [PATCH v2 02/11] mirror: Fix bdrv_has_zero_init() use

2019-07-25 Thread Max Reitz

On 25.07.19 17:28, Maxim Levitsky wrote:
> On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
>> bdrv_has_zero_init() only has meaning for newly created images or image
>> areas.  If the mirror job itself did not create the image, it cannot
>> rely on bdrv_has_zero_init()'s result to carry any meaning.
>>
>> This is the case for drive-mirror with mode=existing and always for
>> blockdev-mirror.
>>
>> Note that we only have to zero-initialize the target with sync=full,
>> because other modes actually do not promise that the target will contain
>> the same data as the source after the job -- sync=top only promises to
>> copy anything allocated in the top layer, and sync=none will only copy
>> new I/O.  (Which is how mirror has always handled it.)
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  include/block/block_int.h   |  2 ++
>>  block/mirror.c  | 11 ---
>>  blockdev.c  | 16 +---
>>  tests/test-block-iothread.c |  2 +-
>>  4 files changed, 24 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 3aa1e832a8..6a0b1b5008 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -1116,6 +1116,7 @@ BlockJob *commit_active_start(const char *job_id, 
>> BlockDriverState *bs,
>>   * @buf_size: The amount of data that can be in flight at one time.
>>   * @mode: Whether to collapse all images in the chain to the target.
>>   * @backing_mode: How to establish the target's backing chain after 
>> completion.
>> + * @zero_target: Whether the target should be explicitly zero-initialized
>>   * @on_source_error: The action to take upon error reading from the source.
>>   * @on_target_error: The action to take upon error writing to the target.
>>   * @unmap: Whether to unmap target where source sectors only contain zeroes.
>> @@ -1135,6 +1136,7 @@ void mirror_start(const char *job_id, BlockDriverState 
>> *bs,
>>int creation_flags, int64_t speed,
>>uint32_t granularity, int64_t buf_size,
>>MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
>> +  bool zero_target,
>>BlockdevOnError on_source_error,
>>BlockdevOnError on_target_error,
>>bool unmap, const char *filter_node_name,
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 8cb75fb409..50188ce6e9 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -51,6 +51,8 @@ typedef struct MirrorBlockJob {
>>  Error *replace_blocker;
>>  bool is_none_mode;
>>  BlockMirrorBackingMode backing_mode;
>> +/* Whether the target image requires explicit zero-initialization */
>> +bool zero_target;
>>  MirrorCopyMode copy_mode;
>>  BlockdevOnError on_source_error, on_target_error;
>>  bool synced;
>> @@ -763,7 +765,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
>> *s)
>>  int ret;
>>  int64_t count;
>>  
>> -if (base == NULL && !bdrv_has_zero_init(target_bs)) {
>> +if (s->zero_target) {
> The justification for removing base here, is that it can be != NULL only
> when MIRROR_SYNC_MODE_TOP. So looks OK

Or with sync=none, or when doing an active commit,

>>  if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
>>  bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
>>  return 0;
>> @@ -1501,6 +1503,7 @@ static BlockJob *mirror_start_job(
>>   const char *replaces, int64_t speed,
>>   uint32_t granularity, int64_t buf_size,
>>   BlockMirrorBackingMode backing_mode,
>> + bool zero_target,
>>   BlockdevOnError on_source_error,
>>   BlockdevOnError on_target_error,
>>   bool unmap,
>> @@ -1628,6 +1631,7 @@ static BlockJob *mirror_start_job(
>>  s->on_target_error = on_target_error;
>>  s->is_none_mode = is_none_mode;
>>  s->backing_mode = backing_mode;
>> +s->zero_target = zero_target;
>>  s->copy_mode = copy_mode;
>>  s->base = base;
>>  s->granularity = granularity;
>> @@ -1713,6 +1717,7 @@ void mirror_start(const char *job_id, BlockDriverState 
>> *bs,
>>int creation_flags, int64_t speed,
>>uint32_t granularity, int64_t buf_size,
>>MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
>> +  bool zero_target,
>>BlockdevOnError on_source_error,
>>BlockdevOnError on_target_error,
>>bool unmap, const char *filter_node_name,
>> @@ -1728,7 +1733,7 @@ void mirror_start(const char *job_id, BlockDriverState 
>> *bs,
>>  is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
>>  base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
>>

Re: [Qemu-devel] [PATCH for-4.2 00/24] target/arm: Implement ARMv8.1-VHE

2019-07-25 Thread Alex Bennée



Richard Henderson  writes:

> About half of this patch set is cleanup of the qemu tlb handling
> leading up to the actual implementation of VHE, and the biggest
> piece of that: The EL2&0 translation regime.
>
> Testing so far has been limited to booting a debian 9 system with
> a 4.9 kernel, and a fedora 30 system with a 5.1 kernel.  Both have
> KVM enabled, and both report enabling VHE is successful.

So you haven't booted a kernel via KVM inside the emulation yet? That
might explain why I was struggling to do so. For example single-stepping
through the guest kernel gets stuck. I suspect that means some state
machine doesn't quite work (or possibly that is an existing bug, I need
to check without VHE as well).

Anyway I've done my pass for now. I'll come back to the later patches
when more stuff is working.

>
>
> r~
>
>
> Richard Henderson (24):
>   cputlb: Add tlb_set_asid_for_mmuidx
>   cputlb: Add tlb_flush_asid_by_mmuidx and friends
>   target/arm: Install ASIDs for long-form from EL1
>   target/arm: Install ASIDs for short-form from EL1
>   target/arm: Install ASIDs for EL2
>   target/arm: Define isar_feature_aa64_vh
>   target/arm: Enable HCR_E2H for VHE
>   target/arm: Add CONTEXTIDR_EL2
>   target/arm: Add TTBR1_EL2
>   target/arm: Update CNTVCT_EL0 for VHE
>   target/arm: Add the hypervisor virtual counter
>   target/arm: Add VHE system register redirection and aliasing
>   target/arm: Split out vae1_tlbmask, vmalle1_tlbmask
>   target/arm: Simplify tlb_force_broadcast alternatives
>   target/arm: Reorganize ARMMMUIdx
>   target/arm: Add regime_has_2_ranges
>   target/arm: Update arm_mmu_idx for VHE
>   target/arm: Update arm_sctlr for VHE
>   target/arm: Install asids for E2&0 translation regime
>   target/arm: Flush tlbs for E2&0 translation regime
>   target/arm: Update arm_phys_excp_target_el for TGE
>   target/arm: Update regime_is_user for EL2&0
>   target/arm: Update {fp,sve}_exception_el for VHE
>   target/arm: Enable ARMv8.1-VHE in -cpu max
>
>  include/exec/cpu-all.h |  11 +
>  include/exec/cpu-defs.h|   2 +
>  include/exec/exec-all.h|  35 ++
>  include/qom/cpu.h  |   1 +
>  target/arm/cpu-qom.h   |   1 +
>  target/arm/cpu.h   | 259 +-
>  target/arm/internals.h |  62 ++-
>  target/arm/translate.h |   2 +-
>  accel/tcg/cputlb.c |  77 +++
>  target/arm/arch_dump.c |   2 +-
>  target/arm/cpu.c   |   2 +
>  target/arm/cpu64.c |   1 +
>  target/arm/debug_helper.c  |  50 +-
>  target/arm/helper-a64.c|   2 +-
>  target/arm/helper.c| 985 ++---
>  target/arm/m_helper.c  |   6 +-
>  target/arm/pauth_helper.c  |   9 +-
>  target/arm/translate-a64.c |  14 +-
>  target/arm/translate.c |  17 +-
>  19 files changed, 1058 insertions(+), 480 deletions(-)


--
Alex Bennée

Re: [Qemu-devel] [PATCH for 4.1?] pl330: fix vmstate description

2019-07-25 Thread Peter Maydell

On Wed, 24 Jul 2019 at 15:36, Damien Hedde  wrote:
>
> Fix the pl330 main and queue vmstate description.
> There were missing POINTER flags causing crashes during
> incoming migration because:
> + PL330State chan field is a pointer to an array
> + PL330Queue queue field is a pointer to an array
>
> Also bump corresponding vmsd version numbers.
>
> Signed-off-by: Damien Hedde 
> ---
>
> I found this while working on reset with xilinx-zynq machine.
>
> I'm not sure what's the vmsd version policy in such cases (for
> backward compatibility). I've simply bumped them since migration
> was not working anyway (vmstate_load_state was erasing critical part
> of PL330State and causing segfaults while loading following fields).
>
> Tested doing migration with the xilinx-zynq-a9 machine.

Reviewed-by: Peter Maydell 

I worked out that we can catch this category of bug by adding
type-checking to the VMSTATE_STRUCT_VARRAY_UINT32 macro and
friends that ensures that the passed in field name is really
an array and not a pointer. This also caught at least one
other bug of the same type...patches to follow later.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3] qapi: add dirty-bitmaps to query-named-block-nodes result

2019-07-25 Thread John Snow




On 7/25/19 2:06 AM, Markus Armbruster wrote:
> John Snow  writes:
> 
>> On 7/24/19 12:47 AM, Markus Armbruster wrote:
>>> John Snow  writes:
>>>
 From: Vladimir Sementsov-Ogievskiy 

 Let's add a possibility to query dirty-bitmaps not only on root nodes.
 It is useful when dealing both with snapshots and incremental backups.

 Signed-off-by: Vladimir Sementsov-Ogievskiy 
 [Added deprecation information. --js]
 Signed-off-by: John Snow 
 ---
  block/qapi.c |  5 +
  qapi/block-core.json |  6 +-
  qemu-deprecated.texi | 12 
  3 files changed, 22 insertions(+), 1 deletion(-)

 diff --git a/block/qapi.c b/block/qapi.c
 index 917435f022..15f1030264 100644
 --- a/block/qapi.c
 +++ b/block/qapi.c
 @@ -79,6 +79,11 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend 
 *blk,
  info->backing_file = g_strdup(bs->backing_file);
  }
  
 +if (!QLIST_EMPTY(>dirty_bitmaps)) {
 +info->has_dirty_bitmaps = true;
 +info->dirty_bitmaps = bdrv_query_dirty_bitmaps(bs);
 +}
 +
  info->detect_zeroes = bs->detect_zeroes;
  
  if (blk && blk_get_public(blk)->throttle_group_member.throttle_state) 
 {
 diff --git a/qapi/block-core.json b/qapi/block-core.json
 index 0d43d4f37c..9210ae233d 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
 @@ -360,6 +360,9 @@
  # @write_threshold: configured write threshold for the device.
  #   0 if disabled. (Since 2.3)
  #
 +# @dirty-bitmaps: dirty bitmaps information (only present if node
 +# has one or more dirty bitmaps) (Since 4.2)
 +#
  # Since: 0.14.0
  #
  ##
 @@ -378,7 +381,7 @@
  '*bps_wr_max_length': 'int', '*iops_max_length': 'int',
  '*iops_rd_max_length': 'int', '*iops_wr_max_length': 'int',
  '*iops_size': 'int', '*group': 'str', 'cache': 
 'BlockdevCacheInfo',
 -'write_threshold': 'int' } }
 +'write_threshold': 'int', '*dirty-bitmaps': 
 ['BlockDirtyInfo'] } }
  
  ##
  # @BlockDeviceIoStatus:
 @@ -656,6 +659,7 @@
  #
  # @dirty-bitmaps: dirty bitmaps information (only present if the
  # driver has one or more dirty bitmaps) (Since 2.0)
 +# Deprecated in 4.2; see BlockDirtyInfo instead.
  #
  # @io-status: @BlockDeviceIoStatus. Only present if the device
  # supports it and the VM is configured to stop on errors
 diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
 index c90b08d553..6374b66546 100644
 --- a/qemu-deprecated.texi
 +++ b/qemu-deprecated.texi
 @@ -134,6 +134,18 @@ The ``status'' field of the ``BlockDirtyInfo'' 
 structure, returned by
  the query-block command is deprecated. Two new boolean fields,
  ``recording'' and ``busy'' effectively replace it.
  
 +@subsection query-block result field dirty-bitmaps (Since 4.2)
 +
 +The ``dirty-bitmaps`` field of the ``BlockInfo`` structure, returned by
 +the query-block command is itself now deprecated. The ``dirty-bitmaps``
 +field of the ``BlockDeviceInfo`` struct should be used instead, which is 
 the
 +type of the ``inserted`` field in query-block replies, as well as the
 +type of array items in query-named-block-nodes.
>>>
>>> Would the text be clearer if it talked only about commands, not about
>>> types?
>>>
>>> Here's my (laconic) try:
>>>
>>>@subsection query-block result field dirty-bitmaps (Since 4.2)
>>>
>>>In the result of query-block, member ``dirty-bitmaps'' has been moved
>>>into member ``inserted''.
>>>
>>
>> Yeah, that's probably better in terms of strictly what the deprecation
>> is. I was trying to imply that the output will also now be visible in
>> other commands as well, but that's not the deprecation -- that's the new
>> feature.
>>
>> ACK
>>
>>> Aside: same for existing @subsection query-block result field
>>> dirty-bitmaps[i].status (since 4.0).
>>>
>>
>> (Probably not worth editing deprecation text that was already published.)
> 
> Maybe, maybe not.  I'm not making demands.
> 
 +Since the ``dirty-bitmaps`` field is optionally present in both the old 
 and
 +new locations, clients must use introspection to learn where to anticipate
 +the field if/when it does appear in command output.
 +
>>>
>>> I find this hint a bit confusing.  Do we need it?
>>>
>>
>> I think so, yes: it's nice to inform readers of how to cope with the
>> deprecation.
>>
>>> If yes, laconic me again:
>>>
>>>Clients should use introspection to learn whether ``dirty-bitmaps'' is
>>>in the new location.
>>>
>>
>> Too terse. I want my documentation to greet me in the morning by reading
>> me the local newspaper while I brush my teeth.
>>
>> Yours says

Re: [Qemu-devel] [PATCH for-4.2 18/24] target/arm: Update arm_sctlr for VHE

2019-07-25 Thread Alex Bennée



Richard Henderson  writes:

> Use this function in many more places in order to select
> the correct control.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/arm/cpu.h  | 10 ++
>  target/arm/arch_dump.c|  2 +-
>  target/arm/helper-a64.c   |  2 +-
>  target/arm/helper.c   | 10 +-
>  target/arm/pauth_helper.c |  9 +
>  5 files changed, 14 insertions(+), 19 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 7310adfd9b..7efbb488d9 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -3104,11 +3104,13 @@ static inline bool arm_sctlr_b(CPUARMState *env)
>  static inline uint64_t arm_sctlr(CPUARMState *env, int el)
>  {
>  if (el == 0) {
> -/* FIXME: ARMv8.1-VHE S2 translation regime.  */
> -return env->cp15.sctlr_el[1];
> -} else {
> -return env->cp15.sctlr_el[el];
> +if (arm_el_is_aa64(env, 2) && (arm_hcr_el2_eff(env) & HCR_E2H)) {
> +el = 2;
> +} else {
> +el = 1;
> +}
>  }
> +return env->cp15.sctlr_el[el];
>  }
>
>
> diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
> index 26a2c09868..5fbd008d8c 100644
> --- a/target/arm/arch_dump.c
> +++ b/target/arm/arch_dump.c
> @@ -320,7 +320,7 @@ int cpu_get_dump_info(ArchDumpInfo *info,
>   * dump a hypervisor that happens to be running an opposite-endian
>   * kernel.
>   */
> -info->d_endian = (env->cp15.sctlr_el[1] & SCTLR_EE) != 0
> +info->d_endian = (arm_sctlr(env, 1) & SCTLR_EE) != 0
>   ? ELFDATA2MSB : ELFDATA2LSB;
>
>  return 0;
> diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
> index 060699b901..3bf1b731e7 100644
> --- a/target/arm/helper-a64.c
> +++ b/target/arm/helper-a64.c
> @@ -70,7 +70,7 @@ static void daif_check(CPUARMState *env, uint32_t op,
> uint32_t imm, uintptr_t ra)
>  {
>  /* DAIF update to PSTATE. This is OK from EL0 only if UMA is set.  */
> -if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_UMA)) {
> +if (arm_current_el(env) == 0 && !(arm_sctlr(env, 0) & SCTLR_UMA)) {
>  raise_exception_ra(env, EXCP_UDEF,
> syn_aa64_sysregtrap(0, extract32(op, 0, 3),
> extract32(op, 3, 3), 4,
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 54c328b844..db13a8f9c0 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -3868,7 +3868,7 @@ static void aa64_fpsr_write(CPUARMState *env, const 
> ARMCPRegInfo *ri,
>  static CPAccessResult aa64_daif_access(CPUARMState *env, const ARMCPRegInfo 
> *ri,
> bool isread)
>  {
> -if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_UMA)) {
> +if (arm_current_el(env) == 0 && !(arm_sctlr(env, 0) & SCTLR_UMA)) {
>  return CP_ACCESS_TRAP;
>  }
>  return CP_ACCESS_OK;
> @@ -3887,7 +3887,7 @@ static CPAccessResult aa64_cacheop_access(CPUARMState 
> *env,
>  /* Cache invalidate/clean: NOP, but EL0 must UNDEF unless
>   * SCTLR_EL1.UCI is set.
>   */
> -if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_UCI)) {
> +if (arm_current_el(env) == 0 && !(arm_sctlr(env, 0) & SCTLR_UCI)) {
>  return CP_ACCESS_TRAP;
>  }
>  return CP_ACCESS_OK;
> @@ -4114,7 +4114,7 @@ static CPAccessResult aa64_zva_access(CPUARMState *env, 
> const ARMCPRegInfo *ri,
>  /* We don't implement EL2, so the only control on DC ZVA is the
>   * bit in the SCTLR which can prohibit access for EL0.
>   */
> -if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_DZE)) {
> +if (arm_current_el(env) == 0 && !(arm_sctlr(env, 0) & SCTLR_DZE)) {
>  return CP_ACCESS_TRAP;
>  }
>  return CP_ACCESS_OK;
> @@ -5344,7 +5344,7 @@ static CPAccessResult ctr_el0_access(CPUARMState *env, 
> const ARMCPRegInfo *ri,
>  /* Only accessible in EL0 if SCTLR.UCT is set (and only in AArch64,
>   * but the AArch32 CTR has its own reginfo struct)
>   */
> -if (arm_current_el(env) == 0 && !(env->cp15.sctlr_el[1] & SCTLR_UCT)) {
> +if (arm_current_el(env) == 0 && !(arm_sctlr(env, 0) & SCTLR_UCT)) {
>  return CP_ACCESS_TRAP;
>  }
>  return CP_ACCESS_OK;
> @@ -8161,7 +8161,7 @@ static void take_aarch32_exception(CPUARMState *env, 
> int new_mode,
>  env->uncached_cpsr = (env->uncached_cpsr & ~CPSR_M) | new_mode;
>  /* Set new mode endianness */
>  env->uncached_cpsr &= ~CPSR_E;
> -if (env->cp15.sctlr_el[arm_current_el(env)] & SCTLR_EE) {
> +if (arm_sctlr(env, arm_current_el(env)) & SCTLR_EE) {
>  env->uncached_cpsr |= CPSR_E;
>  }
>  /* J and IL must always be cleared for exception entry */
> diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
> index d3194f2043..42c9141bb7 100644
> --- a/target/arm/pauth_helper.c
> +++

Re: [Qemu-devel] [PATCH for-4.2 17/24] target/arm: Update arm_mmu_idx for VHE

2019-07-25 Thread Alex Bennée



Richard Henderson  writes:

> This covers initial generation in arm_mmu_idx, and reconstruction
> in core_to_arm_mmu_idx.  As a conseqeuence, we also need a bit in
> TBFLAGS in order to make the latter reliable.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

> ---
>  target/arm/cpu.h|  2 ++
>  target/arm/helper.c | 42 +++---
>  2 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 4b537c4613..7310adfd9b 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -3158,6 +3158,8 @@ FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)
>  /* Target EL if we take a floating-point-disabled exception */
>  FIELD(TBFLAG_ANY, FPEXC_EL, 24, 2)
>  FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
> +/* For A profile only, if EL2 is AA64 and HCR_EL2.E2H is set.  */
> +FIELD(TBFLAG_ANY, E2H, 22, 1)
>
>  /* Bit usage when in AArch32 state: */
>  FIELD(TBFLAG_A32, THUMB, 0, 1)
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 2d5658f9e3..54c328b844 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -11250,21 +11250,29 @@ int fp_exception_el(CPUARMState *env, int cur_el)
>
>  ARMMMUIdx core_to_arm_mmu_idx(CPUARMState *env, int mmu_idx)
>  {
> +bool e2h;
> +
>  if (arm_feature(env, ARM_FEATURE_M)) {
>  return mmu_idx | ARM_MMU_IDX_M;
>  }
>
>  mmu_idx |= ARM_MMU_IDX_A;
> +if (mmu_idx & ARM_MMU_IDX_S) {
> +return mmu_idx;
> +}
> +
> +e2h = (env->cp15.hcr_el2 & HCR_E2H) != 0;
> +if (!arm_el_is_aa64(env, 2)) {
> +e2h = false;
> +}
> +
>  switch (mmu_idx) {
>  case ARMMMUIdx_E0:
> -return ARMMMUIdx_EL10_0;
> +return e2h ? ARMMMUIdx_EL20_0 : ARMMMUIdx_EL10_0;
>  case ARMMMUIdx_E1:
>  return ARMMMUIdx_EL10_1;
>  case ARMMMUIdx_E2:
> -case ARMMMUIdx_SE0:
> -case ARMMMUIdx_SE1:
> -case ARMMMUIdx_SE3:
> -return mmu_idx;
> +return e2h ? ARMMMUIdx_EL20_2 : ARMMMUIdx_E2;
>  default:
>  g_assert_not_reached();
>  }
> @@ -11292,24 +11300,28 @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState 
> *env, bool secstate)
>
>  ARMMMUIdx arm_mmu_idx(CPUARMState *env)
>  {
> +bool e2h, sec;
>  int el;
>
>  if (arm_feature(env, ARM_FEATURE_M)) {
>  return arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
>  }
>
> +sec = arm_is_secure_below_el3(env);
> +e2h = (env->cp15.hcr_el2 & HCR_E2H) != 0;
> +if (!arm_el_is_aa64(env, 2)) {
> +e2h = false;
> +}
> +
>  el = arm_current_el(env);
>  switch (el) {
>  case 0:
> -/* TODO: ARMv8.1-VHE */
> +return sec ? ARMMMUIdx_SE0 : e2h ? ARMMMUIdx_EL20_0 : 
> ARMMMUIdx_EL10_0;
>  case 1:
> -return (arm_is_secure_below_el3(env)
> -? ARMMMUIdx_SE0 + el
> -: ARMMMUIdx_EL10_0 + el);
> +return sec ? ARMMMUIdx_SE1 : ARMMMUIdx_EL10_1;
>  case 2:
> -/* TODO: ARMv8.1-VHE */
>  /* TODO: ARMv8.4-SecEL2 */
> -return ARMMMUIdx_E2;
> +return e2h ? ARMMMUIdx_EL20_2 : ARMMMUIdx_E2;
>  case 3:
>  return ARMMMUIdx_SE3;
>  default:
> @@ -11421,6 +11433,14 @@ void cpu_get_tb_cpu_state(CPUARMState *env, 
> target_ulong *pc,
>
>  flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX, 
> arm_to_core_mmu_idx(mmu_idx));
>
> +/*
> + * Include E2H in TBFLAGS so that core_to_arm_mmu_idx can
> + * reliably determine E1&0 vs E2&0 regimes.
> + */
> +if (arm_el_is_aa64(env, 2) && (env->cp15.hcr_el2 & HCR_E2H)) {
> +flags = FIELD_DP32(flags, TBFLAG_ANY, E2H, 1);
> +}
> +
>  /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
>   * states defined in the ARM ARM for software singlestep:
>   *  SS_ACTIVE   PSTATE.SS   State


--
Alex Bennée

[Qemu-devel] [PATCH 5/7] iotests: Disable broken streamOptimized tests

2019-07-25 Thread Max Reitz

streamOptimized does not support writes that do not span exactly one
cluster.  Furthermore, it cannot rewrite already allocated clusters.
As such, many iotests do not work with it.  Disable them.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/002 | 1 +
 tests/qemu-iotests/003 | 1 +
 tests/qemu-iotests/005 | 3 ++-
 tests/qemu-iotests/009 | 1 +
 tests/qemu-iotests/010 | 1 +
 tests/qemu-iotests/011 | 1 +
 tests/qemu-iotests/017 | 3 ++-
 tests/qemu-iotests/018 | 3 ++-
 tests/qemu-iotests/019 | 3 ++-
 tests/qemu-iotests/020 | 3 ++-
 tests/qemu-iotests/027 | 1 +
 tests/qemu-iotests/032 | 1 +
 tests/qemu-iotests/033 | 1 +
 tests/qemu-iotests/034 | 3 ++-
 tests/qemu-iotests/037 | 3 ++-
 tests/qemu-iotests/063 | 3 ++-
 tests/qemu-iotests/072 | 1 +
 tests/qemu-iotests/105 | 3 ++-
 tests/qemu-iotests/197 | 1 +
 tests/qemu-iotests/215 | 1 +
 tests/qemu-iotests/251 | 1 +
 21 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/002 b/tests/qemu-iotests/002
index fd413bce48..1a0d411df5 100755
--- a/tests/qemu-iotests/002
+++ b/tests/qemu-iotests/002
@@ -38,6 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt generic
 _supported_proto generic
+_unsupported_imgopts "subformat=streamOptimized"
 
 
 size=128M
diff --git a/tests/qemu-iotests/003 b/tests/qemu-iotests/003
index ccd3a39dfb..33eeade0de 100755
--- a/tests/qemu-iotests/003
+++ b/tests/qemu-iotests/003
@@ -38,6 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt generic
 _supported_proto generic
+_unsupported_imgopts "subformat=streamOptimized"
 
 size=128M
 offset=67M
diff --git a/tests/qemu-iotests/005 b/tests/qemu-iotests/005
index 9c7681c19b..58442762fe 100755
--- a/tests/qemu-iotests/005
+++ b/tests/qemu-iotests/005
@@ -43,7 +43,8 @@ _supported_fmt generic
 _supported_proto generic
 _supported_os Linux
 _unsupported_imgopts "subformat=twoGbMaxExtentFlat" \
- "subformat=twoGbMaxExtentSparse"
+ "subformat=twoGbMaxExtentSparse" \
+ "subformat=streamOptimized"
 
 # vpc is limited to 127GB, so we can't test it here
 if [ "$IMGFMT" = "vpc" ]; then
diff --git a/tests/qemu-iotests/009 b/tests/qemu-iotests/009
index 51b200db1d..4dc7d210f9 100755
--- a/tests/qemu-iotests/009
+++ b/tests/qemu-iotests/009
@@ -38,6 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt generic
 _supported_proto generic
+_unsupported_imgopts "subformat=streamOptimized"
 
 
 size=6G
diff --git a/tests/qemu-iotests/010 b/tests/qemu-iotests/010
index 48c533f632..df809b3088 100755
--- a/tests/qemu-iotests/010
+++ b/tests/qemu-iotests/010
@@ -38,6 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt generic
 _supported_proto generic
+_unsupported_imgopts "subformat=streamOptimized"
 
 
 size=6G
diff --git a/tests/qemu-iotests/011 b/tests/qemu-iotests/011
index 8b1fce069a..d02c38c461 100755
--- a/tests/qemu-iotests/011
+++ b/tests/qemu-iotests/011
@@ -38,6 +38,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 
 _supported_fmt generic
 _supported_proto generic
+_unsupported_imgopts "subformat=streamOptimized"
 
 
 size=6G
diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 79875de454..0a4b854e65 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -41,7 +41,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow qcow2 vmdk qed
 _supported_proto generic
 _unsupported_proto vxhs
-_unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
+_unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat" 
\
+ "subformat=streamOptimized"
 
 TEST_OFFSETS="0 4294967296"
 
diff --git a/tests/qemu-iotests/018 b/tests/qemu-iotests/018
index 78169838ba..c69ce09209 100755
--- a/tests/qemu-iotests/018
+++ b/tests/qemu-iotests/018
@@ -41,7 +41,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow qcow2 vmdk qed
 _supported_proto file
 _supported_os Linux
-_unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
+_unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat" 
\
+ "streamOptimized"
 
 TEST_OFFSETS="0 4294967296"
 
diff --git a/tests/qemu-iotests/019 b/tests/qemu-iotests/019
index a56dd30bed..b4f5234609 100755
--- a/tests/qemu-iotests/019
+++ b/tests/qemu-iotests/019
@@ -47,7 +47,8 @@ _supported_proto file
 _supported_os Linux
 _unsupported_imgopts "subformat=monolithicFlat" \
  "subformat=twoGbMaxExtentFlat" \
- "subformat=twoGbMaxExtentSparse"
+ "subformat=twoGbMaxExtentSparse" \
+ "subformat=streamOptimized"
 
 TEST_OFFSETS="0 4294967296"
 CLUSTER_SIZE=65536
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 6b0ebb37d2..f41b92f35f 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -44,7 +44,8 @@ _supported_fmt qcow qcow2 vmdk qed
 _supported_proto file

[Qemu-devel] [PATCH 7/7] iotests: Disable 126 for some vmdk subformats

2019-07-25 Thread Max Reitz

Several vmdk subformats do not work with iotest 126, so disable them.

(twoGbMaxExtentSparse actually should work, but fixing that is a bit
difficult.  The problem is that the vmdk descriptor file will contain a
referenc to "image:base.vmdk", which the block layer cannot open because
it does not know the protocol "image".  This is not trivial to solve,
because I suppose real protocols like "http://; should be supported.
Making vmdk treat all paths with a potential protocol prefix that the
block layer does not recognize as plain files seems a bit weird,
though.  Ignoring this problem does not seem too bad.)

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/126 | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tests/qemu-iotests/126 b/tests/qemu-iotests/126
index 9b0dcf9255..8e55d7c843 100755
--- a/tests/qemu-iotests/126
+++ b/tests/qemu-iotests/126
@@ -33,6 +33,12 @@ status=1 # failure is the default!
 
 # Needs backing file support
 _supported_fmt qcow qcow2 qed vmdk
+# (1) Flat vmdk images do not support backing files
+# (2) Split vmdk images simply fail this test right now.  Fixing that
+# is left for another day.
+_unsupported_imgopts "subformat=monolithicFlat" \
+ "subformat=twoGbMaxExtentFlat" \
+ "subformat=twoGbMaxExtentSparse"
 # This is the default protocol (and we want to test the difference between
 # colons which separate a protocol prefix from the rest and colons which are
 # just part of the filename, so we cannot test protocols which require a 
prefix)
-- 
2.21.0

[Qemu-devel] [PATCH 4/7] vmdk: Reject invalid compressed writes

2019-07-25 Thread Max Reitz

Compressed writes generally have to write full clusters, not just in
theory but also in practice when it comes to vmdk's streamOptimized
subformat.  It currently is just silently broken for writes with
non-zero in-cluster offsets:

$ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
$ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
wrote 4096/4096 bytes at offset 4096
4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
read failed: Invalid argument

(The technical reason is that vmdk_write_extent() just writes the
incomplete compressed data actually to offset 4k.  When reading the
data, vmdk_read_extent() looks at offset 0 and finds the compressed data
size to be 0, because that is what it reads from there.  This yields an
error.)

For incomplete writes with zero in-cluster offsets, the error path when
reading the rest of the cluster is a bit different, but the result is
the same:

$ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
$ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
read failed: Invalid argument

(Here, vmdk_read_extent() finds the data and then sees that the
uncompressed data is short.)

It is better to reject invalid writes than to make the user believe they
might have succeeded and then fail when trying to read it back.

Signed-off-by: Max Reitz 
---
 block/vmdk.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/vmdk.c b/block/vmdk.c
index db6acfc31e..641acacfe0 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 if (extent->compressed) {
 void *compressed_data;
 
+/* Only whole clusters */
+if (offset_in_cluster ||
+n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
+(n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
+ offset + n_bytes != extent->end_sector * SECTOR_SIZE))
+{
+ret = -EINVAL;
+goto out;
+}
+
 if (!extent->has_marker) {
 ret = -EINVAL;
 goto out;
-- 
2.21.0

Re: [Qemu-devel] [PATCH for-4.2 16/24] target/arm: Add regime_has_2_ranges

2019-07-25 Thread Alex Bennée



Richard Henderson  writes:

Can you elucidate what we mean by having 2 ranges?


> Signed-off-by: Richard Henderson 
> ---
>  target/arm/internals.h | 16 
>  target/arm/helper.c| 22 +-
>  target/arm/translate-a64.c |  3 +--
>  3 files changed, 22 insertions(+), 19 deletions(-)
>
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index 047cbfd1d7..f653e0e7f5 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -824,6 +824,22 @@ static inline void arm_call_el_change_hook(ARMCPU *cpu)
>  }
>  }
>
> +/* Return true if this address translation regime has two ranges.  */
> +static inline bool regime_has_2_ranges(ARMMMUIdx mmu_idx)
> +{
> +switch (mmu_idx) {
> +case ARMMMUIdx_Stage1_E0:
> +case ARMMMUIdx_Stage1_E1:
> +case ARMMMUIdx_EL10_0:
> +case ARMMMUIdx_EL10_1:
> +case ARMMMUIdx_EL20_0:
> +case ARMMMUIdx_EL20_2:
> +return true;
> +default:
> +return false;
> +}
> +}
> +
>  /* Return true if this address translation regime is secure */
>  static inline bool regime_is_secure(CPUARMState *env, ARMMMUIdx mmu_idx)
>  {
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 240a6ed168..2d5658f9e3 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -8999,15 +8999,8 @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx 
> mmu_idx, bool is_aa64,
>  }
>
>  if (is_aa64) {
> -switch (regime_el(env, mmu_idx)) {
> -case 1:
> -if (!is_user) {
> -xn = pxn || (user_rw & PAGE_WRITE);
> -}
> -break;
> -case 2:
> -case 3:
> -break;
> +if (regime_has_2_ranges(mmu_idx) && !is_user) {
> +xn = pxn || (user_rw & PAGE_WRITE);
>  }
>  } else if (arm_feature(env, ARM_FEATURE_V7)) {
>  switch (regime_el(env, mmu_idx)) {
> @@ -9541,7 +9534,6 @@ ARMVAParameters aa64_va_parameters_both(CPUARMState 
> *env, uint64_t va,
>  ARMMMUIdx mmu_idx)
>  {
>  uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
> -uint32_t el = regime_el(env, mmu_idx);
>  bool tbi, tbid, epd, hpd, using16k, using64k;
>  int select, tsz;
>
> @@ -9551,7 +9543,7 @@ ARMVAParameters aa64_va_parameters_both(CPUARMState 
> *env, uint64_t va,
>   */
>  select = extract64(va, 55, 1);
>
> -if (el > 1) {
> +if (!regime_has_2_ranges(mmu_idx)) {
>  tsz = extract32(tcr, 0, 6);
>  using64k = extract32(tcr, 14, 1);
>  using16k = extract32(tcr, 15, 1);
> @@ -9707,10 +9699,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
> target_ulong address,
>  param = aa64_va_parameters(env, address, mmu_idx,
> access_type != MMU_INST_FETCH);
>  level = 0;
> -/* If we are in 64-bit EL2 or EL3 then there is no TTBR1, so mark it
> - * invalid.
> - */
> -ttbr1_valid = (el < 2);
> +ttbr1_valid = regime_has_2_ranges(mmu_idx);
>  addrsize = 64 - 8 * param.tbi;
>  inputsize = 64 - param.tsz;
>  } else {
> @@ -11361,8 +11350,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, 
> target_ulong *pc,
>  ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
>  int tbii, tbid;
>
> -/* FIXME: ARMv8.1-VHE S2 translation regime.  */
> -if (regime_el(env, stage1) < 2) {
> +if (regime_has_2_ranges(mmu_idx)) {
>  ARMVAParameters p1 = aa64_va_parameters_both(env, -1, 
> stage1);
>  tbid = (p1.tbi << 1) | p0.tbi;
>  tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index a9e65fe347..5679349310 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -176,8 +176,7 @@ static void gen_top_byte_ignore(DisasContext *s, TCGv_i64 
> dst,
>  if (tbi == 0) {
>  /* Load unmodified address */
>  tcg_gen_mov_i64(dst, src);
> -} else if (s->current_el >= 2) {
> -/* FIXME: ARMv8.1-VHE S2 translation regime.  */
> +} else if (!regime_has_2_ranges(s->mmu_idx)) {
>  /* Force tag byte to all zero */
>  tcg_gen_extract_i64(dst, src, 0, 56);
>  } else {


--
Alex Bennée

[Qemu-devel] [PATCH 1/7] iotests: Fix _filter_img_create()

2019-07-25 Thread Max Reitz

fe646693acc changed qemu-img create's output so that it no longer prints
single quotes around parameter values.  The subformat and adapter_type
filters in _filter_img_create() have never been adapted to that change.

Fixes: fe646693acc13ac48b98435d14149ab04dc597bc
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/059.out   | 16 
 tests/qemu-iotests/common.filter |  4 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index 4fab42a28c..77d8984428 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -13,17 +13,17 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 qemu-io: can't open device TEST_DIR/t.vmdk: L1 size too big
 
 === Testing monolithicFlat creation and opening ===
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648 
subformat=monolithicFlat
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 2 GiB (2147483648 bytes)
 
 === Testing monolithicFlat with zeroed_grain ===
 qemu-img: TEST_DIR/t.IMGFMT: Flat image can't enable zeroed grain
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648 
subformat=monolithicFlat
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648
 
 === Testing big twoGbMaxExtentFlat ===
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824000 
subformat=twoGbMaxExtentFlat
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824000
 image: TEST_DIR/t.vmdk
 file format: vmdk
 virtual size: 0.977 TiB (1073741824000 bytes)
@@ -2038,7 +2038,7 @@ Format specific information:
 qemu-img: Could not open 'TEST_DIR/t.IMGFMT': Invalid extent line: RW 12582912 
VMFS "dummy.IMGFMT" 1
 
 === Testing truncated sparse ===
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=107374182400 
subformat=monolithicSparse
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=107374182400
 qemu-img: Could not open 'TEST_DIR/t.IMGFMT': File truncated, expecting at 
least 13172736 bytes
 
 === Converting to streamOptimized from image with small cluster size===
@@ -2049,7 +2049,7 @@ wrote 512/512 bytes at offset 10240
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 === Testing monolithicFlat with internally generated JSON file name ===
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
subformat=monolithicFlat
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 qemu-io: can't open: Cannot use relative extent paths with VMDK descriptor 
file 'json:{"image": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}, 
"driver": "blkdebug", "inject-error.0.event": "read_aio"}'
 
 === Testing version 3 ===
@@ -2259,7 +2259,7 @@ read 512/512 bytes at offset 64931328
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 === Testing 4TB monolithicFlat creation and IO ===
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4398046511104 
subformat=monolithicFlat
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4398046511104
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 4 TiB (4398046511104 bytes)
@@ -2333,7 +2333,7 @@ read 1024/1024 bytes at offset 966367641600
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 === Testing qemu-img map on extents ===
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33285996544 
subformat=monolithicSparse
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33285996544
 wrote 1024/1024 bytes at offset 65024
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 1024/1024 bytes at offset 2147483136
@@ -2344,7 +2344,7 @@ Offset  Length  Mapped to   File
 0   0x2 0x3fTEST_DIR/t.vmdk
 0x7fff  0x2 0x41TEST_DIR/t.vmdk
 0x14000 0x1 0x43TEST_DIR/t.vmdk
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33285996544 
subformat=twoGbMaxExtentSparse
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33285996544
 wrote 1024/1024 bytes at offset 65024
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 1024/1024 bytes at offset 2147483136
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 35fddc746f..64dd9200f3 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -130,8 +130,8 @@ _filter_img_create()
 -e "s# compat6=\\(on\\|off\\)##g" \
 -e "s# static=\\(on\\|off\\)##g" \
 -e "s# zeroed_grain=\\(on\\|off\\)##g" \
--e "s# subformat='[^']*'##g" \
--e "s# adapter_type='[^']*'##g" \
+-e "s# subformat=[^ ]*##g" \
+-e "s# adapter_type=[^ ]*##g" \
 -e "s# hwversion=[^ ]*##g" \
 -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
 -e "s# block_size=[0-9]\\+##g" \
-- 
2.21.0

[Qemu-devel] [PATCH 3/7] iotests: Keep testing broken relative extent paths

2019-07-25 Thread Max Reitz

We had a test for a case where relative extent paths did not work, but
unfortunately we just fixed the underlying problem, so it works now.
This patch adds a new test case that still fails.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/059 | 27 +++
 tests/qemu-iotests/059.out |  4 
 2 files changed, 31 insertions(+)

diff --git a/tests/qemu-iotests/059 b/tests/qemu-iotests/059
index fbed5f9483..2a883d0f21 100755
--- a/tests/qemu-iotests/059
+++ b/tests/qemu-iotests/059
@@ -114,6 +114,8 @@ $QEMU_IMG convert -f qcow2 -O vmdk -o 
subformat=streamOptimized "$TEST_IMG.qcow2
 
 echo
 echo "=== Testing monolithicFlat with internally generated JSON file name ==="
+
+echo '--- blkdebug ---'
 # Should work, because bdrv_dirname() works fine with blkdebug
 IMGOPTS="subformat=monolithicFlat" _make_test_img 64M
 $QEMU_IO -c "open -o 
driver=$IMGFMT,file.driver=blkdebug,file.image.filename=$TEST_IMG,file.inject-error.0.event=read_aio"
 \
@@ -122,6 +124,31 @@ $QEMU_IO -c "open -o 
driver=$IMGFMT,file.driver=blkdebug,file.image.filename=$TE
 | _filter_testdir | _filter_imgfmt | _filter_img_info
 _cleanup_test_img
 
+echo '--- quorum ---'
+# Should not work, because bdrv_dirname() does not work with blkdebug
+IMGOPTS="subformat=monolithicFlat" _make_test_img 64M
+cp "$TEST_IMG" "$TEST_IMG.orig"
+
+filename="json:{
+\"driver\": \"$IMGFMT\",
+\"file\": {
+\"driver\": \"quorum\",
+\"children\": [ {
+\"driver\": \"file\",
+\"filename\": \"$TEST_IMG\"
+}, {
+\"driver\": \"file\",
+\"filename\": \"$TEST_IMG.orig\"
+} ],
+\"vote-threshold\": 1
+} }"
+
+filename=$(echo "$filename" | tr '\n' ' ' | sed -e 's/\s\+/ /g')
+$QEMU_IMG info "$filename" 2>&1 \
+| sed -e "s/'json:[^']*'/\$QUORUM_FILE/g" \
+| _filter_testdir | _filter_imgfmt | _filter_img_info
+
+
 echo
 echo "=== Testing version 3 ==="
 _use_sample_img iotest-version3.vmdk.bz2
diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index 120cddd207..f8895ba434 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -2049,10 +2049,14 @@ wrote 512/512 bytes at offset 10240
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 === Testing monolithicFlat with internally generated JSON file name ===
+--- blkdebug ---
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 format name: IMGFMT
 cluster size: 0 bytes
 vm state offset: 0 bytes
+--- quorum ---
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: Could not open $QUORUM_FILE: Cannot use relative paths with VMDK 
descriptor file $QUORUM_FILE: Cannot generate a base directory for quorum nodes
 
 === Testing version 3 ===
 image: TEST_DIR/iotest-version3.IMGFMT
-- 
2.21.0

[Qemu-devel] [PATCH 6/7] iotests: Disable 110 for vmdk.twoGbMaxExtentSparse

2019-07-25 Thread Max Reitz

The error message for the test case where we have a quorum node for
which no directory name can be generated is different: For
twoGbMaxExtentSparse, it complains that it cannot open the extent file.
For other (sub)formats, it just notes that it cannot determine the
backing file path.  Both are fine, but just disable twoGbMaxExtentSparse
for simplicity's sake.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/110 | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
index 2cdc7c8a72..2ef516baf1 100755
--- a/tests/qemu-iotests/110
+++ b/tests/qemu-iotests/110
@@ -40,7 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 # Any format supporting backing files
 _supported_fmt qed qcow qcow2 vmdk
 _supported_proto file
-_unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat"
+_unsupported_imgopts "subformat=monolithicFlat" "subformat=twoGbMaxExtentFlat" 
\
+ "subformat=twoGbMaxExtentSparse"
 
 TEST_IMG_REL=$(basename "$TEST_IMG")
 
-- 
2.21.0

[Qemu-devel] [PATCH 2/7] vmdk: Use bdrv_dirname() for relative extent paths

2019-07-25 Thread Max Reitz

This makes iotest 033 pass with e.g. subformat=monolithicFlat.  It also
turns a former error in 059 into success.

Signed-off-by: Max Reitz 
---
 block/vmdk.c   | 54 --
 tests/qemu-iotests/059 |  7 +++--
 tests/qemu-iotests/059.out |  4 ++-
 3 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index bd36ece125..db6acfc31e 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1076,8 +1076,7 @@ static const char *next_line(const char *s)
 }
 
 static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
-  const char *desc_file_path, QDict *options,
-  Error **errp)
+  QDict *options, Error **errp)
 {
 int ret;
 int matches;
@@ -1087,6 +1086,7 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 const char *p, *np;
 int64_t sectors = 0;
 int64_t flat_offset;
+char *desc_file_dir = NULL;
 char *extent_path;
 BdrvChild *extent_file;
 BDRVVmdkState *s = bs->opaque;
@@ -1130,16 +1130,23 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 continue;
 }
 
-if (!path_is_absolute(fname) && !path_has_protocol(fname) &&
-!desc_file_path[0])
-{
-bdrv_refresh_filename(bs->file->bs);
-error_setg(errp, "Cannot use relative extent paths with VMDK "
-   "descriptor file '%s'", bs->file->bs->filename);
-return -EINVAL;
-}
+if (path_is_absolute(fname) || path_has_protocol(fname)) {
+extent_path = g_strdup(fname);
+} else {
+if (!desc_file_dir) {
+desc_file_dir = bdrv_dirname(bs->file->bs, errp);
+if (!desc_file_dir) {
+bdrv_refresh_filename(bs->file->bs);
+error_prepend(errp, "Cannot use relative paths with VMDK "
+  "descriptor file '%s': ",
+  bs->file->bs->filename);
+ret = -EINVAL;
+goto out;
+}
+}
 
-extent_path = path_combine(desc_file_path, fname);
+extent_path = g_strconcat(desc_file_dir, fname, NULL);
+}
 
 ret = snprintf(extent_opt_prefix, 32, "extents.%d", s->num_extents);
 assert(ret < 32);
@@ -1149,7 +1156,8 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 g_free(extent_path);
 if (local_err) {
 error_propagate(errp, local_err);
-return -EINVAL;
+ret = -EINVAL;
+goto out;
 }
 
 /* save to extents array */
@@ -1160,7 +1168,7 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 0, 0, 0, 0, 0, , errp);
 if (ret < 0) {
 bdrv_unref_child(bs, extent_file);
-return ret;
+goto out;
 }
 extent->flat_start_offset = flat_offset << 9;
 } else if (!strcmp(type, "SPARSE") || !strcmp(type, "VMFSSPARSE")) {
@@ -1175,24 +1183,27 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 g_free(buf);
 if (ret) {
 bdrv_unref_child(bs, extent_file);
-return ret;
+goto out;
 }
 extent = >extents[s->num_extents - 1];
 } else if (!strcmp(type, "SESPARSE")) {
 ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
 if (ret) {
 bdrv_unref_child(bs, extent_file);
-return ret;
+goto out;
 }
 extent = >extents[s->num_extents - 1];
 } else {
 error_setg(errp, "Unsupported extent type '%s'", type);
 bdrv_unref_child(bs, extent_file);
-return -ENOTSUP;
+ret = -ENOTSUP;
+goto out;
 }
 extent->type = g_strdup(type);
 }
-return 0;
+
+ret = 0;
+goto out;
 
 invalid:
 np = next_line(p);
@@ -1201,7 +1212,11 @@ invalid:
 np--;
 }
 error_setg(errp, "Invalid extent line: %.*s", (int)(np - p), p);
-return -EINVAL;
+ret = -EINVAL;
+
+out:
+g_free(desc_file_dir);
+return ret;
 }
 
 static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
@@ -1228,8 +1243,7 @@ static int vmdk_open_desc_file(BlockDriverState *bs, int 
flags, char *buf,
 }
 s->create_type = g_strdup(ct);
 s->desc_offset = 0;
-ret = vmdk_parse_extents(buf, bs, bs->file->bs->exact_filename, options,
- errp);
+ret = vmdk_parse_extents(buf, bs, options, errp);
 exit:
 return ret;
 }
diff --git a/tests/qemu-iotests/059 b/tests/qemu-iotests/059
index 279aee6815..fbed5f9483 100755

Re: [Qemu-devel] [PATCH for-4.2 15/24] target/arm: Reorganize ARMMMUIdx

2019-07-25 Thread Alex Bennée



Richard Henderson  writes:

> Prepare for, but do not yet implement, the EL2&0 regime and the
> Secure EL2 regime.  Rename all of the a-profile symbols to make
> the distictions clearer.

Perhaps a summary of the renames would be useful here? My head is
spinning a little given the number that we have and not being familiar
with the naming convention.

  ARMMMUIdx[_StageN]_[M][S]Enn[_nn]

  _StageN - stage N only, otherwise assumed 1+2 lookup?
  M - M profile (interleaved with A profile)
  S - Secure
  Enn - visible exception level, so E02 is shared EL0 and EL2?
  _nn - not sure?

The cpu.h comment is very detailed but doesn't actually give me enough
information to decode an ARMMMUIdx when I come across it in the code.

Otherwise:

Reviewed-by: Alex Bennée 

>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/cpu.h   | 180 ++--
>  target/arm/internals.h |  46 ---
>  target/arm/translate.h |   2 +-
>  target/arm/helper.c| 237 ++---
>  target/arm/m_helper.c  |   6 +-
>  target/arm/translate-a64.c |  11 +-
>  target/arm/translate.c |  17 ++-
>  7 files changed, 273 insertions(+), 226 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index a0f10b60eb..4b537c4613 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -2764,7 +2764,10 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> unsigned int excp_idx,
>   *  + NonSecure EL1 & 0 stage 1
>   *  + NonSecure EL1 & 0 stage 2
>   *  + NonSecure EL2
> - *  + Secure EL1 & EL0
> + *  + NonSecure EL2 & 0   (ARMv8.1-VHE)
> + *  + Secure EL0
> + *  + Secure EL1
> + *  + Secure EL2  (ARMv8.4-SecEL2)
>   *  + Secure EL3
>   * If EL3 is 32-bit:
>   *  + NonSecure PL1 & 0 stage 1
> @@ -2774,8 +2777,9 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> unsigned int excp_idx,
>   * (reminder: for 32 bit EL3, Secure PL1 is *EL3*, not EL1.)
>   *
>   * For QEMU, an mmu_idx is not quite the same as a translation regime 
> because:
> - *  1. we need to split the "EL1 & 0" regimes into two mmu_idxes, because 
> they
> - * may differ in access permissions even if the VA->PA map is the same
> + *  1. we need to split the "EL1 & 0" and "EL2 & 0" regimes into two 
> mmu_idxes,
> + * because they may differ in access permissions even if the VA->PA map 
> is
> + * the same
>   *  2. we want to cache in our TLB the full VA->IPA->PA lookup for a stage 
> 1+2
>   * translation, which means that we have one mmu_idx that deals with two
>   * concatenated translation regimes [this sort of combined s1+2 TLB is
> @@ -2787,19 +2791,26 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> unsigned int excp_idx,
>   *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
>   * translation regimes, because they map reasonably well to each other
>   * and they can't both be active at the same time.
> - * This gives us the following list of mmu_idx values:
> + *  5. we want to be able to use the TLB for accesses done as part of a
> + * stage1 page table walk, rather than having to walk the stage2 page
> + * table over and over.
>   *
> - * NS EL0 (aka NS PL0) stage 1+2
> - * NS EL1 (aka NS PL1) stage 1+2
> + * This gives us the following list of cases:
> + *
> + * NS EL0 (aka NS PL0) EL1&0 stage 1+2
> + * NS EL1 (aka NS PL1) EL1&0 stage 1+2
> + * NS EL0 (aka NS PL0) EL2&0
> + * NS EL2 (aka NS PL2) EL2&0
>   * NS EL2 (aka NS PL2)
> - * S EL3 (aka S PL1)
>   * S EL0 (aka S PL0)
>   * S EL1 (not used if EL3 is 32 bit)
> - * NS EL0+1 stage 2
> + * S EL2 (not used if EL3 is 32 bit)
> + * S EL3 (aka S PL1)
> + * NS EL0&1 stage 2
>   *
> - * (The last of these is an mmu_idx because we want to be able to use the TLB
> - * for the accesses done as part of a stage 1 page table walk, rather than
> - * having to walk the stage 2 page table over and over.)
> + * We then merge the two NS EL0 cases, and two NS EL2 cases to produce
> + * 8 different mmu_idx.  We retain separate symbols for these four cases
> + * in order to simplify distinguishing them in the code.
>   *
>   * R profile CPUs have an MPU, but can use the same set of MMU indexes
>   * as A profile. They only need to distinguish NS EL0 and NS EL1 (and
> @@ -2837,62 +2848,93 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> unsigned int excp_idx,
>   * For M profile we arrange them to have a bit for priv, a bit for negpri
>   * and a bit for secure.
>   */
> -#define ARM_MMU_IDX_A 0x10 /* A profile */
> -#define ARM_MMU_IDX_NOTLB 0x20 /* does not have a TLB */
> -#define ARM_MMU_IDX_M 0x40 /* M profile */
> +#define ARM_MMU_IDX_S 0x04  /* Secure */
> +#define ARM_MMU_IDX_A 0x10  /* A profile */
> +#define ARM_MMU_IDX_M 0x20  /* M profile */
> +#define ARM_MMU_IDX_NOTLB 0x100 /* does not have a TLB */
>
> -/* meanings of the bits for M profile mmu idx values */
> -#define ARM_MMU_IDX_M_PRIV 0x1
> +/* Meanings of the bits for A profile

[Qemu-devel] [PATCH 0/7] vmdk: Misc fixes

2019-07-25 Thread Max Reitz

I made the mistake of trying to run the iotests with all non-default
subformats our vmdk driver has to offer:
- monolithicFlat
- twoGbMaxExtentSparse
- twoGbMaxExtentFlat
- streamOptimized

Many things broke, so this series fixes what I found.  It’s mostly just
iotest fixes, but there are actually two real fixes in here.


Max Reitz (7):
  iotests: Fix _filter_img_create()
  vmdk: Use bdrv_dirname() for relative extent paths
  iotests: Keep testing broken relative extent paths
  vmdk: Reject invalid compressed writes
  iotests: Disable broken streamOptimized tests
  iotests: Disable 110 for vmdk.twoGbMaxExtentSparse
  iotests: Disable 126 for some vmdk subformats

 block/vmdk.c | 64 ++--
 tests/qemu-iotests/002   |  1 +
 tests/qemu-iotests/003   |  1 +
 tests/qemu-iotests/005   |  3 +-
 tests/qemu-iotests/009   |  1 +
 tests/qemu-iotests/010   |  1 +
 tests/qemu-iotests/011   |  1 +
 tests/qemu-iotests/017   |  3 +-
 tests/qemu-iotests/018   |  3 +-
 tests/qemu-iotests/019   |  3 +-
 tests/qemu-iotests/020   |  3 +-
 tests/qemu-iotests/027   |  1 +
 tests/qemu-iotests/032   |  1 +
 tests/qemu-iotests/033   |  1 +
 tests/qemu-iotests/034   |  3 +-
 tests/qemu-iotests/037   |  3 +-
 tests/qemu-iotests/059   | 34 -
 tests/qemu-iotests/059.out   | 24 +++-
 tests/qemu-iotests/063   |  3 +-
 tests/qemu-iotests/072   |  1 +
 tests/qemu-iotests/105   |  3 +-
 tests/qemu-iotests/110   |  3 +-
 tests/qemu-iotests/126   |  6 +++
 tests/qemu-iotests/197   |  1 +
 tests/qemu-iotests/215   |  1 +
 tests/qemu-iotests/251   |  1 +
 tests/qemu-iotests/common.filter |  4 +-
 27 files changed, 131 insertions(+), 43 deletions(-)

-- 
2.21.0

[Qemu-devel] [PATCH 3/3] vpc: Do not return RAW from block_status

2019-07-25 Thread Max Reitz

vpc is not really a passthrough driver, even when using the fixed
subformat (where host and guest offsets are equal).  It should handle
preallocation like all other drivers do, namely by returning
DATA | RECURSE instead of RAW.

There is no tangible difference but the fact that bdrv_is_allocated() no
longer falls through to the protocol layer.

Signed-off-by: Max Reitz 
---
 block/vpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/vpc.c b/block/vpc.c
index d4776ee8a5..b25aab0425 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -737,7 +737,7 @@ static int coroutine_fn 
vpc_co_block_status(BlockDriverState *bs,
 *pnum = bytes;
 *map = offset;
 *file = bs->file->bs;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_RECURSE;
 }
 
 qemu_co_mutex_lock(>lock);
-- 
2.21.0

[Qemu-devel] [PATCH 2/3] vmdk: Make block_status recurse for flat extents

2019-07-25 Thread Max Reitz

Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
Signed-off-by: Max Reitz 
---
 block/vmdk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/vmdk.c b/block/vmdk.c
index bd36ece125..fd78fd0ccf 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1692,6 +1692,9 @@ static int coroutine_fn 
vmdk_co_block_status(BlockDriverState *bs,
 if (!extent->compressed) {
 ret |= BDRV_BLOCK_OFFSET_VALID;
 *map = cluster_offset + index_in_cluster;
+if (extent->flat) {
+ret |= BDRV_BLOCK_RECURSE;
+}
 }
 *file = extent->file->bs;
 break;
-- 
2.21.0

[Qemu-devel] [PATCH 1/3] vdi: Make block_status recurse for fixed images

2019-07-25 Thread Max Reitz

Suggested-by: Vladimir Sementsov-Ogievskiy 
Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
Signed-off-by: Max Reitz 
---
 block/vdi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/vdi.c b/block/vdi.c
index b9845a4cbd..40d40c34d5 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -542,7 +542,8 @@ static int coroutine_fn 
vdi_co_block_status(BlockDriverState *bs,
 *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
 index_in_block;
 *file = bs->file->bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID |
+(s->header.image_type == VDI_TYPE_STATIC ? BDRV_BLOCK_RECURSE : 0);
 }
 
 static int coroutine_fn
-- 
2.21.0

[Qemu-devel] [PATCH 0/3] block: Make various formats' block_status recurse again

2019-07-25 Thread Max Reitz

Hi,

69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
would only go down to the protocol layer if the format layer returned
BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
information whether a given range in the image is zero or not.
Generally, this is because the image is preallocated and thus all ranges
appear as zeroes.

However, it only implemented this preallocation detection for qcow2.
There are more formats that support preallocation, though: vdi, vhdx,
vmdk, vpc.  (Funny how they all start with “v”.)

For vdi, vmdk, and vpc, the fix is rather simple, because they really
have different subformats depending on whether an image is preallocated
or not.  This makes the check very simple.

vhdx is more like qcow2, where after the image has been created, it
isn’t clear whether it’s been preallocated or everything is allocated
because everything was already written to.  69f47505ee added a heuristic
to qcow2 to get around this, but I think that’s too much for vhdx.  I
just left it unfixed, because I don’t care that much, honestly (and I
don’t think anyone else does).


Max Reitz (3):
  vdi: Make block_status recurse for fixed images
  vmdk: Make block_status recurse for flat extents
  vpc: Do not return RAW from block_status

 block/vdi.c  | 3 ++-
 block/vmdk.c | 3 +++
 block/vpc.c  | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.21.0

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Sergio Lopez


Michael S. Tsirkin  writes:

> On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
>> > 
>> > Paolo Bonzini  writes:
>> > 
>> > > On 25/07/19 15:26, Stefan Hajnoczi wrote:
>> > >> The microvm design has a premise and it can be answered definitively
>> > >> through performance analysis.
>> > >> 
>> > >> If I had to explain to someone why PCI or ACPI significantly slows
>> > >> things down, I couldn't honestly do so.  I say significantly because
>> > >> PCI init definitely requires more vmexits but can it be a small
>> > >> number?  For ACPI I have no idea why it would consume significant
>> > >> amounts of time.
>> > >
>> > > My guess is that it's just a lot of code that has to run. :(
>> > 
>> > I think I haven't shared any numbers about ACPI.
>> > 
>> > I don't have details about where exactly the time is spent, but
>> > compiling a guest kernel without ACPI decreases the average boot time in
>> > ~12ms, and the kernel's unstripped ELF binary size goes down in a
>> > whooping ~300KiB.
>> 
>> At least the binary size is hardly surprising.
>> 
>> I'm guessing you built in lots of drivers.
>> 
>> It would be educational to try to enable ACPI core but disable all
>> optional features.

I just tried disabling everything that menuconfig allowed me to. Saves
~27KiB and doesn't improve boot time.

> Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.

I also tried enabling this one in my original config. It saves ~11.5KiB,
and has on impact on boot time either.

>> 
>> > On the other hand, removing ACPI from QEMU decreases its initialization
>> > time in ~5ms, and the binary size is ~183KiB smaller.
>> 
>> Yes - ACPI generation uses a ton of allocations and data copies.
>> 
>> Need to play with pre-allocation strategies. Maybe something
>> as simple as:
>> 
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index f3fdfefcd5..24becc069e 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
>> *machine)
>>  acpi_get_pci_holes(_hole, _hole64);
>>  acpi_get_slic_oem(_oem);
>>  
>> +#define DEFAULT_ARRAY_SIZE 16
>>  table_offsets = g_array_new(false, true /* clear */,
>> -sizeof(uint32_t));
>> +sizeof(uint32_t),
>> +DEFAULT_ARRAY_SIZE);
>>  ACPI_BUILD_DPRINTF("init ACPI tables\n");
>>  
>>  bios_linker_loader_alloc(tables->linker,
>> 
>> will already help a bit.
>> 
>> > 
>> > IMHO, those are pretty relevant savings on both fronts.
>> > 
>> > >> Until we have this knowledge, the premise of microvm is unproven and
>> > >> merging it would be premature because maybe we can get into the same
>> > >> ballpark by optimizing existing code.
>> > >> 
>> > >> I'm sorry for being a pain.  I actually think the analysis will
>> > >> support microvm, but it still needs to be done in order to justify it.
>> > >
>> > > No, you're not a pain, you're explaining your reasoning and that helps.
>> > >
>> > > To me *maintainability is the biggest consideration* when introducing a
>> > > new feature.  "We can do just as well with q35" is a good reason to
>> > > deprecate and delete microvm, but not a good reason to reject it now as
>> > > long as microvm is good enough in terms of maintainability.  Keeping it
>> > > out of tree only makes it harder to do this kind of experiment.  virtio
>> > > 1 seems to be the biggest remaining blocker and I think it'd be a good
>> > > thing to have even for the ARM virt machine type.
>> > >
>> > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
>> > > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
>> > > 30% of the whole boot time and reducing it is the hardest part.  If
>> > > having microvm in tree can help reducing it, good.  Yes, it will get
>> > > users, but most likely they will have to support pc or q35 as a fallback
>> > > so we could still delete microvm at any time with the due deprecation
>> > > period if it turns out to be a failed experiment.
>> > >
>> > > Whether to use qboot or SeaBIOS for microvm is another story, but it's
>> > > an implementation detail as long as the ROM size doesn't change and/or
>> > > we don't do versioned machine types.  So we can switch from one to the
>> > > other at any time; we can also include qboot directly in QEMU's tree,
>> > > without going through a submodule, which also reduces the infrastructure
>> > > needed (mirrors, etc.) and makes it easier to delete it.
>> > >
>> > > Paolo
>> > >
>> > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
>> > > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
>> > > end up measured as PCI in SeaBIOS, due to different init order, so the
>> > > real firmware cost of PAM and PCI

[Qemu-devel] [PULL v1 0/2] Merge tpm 2019/07/25 v1

2019-07-25 Thread Stefan Berger

This series of patches improves error handling with the TPM backend.

   Stefan

The following changes since commit 9d2e1fcd14c2bae5be1992214a03c0ddff714c80:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2019-07-22 13:20:49 +0100)

are available in the Git repository at:

  git://github.com/stefanberger/qemu-tpm.git tags/pull-tpm-2019-07-25-1

for you to fetch changes up to 7e095e84ba0b7c0a1ac45bc6824dace2fd352e56:

  tpm_emulator: Translate TPM error codes to strings (2019-07-25 11:37:10 -0400)


Stefan Berger (2):
  tpm: Exit in reset when backend indicates failure
  tpm_emulator: Translate TPM error codes to strings

 hw/tpm/tpm_crb.c  |  4 +++-
 hw/tpm/tpm_emulator.c | 60 
++--
 hw/tpm/tpm_int.h  | 13 +
 hw/tpm/tpm_tis.c  |  4 +++-
 4 files changed, 69 insertions(+), 12 deletions(-)

-- 
2.20.1

[Qemu-devel] [PULL v1 2/2] tpm_emulator: Translate TPM error codes to strings

2019-07-25 Thread Stefan Berger

Implement a function to translate TPM error codes to strings so that
at least the most common error codes can be translated to human
readable strings.

Signed-off-by: Stefan Berger 
Reviewed-by: Marc-André Lureau 
---
 hw/tpm/tpm_emulator.c | 60 +++
 hw/tpm/tpm_int.h  | 13 ++
 2 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/hw/tpm/tpm_emulator.c b/hw/tpm/tpm_emulator.c
index 1288cbcb8d..fc0b512f4f 100644
--- a/hw/tpm/tpm_emulator.c
+++ b/hw/tpm/tpm_emulator.c
@@ -82,6 +82,40 @@ typedef struct TPMEmulator {
 TPMBlobBuffers state_blobs;
 } TPMEmulator;
 
+struct tpm_error {
+uint32_t tpm_result;
+const char *string;
+};
+
+static const struct tpm_error tpm_errors[] = {
+/* TPM 1.2 error codes */
+{ TPM_BAD_PARAMETER   , "a parameter is bad" },
+{ TPM_FAIL, "operation failed" },
+{ TPM_KEYNOTFOUND , "key could not be found" },
+{ TPM_BAD_PARAM_SIZE  , "bad parameter size"},
+{ TPM_ENCRYPT_ERROR   , "encryption error" },
+{ TPM_DECRYPT_ERROR   , "decryption error" },
+{ TPM_BAD_KEY_PROPERTY, "bad key property" },
+{ TPM_BAD_MODE, "bad (encryption) mode" },
+{ TPM_BAD_VERSION , "bad version identifier" },
+{ TPM_BAD_LOCALITY, "bad locality" },
+/* TPM 2 error codes */
+{ TPM_RC_FAILURE , "operation failed" },
+{ TPM_RC_LOCALITY, "bad locality" },
+{ TPM_RC_INSUFFICIENT, "insufficient amount of data" },
+};
+
+static const char *tpm_emulator_strerror(uint32_t tpm_result)
+{
+size_t i;
+
+for (i = 0; i < ARRAY_SIZE(tpm_errors); i++) {
+if (tpm_errors[i].tpm_result == tpm_result) {
+return tpm_errors[i].string;
+}
+}
+return "";
+}
 
 static int tpm_emulator_ctrlcmd(TPMEmulator *tpm, unsigned long cmd, void *msg,
 size_t msg_len_in, size_t msg_len_out)
@@ -264,7 +298,8 @@ static int tpm_emulator_stop_tpm(TPMBackend *tb)
 
 res = be32_to_cpu(res);
 if (res) {
-error_report("tpm-emulator: TPM result for CMD_STOP: 0x%x", res);
+error_report("tpm-emulator: TPM result for CMD_STOP: 0x%x %s", res,
+ tpm_emulator_strerror(res));
 return -1;
 }
 
@@ -293,8 +328,9 @@ static int tpm_emulator_set_buffer_size(TPMBackend *tb,
 
 psbs.u.resp.tpm_result = be32_to_cpu(psbs.u.resp.tpm_result);
 if (psbs.u.resp.tpm_result != 0) {
-error_report("tpm-emulator: TPM result for set buffer size : 0x%x",
- psbs.u.resp.tpm_result);
+error_report("tpm-emulator: TPM result for set buffer size : 0x%x %s",
+ psbs.u.resp.tpm_result,
+ tpm_emulator_strerror(psbs.u.resp.tpm_result));
 return -1;
 }
 
@@ -339,7 +375,8 @@ static int tpm_emulator_startup_tpm_resume(TPMBackend *tb, 
size_t buffersize,
 
 res = be32_to_cpu(init.u.resp.tpm_result);
 if (res) {
-error_report("tpm-emulator: TPM result for CMD_INIT: 0x%x", res);
+error_report("tpm-emulator: TPM result for CMD_INIT: 0x%x %s", res,
+ tpm_emulator_strerror(res));
 goto err_exit;
 }
 return 0;
@@ -399,8 +436,9 @@ static int 
tpm_emulator_reset_tpm_established_flag(TPMBackend *tb,
 
 res = be32_to_cpu(reset_est.u.resp.tpm_result);
 if (res) {
-error_report("tpm-emulator: TPM result for rest establixhed flag: 
0x%x",
- res);
+error_report(
+"tpm-emulator: TPM result for rest established flag: 0x%x %s",
+res, tpm_emulator_strerror(res));
 return -1;
 }
 
@@ -638,7 +676,8 @@ static int tpm_emulator_get_state_blob(TPMEmulator *tpm_emu,
 res = be32_to_cpu(pgs.u.resp.tpm_result);
 if (res != 0 && (res & 0x800) == 0) {
 error_report("tpm-emulator: Getting the stateblob (type %d) failed "
- "with a TPM error 0x%x", type, res);
+ "with a TPM error 0x%x %s", type, res,
+ tpm_emulator_strerror(res));
 return -1;
 }
 
@@ -758,7 +797,8 @@ static int tpm_emulator_set_state_blob(TPMEmulator *tpm_emu,
 tpm_result = be32_to_cpu(pss.u.resp.tpm_result);
 if (tpm_result != 0) {
 error_report("tpm-emulator: Setting the stateblob (type %d) failed "
- "with a TPM error 0x%x", type, tpm_result);
+ "with a TPM error 0x%x %s", type, tpm_result,
+ tpm_emulator_strerror(tpm_result));
 return -1;
 }
 
@@ -888,8 +928,8 @@ static void tpm_emulator_shutdown(TPMEmulator *tpm_emu)
 error_report("tpm-emulator: Could not cleanly shutdown the TPM: %s",
  strerror(errno));
 } else if (res != 0) {
-error_report("tpm-emulator: TPM result for sutdown: 0x%x",
- be32_to_cpu(res));
+error_report("tpm-emulator: TPM result for shutdown: 0x%x %s",
+

[Qemu-devel] [PULL v1 1/2] tpm: Exit in reset when backend indicates failure

2019-07-25 Thread Stefan Berger

Exit() in the frontend reset function when the backend indicates
intialization failure.

Signed-off-by: Stefan Berger 
Reviewed-by: Marc-André Lureau 
---
 hw/tpm/tpm_crb.c | 4 +++-
 hw/tpm/tpm_tis.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c
index 5e2db9e0c4..db0e3e7c67 100644
--- a/hw/tpm/tpm_crb.c
+++ b/hw/tpm/tpm_crb.c
@@ -273,7 +273,9 @@ static void tpm_crb_reset(void *dev)
 s->be_buffer_size = MIN(tpm_backend_get_buffer_size(s->tpmbe),
 CRB_CTRL_CMD_SIZE);
 
-tpm_backend_startup_tpm(s->tpmbe, s->be_buffer_size);
+if (tpm_backend_startup_tpm(s->tpmbe, s->be_buffer_size) < 0) {
+exit(1);
+}
 }
 
 static void tpm_crb_realize(DeviceState *dev, Error **errp)
diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 174618ac30..d6b3212890 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -910,7 +910,9 @@ static void tpm_tis_reset(DeviceState *dev)
 s->rw_offset = 0;
 }
 
-tpm_backend_startup_tpm(s->be_driver, s->be_buffer_size);
+if (tpm_backend_startup_tpm(s->be_driver, s->be_buffer_size) < 0) {
+exit(1);
+}
 }
 
 /* persistent state handling */
-- 
2.20.1

Re: [Qemu-devel] [PATCH] target-i386: kvm: 'kvm_get_supported_msrs' cleanup

2019-07-25 Thread Paolo Bonzini

On 25/07/19 17:16, Li Qiang wrote:
> Function 'kvm_get_supported_msrs' is only called once
> now, get rid of the static variable 'kvm_supported_msrs'.
> 
> Signed-off-by: Li Qiang 

Queued, thanks.

Paolo

> ---
>  target/i386/kvm.c | 185 +++---
>  1 file changed, 91 insertions(+), 94 deletions(-)
> 
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index dbbb13772a..07c9250f45 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -1837,108 +1837,105 @@ static int kvm_get_supported_feature_msrs(KVMState 
> *s)
>  
>  static int kvm_get_supported_msrs(KVMState *s)
>  {
> -static int kvm_supported_msrs;
>  int ret = 0;
> +struct kvm_msr_list msr_list, *kvm_msr_list;
>  
> -/* first time */
> -if (kvm_supported_msrs == 0) {
> -struct kvm_msr_list msr_list, *kvm_msr_list;
> +/*
> + *  Obtain MSR list from KVM.  These are the MSRs that we must
> + *  save/restore.
> + */
> +msr_list.nmsrs = 0;
> +ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, _list);
> +if (ret < 0 && ret != -E2BIG) {
> +return ret;
> +}
> +/*
> + * Old kernel modules had a bug and could write beyond the provided
> + * memory. Allocate at least a safe amount of 1K.
> + */
> +kvm_msr_list = g_malloc0(MAX(1024, sizeof(msr_list) +
> +  msr_list.nmsrs *
> +  sizeof(msr_list.indices[0])));
>  
> -kvm_supported_msrs = -1;
> +kvm_msr_list->nmsrs = msr_list.nmsrs;
> +ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
> +if (ret >= 0) {
> +int i;
>  
> -/* Obtain MSR list from KVM.  These are the MSRs that we must
> - * save/restore */
> -msr_list.nmsrs = 0;
> -ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, _list);
> -if (ret < 0 && ret != -E2BIG) {
> -return ret;
> -}
> -/* Old kernel modules had a bug and could write beyond the provided
> -   memory. Allocate at least a safe amount of 1K. */
> -kvm_msr_list = g_malloc0(MAX(1024, sizeof(msr_list) +
> -  msr_list.nmsrs *
> -  sizeof(msr_list.indices[0])));
> -
> -kvm_msr_list->nmsrs = msr_list.nmsrs;
> -ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
> -if (ret >= 0) {
> -int i;
> -
> -for (i = 0; i < kvm_msr_list->nmsrs; i++) {
> -switch (kvm_msr_list->indices[i]) {
> -case MSR_STAR:
> -has_msr_star = true;
> -break;
> -case MSR_VM_HSAVE_PA:
> -has_msr_hsave_pa = true;
> -break;
> -case MSR_TSC_AUX:
> -has_msr_tsc_aux = true;
> -break;
> -case MSR_TSC_ADJUST:
> -has_msr_tsc_adjust = true;
> -break;
> -case MSR_IA32_TSCDEADLINE:
> -has_msr_tsc_deadline = true;
> -break;
> -case MSR_IA32_SMBASE:
> -has_msr_smbase = true;
> -break;
> -case MSR_SMI_COUNT:
> -has_msr_smi_count = true;
> -break;
> -case MSR_IA32_MISC_ENABLE:
> -has_msr_misc_enable = true;
> -break;
> -case MSR_IA32_BNDCFGS:
> -has_msr_bndcfgs = true;
> -break;
> -case MSR_IA32_XSS:
> -has_msr_xss = true;
> -break;
> -case HV_X64_MSR_CRASH_CTL:
> -has_msr_hv_crash = true;
> -break;
> -case HV_X64_MSR_RESET:
> -has_msr_hv_reset = true;
> -break;
> -case HV_X64_MSR_VP_INDEX:
> -has_msr_hv_vpindex = true;
> -break;
> -case HV_X64_MSR_VP_RUNTIME:
> -has_msr_hv_runtime = true;
> -break;
> -case HV_X64_MSR_SCONTROL:
> -has_msr_hv_synic = true;
> -break;
> -case HV_X64_MSR_STIMER0_CONFIG:
> -has_msr_hv_stimer = true;
> -break;
> -case HV_X64_MSR_TSC_FREQUENCY:
> -has_msr_hv_frequencies = true;
> -break;
> -case HV_X64_MSR_REENLIGHTENMENT_CONTROL:
> -has_msr_hv_reenlightenment = true;
> -break;
> -case MSR_IA32_SPEC_CTRL:
> -has_msr_spec_ctrl = true;
> -break;
> -case MSR_VIRT_SSBD:
> -

[Qemu-devel] [PATCH] ppc: remove redundant capability check for unset irq

2019-07-25 Thread Shivaprasad G Bhat

The KVM_CAP_PPC_UNSET_IRQ is part of kernel since v2.6.36.
Kernels older than that are not supported anymore.
So, remove the checks.

Signed-off-by: Shivaprasad G Bhat 
---
 target/ppc/kvm.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 5ab5e6c6a9..94a2ecb84f 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -55,7 +55,6 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
 
-static int cap_interrupt_unset;
 static int cap_segstate;
 static int cap_booke_sregs;
 static int cap_ppc_smt;
@@ -104,7 +103,6 @@ static int kvmppc_get_dec_bits(void);
 
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
-cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
 cap_segstate = kvm_check_extension(s, KVM_CAP_PPC_SEGSTATE);
 cap_booke_sregs = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_SREGS);
 cap_ppc_smt_possible = kvm_vm_check_extension(s, KVM_CAP_PPC_SMT_POSSIBLE);
@@ -1309,7 +1307,7 @@ int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int 
level)
 return 0;
 }
 
-if (!kvm_enabled() || !cap_interrupt_unset) {
+if (!kvm_enabled()) {
 return 0;
 }

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini

On 25/07/19 17:01, Michael S. Tsirkin wrote:
>> It would be educational to try to enable ACPI core but disable all
>> optional features.

A lot of them are select'ed so it's not easy.

> Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.

That's what the NEMU guys experimented with.  It's not supported by our
DSDT since it uses ACPI GPE, and the reduction in code size is small
(about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).

Paolo

Re: [Qemu-devel] [PATCH v2 0/2] tpm: Improve on error handling

2019-07-25 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20190725150832.1180275-1-stef...@linux.vnet.ibm.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v2 0/2] tpm: Improve on error handling
Message-id: 20190725150832.1180275-1-stef...@linux.vnet.ibm.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
fatal: git fetch_pack: expected ACK/NAK, got 'ERR upload-pack: not our ref 
9c3e4e2c6d83e244e2136a6cdd5a2830bad82ca3'
fatal: The remote end hung up unexpectedly
error: Could not fetch 3c8cf5a9c21ff8782164d1def7f44bd888713384
Traceback (most recent call last):
  File "./patchew-cli", line 504, in test_one
git_clone_repo(clone, r["repo"], r["head"], logf)
  File "./patchew-cli", line 48, in git_clone_repo
stdout=logf, stderr=logf)
  File "/usr/lib64/python3.4/subprocess.py", line 558, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'remote', 'add', '-f', 
'--mirror=fetch', '3c8cf5a9c21ff8782164d1def7f44bd888713384', 
'https://github.com/patchew-project/qemu']' returned non-zero exit status 1



The full log is available at
http://patchew.org/logs/20190725150832.1180275-1-stef...@linux.vnet.ibm.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

2019-07-25 Thread Paolo Bonzini

On 25/07/19 16:46, Michael S. Tsirkin wrote:
> Actually, I think I have a better idea.
> At the moment we just get an exit on these reads and return all-ones.
> Yes, in theory there could be a UR bit set in a bunch of
> registers but in practice no one cares about these,
> and I don't think we implement them.
> So how about mapping a single page, read-only, and filling it
> with all-ones?

Yes, that's nice indeed. :)  But it does have some cost, in terms of
either number of VMAs or QEMU RSS since the MMCONFIG area is large.

What breaks if we return all zeroes?  Zero is not a valid vendor ID.

Paolo

[Qemu-devel] [PULL 12/12] virtio-balloon: free pbp more aggressively

2019-07-25 Thread Michael S. Tsirkin

Previous patches switched to a temporary pbp but that does not go far
enough: after device uses a buffer, guest is free to reuse it, so
tracking the page and freeing it later is wrong.

Free and reset the pbp after we push each element.

Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host 
page size")
Cc: qemu-sta...@nongnu.org #v4.0.0
Cc: David Hildenbrand 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-balloon.c | 37 -
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index fe9664e42c..25de154307 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -41,22 +41,19 @@ typedef struct PartiallyBalloonedPage {
 
 static void virtio_balloon_pbp_free(PartiallyBalloonedPage *pbp)
 {
-if (!pbp) {
+if (!pbp->bitmap) {
 return;
 }
 g_free(pbp->bitmap);
-g_free(pbp);
+pbp->bitmap = NULL;
 }
 
-static PartiallyBalloonedPage *virtio_balloon_pbp_alloc(ram_addr_t base_gpa,
-long subpages)
+static void virtio_balloon_pbp_alloc(PartiallyBalloonedPage *pbp,
+ ram_addr_t base_gpa,
+ long subpages)
 {
-PartiallyBalloonedPage *pbp = g_new0(PartiallyBalloonedPage, 1);
-
 pbp->base_gpa = base_gpa;
 pbp->bitmap = bitmap_new(subpages);
-
-return pbp;
 }
 
 static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
@@ -67,7 +64,7 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage 
*pbp,
 
 static void balloon_inflate_page(VirtIOBalloon *balloon,
  MemoryRegion *mr, hwaddr mr_offset,
- PartiallyBalloonedPage **pbp)
+ PartiallyBalloonedPage *pbp)
 {
 void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
 ram_addr_t rb_offset, rb_aligned_offset, base_gpa;
@@ -104,22 +101,21 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
 base_gpa = memory_region_get_ram_addr(mr) + mr_offset -
(rb_offset - rb_aligned_offset);
 
-if (*pbp && !virtio_balloon_pbp_matches(*pbp, base_gpa)) {
+if (pbp->bitmap && !virtio_balloon_pbp_matches(pbp, base_gpa)) {
 /* We've partially ballooned part of a host page, but now
  * we're trying to balloon part of a different one.  Too hard,
  * give up on the old partial page */
-virtio_balloon_pbp_free(*pbp);
-*pbp = NULL;
+virtio_balloon_pbp_free(pbp);
 }
 
-if (!*pbp) {
-*pbp = virtio_balloon_pbp_alloc(base_gpa, subpages);
+if (!pbp->bitmap) {
+virtio_balloon_pbp_alloc(pbp, base_gpa, subpages);
 }
 
 set_bit((rb_offset - rb_aligned_offset) / BALLOON_PAGE_SIZE,
-(*pbp)->bitmap);
+pbp->bitmap);
 
-if (bitmap_full((*pbp)->bitmap, subpages)) {
+if (bitmap_full(pbp->bitmap, subpages)) {
 /* We've accumulated a full host page, we can actually discard
  * it now */
 
@@ -127,8 +123,7 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
 /* We ignore errors from ram_block_discard_range(), because it
  * has already reported them, and failing to discard a balloon
  * page is not fatal */
-virtio_balloon_pbp_free(*pbp);
-*pbp = NULL;
+virtio_balloon_pbp_free(pbp);
 }
 }
 
@@ -328,13 +323,14 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
-PartiallyBalloonedPage *pbp = NULL;
 VirtQueueElement *elem;
 MemoryRegionSection section;
 
 for (;;) {
+PartiallyBalloonedPage pbp = {};
 size_t offset = 0;
 uint32_t pfn;
+
 elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
 if (!elem) {
 break;
@@ -379,9 +375,8 @@ static void virtio_balloon_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 virtqueue_push(vq, elem, offset);
 virtio_notify(vdev, vq);
 g_free(elem);
+virtio_balloon_pbp_free();
 }
-
-virtio_balloon_pbp_free(pbp);
 }
 
 static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq)
-- 
MST

[Qemu-devel] [PULL 03/12] ioapic: kvm: Skip route updates for masked pins

2019-07-25 Thread Michael S. Tsirkin

From: Jan Kiszka 

Masked entries will not generate interrupt messages, thus do no need to
be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of
the kind

qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, 
high=0xff00, low=0x100)

if the masked entry happens to reference a non-present IRTE.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Jan Kiszka 
Message-Id: 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Peter Xu 
---
 hw/intc/ioapic.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index c408749876..e99c37cceb 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -197,9 +197,11 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
 MSIMessage msg;
 struct ioapic_entry_info info;
 ioapic_entry_parse(s->ioredtbl[i], );
-msg.address = info.addr;
-msg.data = info.data;
-kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+if (!info.masked) {
+msg.address = info.addr;
+msg.data = info.data;
+kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+}
 }
 kvm_irqchip_commit_routes(kvm_state);
 }
-- 
MST

[Qemu-devel] [PULL 01/12] docs: clarify multiqueue vs multiple virtqueues

2019-07-25 Thread Michael S. Tsirkin

From: Stefan Hajnoczi 

The vhost-user specification does not explain when
VHOST_USER_PROTOCOL_F_MQ must be implemented.  This may lead
implementors of vhost-user masters to believe that this protocol feature
is required for any device that has multiple virtqueues.  That would be
a mistake since existing vhost-user slaves offer multiple virtqueues but
do not advertise VHOST_USER_PROTOCOL_F_MQ.

For example, a vhost-net device with one rx/tx queue pair is not
multiqueue.  The slave does not need to advertise
VHOST_USER_PROTOCOL_F_MQ.  Therefore the master must assume it has these
virtqueues and cannot rely on askingt the slave how many virtqueues
exist.

Extend the specification to explain the different between true
multiqueue and regular devices with a fixed virtqueue layout.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20190624091304.666-1-stefa...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Marc-André Lureau 
---
 docs/interop/vhost-user.rst | 17 +
 1 file changed, 17 insertions(+)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 5750668aba..7827b710aa 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -324,6 +324,15 @@ must support changing some configuration aspects on the 
fly.
 Multiple queue support
 --
 
+Many devices have a fixed number of virtqueues.  In this case the master
+already knows the number of available virtqueues without communicating with the
+slave.
+
+Some devices do not have a fixed number of virtqueues.  Instead the maximum
+number of virtqueues is chosen by the slave.  The number can depend on host
+resource availability or slave implementation details.  Such devices are called
+multiple queue devices.
+
 Multiple queue support allows the slave to advertise the maximum number of
 queues.  This is treated as a protocol extension, hence the slave has to
 implement protocol features first. The multiple queues feature is supported
@@ -339,6 +348,14 @@ queue in the sent message to identify a specified queue.
 The master enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``.
 vhost-user-net has historically automatically enabled the first queue pair.
 
+Slaves should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol
+feature, even for devices with a fixed number of virtqueues, since it is simple
+to implement and offers a degree of introspection.
+
+Masters must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for
+devices with a fixed number of virtqueues.  Only true multiqueue devices
+require this protocol feature.
+
 Migration
 -
 
-- 
MST

[Qemu-devel] [PULL 00/12] virtio, pc: fixes, cleanups

2019-07-25 Thread Michael S. Tsirkin

The following changes since commit bf8b024372bf8abf5a9f40bfa65eeefad23ff988:

  Update version for v4.1.0-rc2 release (2019-07-23 18:28:08 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 1b47b37c33ec01ae1efc527f4c97f97f93723bc4:

  virtio-balloon: free pbp more aggressively (2019-07-25 11:19:25 -0400)


virtio, pc: fixes, cleanups

A bunch of fixes all over the place.

Signed-off-by: Michael S. Tsirkin 


David Hildenbrand (7):
  virtio-balloon: Fix wrong sign extension of PFNs
  virtio-balloon: Fix QEMU crashes on pagesize > BALLOON_PAGE_SIZE
  virtio-balloon: Simplify deflate with pbp
  virtio-balloon: Better names for offset variables in inflate/deflate code
  virtio-balloon: Rework pbp tracking data
  virtio-balloon: Use temporary PBP only
  virtio-balloon: don't track subpages for the PBP

Evgeny Yakovlev (2):
  i386/acpi: fix gint overflow in crs_range_compare
  i386/acpi: show PCI Express bus on pxb-pcie expanders

Jan Kiszka (1):
  ioapic: kvm: Skip route updates for masked pins

Michael S. Tsirkin (1):
  virtio-balloon: free pbp more aggressively

Stefan Hajnoczi (1):
  docs: clarify multiqueue vs multiple virtqueues

 include/hw/virtio/virtio-balloon.h |   3 -
 hw/i386/acpi-build.c   |  17 --
 hw/intc/ioapic.c   |   8 ++-
 hw/virtio/virtio-balloon.c | 115 ++---
 docs/interop/vhost-user.rst|  17 ++
 5 files changed, 90 insertions(+), 70 deletions(-)

Re: [Qemu-devel] [PATCH v2 10/11] iotests: Test convert -n to pre-filled image

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/122 | 17 +
>  tests/qemu-iotests/122.out |  8 
>  2 files changed, 25 insertions(+)
> 
> diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
> index 85c3a8d047..059011ebb1 100755
> --- a/tests/qemu-iotests/122
> +++ b/tests/qemu-iotests/122
> @@ -257,6 +257,23 @@ for min_sparse in 4k 8k; do
>  $QEMU_IMG map --output=json "$TEST_IMG".orig | _filter_qemu_img_map
>  done
>  
> +
> +echo
> +echo '=== -n to a non-zero image ==='
> +echo
> +
> +# Keep source zero
> +_make_test_img 64M
> +
> +# Output is not zero, but has bdrv_has_zero_init() == 1
> +TEST_IMG="$TEST_IMG".orig _make_test_img 64M
> +$QEMU_IO -c "write -P 42 0 64k" "$TEST_IMG".orig | _filter_qemu_io
> +
> +# Convert with -n, which should not assume that the target is zeroed
> +$QEMU_IMG convert -O $IMGFMT -n "$TEST_IMG" "$TEST_IMG".orig
> +
> +$QEMU_IMG compare "$TEST_IMG" "$TEST_IMG".orig
> +
>  # success, all done
>  echo '*** done'
>  rm -f $seq.full
> diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
> index c576705284..849b6cc2ef 100644
> --- a/tests/qemu-iotests/122.out
> +++ b/tests/qemu-iotests/122.out
> @@ -220,4 +220,12 @@ convert -c -S 8k
>  { "start": 9216, "length": 8192, "depth": 0, "zero": true, "data": false},
>  { "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true},
>  { "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": 
> false}]
> +
> +=== -n to a non-zero image ===
> +
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
> +Formatting 'TEST_DIR/t.IMGFMT.orig', fmt=IMGFMT size=67108864
> +wrote 65536/65536 bytes at offset 0
> +64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +Images are identical.
>  *** done


Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

Re: [Qemu-devel] [PATCH v2 09/11] iotests: Convert to preallocated encrypted qcow2

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> Add a test case for converting an empty image (which only returns zeroes
> when read) to a preallocated encrypted qcow2 image.
> qcow2_has_zero_init() should return 0 then, thus forcing qemu-img
> convert to create zero clusters.
> 
> Signed-off-by: Max Reitz 
> Acked-by: Stefano Garzarella 
> Tested-by: Stefano Garzarella 
> ---
>  tests/qemu-iotests/188 | 20 +++-
>  tests/qemu-iotests/188.out |  4 
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/188 b/tests/qemu-iotests/188
> index be7278aa65..afca44df54 100755
> --- a/tests/qemu-iotests/188
> +++ b/tests/qemu-iotests/188
> @@ -48,7 +48,7 @@ SECRETALT="secret,id=sec0,data=platypus"
>  
>  _make_test_img --object $SECRET -o 
> "encrypt.format=luks,encrypt.key-secret=sec0,encrypt.iter-time=10" $size
>  
> -IMGSPEC="driver=$IMGFMT,file.filename=$TEST_IMG,encrypt.key-secret=sec0"
> +IMGSPEC="driver=$IMGFMT,encrypt.key-secret=sec0,file.filename=$TEST_IMG"
This change I think doesn't change anything

>  
>  QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
>  
> @@ -68,6 +68,24 @@ echo
>  echo "== verify open failure with wrong password =="
>  $QEMU_IO --object $SECRETALT -c "read -P 0xa 0 $size" --image-opts $IMGSPEC 
> | _filter_qemu_io | _filter_testdir
>  
> +_cleanup_test_img
> +
> +echo
> +echo "== verify that has_zero_init returns false when preallocating =="
> +
> +# Empty source file
> +if [ -n "$TEST_IMG_FILE" ]; then
> +TEST_IMG_FILE="${TEST_IMG_FILE}.orig" _make_test_img $size
> +else
> +TEST_IMG="${TEST_IMG}.orig" _make_test_img $size
> +fi

I wonder why do we have TEST_IMG_FILE and TEST_IMG, I don't know iotests well 
enough
>From the quick look at the code, the TEST_IMG_FILE is an actual file, while 
>TEST_IMG can
be various URL like address.

> +
> +$QEMU_IMG convert -O "$IMGFMT" --object $SECRET \
> +-o 
> "encrypt.format=luks,encrypt.key-secret=sec0,encrypt.iter-time=10,preallocation=metadata"
>  \
> +"${TEST_IMG}.orig" "$TEST_IMG"
> +
> +$QEMU_IMG compare --object $SECRET --image-opts "${IMGSPEC}.orig" "$IMGSPEC"
> +
>  
>  # success, all done
>  echo "*** done"
> diff --git a/tests/qemu-iotests/188.out b/tests/qemu-iotests/188.out
> index 97b1402671..c568ef3701 100644
> --- a/tests/qemu-iotests/188.out
> +++ b/tests/qemu-iotests/188.out
> @@ -15,4 +15,8 @@ read 16777216/16777216 bytes at offset 0
>  
>  == verify open failure with wrong password ==
>  qemu-io: can't open: Invalid password, cannot unlock any keyslot
> +
> +== verify that has_zero_init returns false when preallocating ==
> +Formatting 'TEST_DIR/t.IMGFMT.orig', fmt=IMGFMT size=16777216
> +Images are identical.
>  *** done

Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

Re: [Qemu-devel] [PATCH v2 08/11] vhdx: Fix .bdrv_has_zero_init()

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> Fixed VHDX images cannot guarantee to be zero-initialized.  If the image
> has the "fixed" subformat, forward the call to the underlying storage
> node.
> 
> Reported-by: Stefano Garzarella 
> Signed-off-by: Max Reitz 
> ---
>  block/vhdx.c | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/block/vhdx.c b/block/vhdx.c
> index a02d1c99a7..6a09d0a55c 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -2075,6 +2075,30 @@ static int coroutine_fn vhdx_co_check(BlockDriverState 
> *bs,
>  return 0;
>  }
>  
> +static int vhdx_has_zero_init(BlockDriverState *bs)
> +{
> +BDRVVHDXState *s = bs->opaque;
> +int state;
> +
> +/*
> + * Check the subformat: Fixed images have all BAT entries present,
> + * dynamic images have none (right after creation).  It is
> + * therefore enough to check the first BAT entry.
> + */
> +if (!s->bat_entries) {
> +return 1;
> +}
> +
> +state = s->bat[0] & VHDX_BAT_STATE_BIT_MASK;
> +if (state == PAYLOAD_BLOCK_FULLY_PRESENT) {
> +/* Fixed subformat */
> +return bdrv_has_zero_init(bs->file->bs);
> +}
> +
> +/* Dynamic subformat */
> +return 1;
> +}
> +
>  static QemuOptsList vhdx_create_opts = {
>  .name = "vhdx-create-opts",
>  .head = QTAILQ_HEAD_INITIALIZER(vhdx_create_opts.head),
> @@ -2128,7 +2152,7 @@ static BlockDriver bdrv_vhdx = {
>  .bdrv_co_create_opts= vhdx_co_create_opts,
>  .bdrv_get_info  = vhdx_get_info,
>  .bdrv_co_check  = vhdx_co_check,
> -.bdrv_has_zero_init = bdrv_has_zero_init_1,
> +.bdrv_has_zero_init = vhdx_has_zero_init,
>  
>  .create_opts= _create_opts,
>  };

I am not familiar with VHDX format to be honest too, but knowing that dynamic 
format allows for growing
and static are preallocated this makes sense.

Its a bit amusing and not surprising that the the spec for this format is in 
.docx. 
I took a quick look to get a rough impression of the file format.


Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

Re: [Qemu-devel] [PATCH v2 07/11] vdi: Fix .bdrv_has_zero_init()

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> Static VDI images cannot guarantee to be zero-initialized.  If the image
> has been statically allocated, forward the call to the underlying
> storage node.
> 
> Reported-by: Stefano Garzarella 
> Signed-off-by: Max Reitz 
> Reviewed-by: Stefan Weil 
> Acked-by: Stefano Garzarella 
> Tested-by: Stefano Garzarella 
> ---
>  block/vdi.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index b9845a4cbd..0caa3f281d 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -988,6 +988,17 @@ static void vdi_close(BlockDriverState *bs)
>  error_free(s->migration_blocker);
>  }
>  
> +static int vdi_has_zero_init(BlockDriverState *bs)
> +{
> +BDRVVdiState *s = bs->opaque;
> +
> +if (s->header.image_type == VDI_TYPE_STATIC) {
> +return bdrv_has_zero_init(bs->file->bs);
> +} else {
> +return 1;
> +}
> +}
> +
>  static QemuOptsList vdi_create_opts = {
>  .name = "vdi-create-opts",
>  .head = QTAILQ_HEAD_INITIALIZER(vdi_create_opts.head),
> @@ -1028,7 +1039,7 @@ static BlockDriver bdrv_vdi = {
>  .bdrv_child_perm  = bdrv_format_default_perms,
>  .bdrv_co_create  = vdi_co_create,
>  .bdrv_co_create_opts = vdi_co_create_opts,
> -.bdrv_has_zero_init = bdrv_has_zero_init_1,
> +.bdrv_has_zero_init  = vdi_has_zero_init,
>  .bdrv_co_block_status = vdi_co_block_status,
>  .bdrv_make_empty = vdi_make_empty,
>  


I am not familiar with VDI format to be honest, but knowing that dynamic format 
allows for growing
and static are preallocated this makes sense.

I see that the code when it allocates a new block at the end of the file, 
actually zeroes it out, so most
likely this is right.


Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

Re: [Qemu-devel] [PATCH v2 06/11] qcow2: Fix .bdrv_has_zero_init()

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> If a qcow2 file is preallocated, it can no longer guarantee that it
> initially appears as filled with zeroes.
> 
> So implement .bdrv_has_zero_init() by checking whether the file is
> preallocated; if so, forward the call to the underlying storage node,
> except for when it is encrypted: Encrypted preallocated images always
> return effectively random data, so .bdrv_has_zero_init() must always
> return 0 for them.
> 
> .bdrv_has_zero_init_truncate() can remain bdrv_has_zero_init_1(),
> because it presupposes PREALLOC_MODE_OFF.
> 
> Reported-by: Stefano Garzarella 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2.c | 29 -
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 5c40f54d64..b4e73aa443 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -4631,6 +4631,33 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs,
>  return spec_info;
>  }
>  
> +static int qcow2_has_zero_init(BlockDriverState *bs)
> +{
> +BDRVQcow2State *s = bs->opaque;
> +bool preallocated;
> +
> +if (qemu_in_coroutine()) {
> +qemu_co_mutex_lock(>lock);
> +}
> +/*
> + * Check preallocation status: Preallocated images have all L2
> + * tables allocated, nonpreallocated images have none.  It is
> + * therefore enough to check the first one.
> + */
> +preallocated = s->l1_size > 0 && s->l1_table[0] != 0;
> +if (qemu_in_coroutine()) {
> +qemu_co_mutex_unlock(>lock);
> +}
> +
> +if (!preallocated) {
> +return 1;
> +} else if (bs->encrypted) {
> +return 0;
> +} else {
> +return bdrv_has_zero_init(s->data_file->bs);
> +}
> +}
> +
>  static int qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
>int64_t pos)
>  {
> @@ -5186,7 +5213,7 @@ BlockDriver bdrv_qcow2 = {
>  .bdrv_child_perm  = bdrv_format_default_perms,
>  .bdrv_co_create_opts  = qcow2_co_create_opts,
>  .bdrv_co_create   = qcow2_co_create,
> -.bdrv_has_zero_init = bdrv_has_zero_init_1,
> +.bdrv_has_zero_init   = qcow2_has_zero_init,
>  .bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,
>  .bdrv_co_block_status = qcow2_co_block_status,
>  


Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

Re: [Qemu-devel] [PATCH v2 05/11] block: Use bdrv_has_zero_init_truncate()

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> Signed-off-by: Max Reitz 
> ---
>  block/parallels.c | 2 +-
>  block/vhdx.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/parallels.c b/block/parallels.c
> index 00fae125d1..7cd2714b69 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -835,7 +835,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  goto fail_options;
>  }
>  
> -if (!bdrv_has_zero_init(bs->file->bs)) {
> +if (!bdrv_has_zero_init_truncate(bs->file->bs)) {
>  s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
>  }
>  
> diff --git a/block/vhdx.c b/block/vhdx.c
> index d6070b6fa8..a02d1c99a7 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1282,7 +1282,7 @@ static coroutine_fn int vhdx_co_writev(BlockDriverState 
> *bs, int64_t sector_num,
>  /* Queue another write of zero buffers if the underlying file
>   * does not zero-fill on file extension */
>  
> -if (bdrv_has_zero_init(bs->file->bs) == 0) {
> +if (bdrv_has_zero_init_truncate(bs->file->bs) == 0) {
>  use_zero_buffers = true;
>  
>  /* zero fill the front, if any */

Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

Re: [Qemu-devel] [PATCH v2 04/11] block: Implement .bdrv_has_zero_init_truncate()

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> We need to implement .bdrv_has_zero_init_truncate() for every block
> driver that supports truncation and has a .bdrv_has_zero_init()
> implementation.
> 
> Implement it the same way each driver implements .bdrv_has_zero_init().
> This is at least not any more unsafe than what we had before.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/file-posix.c | 1 +
>  block/file-win32.c | 1 +
>  block/gluster.c| 4 
>  block/nfs.c| 1 +
>  block/qcow2.c  | 1 +
>  block/qed.c| 1 +
>  block/raw-format.c | 6 ++
>  block/rbd.c| 1 +
>  block/sheepdog.c   | 1 +
>  block/ssh.c| 1 +
>  10 files changed, 18 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 4479cc7ab4..0208006f3c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2924,6 +2924,7 @@ BlockDriver bdrv_file = {
>  .bdrv_co_create = raw_co_create,
>  .bdrv_co_create_opts = raw_co_create_opts,
>  .bdrv_has_zero_init = bdrv_has_zero_init_1,
> +.bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,
>  .bdrv_co_block_status = raw_co_block_status,
>  .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>  .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
> diff --git a/block/file-win32.c b/block/file-win32.c
> index 6b2d67b239..41f55dfece 100644
> --- a/block/file-win32.c
> +++ b/block/file-win32.c
> @@ -635,6 +635,7 @@ BlockDriver bdrv_file = {
>  .bdrv_close = raw_close,
>  .bdrv_co_create_opts = raw_co_create_opts,
>  .bdrv_has_zero_init = bdrv_has_zero_init_1,
> +.bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,
>  
>  .bdrv_aio_preadv= raw_aio_preadv,
>  .bdrv_aio_pwritev   = raw_aio_pwritev,
> diff --git a/block/gluster.c b/block/gluster.c
> index f64dc5b01e..64028b2cba 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -1567,6 +1567,7 @@ static BlockDriver bdrv_gluster = {
>  .bdrv_co_writev   = qemu_gluster_co_writev,
>  .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
>  .bdrv_has_zero_init   = qemu_gluster_has_zero_init,
> +.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
>  #ifdef CONFIG_GLUSTERFS_DISCARD
>  .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
>  #endif
> @@ -1598,6 +1599,7 @@ static BlockDriver bdrv_gluster_tcp = {
>  .bdrv_co_writev   = qemu_gluster_co_writev,
>  .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
>  .bdrv_has_zero_init   = qemu_gluster_has_zero_init,
> +.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
>  #ifdef CONFIG_GLUSTERFS_DISCARD
>  .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
>  #endif
> @@ -1629,6 +1631,7 @@ static BlockDriver bdrv_gluster_unix = {
>  .bdrv_co_writev   = qemu_gluster_co_writev,
>  .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
>  .bdrv_has_zero_init   = qemu_gluster_has_zero_init,
> +.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
>  #ifdef CONFIG_GLUSTERFS_DISCARD
>  .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
>  #endif
> @@ -1666,6 +1669,7 @@ static BlockDriver bdrv_gluster_rdma = {
>  .bdrv_co_writev   = qemu_gluster_co_writev,
>  .bdrv_co_flush_to_disk= qemu_gluster_co_flush_to_disk,
>  .bdrv_has_zero_init   = qemu_gluster_has_zero_init,
> +.bdrv_has_zero_init_truncate  = qemu_gluster_has_zero_init,
>  #ifdef CONFIG_GLUSTERFS_DISCARD
>  .bdrv_co_pdiscard = qemu_gluster_co_pdiscard,
>  #endif
> diff --git a/block/nfs.c b/block/nfs.c
> index d93241b3bb..97c815085f 100644
> --- a/block/nfs.c
> +++ b/block/nfs.c
> @@ -863,6 +863,7 @@ static BlockDriver bdrv_nfs = {
>  .create_opts= _create_opts,
>  
>  .bdrv_has_zero_init = nfs_has_zero_init,
> +.bdrv_has_zero_init_truncate= nfs_has_zero_init,
>  .bdrv_get_allocated_file_size   = nfs_get_allocated_file_size,
>  .bdrv_co_truncate   = nfs_file_co_truncate,
>  
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 039bdc2f7e..5c40f54d64 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -5187,6 +5187,7 @@ BlockDriver bdrv_qcow2 = {
>  .bdrv_co_create_opts  = qcow2_co_create_opts,
>  .bdrv_co_create   = qcow2_co_create,
>  .bdrv_has_zero_init = bdrv_has_zero_init_1,
> +.bdrv_has_zero_init_truncate = bdrv_has_zero_init_1,
>  .bdrv_co_block_status = qcow2_co_block_status,
>  
>  .bdrv_co_preadv = qcow2_co_preadv,
> diff --git a/block/qed.c b/block/qed.c
> index 77c7cef175..daaedb6864 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -1668,6 +1668,7 @@ static BlockDriver bdrv_qed = {
>  .bdrv_co_create   = bdrv_qed_co_create,
>  .bdrv_co_create_opts  = bdrv_qed_co_create_opts,
>  .bdrv_has_zero_init   = bdrv_has_zero_init_1,

Re: [Qemu-devel] [PATCH v2 03/11] block: Add bdrv_has_zero_init_truncate()

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> No .bdrv_has_zero_init() implementation returns 1 if growing the file
> would add non-zero areas (at least with PREALLOC_MODE_OFF), so using it
> in lieu of this new function was always safe.
> 
> But on the other hand, it is possible that growing an image that is not
> zero-initialized would still add a zero-initialized area, like when
> using nonpreallocating truncation on a preallocated image.  For callers
> that care only about truncation, not about creation with potential
> preallocation, this new function is useful.
> 
> Alternatively, we could have added a PreallocMode parameter to
> bdrv_has_zero_init().  But the only user would have been qemu-img
> convert, which does not have a plain PreallocMode value right now -- it
> would have to parse the creation option to obtain it.  Therefore, the
> simpler solution is to let bdrv_has_zero_init() inquire the
> preallocation status and add the new bdrv_has_zero_init_truncate() that
> presupposes PREALLOC_MODE_OFF.
> 
> Signed-off-by: Max Reitz 
> ---
>  include/block/block.h |  1 +
>  include/block/block_int.h |  7 +++
>  block.c   | 21 +
>  3 files changed, 29 insertions(+)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 50a07c1c33..5321d8afdf 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -438,6 +438,7 @@ int bdrv_pdiscard(BdrvChild *child, int64_t offset, 
> int64_t bytes);
>  int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
>  int bdrv_has_zero_init_1(BlockDriverState *bs);
>  int bdrv_has_zero_init(BlockDriverState *bs);
> +int bdrv_has_zero_init_truncate(BlockDriverState *bs);
>  bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs);
>  bool bdrv_can_write_zeroes_with_unmap(BlockDriverState *bs);
>  int bdrv_block_status(BlockDriverState *bs, int64_t offset,
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 6a0b1b5008..d7fc6b296b 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -420,9 +420,16 @@ struct BlockDriver {
>  /*
>   * Returns 1 if newly created images are guaranteed to contain only
>   * zeros, 0 otherwise.
> + * Must return 0 if .bdrv_has_zero_init_truncate() returns 0.
>   */
>  int (*bdrv_has_zero_init)(BlockDriverState *bs);
>  
> +/*
> + * Returns 1 if new areas added by growing the image with
> + * PREALLOC_MODE_OFF contain only zeros, 0 otherwise.
> + */
> +int (*bdrv_has_zero_init_truncate)(BlockDriverState *bs);
> +
>  /* Remove fd handlers, timers, and other event loop callbacks so the 
> event
>   * loop is no longer in use.  Called with no in-flight requests and in
>   * depth-first traversal order with parents before child nodes.
> diff --git a/block.c b/block.c
> index cbd8da5f3b..81ae44dcf3 100644
> --- a/block.c
> +++ b/block.c
> @@ -5066,6 +5066,27 @@ int bdrv_has_zero_init(BlockDriverState *bs)
>  return 0;
>  }
>  
> +int bdrv_has_zero_init_truncate(BlockDriverState *bs)
> +{
> +if (!bs->drv) {
> +return 0;
> +}
> +
> +if (bs->backing) {
> +/* Depends on the backing image length, but better safe than sorry */
> +return 0;
> +}
> +if (bs->drv->bdrv_has_zero_init_truncate) {
> +return bs->drv->bdrv_has_zero_init_truncate(bs);
> +}
> +if (bs->file && bs->drv->is_filter) {
> +return bdrv_has_zero_init_truncate(bs->file->bs);
> +}
> +
> +/* safe default */
> +return 0;
> +}
> +
>  bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs)
>  {
>  BlockDriverInfo bdi;


This looks like a very correct change, even for the sake
of clarifying the scope of bdrv_has_zero_init

Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

Re: [Qemu-devel] [PATCH v2 02/11] mirror: Fix bdrv_has_zero_init() use

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> bdrv_has_zero_init() only has meaning for newly created images or image
> areas.  If the mirror job itself did not create the image, it cannot
> rely on bdrv_has_zero_init()'s result to carry any meaning.
> 
> This is the case for drive-mirror with mode=existing and always for
> blockdev-mirror.
> 
> Note that we only have to zero-initialize the target with sync=full,
> because other modes actually do not promise that the target will contain
> the same data as the source after the job -- sync=top only promises to
> copy anything allocated in the top layer, and sync=none will only copy
> new I/O.  (Which is how mirror has always handled it.)
> 
> Signed-off-by: Max Reitz 
> ---
>  include/block/block_int.h   |  2 ++
>  block/mirror.c  | 11 ---
>  blockdev.c  | 16 +---
>  tests/test-block-iothread.c |  2 +-
>  4 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 3aa1e832a8..6a0b1b5008 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1116,6 +1116,7 @@ BlockJob *commit_active_start(const char *job_id, 
> BlockDriverState *bs,
>   * @buf_size: The amount of data that can be in flight at one time.
>   * @mode: Whether to collapse all images in the chain to the target.
>   * @backing_mode: How to establish the target's backing chain after 
> completion.
> + * @zero_target: Whether the target should be explicitly zero-initialized
>   * @on_source_error: The action to take upon error reading from the source.
>   * @on_target_error: The action to take upon error writing to the target.
>   * @unmap: Whether to unmap target where source sectors only contain zeroes.
> @@ -1135,6 +1136,7 @@ void mirror_start(const char *job_id, BlockDriverState 
> *bs,
>int creation_flags, int64_t speed,
>uint32_t granularity, int64_t buf_size,
>MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
> +  bool zero_target,
>BlockdevOnError on_source_error,
>BlockdevOnError on_target_error,
>bool unmap, const char *filter_node_name,
> diff --git a/block/mirror.c b/block/mirror.c
> index 8cb75fb409..50188ce6e9 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -51,6 +51,8 @@ typedef struct MirrorBlockJob {
>  Error *replace_blocker;
>  bool is_none_mode;
>  BlockMirrorBackingMode backing_mode;
> +/* Whether the target image requires explicit zero-initialization */
> +bool zero_target;
>  MirrorCopyMode copy_mode;
>  BlockdevOnError on_source_error, on_target_error;
>  bool synced;
> @@ -763,7 +765,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
> *s)
>  int ret;
>  int64_t count;
>  
> -if (base == NULL && !bdrv_has_zero_init(target_bs)) {
> +if (s->zero_target) {
The justification for removing base here, is that it can be != NULL only
when MIRROR_SYNC_MODE_TOP. So looks OK


>  if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
>  bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
>  return 0;
> @@ -1501,6 +1503,7 @@ static BlockJob *mirror_start_job(
>   const char *replaces, int64_t speed,
>   uint32_t granularity, int64_t buf_size,
>   BlockMirrorBackingMode backing_mode,
> + bool zero_target,
>   BlockdevOnError on_source_error,
>   BlockdevOnError on_target_error,
>   bool unmap,
> @@ -1628,6 +1631,7 @@ static BlockJob *mirror_start_job(
>  s->on_target_error = on_target_error;
>  s->is_none_mode = is_none_mode;
>  s->backing_mode = backing_mode;
> +s->zero_target = zero_target;
>  s->copy_mode = copy_mode;
>  s->base = base;
>  s->granularity = granularity;
> @@ -1713,6 +1717,7 @@ void mirror_start(const char *job_id, BlockDriverState 
> *bs,
>int creation_flags, int64_t speed,
>uint32_t granularity, int64_t buf_size,
>MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
> +  bool zero_target,
>BlockdevOnError on_source_error,
>BlockdevOnError on_target_error,
>bool unmap, const char *filter_node_name,
> @@ -1728,7 +1733,7 @@ void mirror_start(const char *job_id, BlockDriverState 
> *bs,
>  is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
>  base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
>  mirror_start_job(job_id, bs, creation_flags, target, replaces,
> - speed, granularity, buf_size, backing_mode,
> + speed, granularity, buf_size, backing_mode, zero_target,

Re: [Qemu-devel] [PATCH v2 01/11] qemu-img: Fix bdrv_has_zero_init() use in convert

2019-07-25 Thread Maxim Levitsky

On Wed, 2019-07-24 at 19:12 +0200, Max Reitz wrote:
> bdrv_has_zero_init() only has meaning for newly created images or image
> areas.  If qemu-img convert did not create the image itself, it cannot
> rely on bdrv_has_zero_init()'s result to carry any meaning.
> 
> Signed-off-by: Max Reitz 
> ---
>  qemu-img.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 79983772de..0f4be80c10 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1578,6 +1578,7 @@ typedef struct ImgConvertState {
>  bool has_zero_init;
>  bool compressed;
>  bool unallocated_blocks_are_zero;
> +bool target_is_new;
>  bool target_has_backing;
>  int64_t target_backing_sectors; /* negative if unknown */
>  bool wr_in_order;
> @@ -1975,9 +1976,11 @@ static int convert_do_copy(ImgConvertState *s)
>  int64_t sector_num = 0;
>  
>  /* Check whether we have zero initialisation or can get it efficiently */
> -s->has_zero_init = s->min_sparse && !s->target_has_backing
> - ? bdrv_has_zero_init(blk_bs(s->target))
> - : false;
> +if (s->target_is_new && s->min_sparse && !s->target_has_backing) {
> +s->has_zero_init = bdrv_has_zero_init(blk_bs(s->target));
> +} else {
> +s->has_zero_init = false;
> +}
>  
>  if (!s->has_zero_init && !s->target_has_backing &&
>  bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)))
> @@ -2423,6 +2426,8 @@ static int img_convert(int argc, char **argv)
>  }
>  }
>  
> +s.target_is_new = !skip_create;
> +
>  flags = s.min_sparse ? (BDRV_O_RDWR | BDRV_O_UNMAP) : BDRV_O_RDWR;
>  ret = bdrv_parse_cache_mode(cache, , );
>  if (ret < 0) {


Reviewed-by: Maxim Levitsky 
Best regards,
Maxim Levitsky

[Qemu-devel] [PATCH] ati-vga: Attempt to handle CRTC offset not exact multiple of stride

2019-07-25 Thread BALATON Zoltan

MacOS seems to need this and the resulting vbe_start_addr seems
correct but I'm not sure this is how it should work (picture is still
broken with MacOS but probably due to firmware problems now).

It also occured to me that these CRTC regs are also present in VGA so
I wonder if they should be shared in case some drivers try to poke
them via VGA regs. Added a comment noting this although in practice
drivers I've tried so far program the card accessing ati regs so I did
not attempt to change it.

Signed-off-by: BALATON Zoltan 
---
 hw/display/ati.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/display/ati.c b/hw/display/ati.c
index b849f5d510..d1db07dd2f 100644
--- a/hw/display/ati.c
+++ b/hw/display/ati.c
@@ -50,6 +50,7 @@ static void ati_vga_switch_mode(ATIVGAState *s)
 s->mode = EXT_MODE;
 if (s->regs.crtc_gen_cntl & CRTC2_EN) {
 /* CRT controller enabled, use CRTC values */
+/* FIXME Should these be the same as VGA CRTC regs? */
 uint32_t offs = s->regs.crtc_offset & 0x07ff;
 int stride = (s->regs.crtc_pitch & 0x7ff) * 8;
 int bpp = 0;
@@ -103,14 +104,20 @@ static void ati_vga_switch_mode(ATIVGAState *s)
 if (stride) {
 vbe_ioport_write_index(>vga, 0, VBE_DISPI_INDEX_VIRT_WIDTH);
 vbe_ioport_write_data(>vga, 0, stride);
-if (offs % stride == 0) {
-vbe_ioport_write_index(>vga, 0, 
VBE_DISPI_INDEX_Y_OFFSET);
-vbe_ioport_write_data(>vga, 0, offs / stride);
-} else {
-/* FIXME what to do with this? */
-error_report("VGA offset is not multiple of pitch, "
- "expect bad picture");
+if (offs % stride) {
+/* FIXME Is this correct? */
+warn_report("CRTC offset is not multiple of pitch");
+vbe_ioport_write_index(>vga, 0,
+   VBE_DISPI_INDEX_X_OFFSET);
+vbe_ioport_write_data(>vga, 0,
+  offs % stride / (bpp / 8));
 }
+vbe_ioport_write_index(>vga, 0, VBE_DISPI_INDEX_Y_OFFSET);
+vbe_ioport_write_data(>vga, 0, offs / stride);
+DPRINTF("VBE offset (%d,%d), vbe_start_addr=%x\n",
+s->vga.vbe_regs[VBE_DISPI_INDEX_X_OFFSET],
+s->vga.vbe_regs[VBE_DISPI_INDEX_Y_OFFSET],
+s->vga.vbe_start_addr);
 }
 }
 } else {
-- 
2.13.7

Re: [Qemu-devel] [PATCH v4] qapi: Add InetSocketAddress member keep-alive

2019-07-25 Thread Markus Armbruster

Vladimir Sementsov-Ogievskiy  writes:

> It's needed to provide keepalive for nbd client to track server
> availability.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space

2019-07-25 Thread Michael S. Tsirkin

On Thu, Jul 25, 2019 at 08:00:23AM -0600, Alex Williamson wrote:
> On Thu, 25 Jul 2019 06:43:04 -0400
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Jul 25, 2019 at 02:37:33PM +0800, Peter Xu wrote:
> > > After re-read Michael's reply, I feel like what Michael suggested is
> > > that we can simply ignore the bus-number-change case by the guest OS
> > > for now, but I might be wrong.  
> > That's what I suggested, yes.
> 
> "by the guest OS", yes, that's the part that's beyond the scope of this
> effort.  If we deem it necessary to support IVHD DMA alias though, it's
> the guest firmware that determines the initial bus numbers, which is
> part of the DeviceID used to defined IVHD entries and we would not be
> able to ignore that initial programming.  Everything related to the
> guest OS renumber PCI buses is out of scope, the guest firmware
> programming initial bus numbers is not.  

Right. That's par for the course, we have same issues with MCFG
and others. bios programs it and then we generate acpi based on
that.

>Thanks,
> 
> Alex

Re: [Qemu-devel] [PATCH v4] virtio-balloon: free pbp more aggressively

2019-07-25 Thread David Hildenbrand

On 25.07.19 17:18, Michael S. Tsirkin wrote:
> Previous patches switched to a temporary pbp but that does not go far
> enough: after device uses a buffer, guest is free to reuse it, so
> tracking the page and freeing it later is wrong.
> 
> Free and reset the pbp after we push each element.
> 
> Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host 
> page size")
> Cc: qemu-sta...@nongnu.org #v4.0.0
> Cc: David Hildenbrand 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> changes from v2, v3:
> v3 was same as v2, sent by mistake
> v4 fixes a typo
> 
>  hw/virtio/virtio-balloon.c | 37 -
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index fe9664e42c..25de154307 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -41,22 +41,19 @@ typedef struct PartiallyBalloonedPage {
>  
>  static void virtio_balloon_pbp_free(PartiallyBalloonedPage *pbp)
>  {
> -if (!pbp) {
> +if (!pbp->bitmap) {
>  return;
>  }
>  g_free(pbp->bitmap);
> -g_free(pbp);
> +pbp->bitmap = NULL;
>  }
>  
> -static PartiallyBalloonedPage *virtio_balloon_pbp_alloc(ram_addr_t base_gpa,
> -long subpages)
> +static void virtio_balloon_pbp_alloc(PartiallyBalloonedPage *pbp,
> + ram_addr_t base_gpa,
> + long subpages)
>  {
> -PartiallyBalloonedPage *pbp = g_new0(PartiallyBalloonedPage, 1);
> -
>  pbp->base_gpa = base_gpa;
>  pbp->bitmap = bitmap_new(subpages);
> -
> -return pbp;
>  }
>  
>  static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
> @@ -67,7 +64,7 @@ static bool 
> virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
>  
>  static void balloon_inflate_page(VirtIOBalloon *balloon,
>   MemoryRegion *mr, hwaddr mr_offset,
> - PartiallyBalloonedPage **pbp)
> + PartiallyBalloonedPage *pbp)
>  {
>  void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
>  ram_addr_t rb_offset, rb_aligned_offset, base_gpa;
> @@ -104,22 +101,21 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
>  base_gpa = memory_region_get_ram_addr(mr) + mr_offset -
> (rb_offset - rb_aligned_offset);
>  
> -if (*pbp && !virtio_balloon_pbp_matches(*pbp, base_gpa)) {
> +if (pbp->bitmap && !virtio_balloon_pbp_matches(pbp, base_gpa)) {
>  /* We've partially ballooned part of a host page, but now
>   * we're trying to balloon part of a different one.  Too hard,
>   * give up on the old partial page */
> -virtio_balloon_pbp_free(*pbp);
> -*pbp = NULL;
> +virtio_balloon_pbp_free(pbp);
>  }
>  
> -if (!*pbp) {
> -*pbp = virtio_balloon_pbp_alloc(base_gpa, subpages);
> +if (!pbp->bitmap) {
> +virtio_balloon_pbp_alloc(pbp, base_gpa, subpages);
>  }
>  
>  set_bit((rb_offset - rb_aligned_offset) / BALLOON_PAGE_SIZE,
> -(*pbp)->bitmap);
> +pbp->bitmap);
>  
> -if (bitmap_full((*pbp)->bitmap, subpages)) {
> +if (bitmap_full(pbp->bitmap, subpages)) {
>  /* We've accumulated a full host page, we can actually discard
>   * it now */
>  
> @@ -127,8 +123,7 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
>  /* We ignore errors from ram_block_discard_range(), because it
>   * has already reported them, and failing to discard a balloon
>   * page is not fatal */
> -virtio_balloon_pbp_free(*pbp);
> -*pbp = NULL;
> +virtio_balloon_pbp_free(pbp);
>  }
>  }
>  
> @@ -328,13 +323,14 @@ static void balloon_stats_set_poll_interval(Object 
> *obj, Visitor *v,
>  static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>  {
>  VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> -PartiallyBalloonedPage *pbp = NULL;
>  VirtQueueElement *elem;
>  MemoryRegionSection section;
>  
>  for (;;) {
> +PartiallyBalloonedPage pbp = {};
>  size_t offset = 0;
>  uint32_t pfn;
> +
>  elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
>  if (!elem) {
>  break;
> @@ -379,9 +375,8 @@ static void virtio_balloon_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>  virtqueue_push(vq, elem, offset);
>  virtio_notify(vdev, vq);
>  g_free(elem);
> +virtio_balloon_pbp_free();
>  }
> -
> -virtio_balloon_pbp_free(pbp);
>  }
>  
>  static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq)
> 

Works with hugetlbfs on x86 and LGTM:

Reviewed-by: David Hildenbrand 

-- 

Thanks,

David / dhildenb

1 2 3 4 >

1 - 100 of 324 matches

Mail list logo