date:20180302

Re: [Qemu-devel] Outreachy 2017-DecemberMarch Aspirant for Vulkan-ize_virgl Project

2018-03-02 Thread Anusha Srivastava

Hi Marc-Andre,

Any update on this as to how to get started on this ?



On Mar 1, 2018 13:46, "Stefan Hajnoczi"  wrote:

> On Wed, Feb 28, 2018 at 5:40 PM, Anusha Srivastava
>  wrote:
> > Stefan/Marc-Andre,
> >
> > Is it possible to take this project up now for Outreachy/GSOC 2018 ?
>
> I have provided Marc-André with information on how to add this project
> idea to Outreachy.
>
> Stefan
>

Re: [Qemu-devel] [PATCH v4 1/2] tpm: extend TPM emulator with state migration support

2018-03-02 Thread Stefan Berger


On 03/02/2018 05:14 AM, Dr. David Alan Gilbert wrote:

* Marc-André Lureau (marcandre.lur...@gmail.com) wrote:

Hi Stefan

On Thu, Mar 1, 2018 at 8:59 PM, Stefan Berger
 wrote:

Extend the TPM emulator backend device with state migration support.

The external TPM emulator 'swtpm' provides a protocol over
its control channel to retrieve its state blobs. We implement
functions for getting and setting the different state blobs.
In case the setting of the state blobs fails, we return a
negative errno code to fail the start of the VM.

Since we have an external TPM emulator, we need to make sure
that we do not migrate the state for as long as it is busy
processing a request. We need to wait for notification that
the request has completed processing.

Because qemu will have to deal with an external state, managed by the
tpm emulator (different from other storage/nvram):

- the emulator statedir must be different between source and dest? Can
it be the same?
- what is the story regarding versionning? Can you migrate from/to any
version, or only the same version?
- can the host source and destination be of different endianess?
- how is suspend and snapshot working?

There are probably other complications that I can't think of
immediately, but David or Juan help may help avoid mistakes.

Yes, I think that just about covers it.


It probably deserves a few explanations in docs/specs/tpm.txt.


Signed-off-by: Stefan Berger 
---
  hw/tpm/tpm_emulator.c | 312 --

It would be worth to add some basic tests. Either growing our own
tests/tpm-emu.c test emulator

or checking that swtpm is present (and of required version?) to
run/skip the test a more complete test.


  1 file changed, 302 insertions(+), 10 deletions(-)

diff --git a/hw/tpm/tpm_emulator.c b/hw/tpm/tpm_emulator.c
index b787aee..da877e5 100644
--- a/hw/tpm/tpm_emulator.c
+++ b/hw/tpm/tpm_emulator.c
@@ -55,6 +55,19 @@
  #define TPM_EMULATOR_IMPLEMENTS_ALL_CAPS(S, cap) (((S)->caps & (cap)) == 
(cap))

  /* data structures */
+
+/* blobs from the TPM; part of VM state when migrating */
+typedef struct TPMBlobBuffers {
+uint32_t permanent_flags;
+TPMSizedBuffer permanent;
+
+uint32_t volatil_flags;
+TPMSizedBuffer volatil;
+
+uint32_t savestate_flags;
+TPMSizedBuffer savestate;
+} TPMBlobBuffers;
+
  typedef struct TPMEmulator {
  TPMBackend parent;

@@ -70,6 +83,8 @@ typedef struct TPMEmulator {

  unsigned int established_flag:1;
  unsigned int established_flag_cached:1;
+
+TPMBlobBuffers state_blobs;

Suspicious addition, it shouldn't be necessary in running state. It
could be allocated on demand? Even better if the field is removed, but
this may not be possible with VMSTATE macros.


  } TPMEmulator;


@@ -301,7 +316,8 @@ static int tpm_emulator_set_buffer_size(TPMBackend *tb,
  return 0;
  }

-static int tpm_emulator_startup_tpm(TPMBackend *tb, size_t buffersize)
+static int _tpm_emulator_startup_tpm(TPMBackend *tb, size_t buffersize,
+ bool is_resume)

Using underscore-prefixed names is discouraged. Call it tpm_emulator_init?


  {
  TPMEmulator *tpm_emu = TPM_EMULATOR(tb);
  ptm_init init = {
@@ -309,12 +325,17 @@ static int tpm_emulator_startup_tpm(TPMBackend *tb, 
size_t buffersize)
  };
  ptm_res res;

+DPRINTF("%s   is_resume: %d", __func__, is_resume);
+

extra spaces, you may also add buffersize while at it.

Also it would be good to use trace_'s where possible rather than
DPRINTFs.


Here's a branch where I am doing that now:

https://github.com/stefanberger/qemu-tpm/commits/tracing

I would leave a DEBUG_TIS in the tpm_tis.c, though.


That TPMSizedBuffer is not really useful. Use GArray or qemu Buffer ?
(and we could drop TPMSizedBuffer).

Also, given this is an arbitrary sized chunk, this should probably use
g_try_malloc and check the result rather than letting g_malloc assert on
failure (especially true on the source side).


Will fix. Thanks.





+n = qemu_chr_fe_read_all(_emu->ctrl_chr, tsb->buffer, totlength);
+if (n != totlength) {
+error_report("tpm-emulator: Could not read stateblob (type %d) : %s",
+ type, strerror(errno));
+return -1;
+}
+tsb->size = totlength;
+
+DPRINTF("got state blob type %d, %d bytes, flags 0x%08x\n",
+type, tsb->size, *flags);
+
+return 0;
+}
+
+static int tpm_emulator_get_state_blobs(TPMEmulator *tpm_emu)
+{
+TPMBlobBuffers *state_blobs = _emu->state_blobs;
+
+if (tpm_emulator_get_state_blob(tpm_emu, PTM_BLOB_TYPE_PERMANENT,
+_blobs->permanent,
+_blobs->permanent_flags) < 0 ||
+tpm_emulator_get_state_blob(tpm_emu, PTM_BLOB_TYPE_VOLATILE,
+_blobs->volatil,
+_blobs->volatil_flags) < 0

Re: [Qemu-devel] [PULL] RISC-V QEMU Port Submission v8

2018-03-02 Thread Michael Clark

On Sat, Mar 3, 2018 at 3:22 AM, Peter Maydell 
wrote:

> On 2 March 2018 at 13:55, Michael Clark  wrote:
> >
> > *** Changelog ***
> >
> > v8
> >
> > - - Added linux-user/riscv/target_elf.h during rebase
> > - - Make resetvec configurable and clear mpp and mie on reset
> > - - Use SiFive E31, E51, U34 and U54 cores in SiFive machines
> > - - Define SiFive E31, E51, U34 and U54 cores
> > - - Refactor CPU core definition in preparation for vendor cores
> > - - Prevent S or U mode unless S or U extensions are present
> > - - SiFive E Series cores have no MMU
> > - - SiFive E Series cores have U mode
> > - - Make privileged ISA v1.10 implicit in CPU types
> > - - Remove DRAM_BASE and EXT_IO_BASE as they vary by machine
> > - - Correctly handle mtvec and stvec alignment with respect to RVC
> > - - Print more machine mode state in riscv_cpu_dump_state
> > - - Make riscv_isa_string use compact extension order method
> > - - Fix bug introduced in v6 RISCV_CPU_TYPE_NAME macro change
> > - - Parameterize spike v1.9.1 config string
> > - - Coalesce spike_v1.9.1 and spike_v1.10 machines
> > - - Rename sifive_e300 to sifive_e, and sifive_u500 to sifive_u
>
> Please don't send pull requests until after patches have been put
> on list and been reviewed. A minor update to a pullreq is OK if
> it's something like a trivial compiler fix or just dropping some
> patches that had problems, but if you have this many changes that
> deserves a fresh patchset to be sent to the list for review.
>
> (For the QEMU workflow, a pull request isn't a request for patch
> review, it's a statement that patches have all had review and
> are ready to go into master immediately.)

My apoligies. I won't do this again.

I have some very very minor cleanups that do not affect logic, but perhaps
we could address this after getting approval to make a pull request for v8.

My qemu-devel branch holds changes against the latest rebase:

- https://github.com/michaeljclark/riscv-qemu/tree/qemu-devel

Someone raised timebase frequency on the RISC-V sw-dev and after looking at
the code I noticed we had a hard-coded value for a few of the constants we
put in device tree, and i spotted a missed rename. I'm going to have to
learn about the qemu-devel process for trivial fixes...

Thanks for bearing with me, and my apologies for not following QEMU
workflow ettiquite with the v8 series. the tag is locked down and signed in
any case.

Regards,
Michael.

Re: [Qemu-devel] [PATCH v8 23/23] RISC-V Build Infrastructure

2018-03-02 Thread Michael Clark

Let me know if you have a branch for me to pull and rebase against.

We are passing all build and make check tests in travis (except for a
couple of build timeouts because we are hitting the default 50 minute
timeout)

https://travis-ci.org/riscv/riscv-qemu/builds/348234736

On Sat, Mar 3, 2018 at 3:33 AM, Eric Blake  wrote:

> On 03/02/2018 07:51 AM, Michael Clark wrote:
>
>> This adds RISC-V into the build system enabling the following targets:
>>
>> - riscv32-softmmu
>> - riscv64-softmmu
>> - riscv32-linux-user
>> - riscv64-linux-user
>>
>> This adds defaults configs for RISC-V, enables the build for the RISC-V
>> CPU core, hardware, and Linux User Emulation. The 'qemu-binfmt-conf.sh'
>> script is updated to add the RISC-V ELF magic.
>>
>> Expected checkpatch errors for consistency reasons:
>>
>> ERROR: line over 90 characters
>> FILE: scripts/qemu-binfmt-conf.sh
>>
>> Reviewed-by: Richard Henderson 
>> Signed-off-by: Sagar Karandikar 
>> Signed-off-by: Michael Clark 
>> ---
>>   arch_init.c|  2 ++
>>   configure  | 13 +
>>   cpus.c |  6 ++
>>   default-configs/riscv32-linux-user.mak |  1 +
>>   default-configs/riscv32-softmmu.mak|  4 
>>   default-configs/riscv64-linux-user.mak |  1 +
>>   default-configs/riscv64-softmmu.mak|  4 
>>   hw/riscv/Makefile.objs | 11 +++
>>   include/sysemu/arch_init.h |  1 +
>>   qapi-schema.json   | 17 -
>>
>
> Will need rebasing to modify qapi/misc.json if my pending PULL request
> issues get resolved first:
>
> https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg00469.html
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>

Re: [Qemu-devel] [PATCH v8 03/23] RISC-V CPU Core Definition

2018-03-02 Thread Michael Clark

Paraphrase this as, we should be closer to reproducing the behaviour of the
SiFive E31, E51, U34 and U54 cores when running RISC-V and SiFive
verification tests. i.e. now if one attempts to configure the MMU on E
cores one will get an illegal instruction trap.

We still have an E21 core to add but I need some more details on
behavioural differences between the SiFive E21 and E31. We're happy to keep
this in the riscv branch for after the 2.12 release. We don't have any more
features to add pre QEMU 2.12, and if we do we will maintain them on a
branch in the https://github.com/riscv/riscv-qemu.git repository.

So now I think we're mostly at the point where we will be in a rebase loop,
assuming we make it in to qemu before 2.12.

If we make a v9 it will contain extremely minor changes. e.g. replacing
hard-coded values with constants and fixing typos.

- https://github.com/michaeljclark/riscv-qemu/commits/qemu-devel

I believe we have all of the necessary licensing changes and sign-offs done.

On Sat, Mar 3, 2018 at 3:23 PM, Michael Clark  wrote:

> We were able to remove several ifdefs and figured out a problem with
> masking out cores for qemu-system-riscv32 and qemu-system-riscv64.
>
> This version of the core patch seems cleaner to me and we have fixed a few
> spec compliance issues with regard to alignment of mtvec/stvec when the C
> extension is enabled or disabled.
>
> We haven't addressed runtime changes to 'misa' as it is legal for 'misa'
> to be read-only. A later "feature" patch will add runtime support for misa
> changes, likely just invalidating the translation cache due to the limited
> number of bits in mmu_index. I can say for sure we not going to do anything
> this risky so close to soft-freeze. The last minute changes were focused at
> specification compliance for MMU vs NOMMU and respecting the 'S' and 'U'
> misa bits with respect to allowable privilege modes. SiFive's E Series
> cores do not support S mode but they do support U mode.
>
> We also had an internal discussion about S-mode vs mmu, and it was decided
> that it is a legal combination to have a nommu core that implements S mode.
> In this use case, M mode could configure PMP (Physical Memory Protection),
> and it would potentially possible to port FDPIC nommu linux to work on a
> core with S mode and nommu. The other S mode registers besides 'satp'
> (Supervisor Address Translation Pointer) are available in S-mode with nommu.
>
> The v8 patch series just tightens up compliance with the specification,
> and makes use of the mmu register (satp), selecting S-mode or calling SRET
> cause illegal instruction traps.
>
> Writes to mstatus.mpp of un unsupported modes (i.e. 'S' or 'U' misa bits
> not present) are silently dropped, as that is the behaviour I am told
> should be implemented. We may have to raise some issues against the RISC-V
> Privileged ISA specification to make sure that this behaviour is specified,
> as it currently does not specify explicitly the behaviour for a core that
> supports S-mode with no-mmu.
>
> Regards,
> Michael.
>
> On Sat, Mar 3, 2018 at 2:51 AM, Michael Clark  wrote:
>
>> Add CPU state header, CPU definitions and initialization routines
>>
>> Reviewed-by: Richard Henderson 
>> Signed-off-by: Sagar Karandikar 
>> Signed-off-by: Michael Clark 
>> ---
>>  target/riscv/cpu.c  | 432 ++
>> ++
>>  target/riscv/cpu.h  | 296 +
>>  target/riscv/cpu_bits.h | 411 ++
>> +++
>>  3 files changed, 1139 insertions(+)
>>  create mode 100644 target/riscv/cpu.c
>>  create mode 100644 target/riscv/cpu.h
>>  create mode 100644 target/riscv/cpu_bits.h
>>
>> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> new file mode 100644
>> index 000..4851890
>> --- /dev/null
>> +++ b/target/riscv/cpu.c
>> @@ -0,0 +1,432 @@
>> +/*
>> + * QEMU RISC-V CPU
>> + *
>> + * Copyright (c) 2016-2017 Sagar Karandikar, sag...@eecs.berkeley.edu
>> + * Copyright (c) 2017-2018 SiFive, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2 or later, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> along with
>> + * this program.  If not, see .
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/log.h"
>> +#include "cpu.h"
>> +#include "exec/exec-all.h"
>> +#include "qapi/error.h"
>> +#include

Re: [Qemu-devel] [PATCH v8 03/23] RISC-V CPU Core Definition

2018-03-02 Thread Michael Clark

We were able to remove several ifdefs and figured out a problem with
masking out cores for qemu-system-riscv32 and qemu-system-riscv64.

This version of the core patch seems cleaner to me and we have fixed a few
spec compliance issues with regard to alignment of mtvec/stvec when the C
extension is enabled or disabled.

We haven't addressed runtime changes to 'misa' as it is legal for 'misa' to
be read-only. A later "feature" patch will add runtime support for misa
changes, likely just invalidating the translation cache due to the limited
number of bits in mmu_index. I can say for sure we not going to do anything
this risky so close to soft-freeze. The last minute changes were focused at
specification compliance for MMU vs NOMMU and respecting the 'S' and 'U'
misa bits with respect to allowable privilege modes. SiFive's E Series
cores do not support S mode but they do support U mode.

We also had an internal discussion about S-mode vs mmu, and it was decided
that it is a legal combination to have a nommu core that implements S mode.
In this use case, M mode could configure PMP (Physical Memory Protection),
and it would potentially possible to port FDPIC nommu linux to work on a
core with S mode and nommu. The other S mode registers besides 'satp'
(Supervisor Address Translation Pointer) are available in S-mode with nommu.

The v8 patch series just tightens up compliance with the specification, and
makes use of the mmu register (satp), selecting S-mode or calling SRET
cause illegal instruction traps.

Writes to mstatus.mpp of un unsupported modes (i.e. 'S' or 'U' misa bits
not present) are silently dropped, as that is the behaviour I am told
should be implemented. We may have to raise some issues against the RISC-V
Privileged ISA specification to make sure that this behaviour is specified,
as it currently does not specify explicitly the behaviour for a core that
supports S-mode with no-mmu.

Regards,
Michael.

On Sat, Mar 3, 2018 at 2:51 AM, Michael Clark  wrote:

> Add CPU state header, CPU definitions and initialization routines
>
> Reviewed-by: Richard Henderson 
> Signed-off-by: Sagar Karandikar 
> Signed-off-by: Michael Clark 
> ---
>  target/riscv/cpu.c  | 432 ++
> ++
>  target/riscv/cpu.h  | 296 +
>  target/riscv/cpu_bits.h | 411 ++
> +++
>  3 files changed, 1139 insertions(+)
>  create mode 100644 target/riscv/cpu.c
>  create mode 100644 target/riscv/cpu.h
>  create mode 100644 target/riscv/cpu_bits.h
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> new file mode 100644
> index 000..4851890
> --- /dev/null
> +++ b/target/riscv/cpu.c
> @@ -0,0 +1,432 @@
> +/*
> + * QEMU RISC-V CPU
> + *
> + * Copyright (c) 2016-2017 Sagar Karandikar, sag...@eecs.berkeley.edu
> + * Copyright (c) 2017-2018 SiFive, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2 or later, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> along with
> + * this program.  If not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "cpu.h"
> +#include "exec/exec-all.h"
> +#include "qapi/error.h"
> +#include "migration/vmstate.h"
> +
> +/* RISC-V CPU definitions */
> +
> +static const char riscv_exts[26] = "IMAFDQECLBJTPVNSUHKORWXYZG";
> +
> +const char * const riscv_int_regnames[] = {
> +  "zero", "ra  ", "sp  ", "gp  ", "tp  ", "t0  ", "t1  ", "t2  ",
> +  "s0  ", "s1  ", "a0  ", "a1  ", "a2  ", "a3  ", "a4  ", "a5  ",
> +  "a6  ", "a7  ", "s2  ", "s3  ", "s4  ", "s5  ", "s6  ", "s7  ",
> +  "s8  ", "s9  ", "s10 ", "s11 ", "t3  ", "t4  ", "t5  ", "t6  "
> +};
> +
> +const char * const riscv_fpr_regnames[] = {
> +  "ft0 ", "ft1 ", "ft2 ", "ft3 ", "ft4 ", "ft5 ", "ft6 ",  "ft7 ",
> +  "fs0 ", "fs1 ", "fa0 ", "fa1 ", "fa2 ", "fa3 ", "fa4 ",  "fa5 ",
> +  "fa6 ", "fa7 ", "fs2 ", "fs3 ", "fs4 ", "fs5 ", "fs6 ",  "fs7 ",
> +  "fs8 ", "fs9 ", "fs10", "fs11", "ft8 ", "ft9 ", "ft10",  "ft11"
> +};
> +
> +const char * const riscv_excp_names[] = {
> +"misaligned_fetch",
> +"fault_fetch",
> +"illegal_instruction",
> +"breakpoint",
> +"misaligned_load",
> +"fault_load",
> +"misaligned_store",
> +"fault_store",
> +"user_ecall",
> +"supervisor_ecall",
> +"hypervisor_ecall",
> +"machine_ecall",
> +"exec_page_fault",
> +"load_page_fault",
> +"reserved",
> +

Re: [Qemu-devel] [patches] Re: [PULL] RISC-V QEMU Port Submission

2018-03-02 Thread Michael Clark

On Thu, Mar 1, 2018 at 9:40 AM, Michael Clark  wrote:

>
>
> On Thu, Mar 1, 2018 at 12:53 AM, Peter Maydell 
> wrote:
>
>> On 28 February 2018 at 00:09, Michael Clark  wrote:
>> > I've just talked to SiFive about this. They have agreed that we can
>> remove
>> > the sifive_e300 and sifive_u500 boards from the patch series that we are
>> > going to submit upstream again later this week or early next week. These
>> > machines and their devices are pretty easy for us to maintain in the
>> riscv
>> > or a sifive repo. This trims the number of machines from 5 to 3 and
>> lets us
>> > remove the SiFiveUART and SiFivePRCI from the next patch series we are
>> > going to submit. e.g. v8
>>
>> Models of boards which people actively want and are using are fine
>> (though indeed you can save them for a later round of patches if you
>> like). And it sounds like the 1.9.1 stuff is genuinely wanted, so
>> thanks for the clarification there. What I want to avoid is boards
>> going into QEMU just because you happened to have them already. Once
>> a board model goes into QEMU it's a commitment to supporting it for
>> the long term, and getting rid of it again is hard.
>
>
> Most folk in the community at large are interested in the 'spike' and
> 'virt' machines. Well, the linux distributors are currently targetting
> 'virt' as it has working network and block storage.
>
> It's mostly SiFive customers that are interested in the SiFive machines.
>
> SiFive do intend to submit their machines upstream however we've decided
> that we want to review the naming of the machine/board/SOC in light of
> several new models, as there are infact more MCU/SOC models than are
> currently represented in the RISC-V QEMU port (E21, E31, E51, U54, U54MC).
> Those are the SOCs and then there are boards like the HiFive1, the LoFive,
> the HiFive Unleashed and several others (the 7th RISC-V Foundation Workshop
> had an electronic badge with the FE310 designed by antmicro). SiFive might
> like to review the naming and re-jig the SOC / board relationship.
> Presently we created two generic boards to represent the MMU-less E-series
> (sifive_e300) and the U-series (sifive_u500) however that doesn't cover the
> E21 and E51. As well as silicon, there are soft-cores, including soft-cores
> from other vendors (who have not yet submitted anything to the QEMU port).
> After reflecting on this we don't think the naming of the two sifive
> machines is quite right and they are not yet complete emulations. The 
> sifive_u500
> is supposed to model the HiFive Unleashed however we don't have device
> emulation for all of the hardware widgets yet, and some of the drivers are
> not yet in upstream Linux. Linus essentially accepted the generic core,
> which is what we are now suggesting for QEMU. We are totally happy to
> submit them if folk don't mind us potentially renaming them later. This
> is 2 patches for 2 machines and 2 or 3 devices. It would reduce the patch
> series from 23 patches to 18 patches and we'd maintain the 5 files and
> associated headers out-of tree until things firm up. The core of the port
> is pretty solid as Fedora have build stage 4 images using SMP on the 'virt'
> machine.
>

We had internal discussions about the SiFive machines and SiFive decided to
rename SiFive E300 (sifive_e300.c) and SiFive U500 (sifive_u500.c) to
SiFive E Series (sifive_e.c) and SiFive U Series (sifive_u.c). This was the
result of synchronising with the product marketing folk.

- sifive_e will instantiate a e31 nommu core when run in qemu-system-riscv32
- sifive_e will instantiate a e51 nommu core when run in qemu-system-riscv64
- sifive_u will instantiate a u34 mmu core when run in qemu-system-riscv32
- sifive_u will instantiate a u54 mmu core when run in qemu-system-riscv64

The riscv port is interesting because the machines are designed to be
instantiated with either 32-bit or 64-bit cores. SiFive decided they wanted
to get this right before we contributed the SiFive boards. We think we now
have a model we are satisfied with submittting.

We did some last minute changes to strengthen the riscv-qemu port
specificaiton compliance. The nommu machines worked previously but the lack
of an mmu was not enforced. We've now disabled mmu on cores that lack mmu
(SiFive E series). We've also made some changes to strengthen specification
compliance. It is possible for a core to implement S mode and U mode
(Supervisor and User privilege mode) either with or without an mmu. We've
now matches the hardware behaviour and discard mstatus.mpp writes for modes
that are not supported by the core, making it not possible to enter S mode
on the E series cores. We are pretty confident that these are low risk
changes, and we have tested that Linux still works fine in the 'virt'
machine. The 'virt' machine is likely to be the most popular machine for
use by the Linux community.

Given SiFive has SOCs and Soft-core IP, we would

Re: [Qemu-devel] [PATCH v1 03/21] RISC-V CPU Core Definition

2018-03-02 Thread Michael Clark

Hi Antony,

As of v8 of the RISC-V QEMU target patch series, you can now define the
reset vector in your CPU initializer:

https://github.com/riscv/riscv-qemu/blob/qemu-upstream-v8/target/riscv/cpu.c#L110-L168

Michael.

On Fri, Jan 5, 2018 at 6:53 AM, Antony Pavlov 
wrote:

> On Thu, 4 Jan 2018 20:33:57 +1300
> Michael Clark  wrote:
>
> > On Thu, Jan 4, 2018 at 7:47 PM, Antony Pavlov 
> > wrote:
> >
> > > On Wed,  3 Jan 2018 13:44:07 +1300
> > > Michael Clark  wrote:
> > >
> > > > Add CPU state header, CPU definitions and initialization routines
> > > >
> > > > Signed-off-by: Michael Clark 
> > > > ---
> > > >  target/riscv/cpu.c  | 338 ++
> +
> > > >  target/riscv/cpu.h  | 363 ++
> > > 
> > > >  target/riscv/cpu_bits.h | 411 ++
> > > ++
> > > >  3 files changed, 1112 insertions(+)
> > > >  create mode 100644 target/riscv/cpu.c
> > > >  create mode 100644 target/riscv/cpu.h
> > > >  create mode 100644 target/riscv/cpu_bits.h
> > > >
> > > > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > >
> > > ...
> > >
> > > > +static void riscv_cpu_reset(CPUState *cs)
> > > > +{
> > > > +RISCVCPU *cpu = RISCV_CPU(cs);
> > > > +RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
> > > > +CPURISCVState *env = >env;
> > > > +
> > > > +mcc->parent_reset(cs);
> > > > +#ifndef CONFIG_USER_ONLY
> > > > +tlb_flush(cs);
> > > > +env->priv = PRV_M;
> > > > +env->mtvec = DEFAULT_MTVEC;
> > > > +#endif
> > > > +env->pc = DEFAULT_RSTVEC;
> > >
> > > The RISC-V Privileged Architecture Manual v1.10 states that
> > >
> > >   The pc is set to an implementation-defined reset vector.
> > >
> > > But hard-coded DEFAULT_RSTVEC leaves no chance for changing reset
> vector.
> > >
> > > Can we add a mechanism for changing reset vector?
> > >
> >
> > That can be added very easily at some point when necessary.
> >
> > All 5 RISC-V machines in the QEMU port currently have their emulated Mask
> > ROMs at 0x1000 so its not necessary until we add a machine that needs a
> > different value. I certainly wouldn't reject a patch that adds that
> > functionality if we had a machine with a different reset vector, although
> > given we have 5 machines using the same vector, it may remain a sensible
> > default. I would think twice about adding a property that no machines
> sets,
> > or duplicate code and have all machines set their reset vector even when
> > they are all the same? Shall we add the functionality when we need it?
>
> Actually it is me who needs this functionality.
>
> At the moment my board code needs this dirty hack:
>
> https://github.com/miet-riscv-workgroup/riscv-qemu/commit/
> bfc8221d89b9bb828f3742f17eb89d8513a75aae#diff-
> 429448b1b26e0bc4256cc290758c0ab5
>
> >
> > I'd categorise this as a feature request. #define DEFAULT_RSTVEC
> 0x1000
> > is the "implementation-defined reset vector"
> >
> > Folk on the RISC-V mailing list are actually seeking guidance on the
> blanks
> > in the RISC-V specification so it may be that a de-facto standard emerges
> > for some of these "implementation defined" blanks, in which case it may
> > become part of a platform spec (vs the ISA spec).
> >
> > E.g. there is the "reset-hivecs" property in the ARM emulation code
> > > so SoC-specific code can change reset vector.
>
> --
> Best regards,
>   Antony Pavlov
>

Re: [Qemu-devel] [PATCH 0/3] vfio/pci: ioeventfd support

2018-03-02 Thread Tian, Kevin

> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Saturday, March 3, 2018 2:03 AM
> 
> On Fri, 2 Mar 2018 07:08:51 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson
> > > Sent: Thursday, March 1, 2018 4:15 AM
> > >
> > > A vfio ioeventfd will perform the pre-specified device write on
> > > triggering of an eventfd.  When coupled with KVM ioeventfds, this
> > > feature allows a VM to trap a device page for virtualization, while
> > > also registering targeted ioeventfds to maintain performance of high
> > > frequency register writes within the trapped range.  Much like the
> > > existing interrupt eventfd/irqfd coupling, such writes can be handled
> > > entirely in the host kernel.
> > >
> > > The new VFIO device ioctl may be supported by any vfio bus driver,
> > > including mdev drivers, but the implementation here only enables
> > > vfio-pci.  This is intended as an acceleration path, bus drivers
> > > may choose which regions to support and userspace should always
> > > intend to fall back to non-accelerated handling when unavailable.
> > >
> >
> > it's a nice feature! A curious question. Is it possible for mdev driver
> > to directly create ioeventfd on specified offset? Currently ioeventfd
> > requires quirks in Qemu, which must know the device detail to
> > create ioeventfd and then connect vfio and kvm together. However
> > mdev instance is more software defined thus I'm not sure whether
> > asking Qemu to catch up quirk with underlying software logic could
> > be overwhelmed. Also in case of vendor driver emulating mdev
> > with same DID/VID as a real device, it might be difficult for Qemu
> > to figure out whether a vfio device is a real one or mdev one to
> > apply a mdev specific quirk. On the other hand, since vendor
> > driver knows all the logic constructing mdev, it would be more
> > convenient allowing vendor driver to directly create/destroy
> > ioeventfd on its demand?
> 
> Are file descriptors the right interface between kernel components if
> userspace is not involved in connecting them?  Seems like not.  vfio is
> designed to be independent of KVM with the necessary interactions
> between them orchestrated by userspace.  KVMGT is about the only
> exception to that and TBH I'm not thrilled about extending that
> further.  Thanks,
> 

why do you think KVMGT is the only exception? Isn't above scenario
general enough to any mdev implementation, just as what you're
trying to solve, that if a trapped operation happens too frequently on
mdev then we may want to optimize it? I understand fd is not a good 
way in-between kernel components, just asking here in case you
have some good idea in mind. If adding mdev specific quirk in 
Qemu is the only way for similar optimization, I'm fine with it. One
thing which I was not sure though is still about the scenario where 
mdev is constructed same as a real device then what would happen
if Qemu applies a mdev-specific quirk to both sides - of course 
just a mental thinking, not sure whether it happens in real...

Thanks
Kevin

[Qemu-devel] [PATCH qemu v4 1/2] qmp: Merge ObjectPropertyInfo and DevicePropertyInfo

2018-03-02 Thread Alexey Kardashevskiy

ObjectPropertyInfo is more generic and only missing @description.
This adds a description to ObjectPropertyInfo and removes
DevicePropertyInfo so the resulting ObjectPropertyInfo can be used
elsewhere.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v4:
* added (since 2.12) to the new field
* removed wrong reindents
---
 qapi-schema.json | 23 +--
 qdev-monitor.c   |  6 +++---
 qmp.c| 14 +++---
 3 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index 0262b9f..5cdf89a 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1266,10 +1266,12 @@
 #3) A link type in the form 'link' where subtype is a qdev
 #   device type name.  Link properties form the device model graph.
 #
+# @description: if specified, the description of the property (since 2.12).
+#
 # Since: 1.2
 ##
 { 'struct': 'ObjectPropertyInfo',
-  'data': { 'name': 'str', 'type': 'str' } }
+  'data': { 'name': 'str', 'type': 'str', '*description': 'str' } }
 
 ##
 # @qom-list:
@@ -1425,34 +1427,19 @@
   'returns': [ 'ObjectTypeInfo' ] }
 
 ##
-# @DevicePropertyInfo:
-#
-# Information about device properties.
-#
-# @name: the name of the property
-# @type: the typename of the property
-# @description: if specified, the description of the property.
-#   (since 2.2)
-#
-# Since: 1.2
-##
-{ 'struct': 'DevicePropertyInfo',
-  'data': { 'name': 'str', 'type': 'str', '*description': 'str' } }
-
-##
 # @device-list-properties:
 #
 # List properties associated with a device.
 #
 # @typename: the type name of a device
 #
-# Returns: a list of DevicePropertyInfo describing a devices properties
+# Returns: a list of ObjectPropertyInfo describing a devices properties
 #
 # Since: 1.2
 ##
 { 'command': 'device-list-properties',
   'data': { 'typename': 'str'},
-  'returns': [ 'DevicePropertyInfo' ] }
+  'returns': [ 'ObjectPropertyInfo' ] }
 
 ##
 # @xen-set-global-dirty-log:
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 8462381..ab9c46c 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -258,8 +258,8 @@ int qdev_device_help(QemuOpts *opts)
 {
 Error *local_err = NULL;
 const char *driver;
-DevicePropertyInfoList *prop_list;
-DevicePropertyInfoList *prop;
+ObjectPropertyInfoList *prop_list;
+ObjectPropertyInfoList *prop;
 
 driver = qemu_opt_get(opts, "driver");
 if (driver && is_help_option(driver)) {
@@ -295,7 +295,7 @@ int qdev_device_help(QemuOpts *opts)
 }
 }
 
-qapi_free_DevicePropertyInfoList(prop_list);
+qapi_free_ObjectPropertyInfoList(prop_list);
 return 1;
 
 error:
diff --git a/qmp.c b/qmp.c
index 793f6f3..42b7862 100644
--- a/qmp.c
+++ b/qmp.c
@@ -456,19 +456,19 @@ ObjectTypeInfoList *qmp_qom_list_types(bool 
has_implements,
 return ret;
 }
 
-/* Return a DevicePropertyInfo for a qdev property.
+/* Return a ObjectPropertyInfo for a qdev property.
  *
  * If a qdev property with the given name does not exist, use the given default
  * type.  If the qdev property info should not be shown, return NULL.
  *
  * The caller must free the return value.
  */
-static DevicePropertyInfo *make_device_property_info(ObjectClass *klass,
+static ObjectPropertyInfo *make_device_property_info(ObjectClass *klass,
  const char *name,
  const char *default_type,
  const char *description)
 {
-DevicePropertyInfo *info;
+ObjectPropertyInfo *info;
 Property *prop;
 
 do {
@@ -508,14 +508,14 @@ static DevicePropertyInfo 
*make_device_property_info(ObjectClass *klass,
 return info;
 }
 
-DevicePropertyInfoList *qmp_device_list_properties(const char *typename,
+ObjectPropertyInfoList *qmp_device_list_properties(const char *typename,
Error **errp)
 {
 ObjectClass *klass;
 Object *obj;
 ObjectProperty *prop;
 ObjectPropertyIterator iter;
-DevicePropertyInfoList *prop_list = NULL;
+ObjectPropertyInfoList *prop_list = NULL;
 
 klass = object_class_by_name(typename);
 if (klass == NULL) {
@@ -540,8 +540,8 @@ DevicePropertyInfoList *qmp_device_list_properties(const 
char *typename,
 
 object_property_iter_init(, obj);
 while ((prop = object_property_iter_next())) {
-DevicePropertyInfo *info;
-DevicePropertyInfoList *entry;
+ObjectPropertyInfo *info;
+ObjectPropertyInfoList *entry;
 
 /* Skip Object and DeviceState properties */
 if (strcmp(prop->name, "type") == 0 ||
-- 
2.11.0

[Qemu-devel] [PATCH qemu v4 2/2] qmp: Add qom-list-properties to list QOM object properties

2018-03-02 Thread Alexey Kardashevskiy

There is already 'device-list-properties' which does most of the job,
however it does not handle everything returned by qom-list-types such
as machines as they inherit directly from TYPE_OBJECT and not TYPE_DEVICE.
It does not handle abstract classes either.

This adds a new qom-list-properties command which prints properties
of a specific class and its instance. It is pretty much a simplified copy
of the device-list-properties handler.

Since it creates an object instance, device properties should appear
in the output as they are copied to QOM properties at the instance_init
hook.

This adds a object_class_property_iter_init() helper to allow class
properties enumeration uses it in the new QMP command to allow properties
listing for abstract classes.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* Used ObjectPropertyInfo instead of QOMPropertyInfo

v2:
* added abstract classes support, now things like "pci-device" or
"spapr-machine" show properties, previously these would produce
an "abstract class" error

# Conflicts:
#   qapi-schema.json
---
 qapi-schema.json | 15 +++
 include/qom/object.h | 16 
 qmp.c| 49 +
 qom/object.c |  7 +++
 4 files changed, 87 insertions(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index 5cdf89a..3021e90 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1442,6 +1442,21 @@
   'returns': [ 'ObjectPropertyInfo' ] }
 
 ##
+# @qom-list-properties:
+#
+# List properties associated with a QOM object.
+#
+# @typename: the type name of an object
+#
+# Returns: a list of ObjectPropertyInfo describing object properties
+#
+# Since: 2.12
+##
+{ 'command': 'qom-list-properties',
+  'data': { 'typename': 'str'},
+  'returns': [ 'ObjectPropertyInfo' ] }
+
+##
 # @xen-set-global-dirty-log:
 #
 # Enable or disable the global dirty log mode.
diff --git a/include/qom/object.h b/include/qom/object.h
index dc73d59..ef07d78 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1017,6 +1017,22 @@ void object_property_iter_init(ObjectPropertyIterator 
*iter,
Object *obj);
 
 /**
+ * object_class_property_iter_init:
+ * @klass: the class
+ *
+ * Initializes an iterator for traversing all properties
+ * registered against an object class and all parent classes.
+ *
+ * It is forbidden to modify the property list while iterating,
+ * whether removing or adding properties.
+ *
+ * This can be used on abstract classes as it does not create a temporary
+ * instance.
+ */
+void object_class_property_iter_init(ObjectPropertyIterator *iter,
+ ObjectClass *klass);
+
+/**
  * object_property_iter_next:
  * @iter: the iterator instance
  *
diff --git a/qmp.c b/qmp.c
index 42b7862..acd9048 100644
--- a/qmp.c
+++ b/qmp.c
@@ -576,6 +576,55 @@ ObjectPropertyInfoList *qmp_device_list_properties(const 
char *typename,
 return prop_list;
 }
 
+ObjectPropertyInfoList *qmp_qom_list_properties(const char *typename,
+ Error **errp)
+{
+ObjectClass *klass;
+Object *obj = NULL;
+ObjectProperty *prop;
+ObjectPropertyIterator iter;
+ObjectPropertyInfoList *prop_list = NULL;
+
+klass = object_class_by_name(typename);
+if (klass == NULL) {
+error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+  "Class '%s' not found", typename);
+return NULL;
+}
+
+klass = object_class_dynamic_cast(klass, TYPE_OBJECT);
+if (klass == NULL) {
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "typename", 
TYPE_OBJECT);
+return NULL;
+}
+
+if (object_class_is_abstract(klass)) {
+object_class_property_iter_init(, klass);
+} else {
+obj = object_new(typename);
+object_property_iter_init(, obj);
+}
+while ((prop = object_property_iter_next())) {
+ObjectPropertyInfo *info;
+ObjectPropertyInfoList *entry;
+
+info = g_malloc0(sizeof(*info));
+info->name = g_strdup(prop->name);
+info->type = g_strdup(prop->type);
+info->has_description = !!prop->description;
+info->description = g_strdup(prop->description);
+
+entry = g_malloc0(sizeof(*entry));
+entry->value = info;
+entry->next = prop_list;
+prop_list = entry;
+}
+
+object_unref(obj);
+
+return prop_list;
+}
+
 CpuDefinitionInfoList *qmp_query_cpu_definitions(Error **errp)
 {
 return arch_query_cpu_definitions(errp);
diff --git a/qom/object.c b/qom/object.c
index 5dcee46..e7978bd 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -1037,6 +1037,13 @@ ObjectProperty 
*object_property_iter_next(ObjectPropertyIterator *iter)
 return val;
 }
 
+void object_class_property_iter_init(ObjectPropertyIterator *iter,
+ ObjectClass *klass)
+{
+g_hash_table_iter_init(>iter,

[Qemu-devel] [PATCH qemu v4 0/2] qmp: Add qom-list-properties to list QOM object properties

2018-03-02 Thread Alexey Kardashevskiy


This adds a new QMP command to list any class/object properties,
not just devices.

I do not know how/if we need to address the concern about
abstract/nonabstract classes though.

Changes:
v4:
* cleanup in 1/2



Please comment. Thanks.



Alexey Kardashevskiy (2):
  qmp: Merge ObjectPropertyInfo and DevicePropertyInfo
  qmp: Add qom-list-properties to list QOM object properties

 qapi-schema.json | 38 ---
 include/qom/object.h | 16 +
 qdev-monitor.c   |  6 ++---
 qmp.c| 63 ++--
 qom/object.c |  7 ++
 5 files changed, 102 insertions(+), 28 deletions(-)

-- 
2.11.0

Re: [Qemu-devel] [PATCH qemu v3 1/2] qmp: Merge ObjectPropertyInfo and DevicePropertyInfo

2018-03-02 Thread Alexey Kardashevskiy

On 03/03/18 00:37, Eric Blake wrote:
> On 03/01/2018 07:09 AM, Alexey Kardashevskiy wrote:
>> ObjectPropertyInfo is more generic and only missing @description.
>> This adds a description to ObjectPropertyInfo and removes
>> DevicePropertyInfo so the resulting ObjectPropertyInfo can be used
>> elsewhere.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>   qapi-schema.json | 23 +--
>>   qdev-monitor.c   |  6 +++---
>>   qmp.c    | 20 ++--
>>   3 files changed, 18 insertions(+), 31 deletions(-)
>>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 0262b9f..87327e5 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -1266,10 +1266,12 @@
>>   #    3) A link type in the form 'link' where subtype is a
>> qdev
>>   #   device type name.  Link properties form the device model
>> graph.
>>   #
>> +# @description: if specified, the description of the property.
> 
> Missing a '(since 2.12)' tag.
> 
>> +#
>>   # Since: 1.2
>>   ##
>>   { 'struct': 'ObjectPropertyInfo',
>> -  'data': { 'name': 'str', 'type': 'str' } }
>> +  'data': { 'name': 'str', 'type': 'str', '*description': 'str' } }
>>   +++ b/qmp.c
>> @@ -463,12 +463,12 @@ ObjectTypeInfoList *qmp_qom_list_types(bool
>> has_implements,
>>    *
>>    * The caller must free the return value.
>>    */
>> -static DevicePropertyInfo *make_device_property_info(ObjectClass *klass,
>> - const char *name,
>> - const char
>> *default_type,
>> - const char
>> *description)
>> +static ObjObjectPropertyInfoectPropertyInfo 
>> *make_device_property_info(ObjectClass *klass,
>> +  const char *name,
>> +  const char *default_type,
>> +  const char *description)
> 
> Why the indentation change?


Oh. Leftover from DevicePropertyInfo->(non-existng) OOMPropertyInfo. I'll
repost.


> 
>> @@ -508,14 +508,14 @@ static DevicePropertyInfo
>> *make_device_property_info(ObjectClass *klass,
>>   return info;
>>   }
>>   -DevicePropertyInfoList *qmp_device_list_properties(const char *typename,
>> -   Error **errp)
>> +ObjectPropertyInfoList *qmp_device_list_properties(const char *typename,
>> +    Error **errp)
> 
> and again
> 
> Otherwise looks okay
> 


-- 
Alexey

Re: [Qemu-devel] [Qemu-ppc] [PULL 00/24] ppc-for-2.12 queue 20180302

2018-03-02 Thread BALATON Zoltan


On Fri, 2 Mar 2018, BALATON Zoltan wrote:

On Fri, 2 Mar 2018, Peter Maydell wrote:

On 2 March 2018 at 06:03, David Gibson <da...@gibson.dropbear.id.au> wrote:
The following changes since commit 
0dc8ae5e8e693737dfe65ba02d0c6eccb58a9c67:


  Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20180301-v2' 
into staging (2018-03-01 17:08:16 +)


are available in the Git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-2.12-20180302

for you to fetch changes up to 57ae75b2e401f1d04f37a8cd26212eb3134c51a6:

  hw/ppc/spapr,e500: Use new property "stdout-path" for boot console 
(2018-03-02 12:24:44 +1100)



ppc patch queue 2018-03-02

Here's the next batch of accumulated spapr and ppc patches.
Highlights are:
* New Sam460ex machine type
* Yet more fixes related to vcpu id allocation for spapr
* Numerous macio cleanupsr
* Some enhancements to the Spectre/Meltdown fixes for pseries,
  allowing use of a better mitigation for indirect branch based
  exploits
* New pseries machine types with Spectre/Meltdown mitigations
  enabled (stop gap until libvirt and management understands the
  machine options)
* A handful of other fixes



Hi. This generates a compile error from some compilers in my test set
(I think just the older gccs):

/home/petmay01/linaro/qemu-for-merges/hw/ppc/ppc440_uc.c: In function
‘ppc460ex_pcie_realize’:
/home/petmay01/linaro/qemu-for-merges/hw/ppc/ppc440_uc.c:1054:5:
error: ‘id’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
snprintf(buf, sizeof(buf), "pcie%d-io", id);
^
cc1: all warnings being treated as errors

Looks like a valid complaint to me -- the realize function
should check that dcrn_base was set to a valid value, fail
realize if it wasn't, and have a 'default:' case in the
switch with g_assert_not_reached().


I've sent an updated patch (v3) that should fix this.


Wait, I've just realised this is not even in this pull request but already 
in master but it was not compiled before the machine that uses it was 
added now. So a v3 of the original patch is not appropriate. Instead, I've 
sent a fixup patch now that should be applied before this pull request to 
hopefully fix the problem. Sorry for the inconvenience this caused.


Thank you,
BALATON Zoltan

[Qemu-devel] [PATCH] ppc440_uc: Fix unintialized variable warning with older gcc

2018-03-02 Thread BALATON Zoltan

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/ppc440_uc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index 4e2523a..976ab2b 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -1050,6 +1050,9 @@ static void ppc460ex_pcie_realize(DeviceState *dev, Error 
**errp)
 case DCRN_PCIE1_BASE:
 id = 1;
 break;
+default:
+error_setg(errp, "invalid PCIe DCRN base");
+return;
 }
 snprintf(buf, sizeof(buf), "pcie%d-io", id);
 memory_region_init(>iomem, OBJECT(s), buf, UINT64_MAX);
-- 
2.7.6

[Qemu-devel] [Bug 1673976] Re: linux-user clone() can't handle glibc posix_spawn() (causes locale-gen to assert)

2018-03-02 Thread Éric Hoffman

Ok, yes you are right...

I have looked a bit more on the source code, and indeed, I think
understand the issue with the VFORK with QEMU.  Please correct me if I'm
wrong...

- In the syscall trap handler, it has to use the fork() function to emulate the 
vfork() due to restriction of the vfork() function (as QEMU must continue to 
control the flow of instruction past the call to vfork(), and do a lot more 
things in the child thread before ending up performing a execve() or _exit())
- Also, it can not do a wait() for the emulated child, as this child will 
continue to exist even after it calls execve(), so the parent would stall.
- Then, I taught about doing condition signalling, like waiting for a pthread 
condition signal that the child would send once it come to the point of 
performing the _exit() or execve(), but the child would, for example, need to 
know if execve() was successful, or otherwise the child would continue and set 
an error flag and then call _exit().  We do need that error flag before 
continuing the execution on the parent.  So we can not signal back to the 
parent that the 'emulated vfork' is OK before calling execve(), but we can not 
wait after execve() because if the call is successful, there is no return from 
that function, and code goes outside the control of QEMU.

So, I taught of an idea...  What if, in the TARGET_NR_clone syscall trap, when 
we are called upon a CLONE_VFORK, we do:
- Do a regular fork, as it's currently done, with CLONE_VM flag (so the child 
share the same memory as the parent).  However, we also set a state flag that 
we are in this 'vfork emulation' mode just before the fork (more on that 
bellow...).
- Let the parent wait for the child to terminate (again, more on that 
bellow...).
- Let the child return normally and continue execution, as if the parent was 
waiting.

Then, eventually the child will eventually either end up in the 
TARGET_NR_execve or __NR_exit_group syscall trap.  At which point:
- The child check if it is in 'vfork emulation' mode.  If not, then there's 
nothing special, just continue the way the code is currently written.  If the 
flag is set, then follow on with the steps bellow...
- The child set a flag that tell where it is (TARGET_NR_execve or 
__NR_exit_group, and the arguments passed to that syscall), and that everything 
is ok (it has not simply died meanwhile).
- The child terminate, which resume the parent's execution.

The parent then:
- Clear the 'vfork emulation' flag.
- Look at where the child left (was it performing TARGET_NR_execve or 
__NR_exit_group syscall?  What was the arguments passed to the syscall?).  This 
is pretty easy since the child was writing to the parent's memory space the 
whole time (CLONE_VM).  The parent could even use a flag allocated on it's 
stack before the fork(), since the child will have run with it's own stack 
during that time (so the parent stack is still intact).
- Now that we know what the child wanted to do (what syscall and which 
parameters), the parent (which at his point has no more 'leftover' child), can 
then do a *real* vfork, or otherwise return the proper error code.

It's a bit far fetched, and I'm far from implying that I know much about
QEMU, but this is an idea :-)  Sound like it's pretty straightforward
though.  Basically we just wait for the code between the _clone()
function and the _execve/_exit function to complete, at which point we
take action and we are in measure to assess the status code (and do the
real vfork).

Regards,
Eric

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1673976

Title:
  linux-user clone() can't handle glibc posix_spawn() (causes locale-gen
  to assert)

Status in QEMU:
  New

Bug description:
  I'm running a command (locale-gen) inside of an armv7h chroot mounted
  on my x86_64 desktop by putting qemu-arm-static into /usr/bin/ of the
  chroot file system and I get a core dump.

  locale-gen
  Generating locales...
    en_US.UTF-8...localedef: ../sysdeps/unix/sysv/linux/spawni.c:360: 
__spawnix: Assertion `ec >= 0' failed.
  qemu: uncaught target signal 6 (Aborted) - core dumped
  /usr/bin/locale-gen: line 41:34 Aborted (core dumped) 
localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale

  I've done this same thing successfully for years, but this breakage
  has appeared some time in the last 3 or so months. Possibly with the
  update to qemu version 2.8.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1673976/+subscriptions

Re: [Qemu-devel] [PULL 12/51] build-sys: compile with -Og or -O1 when --enable-debug

2018-03-02 Thread Alex Bennée


Peter Maydell  writes:

> On 16 January 2018 at 14:16, Paolo Bonzini  wrote:
>> From: Marc-André Lureau 
>>
>> When --enable-debug is turned on, configure doesn't set -O level, and
>> uses default compiler -O0 level, which is slow.
>>
>> Instead, use -Og if supported by the compiler (optimize debugging
>> experience), or -O1 (keeps code somewhat debuggable and works around
>> compiler bugs).
>
> This gives me a noticeably worse debug experience (using -Og),
> because gdb shows a lot more "" variables and
> function arguments. (I've been mildly irritated by this for
> the last few weeks and only just figured out why this was
> happening.)

I was wondering why my:

   ./configure --enable-debug --enable-debug-tcg --extra-cflags="-O0 -g3" 
--target-list=aarch64-linux-user

builds where showing that.

> Can we go back to the previous behaviour, please ? I don't
> care if the build is slow if I'm debugging, but I really do
> care that I don't have my variables and arguments all
> optimised away by the compiler so I can't tell what's going on.

+1

There is a lot of other stuff enabled when debugging which slows stuff
down anyway.

--
Alex Bennée

Re: [Qemu-devel] [PATCH 4/4] virtio-net: add linkspeed and duplex settings to virtio-net

2018-03-02 Thread Michael S. Tsirkin

On Fri, Mar 02, 2018 at 11:59:00AM -0500, Jason Baron wrote:
> On 03/02/2018 02:14 AM, Jason Wang wrote:
> > 
> > 
> > On 2018年03月02日 11:46, Jason Baron wrote:
> >> Although linkspeed and duplex can be set in a linux guest via 'ethtool
> >> -s',
> >> this requires custom ethtool commands for virtio-net by default.
> >>
> >> Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
> >> the hypervisor to export a linkspeed and duplex setting. The user can
> >> subsequently overwrite it later if desired via: 'ethtool -s'.
> >>
> >> Linkspeed and duplex settings can be set as:
> >> '-device virtio-net,speed=1,duplex=full'
> > 
> > I was thinking whether or not it's better to decide the duplex by the
> > type of backends.
> > 
> > E.g userspace and vhost-kernel implement a in fact half duplex. But dpdk
> > implement a full duplex.
> 
> Interesting - could this be derived only from the backend 'type'. IE:
> NET_CLIENT_DRIVER_TAP, NET_CLIENT_DRIVER_VHOST_USER...
> 
> 
> I was also thinking this could be specified as 'duplex=backend', in
> addition to the proposed 'duplex=full' or 'duplex=half'?
> 
> Thanks,
> 
> -Jason

I'd say it would make more sense to teach backends to obey what's
specified by the user. E.g. if vhost gets a duplex config,
create two threads.

But I think all that's for future, we can just fake it for
now - the current uses don't seem to particularly care about whether
virtio actually is or isn't a duplex.



> > 
> > Thanks
> > 
> >>
> >> where speed is [0...INT_MAX], and duplex is ["half"|"full"].
> >>
> >> Signed-off-by: Jason Baron
> >> Cc: "Michael S. Tsirkin"
> >> Cc: Jason Wang
> >> Cc:virtio-...@lists.oasis-open.org
> >> ---
> >

Re: [Qemu-devel] [PATCH 4/4] virtio-net: add linkspeed and duplex settings to virtio-net

2018-03-02 Thread Michael S. Tsirkin

On Fri, Mar 02, 2018 at 03:14:01PM +0800, Jason Wang wrote:
> 
> 
> On 2018年03月02日 11:46, Jason Baron wrote:
> > Although linkspeed and duplex can be set in a linux guest via 'ethtool -s',
> > this requires custom ethtool commands for virtio-net by default.
> > 
> > Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
> > the hypervisor to export a linkspeed and duplex setting. The user can
> > subsequently overwrite it later if desired via: 'ethtool -s'.
> > 
> > Linkspeed and duplex settings can be set as:
> > '-device virtio-net,speed=1,duplex=full'
> 
> I was thinking whether or not it's better to decide the duplex by the type
> of backends.
> 
> E.g userspace and vhost-kernel implement a in fact half duplex. But dpdk
> implement a full duplex.
> 
> Thanks

OTOH it's a priority for some people to be able to support migration
between different backend types. Breaking that won't be nice.

> > 
> > where speed is [0...INT_MAX], and duplex is ["half"|"full"].
> > 
> > Signed-off-by: Jason Baron
> > Cc: "Michael S. Tsirkin"
> > Cc: Jason Wang
> > Cc:virtio-...@lists.oasis-open.org
> > ---

Re: [Qemu-devel] [PATCH v2 0/2] ipmi: Fix vmstate transfer

2018-03-02 Thread Corey Minyard


On 03/02/2018 02:02 PM, Dr. David Alan Gilbert wrote:

* miny...@acm.org (miny...@acm.org) wrote:

I apologize for the resend, I left the list off the previous post.

This is unchanged since the previous post, two weeks ago.  I received
no comments, so I guess it's ok.  It's fairly broken now, so I would
like this fixed.

Sorry, I'll look at it on Monday; I was out last week and hadn't got
around to this set.


Thanks a bunch.  I have some doubt about how I handled the backwards
compatibility in the KCS code.  It works, but I'm not sure it's right.

-corey


Dave


Changes from v1:
   * Validate the data values in pre_load functions.
   * For KCS, instead of an old function, create a separate vmstate
 structure for the new version.  The name on the old vmstate
 structure wasn't specific enough, so a new name was needed,
 The old structure is set up to never be sent, but it can be
 received.

The following changes since commit 427cbc7e4136a061628cb4315cc8182ea36d772f:

   Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2018-03-01 18:46:41 +)

are available in the git repository at:

   https://github.com/cminyard/qemu.git tags/ipmi-vmstate-fixes

for you to fetch changes up to 90797371d9a3138657e7b1f7ab4425eb67d6fd0a:

   ipmi: Use proper struct reference for BT vmstate (2018-03-02 07:48:39 -0600)


Fix the IPMI vmstate code to work correctly in all cases.  Heavily
tested under load.


Corey Minyard (2):
   ipmi: Use proper struct reference for KCS vmstate
   ipmi: Use proper struct reference for BT vmstate

  hw/ipmi/isa_ipmi_bt.c  | 61 ++-
  hw/ipmi/isa_ipmi_kcs.c | 77 --
  2 files changed, 123 insertions(+), 15 deletions(-)


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 0/2] ipmi: Fix vmstate transfer

2018-03-02 Thread Dr. David Alan Gilbert

* miny...@acm.org (miny...@acm.org) wrote:
> I apologize for the resend, I left the list off the previous post.
> 
> This is unchanged since the previous post, two weeks ago.  I received
> no comments, so I guess it's ok.  It's fairly broken now, so I would
> like this fixed.

Sorry, I'll look at it on Monday; I was out last week and hadn't got
around to this set.

Dave

> Changes from v1:
>   * Validate the data values in pre_load functions.
>   * For KCS, instead of an old function, create a separate vmstate
> structure for the new version.  The name on the old vmstate
> structure wasn't specific enough, so a new name was needed,
> The old structure is set up to never be sent, but it can be
> received.
> 
> The following changes since commit 427cbc7e4136a061628cb4315cc8182ea36d772f:
> 
>   Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
> (2018-03-01 18:46:41 +)
> 
> are available in the git repository at:
> 
>   https://github.com/cminyard/qemu.git tags/ipmi-vmstate-fixes
> 
> for you to fetch changes up to 90797371d9a3138657e7b1f7ab4425eb67d6fd0a:
> 
>   ipmi: Use proper struct reference for BT vmstate (2018-03-02 07:48:39 -0600)
> 
> 
> Fix the IPMI vmstate code to work correctly in all cases.  Heavily
> tested under load.
> 
> 
> Corey Minyard (2):
>   ipmi: Use proper struct reference for KCS vmstate
>   ipmi: Use proper struct reference for BT vmstate
> 
>  hw/ipmi/isa_ipmi_bt.c  | 61 ++-
>  hw/ipmi/isa_ipmi_kcs.c | 77 
> --
>  2 files changed, 123 insertions(+), 15 deletions(-)
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PULL v4 28/30] qapi: Move qapi-schema.json to qapi/, rename generated files

2018-03-02 Thread Eric Blake

From: Markus Armbruster 

Move qapi-schema.json to qapi/, so it's next to its modules, and all
files get generated to qapi/, not just the ones generated for modules.

Consistently name the generated files qapi-MODULE.EXT:
qmp-commands.[ch] become qapi-commands.[ch], qapi-event.[ch] become
qapi-events.[ch], and qmp-introspect.[ch] become qapi-introspect.[ch].
This gets rid of the temporary hacks in scripts/qapi/commands.py,
scripts/qapi/events.py, and scripts/qapi/common.py.

Signed-off-by: Markus Armbruster 
Message-Id: <20180211093607.27351-28-arm...@redhat.com>
Reviewed-by: Eric Blake 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Michael Roth 
[eblake: Fix trailing dot in tpm.c, undo temporary hack for OSX toolchain]
Signed-off-by: Eric Blake 
---
 docs/devel/qapi-code-gen.txt  | 30 +++---
 docs/devel/writing-qmp-commands.txt   |  2 +-
 docs/interop/qmp-intro.txt|  2 +-
 Makefile  | 42 +++
 Makefile.objs | 21 
 qapi/misc.json|  4 +--
 qapi-schema.json => qapi/qapi-schema.json | 32 +++
 include/qapi/visitor.h|  2 +-
 scripts/qapi/commands.py  |  7 --
 scripts/qapi/common.py|  5 ++--
 scripts/qapi/events.py|  9 +--
 scripts/qapi/introspect.py|  4 +--
 scripts/qapi/types.py |  6 ++---
 scripts/qapi/visit.py |  6 ++---
 include/qapi/qmp/qobject.h|  2 +-
 include/qom/object.h  |  2 +-
 backends/hostmem.c|  2 +-
 hmp.c |  2 +-
 monitor.c |  6 ++---
 net/filter-buffer.c   |  2 +-
 qga/commands-posix.c  |  2 +-
 qga/commands-win32.c  |  2 +-
 qga/commands.c|  2 +-
 qga/main.c|  2 +-
 qom/object.c  |  2 +-
 tests/test-qmp-cmds.c |  2 +-
 tests/test-qmp-event.c|  2 +-
 tests/test-qobject-input-visitor.c|  6 ++---
 tpm.c |  3 +--
 ui/vnc.c  |  2 +-
 .gitignore| 16 ++--
 qga/Makefile.objs |  2 +-
 tests/.gitignore  |  6 ++---
 tests/Makefile.include| 14 +--
 ui/cocoa.m|  2 +-
 35 files changed, 119 insertions(+), 134 deletions(-)
 rename qapi-schema.json => qapi/qapi-schema.json (85%)

diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt
index c86792add2e..25b7180a189 100644
--- a/docs/devel/qapi-code-gen.txt
+++ b/docs/devel/qapi-code-gen.txt
@@ -647,7 +647,7 @@ name an event 'MAX', since the generator also produces a C 
enumeration
 of all event names with a generated _MAX value at the end.  When
 'data' is also specified, additional info will be included in the
 event, with similar semantics to a 'struct' expression.  Finally there
-will be C API generated in qapi-event.h; when called by QEMU code, a
+will be C API generated in qapi-events.h; when called by QEMU code, a
 message with timestamp will be emitted on the wire.

 An example event is:
@@ -1147,15 +1147,15 @@ declares qmp_COMMAND() that the user must implement.

 The following files are generated:

-$(prefix)qmp-commands.c: Command marshal/dispatch functions for each
- QMP command defined in the schema
+$(prefix)qapi-commands.c: Command marshal/dispatch functions for each
+  QMP command defined in the schema

-$(prefix)qmp-commands.h: Function prototypes for the QMP commands
- specified in the schema
+$(prefix)qapi-commands.h: Function prototypes for the QMP commands
+  specified in the schema

 Example:

-$ cat qapi-generated/example-qmp-commands.h
+$ cat qapi-generated/example-qapi-commands.h
 [Uninteresting stuff omitted...]

 #ifndef EXAMPLE_QMP_COMMANDS_H
@@ -1170,7 +1170,7 @@ Example:
 void qmp_marshal_my_command(QDict *args, QObject **ret, Error **errp);

 #endif
-$ cat qapi-generated/example-qmp-commands.c
+$ cat qapi-generated/example-qapi-commands.c
 [Uninteresting stuff omitted...]

 static void qmp_marshal_output_UserDefOne(UserDefOne *ret_in, QObject 
**ret_out, Error **errp)
@@ -1243,14 +1243,14 @@ qapi_event_send_EVENT().

 The following files are created:

-$(prefix)qapi-event.h - Function prototypes for each event type, plus an
+$(prefix)qapi-events.h - Function prototypes for each event type, plus

[Qemu-devel] [PULL v4 23/30] qapi: Generate separate .h, .c for each module

2018-03-02 Thread Eric Blake

From: Markus Armbruster 

Our qapi-schema.json is composed of modules connected by include
directives, but the generated code is monolithic all the same: one
qapi-types.h with all the types, one qapi-visit.h with all the
visitors, and so forth.  These monolithic headers get included all
over the place.  In my "build everything" tree, adding a QAPI type
recompiles about 4800 out of 5100 objects.

We wouldn't write such monolithic headers by hand.  It stands to
reason that we shouldn't generate them, either.

Split up generated qapi-types.h to mirror the schema's modular
structure: one header per module.  Name the main module's header
qapi-types.h, and sub-module D/B.json's header D/qapi-types-B.h.

Mirror the schema's includes in the headers, so that qapi-types.h gets
you everything exactly as before.  If you need less, you can include
one or more of the sub-module headers.  To be exploited shortly.

Split up qapi-types.c, qapi-visit.h, qapi-visit.c, qmp-commands.h,
qmp-commands.c, qapi-event.h, qapi-event.c the same way.
qmp-introspect.h, qmp-introspect.c and qapi.texi remain monolithic.

The split of qmp-commands.c duplicates static helper function
qmp_marshal_output_str() in qapi-commands-char.c and
qapi-commands-misc.c.  This happens when commands returning the same
type occur in multiple modules.  Not worth avoiding.

Since I'm going to rename qapi-event.[ch] to qapi-events.[ch], and
qmp-commands.[ch] to qapi-commands.[ch], name the shards that way
already, to reduce churn.  This requires temporary hacks in
commands.py and events.py.  Similarly, c_name() must temporarily
be taught to munge '/' in common.py.  They'll go away with the rename.

Signed-off-by: Markus Armbruster 
Message-Id: <20180211093607.27351-23-arm...@redhat.com>
Reviewed-by: Eric Blake 
[eblake: declare a dummy variable in each .c file, to shut up OSX
toolchain warnings about empty .o files, including hacking c_name()]
Signed-off-by: Eric Blake 
---
 Makefile | 120 +++
 Makefile.objs|  65 -
 scripts/qapi/commands.py |  35 +-
 scripts/qapi/common.py   |  33 +++--
 scripts/qapi/events.py   |  19 ++--
 .gitignore   |  60 
 6 files changed, 310 insertions(+), 22 deletions(-)

diff --git a/Makefile b/Makefile
index 494ae382794..b12fcd5d8ff 100644
--- a/Makefile
+++ b/Makefile
@@ -92,10 +92,70 @@ include $(SRC_PATH)/rules.mak
 GENERATED_FILES = qemu-version.h config-host.h qemu-options.def
 GENERATED_FILES += qapi-builtin-types.h qapi-builtin-types.c
 GENERATED_FILES += qapi-types.h qapi-types.c
+GENERATED_FILES += qapi/qapi-types-block-core.h qapi/qapi-types-block-core.c
+GENERATED_FILES += qapi/qapi-types-block.h qapi/qapi-types-block.c
+GENERATED_FILES += qapi/qapi-types-char.h qapi/qapi-types-char.c
+GENERATED_FILES += qapi/qapi-types-common.h qapi/qapi-types-common.c
+GENERATED_FILES += qapi/qapi-types-crypto.h qapi/qapi-types-crypto.c
+GENERATED_FILES += qapi/qapi-types-introspect.h qapi/qapi-types-introspect.c
+GENERATED_FILES += qapi/qapi-types-migration.h qapi/qapi-types-migration.c
+GENERATED_FILES += qapi/qapi-types-net.h qapi/qapi-types-net.c
+GENERATED_FILES += qapi/qapi-types-rocker.h qapi/qapi-types-rocker.c
+GENERATED_FILES += qapi/qapi-types-run-state.h qapi/qapi-types-run-state.c
+GENERATED_FILES += qapi/qapi-types-sockets.h qapi/qapi-types-sockets.c
+GENERATED_FILES += qapi/qapi-types-tpm.h qapi/qapi-types-tpm.c
+GENERATED_FILES += qapi/qapi-types-trace.h qapi/qapi-types-trace.c
+GENERATED_FILES += qapi/qapi-types-transaction.h qapi/qapi-types-transaction.c
+GENERATED_FILES += qapi/qapi-types-ui.h qapi/qapi-types-ui.c
 GENERATED_FILES += qapi-builtin-visit.h qapi-builtin-visit.c
 GENERATED_FILES += qapi-visit.h qapi-visit.c
+GENERATED_FILES += qapi/qapi-visit-block-core.h qapi/qapi-visit-block-core.c
+GENERATED_FILES += qapi/qapi-visit-block.h qapi/qapi-visit-block.c
+GENERATED_FILES += qapi/qapi-visit-char.h qapi/qapi-visit-char.c
+GENERATED_FILES += qapi/qapi-visit-common.h qapi/qapi-visit-common.c
+GENERATED_FILES += qapi/qapi-visit-crypto.h qapi/qapi-visit-crypto.c
+GENERATED_FILES += qapi/qapi-visit-introspect.h qapi/qapi-visit-introspect.c
+GENERATED_FILES += qapi/qapi-visit-migration.h qapi/qapi-visit-migration.c
+GENERATED_FILES += qapi/qapi-visit-net.h qapi/qapi-visit-net.c
+GENERATED_FILES += qapi/qapi-visit-rocker.h qapi/qapi-visit-rocker.c
+GENERATED_FILES += qapi/qapi-visit-run-state.h qapi/qapi-visit-run-state.c
+GENERATED_FILES += qapi/qapi-visit-sockets.h qapi/qapi-visit-sockets.c
+GENERATED_FILES += qapi/qapi-visit-tpm.h qapi/qapi-visit-tpm.c
+GENERATED_FILES += qapi/qapi-visit-trace.h qapi/qapi-visit-trace.c
+GENERATED_FILES += qapi/qapi-visit-transaction.h qapi/qapi-visit-transaction.c
+GENERATED_FILES += qapi/qapi-visit-ui.h qapi/qapi-visit-ui.c
 GENERATED_FILES +=

[Qemu-devel] [PULL v4 00/30] QAPI patches for 2018-03-01

2018-03-02 Thread Eric Blake

The following changes since commit 136c67e07869227b21b3f627316e03679ce7b738:

  Merge remote-tracking branch 
'remotes/bkoppelmann/tags/pull-tricore-2018-03-02' into staging (2018-03-02 
16:56:20 +)

are available in the Git repository at:

  git://repo.or.cz/qemu/ericb.git tags/pull-qapi-2018-03-01-v4

for you to fetch changes up to 418b1d0ae3a2cc992659f626a2a3f11944e0b259:

  qapi: Don't create useless directory qapi-generated (2018-03-02 13:48:26 
-0600)

v4: another attempt at silencing OSX warnings on empty .o [Peter]
(sending just the changed patches)


qapi patches for 2018-03-01

- Markus Armbruster: Modularize generated QAPI code


Eric Blake (1):
  watchdog: Consolidate QAPI into single file

Markus Armbruster (29):
  Include qapi/qmp/qerror.h exactly where needed
  qapi: Streamline boilerplate comment generation
  qapi: Generate up-to-date copyright notice
  qapi: Rename variable holding the QAPISchemaGenFOOVisitor
  qapi: New classes QAPIGenC, QAPIGenH, QAPIGenDoc
  qapi: Reduce use of global variables in generators some
  qapi: Turn generators into modules
  qapi-gen: New common driver for code and doc generators
  qapi-gen: Convert from getopt to argparse
  qapi: Touch generated files only when they change
  qapi: Improve include file name reporting in error messages
  qapi/common: Eliminate QAPISchema.exprs
  qapi: Lift error reporting from QAPISchema.__init__() to callers
  qapi: Concentrate QAPISchemaParser.exprs updates in .__init__()
  qapi: Record 'include' directives in parse tree
  qapi: Generate in source order
  qapi: Record 'include' directives in intermediate representation
  qapi: Rename generated qmp-marshal.c to qmp-commands.c
  qapi: Make code-generating visitors use QAPIGen more
  qapi/types qapi/visit: Generate built-in stuff into separate files
  qapi/common: Fix guardname() for funny filenames
  qapi: Generate separate .h, .c for each module
  Include less of the generated modular QAPI headers
  qapi: Empty out qapi-schema.json
  docs/devel/writing-qmp-commands: Update for modular QAPI
  docs: Correct outdated information on QAPI
  qapi: Move qapi-schema.json to qapi/, rename generated files
  Fix up dangling references to qmp-commands.* in comment and doc
  qapi: Don't create useless directory qapi-generated

 docs/devel/qapi-code-gen.txt   | 124 ---
 docs/devel/writing-qmp-commands.txt|  39 +--
 docs/interop/qmp-intro.txt |   3 +-
 docs/xen-save-devices-state.txt|   3 +-
 tests/qapi-schema/doc-good.texi|   3 +-
 configure  |   1 -
 Makefile   | 233 +
 Makefile.objs  |  80 -
 qapi-schema.json => qapi/misc.json | 105 +-
 qapi/qapi-schema.json  |  95 ++
 qapi/run-state.json|   9 +
 include/qapi/visitor.h |   2 +-
 scripts/qapi-gen.py|  57 
 scripts/qapi/__init__.py   |   0
 scripts/{qapi-commands.py => qapi/commands.py} | 155 -
 scripts/{qapi.py => qapi/common.py}| 362 +
 scripts/{qapi2texi.py => qapi/doc.py}  |  92 +++---
 scripts/{qapi-event.py => qapi/events.py}  | 128 +++-
 scripts/{qapi-introspect.py => qapi/introspect.py} | 123 +++
 scripts/{qapi-types.py => qapi/types.py}   | 185 ---
 scripts/{qapi-visit.py => qapi/visit.py}   | 189 ---
 crypto/cipherpriv.h|   2 +-
 include/block/block.h  |   2 +-
 include/block/dirty-bitmap.h   |   2 +-
 include/block/nbd.h|   2 +-
 include/chardev/char.h |   1 +
 include/crypto/cipher.h|   2 +-
 include/crypto/hash.h  |   2 +-
 include/crypto/hmac.h  |   2 +-
 include/crypto/secret.h|   1 +
 include/crypto/tlscreds.h  |   1 +
 include/hw/block/block.h   |   2 +-
 include/hw/block/fdc.h |   2 +-
 include/hw/ppc/spapr_drc.h |   1 +
 include/hw/qdev-properties.h   |   2 +
 include/io/dns-resolver.h  |   1 +
 include/migration/colo.h   |   2 +-
 include/migration/failover.h   |   2 +-
 include/migration/global_state.h   |   1 +

Re: [Qemu-devel] [PATCH 1/1] mm: gup: teach get_user_pages_unlocked to handle FOLL_NOWAIT

2018-03-02 Thread Andrew Morton

On Fri,  2 Mar 2018 18:43:43 +0100 Andrea Arcangeli  wrote:

> KVM is hanging during postcopy live migration with userfaultfd because
> get_user_pages_unlocked is not capable to handle FOLL_NOWAIT.
> 
> Earlier FOLL_NOWAIT was only ever passed to get_user_pages.
> 
> Specifically faultin_page (the callee of get_user_pages_unlocked
> caller) doesn't know that if FAULT_FLAG_RETRY_NOWAIT was set in the
> page fault flags, when VM_FAULT_RETRY is returned, the mmap_sem wasn't
> actually released (even if nonblocking is not NULL). So it sets
> *nonblocking to zero and the caller won't release the mmap_sem
> thinking it was already released, but it wasn't because of
> FOLL_NOWAIT.
> 
> Reported-by: Dr. David Alan Gilbert 
> Tested-by: Dr. David Alan Gilbert 
> Signed-off-by: Andrea Arcangeli 

I added

Fixes: ce53053ce378c ("kvm: switch get_user_page_nowait() to 
get_user_pages_unlocked()")
Cc:

[Qemu-devel] [PATCH v3 1/2] target/arm: Add a core count property

2018-03-02 Thread Alistair Francis

The cortex A53 TRM specifies that bits 24 and 25 of the L2CTLR register
specify the number of cores in the processor, not the total number of
cores in the sytem. To report this correctly on machines with multiple
CPU clusters (ARM's big.LITTLE or Xilinx's ZynqMP) we need to allow
the machine to overwrite this value. To do this let's add an optional
property.

Signed-off-by: Alistair Francis 
---
V3:
 - Fix Linux user compile failure
V2:
 - Fix commit message and title.
 - Move the core_count default setting logic to the arm_cpu_realize()
   function.

 target/arm/cpu.h   | 5 +
 target/arm/cpu.c   | 6 ++
 target/arm/cpu64.c | 6 --
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8dd6b788df..3fa8fdad21 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -745,6 +745,11 @@ struct ARMCPU {
 /* Uniprocessor system with MP extensions */
 bool mp_is_up;
 
+/* Specify the number of cores in this CPU cluster. Used for the L2CTLR
+ * register.
+ */
+int32_t core_count;
+
 /* The instance init functions for implementation-specific subclasses
  * set these fields to specify the implementation-dependent values of
  * various constant registers and reset values of non-constant
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 6b77aaa445..3e4e9f1873 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -939,6 +939,11 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 cs->num_ases = 1;
 }
 cpu_address_space_init(cs, ARMASIdx_NS, "cpu-memory", cs->memory);
+
+/* No core_count specified, default to smp_cpus. */
+if (cpu->core_count == -1) {
+cpu->core_count = smp_cpus;
+}
 #endif
 
 qemu_init_vcpu(cs);
@@ -1765,6 +1770,7 @@ static Property arm_cpu_properties[] = {
 DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
 mp_affinity, ARM64_AFFINITY_INVALID),
 DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
+DEFINE_PROP_INT32("core-count", ARMCPU, core_count, -1),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 4228713b19..dd9ba973f7 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -42,8 +42,10 @@ static inline void unset_feature(CPUARMState *env, int 
feature)
 #ifndef CONFIG_USER_ONLY
 static uint64_t a57_a53_l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
-/* Number of processors is in [25:24]; otherwise we RAZ */
-return (smp_cpus - 1) << 24;
+ARMCPU *cpu = arm_env_get_cpu(env);
+
+/* Number of cores is in [25:24]; otherwise we RAZ */
+return (cpu->core_count - 1) << 24;
 }
 #endif
 
-- 
2.14.1

Re: [Qemu-devel] [PATCH v2 5/5] s390x/cpumodel: Set up CPU model for AP device support

2018-03-02 Thread Tony Krowiak


On 03/01/2018 09:36 AM, David Hildenbrand wrote:

On 01.03.2018 15:12, Pierre Morel wrote:

On 28/02/2018 12:40, Cornelia Huck wrote:

On Wed, 28 Feb 2018 11:26:30 +0100
David Hildenbrand  wrote:


Then I request the following change in KVM:

If KVM_S390_VM_CPU_FEAT_AP is enabled in KVM, ECA.28 is _always_ set
(not just if an AP device is configured). This especially makes things a
lot easier when it comes to handling hotplugged CPUs and avoiding race
conditions when enabling these bits as mentioned in the KVM series.

KVM_S390_VM_CPU_FEAT_AP == AP instructions available for the guest
(don't throw an operation exception).

So this feature then really is guest ABI. The instructions are
available. If there is no device configured, bad luck.

Sounds sensible from my POV.


I have a concern with this proposition and with the original code:

Very good, this is exactly what I talked to Conny about yesterday.

Short version: CPU model is guest ABI, everything else is configuration.


1) ap=on is a guest ABI feature saying to the guest you can use AP
instructions

Indeed, that's what belongs into the CPU model.


2) How we provide AP instructions to the guest can be done in three
different ways:
   - SIE Interpretation
   - interception with VFIO
   - interception with emulation


Due to bad AP design we can't handle this like zPCI - completely in
QEMU. That's what should control it.


3) We implement this with a device in QEMU and a certain level kernel
support.

It seems possible to set or not ECA.28 , based on the type of kernel device:
- SIE interpretation -> MATRIX KVM device -> ECA.28
- Interception with VFIO and virtualization -> no ECA.28
- interception with emulation -> no ECA.28

I understand the concern with the vCPU but I think we can handle it with
an indirect variable
like:

SIE interpretation Device + KVM_S390_VM_CPU_FEAT_AP -> set the variable
ap_to_be_sie_interpreted=1
Then in vCPU initialization set ECA.28 based on this variable.

That's exactly why we have the cpu feature interface. E.g. CMMA -> if
not enabled, not interpreted by HW (however also not intercepted to user
space - no sense in doing that right now).

I'm not sure I am interpreting what you are saying here correctly, but
in the case of AP, if ECA.28 is not set, the AP instructions will not be
interpreted by HW but WILL be intercepted and forwarded to user space.



I think it let us more doors open, what is your opinion?

In general, for now we don't care. The kernel is the only AP bus provider.

If KVM presents AP support -> AP feature can be enabled. And it should
always enable it.

Once we have a QEMU AP bus implementation, things get more complicated.
We could provide the AP feature also without KVM (like zPCI).

But weather or not to enable the KVM control has to be concluded from
the other configuration. Only user space can now that and has to decide
before enabling AP in KVM.

So I think for now we are fine. Later, this might be tricky to check but
not impossible.

Maybe we are applying the wrong semantics to this feature. The
premise for this feature was to control the setting of ECA.28.
It grew beyond this premise because of observations related to
future considerations about emulation and full virtualization (i.e.,
the things Pierre mentioned above). Instead of this feature
indicating AP facilities are installed on the guest, it might
behoove us to return to its original intended purpose which was
to indicate that AP instructions executed by the guest are
interpreted by HW. In this case, we can resume setting it in
the vcpu setup like it was in the earlier patch series.



Regards,

Pierre

Re: [Qemu-devel] [PATCH QEMU v1 0/4] multiboot: bss_end_addr can be zero / cleanup

2018-03-02 Thread Jack Schwartz


Hi Kevin.

On 2018-01-15 07:54, Kevin Wolf wrote:

Am 21.12.2017 um 18:25 hat Jack Schwartz geschrieben:

Properly account for the possibility of multiboot kernels with a zero
bss_end_addr.  The Multiboot Specification, section 3.1.3 allows for
kernels without a bss section, by allowing a zeroed bss_end_addr multiboot
header field.

Do some cleanup to multiboot.c as well:
- Remove some unused variables.
- Use more intuitive header names when displaying fields in messages.
- Change fprintf(stderr...) to error_report

There are some conflicts with Anatol's (CCed) multiboot series:
https://lists.nongnu.org/archive/html/qemu-devel/2017-10/msg03003.html

None if these should be hard to resolve, but it would be good if you
could agree with each other whose patch series should come first, and
then the other one should be rebased on top of that.


Testing:
   1) Ran the "make check" test suite.
   2) Booted multiboot kernel with bss_end_addr=0.  (I rolled my own
  grub multiboot.elf test "kernel" by modifying source.)  Verified
  with gdb that new code that reads addresses/offsets from multiboot
  header was accessed.
   3) Booted multiboot kernel with non-zero bss_end_addr.
   4) Uncommented DEBUG_MULTIBOOT in multiboot.c and verified messages worked.
   5) Code has soaked in an internal repo for two months.

Can you integrate your test kernel from 2) in tests/multiboot/ so we can
keep this as a regression test?
If need be, would you be willing to accept updated versions of these 
patches (with another review, of course) without the test file?  I will 
deliver the test file later once I get company approvals.  I don't want 
the test file to continue holding everything up in the meantime.


Patchwork links, for reference:

1/4 multiboot: bss_end_addr can be zero
http://patchwork.ozlabs.org/patch/852049/

2/4 multiboot: Remove unused variables from multiboot.c
http://patchwork.ozlabs.org/patch/852045/

3/4 multiboot: Use header names when displaying fields
http://patchwork.ozlabs.org/patch/852046/

4/4 multiboot: fprintf(stderr...) -> error_report()
http://patchwork.ozlabs.org/patch/852051/


    Thanks,
    Jack



Jack Schwartz (4):
   multiboot: bss_end_addr can be zero
   multiboot: Remove unused variables from multiboot.c
   multiboot: Use header names when displaying fields
   multiboot: fprintf(stderr...) -> error_report()

Apart from the conflicts, the patches look good to me.

Kevin

[Qemu-devel] [PATCH v3 2/2] hw/arm: Set the core count for Xilinx's ZynqMP

2018-03-02 Thread Alistair Francis

Set the ARM CPU core count property for the A53's attached to the Xilnx
ZynqMP machine.

Signed-off-by: Alistair Francis 
Reviewed-by: Peter Maydell 
---

 hw/arm/xlnx-zynqmp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 69227fd4c9..465796e97c 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -282,6 +282,8 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
  s->virt, "has_el2", NULL);
 object_property_set_int(OBJECT(>apu_cpu[i]), GIC_BASE_ADDR,
 "reset-cbar", _abort);
+object_property_set_int(OBJECT(>apu_cpu[i]), num_apus,
+"core-count", _abort);
 object_property_set_bool(OBJECT(>apu_cpu[i]), true, "realized",
  );
 if (err) {
-- 
2.14.1

[Qemu-devel] [PATCH v3 0/2] Add a property to set the ARM CPU core count

2018-03-02 Thread Alistair Francis


Add an ARM CPU property which allows us to set the ARM CPU core count.
V3:
 - Fix Linux user mode compile failure

V2:
 - Fix commit message and title.
 - Move the core_count default setting logic to the arm_cpu_realize()
   function.


Alistair Francis (2):
  target/arm: Add a core count property
  hw/arm: Set the core count for Xilinx's ZynqMP

 target/arm/cpu.h | 5 +
 hw/arm/xlnx-zynqmp.c | 2 ++
 target/arm/cpu.c | 6 ++
 target/arm/cpu64.c   | 6 --
 4 files changed, 17 insertions(+), 2 deletions(-)

-- 
2.14.1

[Qemu-devel] [PULL 3/3] tcg: Add choose_vector_size

2018-03-02 Thread Richard Henderson

This unifies 5 copies of checks for supported vector size,
and in the process fixes a missing check in tcg_gen_gvec_2s.

This lead to an assertion failure for 64-bit vector multiply,
which is not available in the AVX instruction set.

Suggested-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.c | 438 --
 1 file changed, 259 insertions(+), 179 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index bfe44bba81..22db1590d5 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -351,6 +351,42 @@ static void gen_dup_i64(unsigned vece, TCGv_i64 out, 
TCGv_i64 in)
 }
 }
 
+/* Select a supported vector type for implementing an operation on SIZE
+ * bytes.  If OP is 0, assume that the real operation to be performed is
+ * required by all backends.  Otherwise, make sure than OP can be performed
+ * on elements of size VECE in the selected type.  Do not select V64 if
+ * PREFER_I64 is true.  Return 0 if no vector type is selected.
+ */
+static TCGType choose_vector_type(TCGOpcode op, unsigned vece, uint32_t size,
+  bool prefer_i64)
+{
+if (TCG_TARGET_HAS_v256 && check_size_impl(size, 32)) {
+if (op == 0) {
+return TCG_TYPE_V256;
+}
+/* Recall that ARM SVE allows vector sizes that are not a
+ * power of 2, but always a multiple of 16.  The intent is
+ * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+ * It is hard to imagine a case in which v256 is supported
+ * but v128 is not, but check anyway.
+ */
+if (tcg_can_emit_vec_op(op, TCG_TYPE_V256, vece)
+&& (size % 32 == 0
+|| tcg_can_emit_vec_op(op, TCG_TYPE_V128, vece))) {
+return TCG_TYPE_V256;
+}
+}
+if (TCG_TARGET_HAS_v128 && check_size_impl(size, 16)
+&& (op == 0 || tcg_can_emit_vec_op(op, TCG_TYPE_V128, vece))) {
+return TCG_TYPE_V128;
+}
+if (TCG_TARGET_HAS_v64 && !prefer_i64 && check_size_impl(size, 8)
+&& (op == 0 || tcg_can_emit_vec_op(op, TCG_TYPE_V64, vece))) {
+return TCG_TYPE_V64;
+}
+return 0;
+}
+
 /* Set OPRSZ bytes at DOFS to replications of IN_32, IN_64 or IN_C.
  * Only one of IN_32 or IN_64 may be set;
  * IN_C is used if IN_32 and IN_64 are unset.
@@ -376,19 +412,12 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t 
oprsz,
 }
 }
 
-type = 0;
-if (TCG_TARGET_HAS_v256 && check_size_impl(oprsz, 32)) {
-type = TCG_TYPE_V256;
-} else if (TCG_TARGET_HAS_v128 && check_size_impl(oprsz, 16)) {
-type = TCG_TYPE_V128;
-} else if (TCG_TARGET_HAS_v64 && check_size_impl(oprsz, 8)
-   /* Prefer integer when 64-bit host and no variable dup.  */
-   && !(TCG_TARGET_REG_BITS == 64 && in_32 == NULL
-&& (in_64 == NULL || vece == MO_64))) {
-type = TCG_TYPE_V64;
-}
-
-/* Implement inline with a vector type, if possible.  */
+/* Implement inline with a vector type, if possible.
+ * Prefer integer when 64-bit host and no variable dup.
+ */
+type = choose_vector_type(0, vece, oprsz,
+  (TCG_TARGET_REG_BITS == 64 && in_32 == NULL
+   && (in_64 == NULL || vece == MO_64)));
 if (type != 0) {
 TCGv_vec t_vec = tcg_temp_new_vec(type);
 
@@ -414,21 +443,30 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t 
oprsz,
 }
 
 i = 0;
-if (TCG_TARGET_HAS_v256) {
+switch (type) {
+case TCG_TYPE_V256:
+/* Recall that ARM SVE allows vector sizes that are not a
+ * power of 2, but always a multiple of 16.  The intent is
+ * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+ */
 for (; i + 32 <= oprsz; i += 32) {
 tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V256);
 }
-}
-if (TCG_TARGET_HAS_v128) {
+/* fallthru */
+case TCG_TYPE_V128:
 for (; i + 16 <= oprsz; i += 16) {
 tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V128);
 }
-}
-if (TCG_TARGET_HAS_v64) {
+break;
+case TCG_TYPE_V64:
 for (; i < oprsz; i += 8) {
 tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64);
 }
+break;
+default:
+g_assert_not_reached();
 }
+
 tcg_temp_free_vec(t_vec);
 goto done;
 }
@@ -484,7 +522,7 @@ static void do_dup(unsigned vece, uint32_t dofs, uint32_t 
oprsz,
 }
 tcg_temp_free_i64(t_64);
 goto done;
-} 
+}
 }
 
 /* Otherwise implement out of line.  */
@@ -866,49 +904,55 @@ static void expand_4_vec(unsigned

Re: [Qemu-devel] [PATCH] sparc: fix leon3 casa instruction when MMU is disabled

2018-03-02 Thread Richard Henderson

On 03/02/2018 01:59 AM, KONRAD Frederic wrote:
> From: KONRAD Frederic 
> 
> Since the commit af7a06bac7d3abb2da48ef3277d2a415772d2ae8:
> `casa [..](10), .., ..` (and probably others alternate space instructions)
> triggers a data access exception when the MMU is disabled.
> 
> When we enter get_asi(...) dc->mem_idx is set to MMU_PHYS_IDX when the MMU
> is disabled. Just keep mem_idx unchanged in this case so we passthrough the
> MMU when it is disabled.
> 
> Signed-off-by: KONRAD Frederic 
> ---
> 
> Notes:
> Changes RFC -> V1:
>  * emit the instruction with MMU_PHYS_IDX instead of checking that the MMU
>is enabled in get_physical_address(..)
> 
>  target/sparc/translate.c | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Richard Henderson 

r~

[Qemu-devel] [PULL 35/37] block/ssh: Make ssh_grow_file() blocking

2018-03-02 Thread Kevin Wolf

From: Max Reitz 

At runtime (that is, during a future ssh_truncate()), the SSH session is
non-blocking.  However, ssh_truncate() (or rather, bdrv_truncate() in
general) is not a coroutine, so this resize operation needs to block.

For ssh_create(), that is fine, too; the session is never set to
non-blocking anyway.

Signed-off-by: Max Reitz 
Message-id: 20180214204915.7980-3-mre...@redhat.com
Reviewed-by: Eric Blake 
Reviewed-by: Richard W.M. Jones 
Signed-off-by: Max Reitz 
---
 block/ssh.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index d6a68cb880..4bcf10334f 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -803,17 +803,24 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 return ret;
 }
 
+/* Note: This is a blocking operation */
 static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
 {
 ssize_t ret;
 char c[1] = { '\0' };
+int was_blocking = libssh2_session_get_blocking(s->session);
 
 /* offset must be strictly greater than the current size so we do
  * not overwrite anything */
 assert(offset > 0 && offset > s->attrs.filesize);
 
+libssh2_session_set_blocking(s->session, 1);
+
 libssh2_sftp_seek64(s->sftp_handle, offset - 1);
 ret = libssh2_sftp_write(s->sftp_handle, c, 1);
+
+libssh2_session_set_blocking(s->session, was_blocking);
+
 if (ret < 0) {
 sftp_error_setg(errp, s, "Failed to grow file");
 return -EIO;
-- 
2.13.6

[Qemu-devel] [PULL 37/37] qcow2: Replace align_offset() with ROUND_UP()

2018-03-02 Thread Kevin Wolf

From: Alberto Garcia 

The align_offset() function is equivalent to the ROUND_UP() macro so
there's no need to use the former. The ROUND_UP() name is also a bit
more explicit.

This patch uses ROUND_UP() instead of the slower QEMU_ALIGN_UP()
because align_offset() already requires that the second parameter is a
power of two.

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20180215131008.5153-1-be...@igalia.com
Signed-off-by: Max Reitz 
---
 block/qcow2.h  |  6 --
 block/qcow2-bitmap.c   |  4 ++--
 block/qcow2-cluster.c  |  4 ++--
 block/qcow2-refcount.c |  4 ++--
 block/qcow2-snapshot.c | 10 +-
 block/qcow2.c  | 14 +++---
 6 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 883802241f..1a84cc77b0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -480,12 +480,6 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State 
*s, int64_t offset)
 return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
-static inline int64_t align_offset(int64_t offset, int n)
-{
-offset = (offset + n - 1) & ~(n - 1);
-return offset;
-}
-
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
 return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 4f6fd863ea..5127276f90 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -413,8 +413,8 @@ static inline void 
bitmap_dir_entry_to_be(Qcow2BitmapDirEntry *entry)
 
 static inline int calc_dir_entry_size(size_t name_size, size_t extra_data_size)
 {
-return align_offset(sizeof(Qcow2BitmapDirEntry) +
-name_size + extra_data_size, 8);
+int size = sizeof(Qcow2BitmapDirEntry) + name_size + extra_data_size;
+return ROUND_UP(size, 8);
 }
 
 static inline int dir_entry_size(Qcow2BitmapDirEntry *entry)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e406b0f3b9..98908c4264 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -126,11 +126,11 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t 
min_size,
 
 new_l1_size2 = sizeof(uint64_t) * new_l1_size;
 new_l1_table = qemu_try_blockalign(bs->file->bs,
-   align_offset(new_l1_size2, 512));
+   ROUND_UP(new_l1_size2, 512));
 if (new_l1_table == NULL) {
 return -ENOMEM;
 }
-memset(new_l1_table, 0, align_offset(new_l1_size2, 512));
+memset(new_l1_table, 0, ROUND_UP(new_l1_size2, 512));
 
 if (s->l1_size) {
 memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index d46b69d7f3..126cca3276 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1204,7 +1204,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
  * l1_table_offset when it is the current s->l1_table_offset! Be careful
  * when changing this! */
 if (l1_table_offset != s->l1_table_offset) {
-l1_table = g_try_malloc0(align_offset(l1_size2, 512));
+l1_table = g_try_malloc0(ROUND_UP(l1_size2, 512));
 if (l1_size2 && l1_table == NULL) {
 ret = -ENOMEM;
 goto fail;
@@ -2553,7 +2553,7 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, 
int ign, int64_t offset,
 }
 
 /* align range to test to cluster boundaries */
-size = align_offset(offset_into_cluster(s, offset) + size, 
s->cluster_size);
+size = ROUND_UP(offset_into_cluster(s, offset) + size, s->cluster_size);
 offset = start_of_cluster(s, offset);
 
 if ((chk & QCOW2_OL_ACTIVE_L1) && s->l1_size) {
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 44243e0e95..cee25f582b 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -66,7 +66,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 
 for(i = 0; i < s->nb_snapshots; i++) {
 /* Read statically sized part of the snapshot header */
-offset = align_offset(offset, 8);
+offset = ROUND_UP(offset, 8);
 ret = bdrv_pread(bs->file, offset, , sizeof(h));
 if (ret < 0) {
 goto fail;
@@ -155,7 +155,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 offset = 0;
 for(i = 0; i < s->nb_snapshots; i++) {
 sn = s->snapshots + i;
-offset = align_offset(offset, 8);
+offset = ROUND_UP(offset, 8);
 offset += sizeof(h);
 offset += sizeof(extra);
 offset += strlen(sn->id_str);
@@ -215,7 +215,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 assert(id_str_size <= UINT16_MAX && name_size <= UINT16_MAX);
 h.id_str_size = cpu_to_be16(id_str_size);
 h.name_size = cpu_to_be16(name_size);
-offset =

[Qemu-devel] [PULL 2/3] tcg/i386: Support INDEX_op_dup2_vec for -m32

2018-03-02 Thread Richard Henderson

Unknown why -m32 was passing with gcc but not clang; it should have
failed for both.  This would be used for tcg_gen_dup_i64_vec, and
visible with the right TB and an aarch64 guest.

Reported-by: Max Reitz 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index fc05909d1d..d7e59e79c5 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2696,6 +2696,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_x86_packus_vec:
 insn = packus_insn[vece];
 goto gen_simd;
+#if TCG_TARGET_REG_BITS == 32
+case INDEX_op_dup2_vec:
+/* Constraints have already placed both 32-bit inputs in xmm regs.  */
+insn = OPC_PUNPCKLDQ;
+goto gen_simd;
+#endif
 gen_simd:
 tcg_debug_assert(insn != OPC_UD2);
 if (type == TCG_TYPE_V256) {
@@ -3045,6 +3051,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_x86_vperm2i128_vec:
 case INDEX_op_x86_punpckl_vec:
 case INDEX_op_x86_punpckh_vec:
+#if TCG_TARGET_REG_BITS == 32
+case INDEX_op_dup2_vec:
+#endif
 return _x_x;
 case INDEX_op_dup_vec:
 case INDEX_op_shli_vec:
-- 
2.14.3

[Qemu-devel] [PULL 36/37] block/ssh: Add basic .bdrv_truncate()

2018-03-02 Thread Kevin Wolf

From: Max Reitz 

libssh2 does not seem to offer real truncation support, so we can only
grow files -- but that is better than nothing.

Signed-off-by: Max Reitz 
Message-id: 20180214204915.7980-4-mre...@redhat.com
Reviewed-by: Eric Blake 
Reviewed-by: Richard W.M. Jones 
Signed-off-by: Max Reitz 
---
 block/ssh.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index 4bcf10334f..80a8b40dfa 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -1220,6 +1220,29 @@ static int64_t ssh_getlength(BlockDriverState *bs)
 return length;
 }
 
+static int ssh_truncate(BlockDriverState *bs, int64_t offset,
+PreallocMode prealloc, Error **errp)
+{
+BDRVSSHState *s = bs->opaque;
+
+if (prealloc != PREALLOC_MODE_OFF) {
+error_setg(errp, "Unsupported preallocation mode '%s'",
+   PreallocMode_str(prealloc));
+return -ENOTSUP;
+}
+
+if (offset < s->attrs.filesize) {
+error_setg(errp, "ssh driver does not support shrinking files");
+return -ENOTSUP;
+}
+
+if (offset == s->attrs.filesize) {
+return 0;
+}
+
+return ssh_grow_file(s, offset, errp);
+}
+
 static BlockDriver bdrv_ssh = {
 .format_name  = "ssh",
 .protocol_name= "ssh",
@@ -1232,6 +1255,7 @@ static BlockDriver bdrv_ssh = {
 .bdrv_co_readv= ssh_co_readv,
 .bdrv_co_writev   = ssh_co_writev,
 .bdrv_getlength   = ssh_getlength,
+.bdrv_truncate= ssh_truncate,
 .bdrv_co_flush_to_disk= ssh_co_flush,
 .create_opts  = _create_opts,
 };
-- 
2.13.6

[Qemu-devel] [PULL 33/37] qemu-img: Make resize error message more general

2018-03-02 Thread Kevin Wolf

From: Max Reitz 

The issue:

  $ qemu-img resize -f qcow2 foo.qcow2
  qemu-img: Expecting one image file name
  Try 'qemu-img --help' for more information

So we gave an image file name, but we omitted the length.  qemu-img
thinks the last argument is always the size and removes it immediately
from argv (by decrementing argc), and tries to verify that it is a valid
size only at a later point.

So we do not actually know whether that last argument we called "size"
is indeed a size or whether the user instead forgot to specify that size
but did give a file name.

Therefore, the error message should be more general.

Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1523458
Signed-off-by: Max Reitz 
Message-id: 20180205162745.23650-1-mre...@redhat.com
Reviewed-by: John Snow 
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 qemu-img.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 56edc15218..aa99fd32e9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3469,7 +3469,7 @@ static int img_resize(int argc, char **argv)
 }
 }
 if (optind != argc - 1) {
-error_exit("Expecting one image file name");
+error_exit("Expecting image file name and size");
 }
 filename = argv[optind++];
 
-- 
2.13.6

[Qemu-devel] [PULL 1/3] tcg: Improve tcg_gen_muli_i32/i64

2018-03-02 Thread Richard Henderson

Convert multiplication by power of two to left shift.

Reviewed-by: Emilio G. Cota 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 3467787323..34b96d68f3 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -277,9 +277,15 @@ void tcg_gen_setcondi_i32(TCGCond cond, TCGv_i32 ret,
 
 void tcg_gen_muli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
-TCGv_i32 t0 = tcg_const_i32(arg2);
-tcg_gen_mul_i32(ret, arg1, t0);
-tcg_temp_free_i32(t0);
+if (arg2 == 0) {
+tcg_gen_movi_i32(ret, 0);
+} else if (is_power_of_2(arg2)) {
+tcg_gen_shli_i32(ret, arg1, ctz32(arg2));
+} else {
+TCGv_i32 t0 = tcg_const_i32(arg2);
+tcg_gen_mul_i32(ret, arg1, t0);
+tcg_temp_free_i32(t0);
+}
 }
 
 void tcg_gen_div_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
@@ -1430,9 +1436,15 @@ void tcg_gen_setcondi_i64(TCGCond cond, TCGv_i64 ret,
 
 void tcg_gen_muli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-TCGv_i64 t0 = tcg_const_i64(arg2);
-tcg_gen_mul_i64(ret, arg1, t0);
-tcg_temp_free_i64(t0);
+if (arg2 == 0) {
+tcg_gen_movi_i64(ret, 0);
+} else if (is_power_of_2(arg2)) {
+tcg_gen_shli_i64(ret, arg1, ctz64(arg2));
+} else {
+TCGv_i64 t0 = tcg_const_i64(arg2);
+tcg_gen_mul_i64(ret, arg1, t0);
+tcg_temp_free_i64(t0);
+}
 }
 
 void tcg_gen_div_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
-- 
2.14.3

[Qemu-devel] [PULL 34/37] block/ssh: Pull ssh_grow_file() from ssh_create()

2018-03-02 Thread Kevin Wolf

From: Max Reitz 

If we ever want to offer even rudimentary truncation functionality for
ssh, we should put the respective code into a reusable function.

Signed-off-by: Max Reitz 
Message-id: 20180214204915.7980-2-mre...@redhat.com
Reviewed-by: Eric Blake 
Reviewed-by: Richard W.M. Jones 
Signed-off-by: Max Reitz 
---
 block/ssh.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index 36d5d888d5..d6a68cb880 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -803,6 +803,26 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 return ret;
 }
 
+static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
+{
+ssize_t ret;
+char c[1] = { '\0' };
+
+/* offset must be strictly greater than the current size so we do
+ * not overwrite anything */
+assert(offset > 0 && offset > s->attrs.filesize);
+
+libssh2_sftp_seek64(s->sftp_handle, offset - 1);
+ret = libssh2_sftp_write(s->sftp_handle, c, 1);
+if (ret < 0) {
+sftp_error_setg(errp, s, "Failed to grow file");
+return -EIO;
+}
+
+s->attrs.filesize = offset;
+return 0;
+}
+
 static QemuOptsList ssh_create_opts = {
 .name = "ssh-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(ssh_create_opts.head),
@@ -823,8 +843,6 @@ static int coroutine_fn ssh_co_create_opts(const char 
*filename, QemuOpts *opts,
 int64_t total_size = 0;
 QDict *uri_options = NULL;
 BDRVSSHState s;
-ssize_t r2;
-char c[1] = { '\0' };
 
 ssh_state_init();
 
@@ -850,14 +868,10 @@ static int coroutine_fn ssh_co_create_opts(const char 
*filename, QemuOpts *opts,
 }
 
 if (total_size > 0) {
-libssh2_sftp_seek64(s.sftp_handle, total_size-1);
-r2 = libssh2_sftp_write(s.sftp_handle, c, 1);
-if (r2 < 0) {
-sftp_error_setg(errp, , "truncate failed");
-ret = -EINVAL;
+ret = ssh_grow_file(, total_size, errp);
+if (ret < 0) {
 goto out;
 }
-s.attrs.filesize = total_size;
 }
 
 ret = 0;
-- 
2.13.6

Re: [Qemu-devel] [PATCH v2 0/2] Add a property to set the ARM CPU core count

2018-03-02 Thread Alistair Francis

On Fri, Mar 2, 2018 at 10:51 AM,   wrote:
> Hi,
>
> This series failed build test on ppcle host. Please find the details below.
>
...
>   CC  aarch64-softmmu/target/arm/helper-a64.o
>   LINKsparc64-softmmu/qemu-system-sparc64
>   CC  arm-softmmu/trace/control-target.o
>   CC  aarch64-softmmu/target/arm/gdbstub64.o
>   CC  aarch64-softmmu/target/arm/crypto_helper.o
>   CC  aarch64-softmmu/target/arm/arm-powerctl.o
>   LINKarmeb-linux-user/qemu-armeb
>   CC  arm-softmmu/gdbstub-xml.o
>   GEN trace/generated-helpers.c
>   CC  arm-softmmu/trace/generated-helpers.o
>   CC  aarch64-softmmu/trace/control-target.o
>   CC  aarch64-softmmu/gdbstub-xml.o
>   CC  aarch64-softmmu/trace/generated-helpers.o
>   LINKarm-linux-user/qemu-arm
> target/arm/cpu.o:(.toc+0x0): undefined reference to `smp_cpus'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-aarch64] Error 1
> make: *** [subdir-aarch64-linux-user] Error 2
> make: *** Waiting for unfinished jobs
> target/arm/cpu.o:(.toc+0x0): undefined reference to `smp_cpus'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-armeb] Error 1
> make: *** [subdir-armeb-linux-user] Error 2
>   LINKmipsel-linux-user/qemu-mipsel
> target/arm/cpu.o:(.toc+0x0): undefined reference to `smp_cpus'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-arm] Error 1
> make: *** [subdir-arm-linux-user] Error 2
>   LINKaarch64_be-linux-user/qemu-aarch64_be
> target/arm/cpu.o:(.toc+0x0): undefined reference to `smp_cpus'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-aarch64_be] Error 1
> make: *** [subdir-aarch64_be-linux-user] Error 2
>   LINKs390x-softmmu/qemu-system-s390x
>   LINKmipsn32el-linux-user/qemu-mipsn32el
>   LINKi386-softmmu/qemu-system-i386
>   LINKppc64-linux-user/qemu-ppc64
>   LINKx86_64-softmmu/qemu-system-x86_64
>   LINKppc64le-linux-user/qemu-ppc64le
>   LINKppc-linux-user/qemu-ppc
>   LINKmipsel-softmmu/qemu-system-mipsel
>   LINKppc64abi32-linux-user/qemu-ppc64abi32
>   LINKarm-softmmu/qemu-system-arm
>   LINKmips-softmmu/qemu-system-mips
>   LINKmips64el-softmmu/qemu-system-mips64el
>   LINKaarch64-softmmu/qemu-system-aarch64
>   LINKmips64-softmmu/qemu-system-mips64
>   LINKppcemb-softmmu/qemu-system-ppcemb
>   LINKppc-softmmu/qemu-system-ppc
>   LINKppc64-softmmu/qemu-system-ppc64
> === OUTPUT END ===

Argh! I rushed this through without waiting on the CI. I'll re-send V3.

Alistair

>
> Test command exited with code: 2
>
>
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-de...@freelists.org

[Qemu-devel] [PULL 29/37] block: test blk_aio_flush() with blk->root == NULL

2018-03-02 Thread Kevin Wolf

This patch adds test cases for the scenario where blk_aio_flush() is
called on a BlockBackend with no root.  Calling drain afterwards should
complete the requests with -ENOMEDIUM.

Signed-off-by: Kevin Wolf 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 tests/test-block-backend.c | 82 ++
 tests/Makefile.include |  2 ++
 2 files changed, 84 insertions(+)
 create mode 100644 tests/test-block-backend.c

diff --git a/tests/test-block-backend.c b/tests/test-block-backend.c
new file mode 100644
index 00..fd59f02bd0
--- /dev/null
+++ b/tests/test-block-backend.c
@@ -0,0 +1,82 @@
+/*
+ * BlockBackend tests
+ *
+ * Copyright (c) 2017 Kevin Wolf 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/block.h"
+#include "sysemu/block-backend.h"
+#include "qapi/error.h"
+
+static void test_drain_aio_error_flush_cb(void *opaque, int ret)
+{
+bool *completed = opaque;
+
+g_assert(ret == -ENOMEDIUM);
+*completed = true;
+}
+
+static void test_drain_aio_error(void)
+{
+BlockBackend *blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+BlockAIOCB *acb;
+bool completed = false;
+
+acb = blk_aio_flush(blk, test_drain_aio_error_flush_cb, );
+g_assert(acb != NULL);
+g_assert(completed == false);
+
+blk_drain(blk);
+g_assert(completed == true);
+
+blk_unref(blk);
+}
+
+static void test_drain_all_aio_error(void)
+{
+BlockBackend *blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+BlockAIOCB *acb;
+bool completed = false;
+
+acb = blk_aio_flush(blk, test_drain_aio_error_flush_cb, );
+g_assert(acb != NULL);
+g_assert(completed == false);
+
+blk_drain_all();
+g_assert(completed == true);
+
+blk_unref(blk);
+}
+
+int main(int argc, char **argv)
+{
+bdrv_init();
+qemu_init_main_loop(_abort);
+
+g_test_init(, , NULL);
+
+g_test_add_func("/block-backend/drain_aio_error", test_drain_aio_error);
+g_test_add_func("/block-backend/drain_all_aio_error",
+test_drain_all_aio_error);
+
+return g_test_run();
+}
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 937cbd874a..b5aab848b3 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -83,6 +83,7 @@ gcov-files-test-hbitmap-y = blockjob.c
 check-unit-y += tests/test-bdrv-drain$(EXESUF)
 check-unit-y += tests/test-blockjob$(EXESUF)
 check-unit-y += tests/test-blockjob-txn$(EXESUF)
+check-unit-y += tests/test-block-backend$(EXESUF)
 check-unit-y += tests/test-x86-cpuid$(EXESUF)
 # all code tested by test-x86-cpuid is inside topology.h
 gcov-files-test-x86-cpuid-y =
@@ -613,6 +614,7 @@ tests/test-throttle$(EXESUF): tests/test-throttle.o 
$(test-block-obj-y)
 tests/test-bdrv-drain$(EXESUF): tests/test-bdrv-drain.o $(test-block-obj-y) 
$(test-util-obj-y)
 tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) 
$(test-util-obj-y)
 tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o 
$(test-block-obj-y) $(test-util-obj-y)
+tests/test-block-backend$(EXESUF): tests/test-block-backend.o 
$(test-block-obj-y) $(test-util-obj-y)
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) 
$(test-crypto-obj-y)
-- 
2.13.6

[Qemu-devel] [PULL 26/37] aio: rename aio_context_in_iothread() to in_aio_context_home_thread()

2018-03-02 Thread Kevin Wolf

From: Stefan Hajnoczi 

The name aio_context_in_iothread() is misleading because it also returns
true when called on the main AioContext from the main loop thread, which
is not an IOThread.

This patch renames it to in_aio_context_home_thread() and expands the
doc comment to make the semantics clearer.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 include/block/aio.h   | 7 +--
 include/block/block.h | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index e9aeeaec94..a1d6b9e249 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -534,11 +534,14 @@ void aio_co_enter(AioContext *ctx, struct Coroutine *co);
 AioContext *qemu_get_current_aio_context(void);
 
 /**
+ * in_aio_context_home_thread:
  * @ctx: the aio context
  *
- * Return whether we are running in the I/O thread that manages @ctx.
+ * Return whether we are running in the thread that normally runs @ctx.  Note
+ * that acquiring/releasing ctx does not affect the outcome, each AioContext
+ * still only has one home thread that is responsible for running it.
  */
-static inline bool aio_context_in_iothread(AioContext *ctx)
+static inline bool in_aio_context_home_thread(AioContext *ctx)
 {
 return ctx == qemu_get_current_aio_context();
 }
diff --git a/include/block/block.h b/include/block/block.h
index 947e8876cd..bc41ed253b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -372,7 +372,7 @@ void bdrv_drain_all(void);
 bool busy_ = true; \
 BlockDriverState *bs_ = (bs);  \
 AioContext *ctx_ = bdrv_get_aio_context(bs_);  \
-if (aio_context_in_iothread(ctx_)) {   \
+if (in_aio_context_home_thread(ctx_)) {\
 while ((cond) || busy_) {  \
 busy_ = aio_poll(ctx_, (cond));\
 waited_ |= !!(cond) | busy_;   \
-- 
2.13.6

[Qemu-devel] [PULL 31/37] block: rename .bdrv_create() to .bdrv_co_create_opts()

2018-03-02 Thread Kevin Wolf

From: Stefan Hajnoczi 

BlockDriver->bdrv_create() has been called from coroutine context since
commit 5b7e1542cfa41a281af9629d31cef03704d976e6 ("block: make
bdrv_create adopt coroutine").

Make this explicit by renaming to .bdrv_co_create_opts() and add the
coroutine_fn annotation.  This makes it obvious to block driver authors
that they may yield, use CoMutex, or other coroutine_fn APIs.
bdrv_co_create is reserved for the QAPI-based version that Kevin is
working on.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <20170705102231.20711-2-stefa...@redhat.com>
Signed-off-by: Paolo Bonzini 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 include/block/block_int.h |  3 ++-
 block.c   |  4 ++--
 block/crypto.c|  8 
 block/file-posix.c| 15 ---
 block/file-win32.c|  5 +++--
 block/gluster.c   | 13 +++--
 block/iscsi.c |  7 ---
 block/nfs.c   |  5 +++--
 block/parallels.c |  6 --
 block/qcow.c  |  5 +++--
 block/qcow2.c |  5 +++--
 block/qed.c   |  6 --
 block/raw-format.c|  5 +++--
 block/rbd.c   |  6 --
 block/sheepdog.c  | 10 +-
 block/ssh.c   |  5 +++--
 block/vdi.c   |  5 +++--
 block/vhdx.c  |  5 +++--
 block/vmdk.c  |  5 +++--
 block/vpc.c   |  5 +++--
 20 files changed, 74 insertions(+), 54 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index aef10296b0..64a5700f2b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -129,7 +129,8 @@ struct BlockDriver {
 int (*bdrv_file_open)(BlockDriverState *bs, QDict *options, int flags,
   Error **errp);
 void (*bdrv_close)(BlockDriverState *bs);
-int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp);
+int coroutine_fn (*bdrv_co_create_opts)(const char *filename, QemuOpts 
*opts,
+   Error **errp);
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
 void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
diff --git a/block.c b/block.c
index a83037c2a5..86dd809041 100644
--- a/block.c
+++ b/block.c
@@ -420,7 +420,7 @@ static void coroutine_fn bdrv_create_co_entry(void *opaque)
 CreateCo *cco = opaque;
 assert(cco->drv);
 
-ret = cco->drv->bdrv_create(cco->filename, cco->opts, _err);
+ret = cco->drv->bdrv_co_create_opts(cco->filename, cco->opts, _err);
 error_propagate(>err, local_err);
 cco->ret = ret;
 }
@@ -439,7 +439,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 .err = NULL,
 };
 
-if (!drv->bdrv_create) {
+if (!drv->bdrv_co_create_opts) {
 error_setg(errp, "Driver '%s' does not support image creation", 
drv->format_name);
 ret = -ENOTSUP;
 goto out;
diff --git a/block/crypto.c b/block/crypto.c
index 3df66947c5..2ea116e6db 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -556,9 +556,9 @@ static int block_crypto_open_luks(BlockDriverState *bs,
  bs, options, flags, errp);
 }
 
-static int block_crypto_create_luks(const char *filename,
-QemuOpts *opts,
-Error **errp)
+static int coroutine_fn block_crypto_co_create_opts_luks(const char *filename,
+ QemuOpts *opts,
+ Error **errp)
 {
 return block_crypto_create_generic(Q_CRYPTO_BLOCK_FORMAT_LUKS,
filename, opts, errp);
@@ -617,7 +617,7 @@ BlockDriver bdrv_crypto_luks = {
 .bdrv_open  = block_crypto_open_luks,
 .bdrv_close = block_crypto_close,
 .bdrv_child_perm= bdrv_format_default_perms,
-.bdrv_create= block_crypto_create_luks,
+.bdrv_co_create_opts = block_crypto_co_create_opts_luks,
 .bdrv_truncate  = block_crypto_truncate,
 .create_opts= _crypto_create_opts_luks,
 
diff --git a/block/file-posix.c b/block/file-posix.c
index f1591c3849..7f2cc63c60 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1982,7 +1982,8 @@ static int64_t 
raw_get_allocated_file_size(BlockDriverState *bs)
 return (int64_t)st.st_blocks * 512;
 }
 
-static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn raw_co_create_opts(const char *filename, QemuOpts 
*opts,
+   Error **errp)
 {
 int fd;
 int result = 0;
@@ -2276,7 +2277,7 @@ BlockDriver bdrv_file = {
 .bdrv_reopen_commit = raw_reopen_commit,
 .bdrv_reopen_abort = raw_reopen_abort,
 .bdrv_close = raw_close,
-.bdrv_create = raw_create,
+

[Qemu-devel] [PULL 30/37] Revert "IDE: Do not flush empty CDROM drives"

2018-03-02 Thread Kevin Wolf

From: Stefan Hajnoczi 

This reverts commit 4da97120d51a4383aa96d741a2b837f8c4bbcd0b.

blk_aio_flush() now handles the blk->root == NULL case, so we no longer
need this workaround.

Cc: John Snow 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 hw/ide/core.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 257b429381..139c843514 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1087,15 +1087,7 @@ static void ide_flush_cache(IDEState *s)
 s->status |= BUSY_STAT;
 ide_set_retry(s);
 block_acct_start(blk_get_stats(s->blk), >acct, 0, BLOCK_ACCT_FLUSH);
-
-if (blk_bs(s->blk)) {
-s->pio_aiocb = blk_aio_flush(s->blk, ide_flush_cb, s);
-} else {
-/* XXX blk_aio_flush() crashes when blk_bs(blk) is NULL, remove this
- * temporary workaround when blk_aio_*() functions handle NULL blk_bs.
- */
-ide_flush_cb(s, 0);
-}
+s->pio_aiocb = blk_aio_flush(s->blk, ide_flush_cb, s);
 }
 
 static void ide_cfata_metadata_inquiry(IDEState *s)
-- 
2.13.6

[Qemu-devel] [PULL 25/37] docs: document how to use the l2-cache-entry-size parameter

2018-03-02 Thread Kevin Wolf

From: Alberto Garcia 

This patch updates docs/qcow2-cache.txt explaining how to use the new
l2-cache-entry-size parameter.

Here's a more detailed technical description of this feature:

   https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html

And here are some performance numbers:

   https://lists.gnu.org/archive/html/qemu-block/2017-12/msg00507.html

Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 docs/qcow2-cache.txt | 46 +++---
 1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index b0571de4b8..170191a242 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -1,6 +1,6 @@
 qcow2 L2/refcount cache configuration
 =
-Copyright (C) 2015 Igalia, S.L.
+Copyright (C) 2015, 2018 Igalia, S.L.
 Author: Alberto Garcia 
 
 This work is licensed under the terms of the GNU GPL, version 2 or
@@ -118,8 +118,8 @@ There are three options available, and all of them take 
bytes:
 
 There are two things that need to be taken into account:
 
- - Both caches must have a size that is a multiple of the cluster
-   size.
+ - Both caches must have a size that is a multiple of the cluster size
+   (or the cache entry size: see "Using smaller cache sizes" below).
 
  - If you only set one of the options above, QEMU will automatically
adjust the others so that the L2 cache is 4 times bigger than the
@@ -143,6 +143,46 @@ much less often than the L2 cache, so it's perfectly 
reasonable to
 keep it small.
 
 
+Using smaller cache entries
+---
+The qcow2 L2 cache stores complete tables by default. This means that
+if QEMU needs an entry from an L2 table then the whole table is read
+from disk and is kept in the cache. If the cache is full then a
+complete table needs to be evicted first.
+
+This can be inefficient with large cluster sizes since it results in
+more disk I/O and wastes more cache memory.
+
+Since QEMU 2.12 you can change the size of the L2 cache entry and make
+it smaller than the cluster size. This can be configured using the
+"l2-cache-entry-size" parameter:
+
+   -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096
+
+Some things to take into account:
+
+ - The L2 cache entry size has the same restrictions as the cluster
+   size (power of two, at least 512 bytes).
+
+ - Smaller entry sizes generally improve the cache efficiency and make
+   disk I/O faster. This is particularly true with solid state drives
+   so it's a good idea to reduce the entry size in those cases. With
+   rotating hard drives the situation is a bit more complicated so you
+   should test it first and stay with the default size if unsure.
+
+ - Try different entry sizes to see which one gives faster performance
+   in your case. The block size of the host filesystem is generally a
+   good default (usually 4096 bytes in the case of ext4).
+
+ - Only the L2 cache can be configured this way. The refcount cache
+   always uses the cluster size as the entry size.
+
+ - If the L2 cache is big enough to hold all of the image's L2 tables
+   (as explained in the "Choosing the right cache sizes" section
+   earlier in this document) then none of this is necessary and you
+   can omit the "l2-cache-entry-size" parameter altogether.
+
+
 Reducing the memory usage
 -
 It is possible to clean unused cache entries in order to reduce the
-- 
2.13.6

[Qemu-devel] [PULL 28/37] block: add BlockBackend->in_flight counter

2018-03-02 Thread Kevin Wolf

From: Stefan Hajnoczi 

BlockBackend currently relies on BlockDriverState->in_flight to track
requests for blk_drain().  There is a corner case where
BlockDriverState->in_flight cannot be used though: blk->root can be NULL
when there is no medium.  This results in a segfault when the NULL
pointer is dereferenced.

Introduce a BlockBackend->in_flight counter for aio requests so it works
even when blk->root == NULL.

Based on a patch by Kevin Wolf .

Signed-off-by: Kevin Wolf 
Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 block.c   |  2 +-
 block/block-backend.c | 60 +--
 2 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 9e4da81213..a83037c2a5 100644
--- a/block.c
+++ b/block.c
@@ -4713,7 +4713,7 @@ out:
 
 AioContext *bdrv_get_aio_context(BlockDriverState *bs)
 {
-return bs->aio_context;
+return bs ? bs->aio_context : qemu_get_aio_context();
 }
 
 AioWait *bdrv_get_aio_wait(BlockDriverState *bs)
diff --git a/block/block-backend.c b/block/block-backend.c
index 0266ac990b..a775a3dd2f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -73,6 +73,14 @@ struct BlockBackend {
 int quiesce_counter;
 VMChangeStateEntry *vmsh;
 bool force_allow_inactivate;
+
+/* Number of in-flight aio requests.  BlockDriverState also counts
+ * in-flight requests but aio requests can exist even when blk->root is
+ * NULL, so we cannot rely on its counter for that case.
+ * Accessed with atomic ops.
+ */
+unsigned int in_flight;
+AioWait wait;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -1225,11 +1233,22 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags 
flags)
 return bdrv_make_zero(blk->root, flags);
 }
 
+static void blk_inc_in_flight(BlockBackend *blk)
+{
+atomic_inc(>in_flight);
+}
+
+static void blk_dec_in_flight(BlockBackend *blk)
+{
+atomic_dec(>in_flight);
+aio_wait_kick(>wait);
+}
+
 static void error_callback_bh(void *opaque)
 {
 struct BlockBackendAIOCB *acb = opaque;
 
-bdrv_dec_in_flight(acb->common.bs);
+blk_dec_in_flight(acb->blk);
 acb->common.cb(acb->common.opaque, acb->ret);
 qemu_aio_unref(acb);
 }
@@ -1240,7 +1259,7 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 {
 struct BlockBackendAIOCB *acb;
 
-bdrv_inc_in_flight(blk_bs(blk));
+blk_inc_in_flight(blk);
 acb = blk_aio_get(_backend_aiocb_info, blk, cb, opaque);
 acb->blk = blk;
 acb->ret = ret;
@@ -1263,7 +1282,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
 if (acb->has_returned) {
-bdrv_dec_in_flight(acb->common.bs);
+blk_dec_in_flight(acb->rwco.blk);
 acb->common.cb(acb->common.opaque, acb->rwco.ret);
 qemu_aio_unref(acb);
 }
@@ -1284,7 +1303,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset, int bytes,
 BlkAioEmAIOCB *acb;
 Coroutine *co;
 
-bdrv_inc_in_flight(blk_bs(blk));
+blk_inc_in_flight(blk);
 acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
 acb->rwco = (BlkRwCo) {
 .blk= blk,
@@ -1521,14 +1540,41 @@ int blk_flush(BlockBackend *blk)
 
 void blk_drain(BlockBackend *blk)
 {
-if (blk_bs(blk)) {
-bdrv_drain(blk_bs(blk));
+BlockDriverState *bs = blk_bs(blk);
+
+if (bs) {
+bdrv_drained_begin(bs);
+}
+
+/* We may have -ENOMEDIUM completions in flight */
+AIO_WAIT_WHILE(>wait,
+blk_get_aio_context(blk),
+atomic_mb_read(>in_flight) > 0);
+
+if (bs) {
+bdrv_drained_end(bs);
 }
 }
 
 void blk_drain_all(void)
 {
-bdrv_drain_all();
+BlockBackend *blk = NULL;
+
+bdrv_drain_all_begin();
+
+while ((blk = blk_all_next(blk)) != NULL) {
+AioContext *ctx = blk_get_aio_context(blk);
+
+aio_context_acquire(ctx);
+
+/* We may have -ENOMEDIUM completions in flight */
+AIO_WAIT_WHILE(>wait, ctx,
+atomic_mb_read(>in_flight) > 0);
+
+aio_context_release(ctx);
+}
+
+bdrv_drain_all_end();
 }
 
 void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,
-- 
2.13.6

[Qemu-devel] [PULL 0/3] tcg queued patches

2018-03-02 Thread Richard Henderson

Only three outstanding patches for now.


r~


Richard Henderson (3):
  tcg: Improve tcg_gen_muli_i32/i64
  tcg/i386: Support INDEX_op_dup2_vec for -m32
  tcg: Add choose_vector_size

 tcg/i386/tcg-target.inc.c |   9 +
 tcg/tcg-op-gvec.c | 438 +++---
 tcg/tcg-op.c  |  24 ++-
 3 files changed, 286 insertions(+), 185 deletions(-)

-- 
2.14.3

[Qemu-devel] [PULL 27/37] block: extract AIO_WAIT_WHILE() from BlockDriverState

2018-03-02 Thread Kevin Wolf

From: Stefan Hajnoczi 

BlockDriverState has the BDRV_POLL_WHILE() macro to wait on event loop
activity while a condition evaluates to true.  This is used to implement
synchronous operations where it acts as a condvar between the IOThread
running the operation and the main loop waiting for the operation.  It
can also be called from the thread that owns the AioContext and in that
case it's just a nested event loop.

BlockBackend needs this behavior but doesn't always have a
BlockDriverState it can use.  This patch extracts BDRV_POLL_WHILE() into
the AioWait abstraction, which can be used with AioContext and isn't
tied to BlockDriverState anymore.

This feature could be built directly into AioContext but then all users
would kick the event loop even if they signal different conditions.
Imagine an AioContext with many BlockDriverStates, each time a request
completes any waiter would wake up and re-check their condition.  It's
nicer to keep a separate AioWait object for each condition instead.

Please see "block/aio-wait.h" for details on the API.

The name AIO_WAIT_WHILE() avoids the confusion between AIO_POLL_WHILE()
and AioContext polling.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 include/block/aio-wait.h  | 116 ++
 include/block/block.h |  40 +++-
 include/block/block_int.h |   7 ++-
 block.c   |   5 ++
 block/io.c|  10 +---
 util/aio-wait.c   |  40 
 util/Makefile.objs|   2 +-
 7 files changed, 174 insertions(+), 46 deletions(-)
 create mode 100644 include/block/aio-wait.h
 create mode 100644 util/aio-wait.c

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
new file mode 100644
index 00..a48c744fa8
--- /dev/null
+++ b/include/block/aio-wait.h
@@ -0,0 +1,116 @@
+/*
+ * AioContext wait support
+ *
+ * Copyright (C) 2018 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_AIO_WAIT_H
+#define QEMU_AIO_WAIT_H
+
+#include "block/aio.h"
+
+/**
+ * AioWait:
+ *
+ * An object that facilitates synchronous waiting on a condition.  The main
+ * loop can wait on an operation running in an IOThread as follows:
+ *
+ *   AioWait *wait = ...;
+ *   AioContext *ctx = ...;
+ *   MyWork work = { .done = false };
+ *   schedule_my_work_in_iothread(ctx, );
+ *   AIO_WAIT_WHILE(wait, ctx, !work.done);
+ *
+ * The IOThread must call aio_wait_kick() to notify the main loop when
+ * work.done changes:
+ *
+ *   static void do_work(...)
+ *   {
+ *   ...
+ *   work.done = true;
+ *   aio_wait_kick(wait);
+ *   }
+ */
+typedef struct {
+/* Is the main loop waiting for a kick?  Accessed with atomic ops. */
+bool need_kick;
+} AioWait;
+
+/**
+ * AIO_WAIT_WHILE:
+ * @wait: the aio wait object
+ * @ctx: the aio context
+ * @cond: wait while this conditional expression is true
+ *
+ * Wait while a condition is true.  Use this to implement synchronous
+ * operations that require event loop activity.
+ *
+ * The caller must be sure that something calls aio_wait_kick() when the value
+ * of @cond might have changed.
+ *
+ * The caller's thread must be the IOThread that owns @ctx or the main loop
+ * thread (with @ctx acquired exactly once).  This function cannot be used to
+ * wait on conditions between two IOThreads since that could lead to deadlock,
+ * go via the main loop instead.
+ */
+#define AIO_WAIT_WHILE(wait, ctx, cond) ({  \
+bool waited_ = false;   \
+bool busy_ = true;  \
+AioWait *wait_ = (wait);\
+AioContext *ctx_ = (ctx);   \
+if (in_aio_context_home_thread(ctx_)) { \
+while ((cond) ||

[Qemu-devel] [PULL 21/37] block: Drop unused .bdrv_co_get_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Now that all drivers have been updated to provide the
byte-based .bdrv_co_block_status(), we can delete the sector-based
interface.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 include/block/block_int.h |  3 ---
 block/io.c| 50 ++-
 2 files changed, 10 insertions(+), 43 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index bf2598856c..5ae7738cf8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -215,9 +215,6 @@ struct BlockDriver {
  * as well as non-NULL pnum, map, and file; in turn, the driver
  * must return an error or set pnum to an aligned non-zero value.
  */
-int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum,
-BlockDriverState **file);
 int coroutine_fn (*bdrv_co_block_status)(BlockDriverState *bs,
 bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
 int64_t *map, BlockDriverState **file);
diff --git a/block/io.c b/block/io.c
index 5bae79f282..4c3dba0973 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1963,7 +1963,7 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 
 /* Must be non-NULL or bdrv_getlength() would have failed */
 assert(bs->drv);
-if (!bs->drv->bdrv_co_get_block_status && !bs->drv->bdrv_co_block_status) {
+if (!bs->drv->bdrv_co_block_status) {
 *pnum = bytes;
 ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
 if (offset + bytes == total_size) {
@@ -1981,53 +1981,23 @@ static int coroutine_fn 
bdrv_co_block_status(BlockDriverState *bs,
 
 /* Round out to request_alignment boundaries */
 align = bs->bl.request_alignment;
-if (bs->drv->bdrv_co_get_block_status && align < BDRV_SECTOR_SIZE) {
-align = BDRV_SECTOR_SIZE;
-}
 aligned_offset = QEMU_ALIGN_DOWN(offset, align);
 aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
 
-if (bs->drv->bdrv_co_get_block_status) {
-int count; /* sectors */
-int64_t longret;
-
-assert(QEMU_IS_ALIGNED(aligned_offset | aligned_bytes,
-   BDRV_SECTOR_SIZE));
-/*
- * The contract allows us to return pnum smaller than bytes, even
- * if the next query would see the same status; we truncate the
- * request to avoid overflowing the driver's 32-bit interface.
- */
-longret = bs->drv->bdrv_co_get_block_status(
-bs, aligned_offset >> BDRV_SECTOR_BITS,
-MIN(INT_MAX, aligned_bytes) >> BDRV_SECTOR_BITS, ,
-_file);
-if (longret < 0) {
-assert(INT_MIN <= longret);
-ret = longret;
-goto out;
-}
-if (longret & BDRV_BLOCK_OFFSET_VALID) {
-local_map = longret & BDRV_BLOCK_OFFSET_MASK;
-}
-ret = longret & ~BDRV_BLOCK_OFFSET_MASK;
-*pnum = count * BDRV_SECTOR_SIZE;
-} else {
-ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
-aligned_bytes, pnum, _map,
-_file);
-if (ret < 0) {
-*pnum = 0;
-goto out;
-}
-assert(*pnum); /* The block driver must make progress */
+ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+aligned_bytes, pnum, _map,
+_file);
+if (ret < 0) {
+*pnum = 0;
+goto out;
 }
 
 /*
- * The driver's result must be a multiple of request_alignment.
+ * The driver's result must be a non-zero multiple of request_alignment.
  * Clamp pnum and adjust map to original request.
  */
-assert(QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset);
+assert(*pnum && QEMU_IS_ALIGNED(*pnum, align) &&
+   align > offset - aligned_offset);
 *pnum -= offset - aligned_offset;
 if (*pnum > bytes) {
 *pnum = bytes;
-- 
2.13.6

[Qemu-devel] [PULL 20/37] vvfat: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vvfat driver accordingly.  Note that we
can rely on the block driver having already clamped limits to our
block size, and simplify accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/vvfat.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 7e06ebacf6..4a17a49e12 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3088,15 +3088,13 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return ret;
 }
 
-static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file)
+static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
+  bool want_zero, int64_t offset,
+  int64_t bytes, int64_t *n,
+  int64_t *map,
+  BlockDriverState **file)
 {
-*n = bs->total_sectors - sector_num;
-if (*n > nb_sectors) {
-*n = nb_sectors;
-} else if (*n < 0) {
-return 0;
-}
+*n = bytes;
 return BDRV_BLOCK_DATA;
 }
 
@@ -3257,7 +3255,7 @@ static BlockDriver bdrv_vvfat = {
 
 .bdrv_co_preadv = vvfat_co_preadv,
 .bdrv_co_pwritev= vvfat_co_pwritev,
-.bdrv_co_get_block_status = vvfat_co_get_block_status,
+.bdrv_co_block_status   = vvfat_co_block_status,
 };
 
 static void bdrv_vvfat_init(void)
-- 
2.13.6

[Qemu-devel] [PULL 24/37] specs/qcow2: Fix documentation of the compressed cluster descriptor

2018-03-02 Thread Kevin Wolf

From: Alberto Garcia 

This patch fixes several mistakes in the documentation of the
compressed cluster descriptor:

1) the documentation claims that the cluster descriptor contains the
   number of sectors used to store the compressed data, but what it
   actually contains is the number of sectors *minus one* or, in other
   words, the number of additional sectors after the first one.

2) the width of the fields is incorrectly specified. The number of bits
   used by each field is

  x = 62 - (cluster_bits - 8)   for the offset field
  y = (cluster_bits - 8)for the size field

   So the offset field's location is [0, x-1], not [0, x] as stated.

3) the size field does not contain the size of the compressed data,
   but rather the number of sectors where that data is stored. The
   compressed data starts at the exact point specified in the offset
   field and ends when there's enough data to produce a cluster of
   decompressed data. Both points can be in the middle of a sector,
   allowing several compressed clusters to be stored next to one
   another, sharing sectors if necessary.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Alberto Garcia 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 docs/interop/qcow2.txt | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index d7fdb1fee3..feb711fb6a 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -426,10 +426,20 @@ Standard Cluster Descriptor:
 
 Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
 
-Bit  0 -  x:Host cluster offset. This is usually _not_ aligned to a
-cluster boundary!
+Bit  0 - x-1:   Host cluster offset. This is usually _not_ aligned to a
+cluster or sector boundary!
 
-   x+1 - 61:Compressed size of the images in sectors of 512 bytes
+ x - 61:Number of additional 512-byte sectors used for the
+compressed data, beyond the sector containing the offset
+in the previous field. Some of these sectors may reside
+in the next contiguous host cluster.
+
+Note that the compressed data does not necessarily occupy
+all of the bytes in the final sector; rather, decompression
+stops when it has produced a cluster of data.
+
+Another compressed cluster may map to the tail of the final
+sector used by this compressed cluster.
 
 If a cluster is unallocated, read requests shall read the data from the backing
 file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
-- 
2.13.6

[Qemu-devel] [PULL 16/37] vdi: Avoid bitrot of debugging code

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

Rework the debug define so that we always get -Wformat checking,
even when debugging is disabled.

Signed-off-by: Eric Blake 
Reviewed-by: Stefan Weil 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/vdi.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index fc1c614cb1..32b1763cde 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -87,12 +87,18 @@
 #define DEFAULT_CLUSTER_SIZE (1 * MiB)
 
 #if defined(CONFIG_VDI_DEBUG)
-#define logout(fmt, ...) \
-fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
+#define VDI_DEBUG 1
 #else
-#define logout(fmt, ...) ((void)0)
+#define VDI_DEBUG 0
 #endif
 
+#define logout(fmt, ...) \
+do {\
+if (VDI_DEBUG) {\
+fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__); \
+}   \
+} while (0)
+
 /* Image signature. */
 #define VDI_SIGNATURE 0xbeda107f
 
-- 
2.13.6

[Qemu-devel] [PULL 19/37] vpc: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vpc driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/vpc.c | 45 +++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/block/vpc.c b/block/vpc.c
index cfa5144e86..fba4492fd7 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -706,53 +706,54 @@ fail:
 return ret;
 }
 
-static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vpc_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
 {
 BDRVVPCState *s = bs->opaque;
 VHDFooter *footer = (VHDFooter*) s->footer_buf;
-int64_t start, offset;
+int64_t image_offset;
 bool allocated;
-int64_t ret;
-int n;
+int ret;
+int64_t n;
 
 if (be32_to_cpu(footer->type) == VHD_FIXED) {
-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs->file->bs;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }
 
 qemu_co_mutex_lock(>lock);
 
-offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false, NULL);
-start = offset;
-allocated = (offset != -1);
+image_offset = get_image_offset(bs, offset, false, NULL);
+allocated = (image_offset != -1);
 *pnum = 0;
 ret = 0;
 
 do {
 /* All sectors in a block are contiguous (without using the bitmap) */
-n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE)
-  - sector_num;
-n = MIN(n, nb_sectors);
+n = ROUND_UP(offset + 1, s->block_size) - offset;
+n = MIN(n, bytes);
 
 *pnum += n;
-sector_num += n;
-nb_sectors -= n;
+offset += n;
+bytes -= n;
 /* *pnum can't be greater than one block for allocated
  * sectors since there is always a bitmap in between. */
 if (allocated) {
 *file = bs->file->bs;
-ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+*map = image_offset;
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 break;
 }
-if (nb_sectors == 0) {
+if (bytes == 0) {
 break;
 }
-offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false,
-  NULL);
-} while (offset == -1);
+image_offset = get_image_offset(bs, offset, false, NULL);
+} while (image_offset == -1);
 
 qemu_co_mutex_unlock(>lock);
 return ret;
@@ -1098,7 +1099,7 @@ static BlockDriver bdrv_vpc = {
 
 .bdrv_co_preadv = vpc_co_preadv,
 .bdrv_co_pwritev= vpc_co_pwritev,
-.bdrv_co_get_block_status   = vpc_co_get_block_status,
+.bdrv_co_block_status   = vpc_co_block_status,
 
 .bdrv_get_info  = vpc_get_info,
 
-- 
2.13.6

[Qemu-devel] [PULL 23/37] iotest 033: add misaligned write-zeroes test via truncate

2018-03-02 Thread Kevin Wolf

From: Anton Nefedov 

This new test case only makes sense for qcow2 while iotest 033 is generic;
however it matches the test purpose perfectly and also 033 contains those
do_test() tricks to pass the alignment, which won't look nice being
duplicated in other tests or moved to the common code.

Signed-off-by: Anton Nefedov 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/033 | 29 +
 tests/qemu-iotests/033.out | 13 +
 2 files changed, 42 insertions(+)

diff --git a/tests/qemu-iotests/033 b/tests/qemu-iotests/033
index 2cdfd1397a..a1d8357331 100755
--- a/tests/qemu-iotests/033
+++ b/tests/qemu-iotests/033
@@ -64,6 +64,9 @@ do_test()
} | $QEMU_IO $IO_EXTRA_ARGS
 }
 
+echo
+echo "=== Test aligned and misaligned write zeroes operations ==="
+
 for write_zero_cmd in "write -z" "aio_write -z"; do
 for align in 512 4k; do
echo
@@ -102,7 +105,33 @@ for align in 512 4k; do
 done
 done
 
+
+# Trigger truncate that would shrink qcow2 L1 table, which is done by
+#   clearing one entry (8 bytes) with bdrv_co_pwrite_zeroes()
+
+echo
+echo "=== Test misaligned write zeroes via truncate ==="
+echo
+
+# any size will do, but the smaller the size the smaller the required image
+CLUSTER_SIZE=$((4 * 1024))
+L2_COVERAGE=$(($CLUSTER_SIZE * $CLUSTER_SIZE / 8))
+_make_test_img $(($L2_COVERAGE * 2))
+
+do_test 512 "write -P 1 0 0x200" "$TEST_IMG" | _filter_qemu_io
+# next L2 table
+do_test 512 "write -P 1 $L2_COVERAGE 0x200" "$TEST_IMG" | _filter_qemu_io
+
+# only interested in qcow2 here; also other formats might respond with
+#  "not supported" error message
+if [ $IMGFMT = "qcow2" ]; then
+do_test 512 "truncate $L2_COVERAGE" "$TEST_IMG" | _filter_qemu_io
+fi
+
+do_test 512 "read -P 1 0 0x200" "$TEST_IMG" | _filter_qemu_io
+
 # success, all done
+echo
 echo "*** done"
 rm -f $seq.full
 status=0
diff --git a/tests/qemu-iotests/033.out b/tests/qemu-iotests/033.out
index 95929eff70..9683f6b290 100644
--- a/tests/qemu-iotests/033.out
+++ b/tests/qemu-iotests/033.out
@@ -1,6 +1,8 @@
 QA output created by 033
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 
+=== Test aligned and misaligned write zeroes operations ===
+
 == preparing image ==
 wrote 1024/1024 bytes at offset 512
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
@@ -164,4 +166,15 @@ read 512/512 bytes at offset 512
 read 3072/3072 bytes at offset 1024
 3 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+
+=== Test misaligned write zeroes via truncate ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 2097152
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 *** done
-- 
2.13.6

[Qemu-devel] [PULL 32/37] qcow2: make qcow2_co_create2() a coroutine_fn

2018-03-02 Thread Kevin Wolf

From: Stefan Hajnoczi 

qcow2_create2() calls qemu_co_mutex_lock().  Only a coroutine_fn may
call another coroutine_fn.  In fact, qcow2_create2 is always called from
coroutine context.

Rename the function to add the "co" moniker and add coroutine_fn.

Reported-by: Marc-André Lureau 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20170705102231.20711-3-stefa...@redhat.com>
Signed-off-by: Paolo Bonzini 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 93fb625dcb..7cf3c1518a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2725,11 +2725,12 @@ static uint64_t 
qcow2_opt_get_refcount_bits_del(QemuOpts *opts, int version,
 return refcount_bits;
 }
 
-static int qcow2_create2(const char *filename, int64_t total_size,
- const char *backing_file, const char *backing_format,
- int flags, size_t cluster_size, PreallocMode prealloc,
- QemuOpts *opts, int version, int refcount_order,
- const char *encryptfmt, Error **errp)
+static int coroutine_fn
+qcow2_co_create2(const char *filename, int64_t total_size,
+ const char *backing_file, const char *backing_format,
+ int flags, size_t cluster_size, PreallocMode prealloc,
+ QemuOpts *opts, int version, int refcount_order,
+ const char *encryptfmt, Error **errp)
 {
 QDict *options;
 
@@ -2998,9 +2999,9 @@ static int coroutine_fn qcow2_co_create_opts(const char 
*filename, QemuOpts *opt
 
 refcount_order = ctz32(refcount_bits);
 
-ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
-cluster_size, prealloc, opts, version, refcount_order,
-encryptfmt, _err);
+ret = qcow2_co_create2(filename, size, backing_file, backing_fmt, flags,
+   cluster_size, prealloc, opts, version, 
refcount_order,
+   encryptfmt, _err);
 error_propagate(errp, local_err);
 
 finish:
-- 
2.13.6

[Qemu-devel] [PULL 13/37] qed: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qed driver accordingly, taking the opportunity
to inline qed_is_allocated_cb() into its lone caller (the callback
used to be important, until we switched qed to coroutines).  There is
no intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/qed.c | 76 +++--
 1 file changed, 24 insertions(+), 52 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index c6ff3ab015..a595220926 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -688,74 +688,46 @@ finish:
 return ret;
 }
 
-typedef struct {
-BlockDriverState *bs;
-Coroutine *co;
-uint64_t pos;
-int64_t status;
-int *pnum;
-BlockDriverState **file;
-} QEDIsAllocatedCB;
-
-/* Called with table_lock held.  */
-static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t 
len)
+static int coroutine_fn bdrv_qed_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t pos, int64_t bytes,
+ int64_t *pnum, int64_t *map,
+ BlockDriverState **file)
 {
-QEDIsAllocatedCB *cb = opaque;
-BDRVQEDState *s = cb->bs->opaque;
-*cb->pnum = len / BDRV_SECTOR_SIZE;
+BDRVQEDState *s = bs->opaque;
+size_t len = MIN(bytes, SIZE_MAX);
+int status;
+QEDRequest request = { .l2_table = NULL };
+uint64_t offset;
+int ret;
+
+qemu_co_mutex_lock(>table_lock);
+ret = qed_find_cluster(s, , pos, , );
+
+*pnum = len;
 switch (ret) {
 case QED_CLUSTER_FOUND:
-offset |= qed_offset_into_cluster(s, cb->pos);
-cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
-*cb->file = cb->bs->file->bs;
+*map = offset | qed_offset_into_cluster(s, pos);
+status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+*file = bs->file->bs;
 break;
 case QED_CLUSTER_ZERO:
-cb->status = BDRV_BLOCK_ZERO;
+status = BDRV_BLOCK_ZERO;
 break;
 case QED_CLUSTER_L2:
 case QED_CLUSTER_L1:
-cb->status = 0;
+status = 0;
 break;
 default:
 assert(ret < 0);
-cb->status = ret;
+status = ret;
 break;
 }
 
-if (cb->co) {
-aio_co_wake(cb->co);
-}
-}
-
-static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
-{
-BDRVQEDState *s = bs->opaque;
-size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE;
-QEDIsAllocatedCB cb = {
-.bs = bs,
-.pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE,
-.status = BDRV_BLOCK_OFFSET_MASK,
-.pnum = pnum,
-.file = file,
-};
-QEDRequest request = { .l2_table = NULL };
-uint64_t offset;
-int ret;
-
-qemu_co_mutex_lock(>table_lock);
-ret = qed_find_cluster(s, , cb.pos, , );
-qed_is_allocated_cb(, ret, offset, len);
-
-/* The callback was invoked immediately */
-assert(cb.status != BDRV_BLOCK_OFFSET_MASK);
-
 qed_unref_l2_cache_entry(request.l2_table);
 qemu_co_mutex_unlock(>table_lock);
 
-return cb.status;
+return status;
 }
 
 static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
@@ -1594,7 +1566,7 @@ static BlockDriver bdrv_qed = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create  = bdrv_qed_create,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
+.bdrv_co_block_status = bdrv_qed_co_block_status,
 .bdrv_co_readv= bdrv_qed_co_readv,
 .bdrv_co_writev   = bdrv_qed_co_writev,
 .bdrv_co_pwrite_zeroes= bdrv_qed_co_pwrite_zeroes,
-- 
2.13.6

[Qemu-devel] [PULL 18/37] vmdk: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vmdk driver accordingly.  Drop the
now-unused vmdk_find_index_in_cluster().

Also, fix a pre-existing bug: if find_extent() fails (unlikely,
since the block layer did a bounds check), then we must return a
failure, rather than 0.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/vmdk.c | 38 ++
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index ef15ddbfd3..75f84213e6 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1304,33 +1304,27 @@ static inline uint64_t 
vmdk_find_offset_in_cluster(VmdkExtent *extent,
 return extent_relative_offset % cluster_size;
 }
 
-static inline uint64_t vmdk_find_index_in_cluster(VmdkExtent *extent,
-  int64_t sector_num)
-{
-uint64_t offset;
-offset = vmdk_find_offset_in_cluster(extent, sector_num * 
BDRV_SECTOR_SIZE);
-return offset / BDRV_SECTOR_SIZE;
-}
-
-static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vmdk_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
+ BlockDriverState **file)
 {
 BDRVVmdkState *s = bs->opaque;
 int64_t index_in_cluster, n, ret;
-uint64_t offset;
+uint64_t cluster_offset;
 VmdkExtent *extent;
 
-extent = find_extent(s, sector_num, NULL);
+extent = find_extent(s, offset >> BDRV_SECTOR_BITS, NULL);
 if (!extent) {
-return 0;
+return -EIO;
 }
 qemu_co_mutex_lock(>lock);
-ret = get_cluster_offset(bs, extent, NULL,
- sector_num * 512, false, ,
+ret = get_cluster_offset(bs, extent, NULL, offset, false, _offset,
  0, 0);
 qemu_co_mutex_unlock(>lock);
 
-index_in_cluster = vmdk_find_index_in_cluster(extent, sector_num);
+index_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
 switch (ret) {
 case VMDK_ERROR:
 ret = -EIO;
@@ -1345,18 +1339,14 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
 ret = BDRV_BLOCK_DATA;
 if (!extent->compressed) {
 ret |= BDRV_BLOCK_OFFSET_VALID;
-ret |= (offset + (index_in_cluster << BDRV_SECTOR_BITS))
-& BDRV_BLOCK_OFFSET_MASK;
+*map = cluster_offset + index_in_cluster;
 }
 *file = extent->file->bs;
 break;
 }
 
-n = extent->cluster_sectors - index_in_cluster;
-if (n > nb_sectors) {
-n = nb_sectors;
-}
-*pnum = n;
+n = extent->cluster_sectors * BDRV_SECTOR_SIZE - index_in_cluster;
+*pnum = MIN(n, bytes);
 return ret;
 }
 
@@ -2410,7 +2400,7 @@ static BlockDriver bdrv_vmdk = {
 .bdrv_close   = vmdk_close,
 .bdrv_create  = vmdk_create,
 .bdrv_co_flush_to_disk= vmdk_co_flush,
-.bdrv_co_get_block_status = vmdk_co_get_block_status,
+.bdrv_co_block_status = vmdk_co_block_status,
 .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
 .bdrv_has_zero_init   = vmdk_has_zero_init,
 .bdrv_get_specific_info   = vmdk_get_specific_info,
-- 
2.13.6

[Qemu-devel] [PULL 10/37] parallels: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the parallels driver accordingly.  Note that
the internal function block_status() is still sector-based, because
it is still in use by other sector-based functions; but that's okay
because request_alignment is 512 as a result of those functions.
For now, no optimizations are added based on the mapping hint.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/parallels.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index e1e3d80c88..3e952a9c14 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -261,23 +261,31 @@ static coroutine_fn int 
parallels_co_flush_to_os(BlockDriverState *bs)
 }
 
 
-static int64_t coroutine_fn parallels_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn parallels_co_block_status(BlockDriverState *bs,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 BDRVParallelsState *s = bs->opaque;
-int64_t offset;
+int count;
 
+assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
 qemu_co_mutex_lock(>lock);
-offset = block_status(s, sector_num, nb_sectors, pnum);
+offset = block_status(s, offset >> BDRV_SECTOR_BITS,
+  bytes >> BDRV_SECTOR_BITS, );
 qemu_co_mutex_unlock(>lock);
 
+*pnum = count * BDRV_SECTOR_SIZE;
 if (offset < 0) {
 return 0;
 }
 
+*map = offset * BDRV_SECTOR_SIZE;
 *file = bs->file->bs;
-return (offset << BDRV_SECTOR_BITS) |
-BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static coroutine_fn int parallels_co_writev(BlockDriverState *bs,
@@ -782,7 +790,7 @@ static BlockDriver bdrv_parallels = {
 .bdrv_open = parallels_open,
 .bdrv_close= parallels_close,
 .bdrv_child_perm  = bdrv_format_default_perms,
-.bdrv_co_get_block_status = parallels_co_get_block_status,
+.bdrv_co_block_status = parallels_co_block_status,
 .bdrv_has_zero_init   = bdrv_has_zero_init_1,
 .bdrv_co_flush_to_os  = parallels_co_flush_to_os,
 .bdrv_co_readv  = parallels_co_readv,
-- 
2.13.6

[Qemu-devel] [PULL 22/37] block: fix write with zero flag set and iovector provided

2018-03-02 Thread Kevin Wolf

From: Anton Nefedov 

The normal bdrv_co_pwritev() use is either
  - BDRV_REQ_ZERO_WRITE clear and iovector provided
  - BDRV_REQ_ZERO_WRITE set and iovector == NULL

while
  - the flag clear and iovector == NULL is an assertion failure
in bdrv_co_do_zero_pwritev()
  - the flag set and iovector provided is in fact allowed
(the flag prevails and zeroes are written)

However the alignment logic does not support the latter case so the padding
areas get overwritten with zeroes.

Currently, general functions like bdrv_rw_co() do provide iovector
regardless of flags. So, keep it supported and use bdrv_co_do_zero_pwritev()
alignment for it which also makes the code a bit more obvious anyway.

Signed-off-by: Anton Nefedov 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 4c3dba0973..4d3d1f640a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1701,7 +1701,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
  */
 tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE);
 
-if (!qiov) {
+if (flags & BDRV_REQ_ZERO_WRITE) {
 ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, );
 goto out;
 }
-- 
2.13.6

[Qemu-devel] [PULL 17/37] vdi: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vdi driver accordingly.  Note that the
TODO is already covered (the block layer guarantees bounds of its
requests), and that we can remove the now-unused s->block_sectors.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/vdi.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 32b1763cde..0780c82d82 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -172,8 +172,6 @@ typedef struct {
 uint32_t *bmap;
 /* Size of block (bytes). */
 uint32_t block_size;
-/* Size of block (sectors). */
-uint32_t block_sectors;
 /* First sector of block map. */
 uint32_t bmap_sector;
 /* VDI header (converted to host endianness). */
@@ -463,7 +461,6 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
 bs->total_sectors = header.disk_size / SECTOR_SIZE;
 
 s->block_size = header.block_size;
-s->block_sectors = header.block_size / SECTOR_SIZE;
 s->bmap_sector = header.offset_bmap / SECTOR_SIZE;
 s->header = header;
 
@@ -509,33 +506,29 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
 return 0;
 }
 
-static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset, int64_t bytes,
+int64_t *pnum, int64_t *map,
+BlockDriverState **file)
 {
-/* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
 BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
-size_t bmap_index = sector_num / s->block_sectors;
-size_t sector_in_block = sector_num % s->block_sectors;
-int n_sectors = s->block_sectors - sector_in_block;
+size_t bmap_index = offset / s->block_size;
+size_t index_in_block = offset % s->block_size;
 uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
-uint64_t offset;
 int result;
 
-logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
-if (n_sectors > nb_sectors) {
-n_sectors = nb_sectors;
-}
-*pnum = n_sectors;
+logout("%p, %" PRId64 ", %" PRId64 ", %p\n", bs, offset, bytes, pnum);
+*pnum = MIN(s->block_size - index_in_block, bytes);
 result = VDI_IS_ALLOCATED(bmap_entry);
 if (!result) {
 return 0;
 }
 
-offset = s->header.offset_data +
-  (uint64_t)bmap_entry * s->block_size +
-  sector_in_block * SECTOR_SIZE;
+*map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
+index_in_block;
 *file = bs->file->bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static int coroutine_fn
@@ -903,7 +896,7 @@ static BlockDriver bdrv_vdi = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create = vdi_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = vdi_co_get_block_status,
+.bdrv_co_block_status = vdi_co_block_status,
 .bdrv_make_empty = vdi_make_empty,
 
 .bdrv_co_preadv = vdi_co_preadv,
-- 
2.13.6

[Qemu-devel] [PULL 15/37] sheepdog: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the sheepdog driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Reviewed-by: Jeff Cody 
Signed-off-by: Kevin Wolf 
---
 block/sheepdog.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index ac02b10fe0..3c3becf94d 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -3004,19 +3004,19 @@ static coroutine_fn int sd_co_pdiscard(BlockDriverState 
*bs, int64_t offset,
 return acb.ret;
 }
 
-static coroutine_fn int64_t
-sd_co_get_block_status(BlockDriverState *bs, int64_t sector_num, int 
nb_sectors,
-   int *pnum, BlockDriverState **file)
+static coroutine_fn int
+sd_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
+   int64_t bytes, int64_t *pnum, int64_t *map,
+   BlockDriverState **file)
 {
 BDRVSheepdogState *s = bs->opaque;
 SheepdogInode *inode = >inode;
 uint32_t object_size = (UINT32_C(1) << inode->block_size_shift);
-uint64_t offset = sector_num * BDRV_SECTOR_SIZE;
 unsigned long start = offset / object_size,
-  end = DIV_ROUND_UP((sector_num + nb_sectors) *
- BDRV_SECTOR_SIZE, object_size);
+  end = DIV_ROUND_UP(offset + bytes, object_size);
 unsigned long idx;
-int64_t ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+*map = offset;
+int ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 
 for (idx = start; idx < end; idx++) {
 if (inode->data_vdi_id[idx] == 0) {
@@ -3033,9 +3033,9 @@ sd_co_get_block_status(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 }
 }
 
-*pnum = (idx - start) * object_size / BDRV_SECTOR_SIZE;
-if (*pnum > nb_sectors) {
-*pnum = nb_sectors;
+*pnum = (idx - start) * object_size;
+if (*pnum > bytes) {
+*pnum = bytes;
 }
 if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
 *file = bs;
@@ -3113,7 +3113,7 @@ static BlockDriver bdrv_sheepdog = {
 .bdrv_co_writev   = sd_co_writev,
 .bdrv_co_flush_to_disk= sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status = sd_co_block_status,
 
 .bdrv_snapshot_create = sd_snapshot_create,
 .bdrv_snapshot_goto   = sd_snapshot_goto,
@@ -3149,7 +3149,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
 .bdrv_co_writev   = sd_co_writev,
 .bdrv_co_flush_to_disk= sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status = sd_co_block_status,
 
 .bdrv_snapshot_create = sd_snapshot_create,
 .bdrv_snapshot_goto   = sd_snapshot_goto,
@@ -3185,7 +3185,7 @@ static BlockDriver bdrv_sheepdog_unix = {
 .bdrv_co_writev   = sd_co_writev,
 .bdrv_co_flush_to_disk= sd_co_flush_to_disk,
 .bdrv_co_pdiscard = sd_co_pdiscard,
-.bdrv_co_get_block_status = sd_co_get_block_status,
+.bdrv_co_block_status = sd_co_block_status,
 
 .bdrv_snapshot_create = sd_snapshot_create,
 .bdrv_snapshot_goto   = sd_snapshot_goto,
-- 
2.13.6

[Qemu-devel] [PULL 14/37] raw: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the raw driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/raw-format.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index ab552c0954..830243a8e4 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -250,17 +250,17 @@ fail:
 return ret;
 }
 
-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum,
+static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
+bool want_zero, int64_t offset,
+int64_t bytes, int64_t *pnum,
+int64_t *map,
 BlockDriverState **file)
 {
 BDRVRawState *s = bs->opaque;
-*pnum = nb_sectors;
+*pnum = bytes;
 *file = bs->file->bs;
-sector_num += s->offset / BDRV_SECTOR_SIZE;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
+*map = offset + s->offset;
+return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
@@ -496,7 +496,7 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwritev  = _co_pwritev,
 .bdrv_co_pwrite_zeroes = _co_pwrite_zeroes,
 .bdrv_co_pdiscard = _co_pdiscard,
-.bdrv_co_get_block_status = _co_get_block_status,
+.bdrv_co_block_status = _co_block_status,
 .bdrv_truncate= _truncate,
 .bdrv_getlength   = _getlength,
 .has_variable_length  = true,
-- 
2.13.6

[Qemu-devel] [PULL 09/37] null: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/null.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/block/null.c b/block/null.c
index 214d394fff..806a8631e4 100644
--- a/block/null.c
+++ b/block/null.c
@@ -223,22 +223,23 @@ static int null_reopen_prepare(BDRVReopenState 
*reopen_state,
 return 0;
 }
 
-static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
+static int coroutine_fn null_co_block_status(BlockDriverState *bs,
+ bool want_zero, int64_t offset,
+ int64_t bytes, int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
 {
 BDRVNullState *s = bs->opaque;
-off_t start = sector_num * BDRV_SECTOR_SIZE;
+int ret = BDRV_BLOCK_OFFSET_VALID;
 
-*pnum = nb_sectors;
+*pnum = bytes;
+*map = offset;
 *file = bs;
 
 if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
 }
+return ret;
 }
 
 static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
@@ -270,7 +271,7 @@ static BlockDriver bdrv_null_co = {
 .bdrv_co_flush_to_disk  = null_co_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,
 
-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,
 
 .bdrv_refresh_filename  = null_refresh_filename,
 };
@@ -290,7 +291,7 @@ static BlockDriver bdrv_null_aio = {
 .bdrv_aio_flush = null_aio_flush,
 .bdrv_reopen_prepare= null_reopen_prepare,
 
-.bdrv_co_get_block_status   = null_co_get_block_status,
+.bdrv_co_block_status   = null_co_block_status,
 
 .bdrv_refresh_filename  = null_refresh_filename,
 };
-- 
2.13.6

[Qemu-devel] [PULL 07/37] iscsi: Switch iscsi_allocmap_update() to byte-based

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the allocmap (no semantic change).  Callers that already had bytes
available are simpler, and callers that now scale to bytes will be
easier to switch to byte-based in the future.

Signed-off-by: Eric Blake 
Acked-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/iscsi.c | 90 +--
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 3414c21c7f..d2b0466775 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -458,24 +458,22 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 }
 
 static void
-iscsi_allocmap_update(IscsiLun *iscsilun, int64_t sector_num,
-  int nb_sectors, bool allocated, bool valid)
+iscsi_allocmap_update(IscsiLun *iscsilun, int64_t offset,
+  int64_t bytes, bool allocated, bool valid)
 {
 int64_t cl_num_expanded, nb_cls_expanded, cl_num_shrunk, nb_cls_shrunk;
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
 
 if (iscsilun->allocmap == NULL) {
 return;
 }
 /* expand to entirely contain all affected clusters */
-assert(cluster_sectors);
-cl_num_expanded = sector_num / cluster_sectors;
-nb_cls_expanded = DIV_ROUND_UP(sector_num + nb_sectors,
-   cluster_sectors) - cl_num_expanded;
+assert(iscsilun->cluster_size);
+cl_num_expanded = offset / iscsilun->cluster_size;
+nb_cls_expanded = DIV_ROUND_UP(offset + bytes,
+   iscsilun->cluster_size) - cl_num_expanded;
 /* shrink to touch only completely contained clusters */
-cl_num_shrunk = DIV_ROUND_UP(sector_num, cluster_sectors);
-nb_cls_shrunk = (sector_num + nb_sectors) / cluster_sectors
-  - cl_num_shrunk;
+cl_num_shrunk = DIV_ROUND_UP(offset, iscsilun->cluster_size);
+nb_cls_shrunk = (offset + bytes) / iscsilun->cluster_size - cl_num_shrunk;
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
 } else {
@@ -498,26 +496,26 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
 }
 
 static void
-iscsi_allocmap_set_allocated(IscsiLun *iscsilun, int64_t sector_num,
- int nb_sectors)
+iscsi_allocmap_set_allocated(IscsiLun *iscsilun, int64_t offset,
+ int64_t bytes)
 {
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, true, true);
+iscsi_allocmap_update(iscsilun, offset, bytes, true, true);
 }
 
 static void
-iscsi_allocmap_set_unallocated(IscsiLun *iscsilun, int64_t sector_num,
-   int nb_sectors)
+iscsi_allocmap_set_unallocated(IscsiLun *iscsilun, int64_t offset,
+   int64_t bytes)
 {
 /* Note: if cache.direct=on the fifth argument to iscsi_allocmap_update
  * is ignored, so this will in effect be an iscsi_allocmap_set_invalid.
  */
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, false, true);
+iscsi_allocmap_update(iscsilun, offset, bytes, false, true);
 }
 
-static void iscsi_allocmap_set_invalid(IscsiLun *iscsilun, int64_t sector_num,
-   int nb_sectors)
+static void iscsi_allocmap_set_invalid(IscsiLun *iscsilun, int64_t offset,
+   int64_t bytes)
 {
-iscsi_allocmap_update(iscsilun, sector_num, nb_sectors, false, false);
+iscsi_allocmap_update(iscsilun, offset, bytes, false, false);
 }
 
 static void iscsi_allocmap_invalidate(IscsiLun *iscsilun)
@@ -531,34 +529,30 @@ static void iscsi_allocmap_invalidate(IscsiLun *iscsilun)
 }
 
 static inline bool
-iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t sector_num,
-int nb_sectors)
+iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t offset,
+int64_t bytes)
 {
 unsigned long size;
 if (iscsilun->allocmap == NULL) {
 return true;
 }
 assert(iscsilun->cluster_size);
-size = DIV_ROUND_UP(sector_num + nb_sectors,
-iscsilun->cluster_size >> BDRV_SECTOR_BITS);
+size = DIV_ROUND_UP(offset + bytes, iscsilun->cluster_size);
 return !(find_next_bit(iscsilun->allocmap, size,
-   sector_num * BDRV_SECTOR_SIZE /
-   iscsilun->cluster_size) == size);
+   offset / iscsilun->cluster_size) == size);
 }
 
 static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
-   int64_t sector_num, int nb_sectors)
+   int64_t offset, int64_t bytes)
 {
 unsigned

[Qemu-devel] [PULL 01/37] block: Add .bdrv_co_block_status() callback

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based. Now that the block layer exposes byte-based allocation,
it's time to tackle the drivers.  Add a new callback that operates
on as small as byte boundaries. Subsequent patches will then update
individual drivers, then finally remove .bdrv_co_get_block_status().

The new code also passes through the 'want_zero' hint, which will
allow subsequent patches to further optimize callers that only care
about how much of the image is allocated (want_zero is false),
rather than full details about runs of zeroes and which offsets the
allocation actually maps to (want_zero is true).  As part of this
effort, fix another part of the documentation: the claim in commit
4c41cb4 that BDRV_BLOCK_ALLOCATED is short for 'DATA || ZERO' is a
lie at the block layer (see commit e88ae2264), even though it is
how the bit is computed from the driver layer.  After all, there
are intentionally cases where we return ZERO but not ALLOCATED at
the block layer, when we know that a read sees zero because the
backing file is too short.  Note that the driver interface is thus
slightly different than the public interface with regards to which
bits will be set, and what guarantees are provided on input.

We also add an assertion that any driver using the new callback will
make progress (the only time pnum will be 0 is if the block layer
already handled an out-of-bounds request, or if there is an error);
the old driver interface did not provide this guarantee, which
could lead to some inf-loops in drastic corner-case failures.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 include/block/block.h | 14 +++---
 include/block/block_int.h | 20 +++-
 block/io.c| 28 +++-
 3 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 19b3ab9cb5..947e8876cd 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -115,19 +115,19 @@ typedef struct HDGeometry {
  * BDRV_BLOCK_ZERO: offset reads as zero
  * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
  * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
- *   layer (short for DATA || ZERO), set by block layer
- * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this layer
+ *   layer rather than any backing, set by block layer
+ * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this
+ * layer, set by block layer
  *
  * Internal flag:
  * BDRV_BLOCK_RAW: for use by passthrough drivers, such as raw, to request
  * that the block layer recompute the answer from the returned
  * BDS; must be accompanied by just BDRV_BLOCK_OFFSET_VALID.
  *
- * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK) of
- * the return value (old interface) or the entire map parameter (new
- * interface) represent the offset in the returned BDS that is allocated for
- * the corresponding raw data.  However, whether that offset actually
- * contains data also depends on BDRV_BLOCK_DATA, as follows:
+ * If BDRV_BLOCK_OFFSET_VALID is set, the map parameter represents the
+ * host offset within the returned BDS that is allocated for the
+ * corresponding raw guest data.  However, whether that offset
+ * actually contains data also depends on BDRV_BLOCK_DATA, as follows:
  *
  * DATA ZERO OFFSET_VALID
  *  ttt   sectors read as zero, returned file is zero at offset
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 5ea63f8fa8..c93722b43a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -202,15 +202,25 @@ struct BlockDriver {
 /*
  * Building block for bdrv_block_status[_above] and
  * bdrv_is_allocated[_above].  The driver should answer only
- * according to the current layer, and should not set
- * BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
- * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.  The block
- * layer guarantees input aligned to request_alignment, as well as
- * non-NULL pnum and file.
+ * according to the current layer, and should only need to set
+ * BDRV_BLOCK_DATA, BDRV_BLOCK_ZERO, BDRV_BLOCK_OFFSET_VALID,
+ * and/or BDRV_BLOCK_RAW; if the current layer defers to a backing
+ * layer, the result should be 0 (and not BDRV_BLOCK_ZERO).  See
+ * block.h for the overall meaning of the bits.  As a hint, the
+ * flag want_zero is true if the caller cares more about precise
+ * mappings (favor accurate _OFFSET_VALID/_ZERO) or false for
+ * overall allocation (favor larger *pnum, perhaps by

[Qemu-devel] [PULL 12/37] qcow2: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow2 driver accordingly.

For now, we are ignoring the 'want_zero' hint.  However, it should
be relatively straightforward to honor the hint as a way to return
larger *pnum values when we have consecutive clusters with the same
data/zero status but which differ only in having non-consecutive
mappings.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 57a517e2bd..288b5299d8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1670,32 +1670,34 @@ static void qcow2_join_options(QDict *options, QDict 
*old_options)
 }
 }
 
-static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
+  bool want_zero,
+  int64_t offset, int64_t count,
+  int64_t *pnum, int64_t *map,
+  BlockDriverState **file)
 {
 BDRVQcow2State *s = bs->opaque;
 uint64_t cluster_offset;
 int index_in_cluster, ret;
 unsigned int bytes;
-int64_t status = 0;
+int status = 0;
 
-bytes = MIN(INT_MAX, nb_sectors * BDRV_SECTOR_SIZE);
+bytes = MIN(INT_MAX, count);
 qemu_co_mutex_lock(>lock);
-ret = qcow2_get_cluster_offset(bs, sector_num << BDRV_SECTOR_BITS, ,
-   _offset);
+ret = qcow2_get_cluster_offset(bs, offset, , _offset);
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
 }
 
-*pnum = bytes >> BDRV_SECTOR_BITS;
+*pnum = bytes;
 
 if (cluster_offset != 0 && ret != QCOW2_CLUSTER_COMPRESSED &&
 !s->crypto) {
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+index_in_cluster = offset & (s->cluster_size - 1);
+*map = cluster_offset | index_in_cluster;
 *file = bs->file->bs;
-status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
+status |= BDRV_BLOCK_OFFSET_VALID;
 }
 if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
 status |= BDRV_BLOCK_ZERO;
@@ -4352,7 +4354,7 @@ BlockDriver bdrv_qcow2 = {
 .bdrv_child_perm  = bdrv_format_default_perms,
 .bdrv_create= qcow2_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
-.bdrv_co_get_block_status = qcow2_co_get_block_status,
+.bdrv_co_block_status = qcow2_co_block_status,
 
 .bdrv_co_preadv = qcow2_co_preadv,
 .bdrv_co_pwritev= qcow2_co_pwritev,
-- 
2.13.6

[Qemu-devel] [PULL 08/37] iscsi: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the iscsi driver accordingly.  In this case,
it is handy to teach iscsi_co_block_status() to handle a NULL map
and file parameter, even though the block layer passes non-NULL
values, because we also call the function directly.  For now, there
are no optimizations done based on the want_zero flag.

We can also make the simplification of asserting that the block
layer passed in aligned values.

Signed-off-by: Eric Blake 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/iscsi.c | 69 ---
 1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index d2b0466775..c228ca21c8 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -653,36 +653,36 @@ out_unlock:
 
 
 
-static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
-  int64_t sector_num,
-  int nb_sectors, int *pnum,
-  BlockDriverState **file)
+static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
+  bool want_zero, int64_t offset,
+  int64_t bytes, int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 IscsiLun *iscsilun = bs->opaque;
 struct scsi_get_lba_status *lbas = NULL;
 struct scsi_lba_status_descriptor *lbasd = NULL;
 struct IscsiTask iTask;
 uint64_t lba;
-int64_t ret;
+int ret;
 
 iscsi_co_init_iscsitask(iscsilun, );
 
-if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
-ret = -EINVAL;
-goto out;
-}
+assert(QEMU_IS_ALIGNED(offset | bytes, iscsilun->block_size));
 
 /* default to all sectors allocated */
-ret = BDRV_BLOCK_DATA;
-ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID;
-*pnum = nb_sectors;
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+if (map) {
+*map = offset;
+}
+*pnum = bytes;
 
 /* LUN does not support logical block provisioning */
 if (!iscsilun->lbpme) {
 goto out;
 }
 
-lba = sector_qemu2lun(sector_num, iscsilun);
+lba = offset / iscsilun->block_size;
 
 qemu_mutex_lock(>mutex);
 retry:
@@ -727,12 +727,12 @@ retry:
 
 lbasd = >descriptors[0];
 
-if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
+if (lba != lbasd->lba) {
 ret = -EIO;
 goto out_unlock;
 }
 
-*pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
+*pnum = lbasd->num_blocks * iscsilun->block_size;
 
 if (lbasd->provisioning == SCSI_PROVISIONING_TYPE_DEALLOCATED ||
 lbasd->provisioning == SCSI_PROVISIONING_TYPE_ANCHORED) {
@@ -743,15 +743,13 @@ retry:
 }
 
 if (ret & BDRV_BLOCK_ZERO) {
-iscsi_allocmap_set_unallocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
-   *pnum * BDRV_SECTOR_SIZE);
+iscsi_allocmap_set_unallocated(iscsilun, offset, *pnum);
 } else {
-iscsi_allocmap_set_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
- *pnum * BDRV_SECTOR_SIZE);
+iscsi_allocmap_set_allocated(iscsilun, offset, *pnum);
 }
 
-if (*pnum > nb_sectors) {
-*pnum = nb_sectors;
+if (*pnum > bytes) {
+*pnum = bytes;
 }
 out_unlock:
 qemu_mutex_unlock(>mutex);
@@ -760,7 +758,7 @@ out:
 if (iTask.task != NULL) {
 scsi_free_scsi_task(iTask.task);
 }
-if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
+if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID && file) {
 *file = bs;
 }
 return ret;
@@ -800,25 +798,24 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
  nb_sectors * BDRV_SECTOR_SIZE) &&
 !iscsi_allocmap_is_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
  nb_sectors * BDRV_SECTOR_SIZE)) {
-int pnum;
-BlockDriverState *file;
+int64_t pnum;
 /* check the block status from the beginning of the cluster
  * containing the start sector */
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
-int head;
-int64_t ret;
-
-assert(cluster_sectors);
-head = sector_num % cluster_sectors;
-ret = iscsi_co_get_block_status(bs, sector_num - head,
-BDRV_REQUEST_MAX_SECTORS, ,
-);
+int64_t head;
+int ret;
+
+assert(iscsilun->cluster_size);
+head = (sector_num

[Qemu-devel] [PULL 11/37] qcow: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the qcow driver accordingly.  There is no
intent to optimize based on the want_zero flag for this format.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/qcow.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index 8631155ac8..dead5029c6 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -524,23 +524,28 @@ static int get_cluster_offset(BlockDriverState *bs,
 return 1;
 }
 
-static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+static int coroutine_fn qcow_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
+ BlockDriverState **file)
 {
 BDRVQcowState *s = bs->opaque;
-int index_in_cluster, n, ret;
+int index_in_cluster, ret;
+int64_t n;
 uint64_t cluster_offset;
 
 qemu_co_mutex_lock(>lock);
-ret = get_cluster_offset(bs, sector_num << 9, 0, 0, 0, 0, _offset);
+ret = get_cluster_offset(bs, offset, 0, 0, 0, 0, _offset);
 qemu_co_mutex_unlock(>lock);
 if (ret < 0) {
 return ret;
 }
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-n = s->cluster_sectors - index_in_cluster;
-if (n > nb_sectors)
-n = nb_sectors;
+index_in_cluster = offset & (s->cluster_size - 1);
+n = s->cluster_size - index_in_cluster;
+if (n > bytes) {
+n = bytes;
+}
 *pnum = n;
 if (!cluster_offset) {
 return 0;
@@ -548,9 +553,9 @@ static int64_t coroutine_fn 
qcow_co_get_block_status(BlockDriverState *bs,
 if ((cluster_offset & QCOW_OFLAG_COMPRESSED) || s->crypto) {
 return BDRV_BLOCK_DATA;
 }
-cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+*map = cluster_offset | index_in_cluster;
 *file = bs->file->bs;
-return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | cluster_offset;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static int decompress_buffer(uint8_t *out_buf, int out_buf_size,
@@ -1128,7 +1133,7 @@ static BlockDriver bdrv_qcow = {
 
 .bdrv_co_readv  = qcow_co_readv,
 .bdrv_co_writev = qcow_co_writev,
-.bdrv_co_get_block_status   = qcow_co_get_block_status,
+.bdrv_co_block_status   = qcow_co_block_status,
 
 .bdrv_make_empty= qcow_make_empty,
 .bdrv_co_pwritev_compressed = qcow_co_pwritev_compressed,
-- 
2.13.6

[Qemu-devel] [PULL 03/37] block: Switch passthrough drivers to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the generic helpers, and all passthrough clients
(blkdebug, commit, mirror, throttle) accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 include/block/block_int.h | 28 
 block/blkdebug.c  | 20 +++-
 block/commit.c|  2 +-
 block/io.c| 36 
 block/mirror.c|  2 +-
 block/throttle.c  |  2 +-
 6 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index c93722b43a..bf2598856c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1041,23 +1041,27 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
uint64_t *nperm, uint64_t *nshared);
 
 /*
- * Default implementation for drivers to pass bdrv_co_get_block_status() to
+ * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
  */
-int64_t coroutine_fn bdrv_co_get_block_status_from_file(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors,
-int *pnum,
-BlockDriverState 
**file);
+int coroutine_fn bdrv_co_block_status_from_file(BlockDriverState *bs,
+bool want_zero,
+int64_t offset,
+int64_t bytes,
+int64_t *pnum,
+int64_t *map,
+BlockDriverState **file);
 /*
- * Default implementation for drivers to pass bdrv_co_get_block_status() to
+ * Default implementation for drivers to pass bdrv_co_block_status() to
  * their backing file.
  */
-int64_t coroutine_fn bdrv_co_get_block_status_from_backing(BlockDriverState 
*bs,
-   int64_t sector_num,
-   int nb_sectors,
-   int *pnum,
-   BlockDriverState 
**file);
+int coroutine_fn bdrv_co_block_status_from_backing(BlockDriverState *bs,
+   bool want_zero,
+   int64_t offset,
+   int64_t bytes,
+   int64_t *pnum,
+   int64_t *map,
+   BlockDriverState **file);
 const char *bdrv_get_parent_name(const BlockDriverState *bs);
 void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp);
 bool blk_dev_has_removable_media(BlockBackend *blk);
diff --git a/block/blkdebug.c b/block/blkdebug.c
index d83f23febd..589712475a 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -627,15 +627,17 @@ static int coroutine_fn 
blkdebug_co_pdiscard(BlockDriverState *bs,
 return bdrv_co_pdiscard(bs->file->bs, offset, bytes);
 }
 
-static int64_t coroutine_fn blkdebug_co_get_block_status(
-BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
-BlockDriverState **file)
+static int coroutine_fn blkdebug_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
 {
-assert(QEMU_IS_ALIGNED(sector_num | nb_sectors,
-   DIV_ROUND_UP(bs->bl.request_alignment,
-BDRV_SECTOR_SIZE)));
-return bdrv_co_get_block_status_from_file(bs, sector_num, nb_sectors,
-  pnum, file);
+assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
+return bdrv_co_block_status_from_file(bs, want_zero, offset, bytes,
+  pnum, map, file);
 }
 
 static void blkdebug_close(BlockDriverState *bs)
@@ -907,7 +909,7 @@ static BlockDriver bdrv_blkdebug = {
 .bdrv_co_flush_to_disk  = blkdebug_co_flush,

[Qemu-devel] [PULL 06/37] iscsi: Switch cluster_sectors to byte-based

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert all uses of
the cluster size in sectors, along with adding assertions that we
are not dividing by zero.

Improve some comment grammar while in the area.

Signed-off-by: Eric Blake 
Acked-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/iscsi.c | 56 +++-
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 421983dd6f..3414c21c7f 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -86,7 +86,7 @@ typedef struct IscsiLun {
 unsigned long *allocmap;
 unsigned long *allocmap_valid;
 long allocmap_size;
-int cluster_sectors;
+int cluster_size;
 bool use_16_for_rw;
 bool write_protected;
 bool lbpme;
@@ -430,9 +430,10 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 {
 iscsi_allocmap_free(iscsilun);
 
+assert(iscsilun->cluster_size);
 iscsilun->allocmap_size =
-DIV_ROUND_UP(sector_lun2qemu(iscsilun->num_blocks, iscsilun),
- iscsilun->cluster_sectors);
+DIV_ROUND_UP(iscsilun->num_blocks * iscsilun->block_size,
+ iscsilun->cluster_size);
 
 iscsilun->allocmap = bitmap_try_new(iscsilun->allocmap_size);
 if (!iscsilun->allocmap) {
@@ -440,7 +441,7 @@ static int iscsi_allocmap_init(IscsiLun *iscsilun, int 
open_flags)
 }
 
 if (open_flags & BDRV_O_NOCACHE) {
-/* in case that cache.direct = on all allocmap entries are
+/* when cache.direct = on all allocmap entries are
  * treated as invalid to force a relookup of the block
  * status on every read request */
 return 0;
@@ -461,17 +462,19 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
   int nb_sectors, bool allocated, bool valid)
 {
 int64_t cl_num_expanded, nb_cls_expanded, cl_num_shrunk, nb_cls_shrunk;
+int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
 
 if (iscsilun->allocmap == NULL) {
 return;
 }
 /* expand to entirely contain all affected clusters */
-cl_num_expanded = sector_num / iscsilun->cluster_sectors;
+assert(cluster_sectors);
+cl_num_expanded = sector_num / cluster_sectors;
 nb_cls_expanded = DIV_ROUND_UP(sector_num + nb_sectors,
-   iscsilun->cluster_sectors) - 
cl_num_expanded;
+   cluster_sectors) - cl_num_expanded;
 /* shrink to touch only completely contained clusters */
-cl_num_shrunk = DIV_ROUND_UP(sector_num, iscsilun->cluster_sectors);
-nb_cls_shrunk = (sector_num + nb_sectors) / iscsilun->cluster_sectors
+cl_num_shrunk = DIV_ROUND_UP(sector_num, cluster_sectors);
+nb_cls_shrunk = (sector_num + nb_sectors) / cluster_sectors
   - cl_num_shrunk;
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
@@ -535,9 +538,12 @@ iscsi_allocmap_is_allocated(IscsiLun *iscsilun, int64_t 
sector_num,
 if (iscsilun->allocmap == NULL) {
 return true;
 }
-size = DIV_ROUND_UP(sector_num + nb_sectors, iscsilun->cluster_sectors);
+assert(iscsilun->cluster_size);
+size = DIV_ROUND_UP(sector_num + nb_sectors,
+iscsilun->cluster_size >> BDRV_SECTOR_BITS);
 return !(find_next_bit(iscsilun->allocmap, size,
-   sector_num / iscsilun->cluster_sectors) == size);
+   sector_num * BDRV_SECTOR_SIZE /
+   iscsilun->cluster_size) == size);
 }
 
 static inline bool iscsi_allocmap_is_valid(IscsiLun *iscsilun,
@@ -547,9 +553,12 @@ static inline bool iscsi_allocmap_is_valid(IscsiLun 
*iscsilun,
 if (iscsilun->allocmap_valid == NULL) {
 return false;
 }
-size = DIV_ROUND_UP(sector_num + nb_sectors, iscsilun->cluster_sectors);
+assert(iscsilun->cluster_size);
+size = DIV_ROUND_UP(sector_num + nb_sectors,
+iscsilun->cluster_size >> BDRV_SECTOR_BITS);
 return (find_next_zero_bit(iscsilun->allocmap_valid, size,
-   sector_num / iscsilun->cluster_sectors) == 
size);
+   sector_num * BDRV_SECTOR_SIZE /
+   iscsilun->cluster_size) == size);
 }
 
 static int coroutine_fn
@@ -793,16 +802,21 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
 BlockDriverState *file;
 /* check the block status from the beginning of the cluster
  * containing the start sector */
-int64_t ret = iscsi_co_get_block_status(bs,
-  sector_num - sector_num % iscsilun->cluster_sectors,
-

[Qemu-devel] [PULL 05/37] gluster: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the gluster driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping find_allocation(), and merely state that
all bytes are allocated.

We can also drop redundant bounds checks that are already
guaranteed by the block layer.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/gluster.c | 70 -
 1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 3f17b7819d..1a07d221d1 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1362,68 +1362,66 @@ exit:
 }
 
 /*
- * Returns the allocation status of the specified sectors.
+ * Returns the allocation status of the specified offset.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
- * and 'pnum' is set to 0.
+ * The block layer guarantees 'offset' and 'bytes' are within bounds.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
  * allocated/unallocated state.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
- * beyond the end of the disk image it will be clamped.
+ * 'bytes' is the max value 'pnum' should be set to.
  *
- * (Based on raw_co_get_block_status() from file-posix.c.)
+ * (Based on raw_co_block_status() from file-posix.c.)
  */
-static int64_t coroutine_fn qemu_gluster_co_get_block_status(
-BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
-BlockDriverState **file)
+static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
 {
 BDRVGlusterState *s = bs->opaque;
-off_t start, data = 0, hole = 0;
-int64_t total_size;
+off_t data = 0, hole = 0;
 int ret = -EINVAL;
 
 if (!s->fd) {
 return ret;
 }
 
-start = sector_num * BDRV_SECTOR_SIZE;
-total_size = bdrv_getlength(bs);
-if (total_size < 0) {
-return total_size;
-} else if (start >= total_size) {
-*pnum = 0;
-return 0;
-} else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
-nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+if (!want_zero) {
+*pnum = bytes;
+*map = offset;
+*file = bs;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
 
-ret = find_allocation(bs, start, , );
+ret = find_allocation(bs, offset, , );
 if (ret == -ENXIO) {
 /* Trailing hole */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_ZERO;
 } else if (ret < 0) {
 /* No info available, so pretend there are no holes */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA;
-} else if (data == start) {
-/* On a data extent, compute sectors to the end of the extent,
+} else if (data == offset) {
+/* On a data extent, compute bytes to the end of the extent,
  * possibly including a partial sector at EOF. */
-*pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+*pnum = MIN(bytes, hole - offset);
 ret = BDRV_BLOCK_DATA;
 } else {
-/* On a hole, compute sectors to the beginning of the next extent.  */
-assert(hole == start);
-*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+/* On a hole, compute bytes to the beginning of the next extent.  */
+assert(hole == offset);
+*pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }
 
+*map = offset;
 *file = bs;
 
-return ret | BDRV_BLOCK_OFFSET_VALID | start;
+return ret | BDRV_BLOCK_OFFSET_VALID;
 }
 
 
@@ -1451,7 +1449,7 @@ static BlockDriver bdrv_gluster = {
 #ifdef

[Qemu-devel] [PULL 02/37] nvme: Drop pointless .bdrv_co_get_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

Commit bdd6a90 has a bug: drivers should never directly set
BDRV_BLOCK_ALLOCATED, but only io.c should do that (as needed).
Instead, drivers should report BDRV_BLOCK_DATA if it knows that
data comes from this BDS.

But let's look at the bigger picture: semantically, the nvme
driver is similar to the nbd, null, and raw drivers (no backing
file, all data comes from this BDS).  But while two of those
other drivers have to supply the callback (null because it can
special-case BDRV_BLOCK_ZERO, raw because it can special-case
a different offset), in this case the block layer defaults are
good enough without the callback at all (similar to nbd).

So, fix the bug by deletion ;)

Signed-off-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/nvme.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 75078022f6..8bca57aae6 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1072,18 +1072,6 @@ static int nvme_reopen_prepare(BDRVReopenState 
*reopen_state,
 return 0;
 }
 
-static int64_t coroutine_fn nvme_co_get_block_status(BlockDriverState *bs,
- int64_t sector_num,
- int nb_sectors, int *pnum,
- BlockDriverState **file)
-{
-*pnum = nb_sectors;
-*file = bs;
-
-return BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
-   (sector_num << BDRV_SECTOR_BITS);
-}
-
 static void nvme_refresh_filename(BlockDriverState *bs, QDict *opts)
 {
 QINCREF(opts);
@@ -1183,8 +1171,6 @@ static BlockDriver bdrv_nvme = {
 .bdrv_co_flush_to_disk= nvme_co_flush,
 .bdrv_reopen_prepare  = nvme_reopen_prepare,
 
-.bdrv_co_get_block_status = nvme_co_get_block_status,
-
 .bdrv_refresh_filename= nvme_refresh_filename,
 .bdrv_refresh_limits  = nvme_refresh_limits,
 
-- 
2.13.6

[Qemu-devel] [PULL 04/37] file-posix: Switch to .bdrv_co_block_status()

2018-03-02 Thread Kevin Wolf

From: Eric Blake 

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the file protocol driver accordingly.

In want_zero mode, we continue to report fine-grained hole
information (the caller wants as much mapping detail as possible);
but when not in that mode, the caller prefers larger *pnum and
merely cares about what offsets are allocated at this layer, rather
than where the holes live.  Since holes still read as zeroes at
this layer (rather than deferring to a backing layer), we can take
the shortcut of skipping lseek(), and merely state that all bytes
are allocated.

We can also drop redundant bounds checks that are already
guaranteed by the block layer.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 64 +-
 1 file changed, 30 insertions(+), 34 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ca49c1a98a..f1591c3849 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2131,25 +2131,24 @@ static int find_allocation(BlockDriverState *bs, off_t 
start,
 }
 
 /*
- * Returns the allocation status of the specified sectors.
+ * Returns the allocation status of the specified offset.
  *
- * If 'sector_num' is beyond the end of the disk image the return value is 0
- * and 'pnum' is set to 0.
+ * The block layer guarantees 'offset' and 'bytes' are within bounds.
  *
- * 'pnum' is set to the number of sectors (including and immediately following
- * the specified sector) that are known to be in the same
+ * 'pnum' is set to the number of bytes (including and immediately following
+ * the specified offset) that are known to be in the same
  * allocated/unallocated state.
  *
- * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
- * beyond the end of the disk image it will be clamped.
+ * 'bytes' is the max value 'pnum' should be set to.
  */
-static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
-int64_t sector_num,
-int nb_sectors, int *pnum,
-BlockDriverState **file)
-{
-off_t start, data = 0, hole = 0;
-int64_t total_size;
+static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
+bool want_zero,
+int64_t offset,
+int64_t bytes, int64_t *pnum,
+int64_t *map,
+BlockDriverState **file)
+{
+off_t data = 0, hole = 0;
 int ret;
 
 ret = fd_open(bs);
@@ -2157,39 +2156,36 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
 return ret;
 }
 
-start = sector_num * BDRV_SECTOR_SIZE;
-total_size = bdrv_getlength(bs);
-if (total_size < 0) {
-return total_size;
-} else if (start >= total_size) {
-*pnum = 0;
-return 0;
-} else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
-nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
+if (!want_zero) {
+*pnum = bytes;
+*map = offset;
+*file = bs;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
 
-ret = find_allocation(bs, start, , );
+ret = find_allocation(bs, offset, , );
 if (ret == -ENXIO) {
 /* Trailing hole */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_ZERO;
 } else if (ret < 0) {
 /* No info available, so pretend there are no holes */
-*pnum = nb_sectors;
+*pnum = bytes;
 ret = BDRV_BLOCK_DATA;
-} else if (data == start) {
-/* On a data extent, compute sectors to the end of the extent,
+} else if (data == offset) {
+/* On a data extent, compute bytes to the end of the extent,
  * possibly including a partial sector at EOF. */
-*pnum = MIN(nb_sectors, DIV_ROUND_UP(hole - start, BDRV_SECTOR_SIZE));
+*pnum = MIN(bytes, hole - offset);
 ret = BDRV_BLOCK_DATA;
 } else {
-/* On a hole, compute sectors to the beginning of the next extent.  */
-assert(hole == start);
-*pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
+/* On a hole, compute bytes to the beginning of the next extent.  */
+assert(hole == offset);
+*pnum = MIN(bytes, data - offset);
 ret = BDRV_BLOCK_ZERO;
 }
+*map = offset;
 *file = bs;
-return ret | BDRV_BLOCK_OFFSET_VALID | start;
+return ret | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static coroutine_fn BlockAIOCB

[Qemu-devel] [PULL 00/37] Block layer patches

2018-03-02 Thread Kevin Wolf

The following changes since commit 86f4c7e05b1c44dbe1b329a51f311f10aef6ff34:

  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180302' 
into staging (2018-03-02 14:37:10 +)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 9d9b4b640f9e583ff4b24dc762630945f3ccc16d:

  Merge remote-tracking branch 'mreitz/tags/pull-block-2018-03-02' into 
queue-block (2018-03-02 18:45:03 +0100)


Block layer patches


Alberto Garcia (3):
  specs/qcow2: Fix documentation of the compressed cluster descriptor
  docs: document how to use the l2-cache-entry-size parameter
  qcow2: Replace align_offset() with ROUND_UP()

Anton Nefedov (2):
  block: fix write with zero flag set and iovector provided
  iotest 033: add misaligned write-zeroes test via truncate

Eric Blake (21):
  block: Add .bdrv_co_block_status() callback
  nvme: Drop pointless .bdrv_co_get_block_status()
  block: Switch passthrough drivers to .bdrv_co_block_status()
  file-posix: Switch to .bdrv_co_block_status()
  gluster: Switch to .bdrv_co_block_status()
  iscsi: Switch cluster_sectors to byte-based
  iscsi: Switch iscsi_allocmap_update() to byte-based
  iscsi: Switch to .bdrv_co_block_status()
  null: Switch to .bdrv_co_block_status()
  parallels: Switch to .bdrv_co_block_status()
  qcow: Switch to .bdrv_co_block_status()
  qcow2: Switch to .bdrv_co_block_status()
  qed: Switch to .bdrv_co_block_status()
  raw: Switch to .bdrv_co_block_status()
  sheepdog: Switch to .bdrv_co_block_status()
  vdi: Avoid bitrot of debugging code
  vdi: Switch to .bdrv_co_block_status()
  vmdk: Switch to .bdrv_co_block_status()
  vpc: Switch to .bdrv_co_block_status()
  vvfat: Switch to .bdrv_co_block_status()
  block: Drop unused .bdrv_co_get_block_status()

Kevin Wolf (2):
  block: test blk_aio_flush() with blk->root == NULL
  Merge remote-tracking branch 'mreitz/tags/pull-block-2018-03-02' into 
queue-block

Max Reitz (4):
  qemu-img: Make resize error message more general
  block/ssh: Pull ssh_grow_file() from ssh_create()
  block/ssh: Make ssh_grow_file() blocking
  block/ssh: Add basic .bdrv_truncate()

Stefan Hajnoczi (6):
  aio: rename aio_context_in_iothread() to in_aio_context_home_thread()
  block: extract AIO_WAIT_WHILE() from BlockDriverState
  block: add BlockBackend->in_flight counter
  Revert "IDE: Do not flush empty CDROM drives"
  block: rename .bdrv_create() to .bdrv_co_create_opts()
  qcow2: make qcow2_co_create2() a coroutine_fn

 docs/interop/qcow2.txt |  16 -
 docs/qcow2-cache.txt   |  46 -
 block/qcow2.h  |   6 --
 include/block/aio-wait.h   | 116 
 include/block/aio.h|   7 +-
 include/block/block.h  |  54 ---
 include/block/block_int.h  |  61 ++---
 block.c|  11 ++-
 block/blkdebug.c   |  20 +++---
 block/block-backend.c  |  60 +++--
 block/commit.c |   2 +-
 block/crypto.c |   8 +--
 block/file-posix.c |  79 +++---
 block/file-win32.c |   5 +-
 block/gluster.c|  83 ---
 block/io.c |  98 +++
 block/iscsi.c  | 164 -
 block/mirror.c |   2 +-
 block/nfs.c|   5 +-
 block/null.c   |  23 ---
 block/nvme.c   |  14 
 block/parallels.c  |  28 +---
 block/qcow.c   |  32 +
 block/qcow2-bitmap.c   |   4 +-
 block/qcow2-cluster.c  |   4 +-
 block/qcow2-refcount.c |   4 +-
 block/qcow2-snapshot.c |  10 +--
 block/qcow2.c  |  60 +
 block/qed.c|  82 ---
 block/raw-format.c |  21 +++---
 block/rbd.c|   6 +-
 block/sheepdog.c   |  36 +-
 block/ssh.c|  66 +++---
 block/throttle.c   |   2 +-
 block/vdi.c|  50 +++---
 block/vhdx.c   |   5 +-
 block/vmdk.c   |  43 +---
 block/vpc.c|  50 +++---
 block/vvfat.c  |  16 ++---
 hw/ide/core.c  |  10 +--
 qemu-img.c |   2 +-
 tests/test-block-backend.c |  82 +++
 util/aio-wait.c|  40 +++
 tests/Makefile.include |   2 +
 tests/qemu-iotests/033 |  29 
 tests/qemu-iotests/033.out |  13 
 util/Makefile.objs |   2 +-
 47 files changed, 973 insertions(+), 606 deletions(-)
 create mode 100644 i

Re: [Qemu-devel] [PULL 12/51] build-sys: compile with -Og or -O1 when --enable-debug

2018-03-02 Thread Peter Maydell

On 16 January 2018 at 14:16, Paolo Bonzini  wrote:
> From: Marc-André Lureau 
>
> When --enable-debug is turned on, configure doesn't set -O level, and
> uses default compiler -O0 level, which is slow.
>
> Instead, use -Og if supported by the compiler (optimize debugging
> experience), or -O1 (keeps code somewhat debuggable and works around
> compiler bugs).

This gives me a noticeably worse debug experience (using -Og),
because gdb shows a lot more "" variables and
function arguments. (I've been mildly irritated by this for
the last few weeks and only just figured out why this was
happening.)

Can we go back to the previous behaviour, please ? I don't
care if the build is slow if I'm debugging, but I really do
care that I don't have my variables and arguments all
optimised away by the compiler so I can't tell what's going on.

thanks
-- PMM

Re: [Qemu-devel] [PULL 0/4] tricore-patches

2018-03-02 Thread Peter Maydell

On 2 March 2018 at 11:57, Bastian Koppelmann
 wrote:
> The following changes since commit 427cbc7e4136a061628cb4315cc8182ea36d772f:
>
>   Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
> (2018-03-01 18:46:41 +)
>
> are available in the Git repository at:
>
>   https://github.com/bkoppelmann/qemu-tricore-upstream.git 
> tags/pull-tricore-2018-03-02
>
> for you to fetch changes up to ce46335c9f31f9eea1736d9c2d3a11d3a8c2cb6b:
>
>   tricore: renamed masking of PIE (2018-03-02 11:46:36 +0100)
>
> 
> tricore patches
>
> 
> David Brenken (4):
>   tricore: added some missing cpu instructions
>   tricore: added CORE_ID
>   tricore: renamed masking of IE
>   tricore: renamed masking of PIE
>
>  target/tricore/cpu.h |  7 +--
>  target/tricore/csfr.def  |  1 +
>  target/tricore/op_helper.c   | 29 +++--
>  target/tricore/translate.c   | 31 +--
>  target/tricore/tricore-opcodes.h |  3 +++
>  5 files changed, 53 insertions(+), 18 deletions(-)

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v3 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-03-02 Thread Michael S. Tsirkin

On Fri, Mar 02, 2018 at 04:47:29PM +0800, Wei Wang wrote:
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index af49e19..16a2aae 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h

...

> +typedef void (QEMUBalloonFreePageStart)(void *opaque);
> +typedef void (QEMUBalloonFreePageStop)(void *opaque);

So I think the rule is that no bitmap sync must happen
between these two, otherwise a hint might arrive and
override the sync output.

Should be documented I think.

-- 
MST

Re: [Qemu-devel] [PATCH v3 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-03-02 Thread Michael S. Tsirkin

On Fri, Mar 02, 2018 at 04:47:29PM +0800, Wei Wang wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq. Callers call the
> free_page_start API to start the reporting, which creates a thread to
> poll for free page hints. The free_page_stop API stops the reporting and
> makes the thread exit.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> CC: Michael S. Tsirkin 
> CC: Dr. David Alan Gilbert 
> CC: Juan Quintela 
> ---
>  balloon.c   |  49 +--
>  hw/virtio/virtio-balloon.c  | 172 
> +---
>  include/hw/virtio/virtio-balloon.h  |  14 +-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h|  15 ++-
>  5 files changed, 228 insertions(+), 29 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index d8dd6fe..b0b0749 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,9 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
> +static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
> +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -64,19 +67,42 @@ static bool have_balloon(Error **errp)
>  return true;
>  }
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> - QEMUBalloonStatus *stat_func, void *opaque)
> +bool balloon_free_page_support(void)
>  {
> -if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> -/* We're already registered one balloon handler.  How many can
> - * a guest really have?
> - */
> -return -1;
> +return balloon_free_page_support_fn &&
> +   balloon_free_page_support_fn(balloon_opaque);
> +}
> +
> +void balloon_free_page_start(void)
> +{
> +balloon_free_page_start_fn(balloon_opaque);
> +}
> +
> +void balloon_free_page_stop(void)
> +{
> +balloon_free_page_stop_fn(balloon_opaque);
> +}
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +  QEMUBalloonStatus *stat_fn,
> +  QEMUBalloonFreePageSupport 
> *free_page_support_fn,
> +  QEMUBalloonFreePageStart *free_page_start_fn,
> +  QEMUBalloonFreePageStop *free_page_stop_fn,
> +  void *opaque)
> +{
> +if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn 
> ||
> +balloon_free_page_start_fn || balloon_free_page_stop_fn ||
> +balloon_opaque) {
> +/* We already registered one balloon handler. */
> +return;
>  }
> -balloon_event_fn = event_func;
> -balloon_stat_fn = stat_func;
> +
> +balloon_event_fn = event_fn;
> +balloon_stat_fn = stat_fn;
> +balloon_free_page_support_fn = free_page_support_fn;
> +balloon_free_page_start_fn = free_page_start_fn;
> +balloon_free_page_stop_fn = free_page_stop_fn;
>  balloon_opaque = opaque;
> -return 0;
>  }
>  
>  void qemu_remove_balloon_handler(void *opaque)
> @@ -86,6 +112,9 @@ void qemu_remove_balloon_handler(void *opaque)
>  }
>  balloon_event_fn = NULL;
>  balloon_stat_fn = NULL;
> +balloon_free_page_support_fn = NULL;
> +balloon_free_page_start_fn = NULL;
> +balloon_free_page_stop_fn = NULL;
>  balloon_opaque = NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 4822449..4607879 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -31,6 +31,7 @@
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "migration/misc.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>  
> @@ -308,6 +309,100 @@ out:
>  }
>  }
>  
> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +VirtQueueElement *elem;
> +VirtIOBalloon *dev = opaque;
> +VirtQueue *vq = dev->free_page_vq;
> +uint32_t id;
> +size_t size;
> +
> +/*
> + * Poll the vq till the status changed to STOP. This happens when
> + * the guest finishes reporting hints or the migration thread actively
> + * stops the reporting.
> + */
> +while (dev->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
> +elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +if (!elem) {
> +continue;
> +}
> +
> +if (elem->out_num) {
> +size = iov_to_buf(elem->out_sg, elem->out_num, 0, , 
> sizeof(id));
> +virtqueue_push(vq, elem, size);
> +g_free(elem);
> +if (unlikely(size != sizeof(id))) {
>

Re: [Qemu-devel] [PULL v3 00/30] QAPI patches for 2018-03-01

2018-03-02 Thread Eric Blake


On 03/02/2018 10:55 AM, Peter Maydell wrote:

On 2 March 2018 at 15:31, Eric Blake  wrote:

The following changes since commit 2e7b766594e17f786a6b2e5be690bc5b43ce6036:

   Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-03-01' into 
staging (2018-03-02 12:39:13 +)

are available in the Git repository at:

   git://repo.or.cz/qemu/ericb.git tags/pull-qapi-2018-03-01-v3

for you to fetch changes up to cfe6a7fc6da20349ef62966c437f18ed6c49fcd5:

   qapi: Don't create useless directory qapi-generated (2018-03-02 09:01:40 
-0600)

v3: (attempt to) fix warnings about empty .o files on OSX toolchain [Peter]
(sending just the changed patch)


qapi patches for 2018-03-01

- Markus Armbruster: Modularize generated QAPI code



As I suspected, this isn't sufficient to silence the warnings.


So 'static char' wasn't enough to create an export table, and I have to 
actually export something (while still ensuring it does not collide). 
Okay, I'll work up a v4 along those lines.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [Bug 1739371] Re: qemu-system-arm snapshot loadvm core dumped

2018-03-02 Thread Peter Maydell

*** This bug is a duplicate of bug 1739378 ***
https://bugs.launchpad.net/bugs/1739378

I'm going to close this bug as a duplicate of #1739378 (as noted in my
earlier comment).


** This bug has been marked a duplicate of bug 1739378
   migration state save/load of sdcard device is broken

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1739371

Title:
  qemu-system-arm snapshot loadvm core dumped

Status in QEMU:
  New

Bug description:
  Ubuntu Qemu is crashing trying to restore saved snapshot in qemu-system-arm.
  I've tried different guests kernels, but I wasn't lucky with any of them.

  The guest vm boots and I can use it normally. The issue is when I save
  the snapshot using "savevm Base0", "quit" and then I restore that
  snapshot using "-loadvm Base0" from the cmd line.

  The only difference I've noticed is tweaking the guest memory:
  * With -m 512 or 1024 it crashes as you can see below.
  * With -m 2048 it doesn't crash, it restores the vm and I can see the screen 
as it was, but the OS is halted. And it's not just the keyboard. I've tried 
saving the snapshot while it's booting with lot of lines being printed on 
screen and after restoring it, the OS is frozen.

  I also tried limiting the guest kernel memory using the mem parameter
  (mem=2048M) and disabling the kernel address space randomization
  (nokaslr) with the same results.

  OS: Ubuntu 16.04.3 LTS (xenial)

  $ qemu-system-arm --version
  QEMU emulator version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.16), Copyright (c) 
2003-2008 Fabrice Bellard

  $ qemu-system-arm -kernel kernel/vmlinuz-4.10.0-42-generic -initrd 
kernel/initrd.img-4.10.0-42-generic -M vexpress-a15 -m 512 -append 
'root=/dev/mmcblk0 rootwait console=tty0' -sd vexpress-4G.qcow2 -dtb 
device-tree/vexpress-v2p-ca15-tc1.dtb  -loadvm Base0
  pulseaudio: set_sink_input_volume() failed
  pulseaudio: Reason: Invalid argument
  pulseaudio: set_sink_input_mute() failed
  pulseaudio: Reason: Invalid argument
  qemu: fatal: Trying to execute code outside RAM or ROM at 0xc0321568

  R00=0001 R01= R02= R03=c0321560
  R04=c150 R05=c150529c R06=c1505234 R07=c14384d0
  R08= R09= R10=c1501f50 R11=c1501f3c
  R12=c1501f40 R13=c1501f30 R14=c030a184 R15=c0321568
  PSR=60070093 -ZC- A S svc32
  s00=6374652f s01=636f6c2f d00=636f6c2f6374652f
  s02=7273752f s03=6962732f d01=6962732f7273752f
  s04=6e612f6e s05=6f726361 d02=6f7263616e612f6e
  s06=7c7c206e s07=63202820 d03=632028207c7c206e
  s08=202f2064 s09=72202626 d04=72202626202f2064
  s10=702d6e75 s11=73747261 d05=73747261702d6e75
  s12=722d2d20 s13=726f7065 d06=726f7065722d2d20
  s14=652f2074 s15=632f6374 d07=632f6374652f2074
  s16= s17= d08=
  s18= s19= d09=
  s20= s21= d10=
  s22= s23= d11=
  s24= s25= d12=
  s26= s27= d13=
  s28= s29= d14=
  s30= s31= d15=
  s32= s33= d16=
  s34= s35= d17=
  s36= s37= d18=
  s38= s39= d19=
  s40= s41= d20=
  s42= s43= d21=
  s44= s45= d22=
  s46= s47= d23=
  s48= s49= d24=
  s50= s51= d25=
  s52= s53= d26=
  s54= s55= d27=
  s56= s57= d28=
  s58= s59= d29=
  s60= s61= d30=
  s62= s63= d31=
  FPSCR: 
  Aborted (core dumped)

  As I said above, the same happens when -m 1024 is used.

  I have a different issue when I use the qemu git master version, but
  I'm submiting a different ticket for that.

  Cheers,
  Gus

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1739371/+subscriptions

Re: [Qemu-devel] [PATCH 0/3] vfio/pci: ioeventfd support

2018-03-02 Thread Alex Williamson

On Fri, 2 Mar 2018 07:08:51 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson
> > Sent: Thursday, March 1, 2018 4:15 AM
> > 
> > A vfio ioeventfd will perform the pre-specified device write on
> > triggering of an eventfd.  When coupled with KVM ioeventfds, this
> > feature allows a VM to trap a device page for virtualization, while
> > also registering targeted ioeventfds to maintain performance of high
> > frequency register writes within the trapped range.  Much like the
> > existing interrupt eventfd/irqfd coupling, such writes can be handled
> > entirely in the host kernel.
> > 
> > The new VFIO device ioctl may be supported by any vfio bus driver,
> > including mdev drivers, but the implementation here only enables
> > vfio-pci.  This is intended as an acceleration path, bus drivers
> > may choose which regions to support and userspace should always
> > intend to fall back to non-accelerated handling when unavailable.
> >   
> 
> it's a nice feature! A curious question. Is it possible for mdev driver
> to directly create ioeventfd on specified offset? Currently ioeventfd
> requires quirks in Qemu, which must know the device detail to
> create ioeventfd and then connect vfio and kvm together. However
> mdev instance is more software defined thus I'm not sure whether 
> asking Qemu to catch up quirk with underlying software logic could
> be overwhelmed. Also in case of vendor driver emulating mdev
> with same DID/VID as a real device, it might be difficult for Qemu
> to figure out whether a vfio device is a real one or mdev one to
> apply a mdev specific quirk. On the other hand, since vendor
> driver knows all the logic constructing mdev, it would be more
> convenient allowing vendor driver to directly create/destroy
> ioeventfd on its demand?

Are file descriptors the right interface between kernel components if
userspace is not involved in connecting them?  Seems like not.  vfio is
designed to be independent of KVM with the necessary interactions
between them orchestrated by userspace.  KVMGT is about the only
exception to that and TBH I'm not thrilled about extending that
further.  Thanks,

Alex

[Qemu-devel] [PATCH v2 2/2] hw/arm: Set the core count for Xilinx's ZynqMP

2018-03-02 Thread Alistair Francis

Set the ARM CPU core count property for the A53's attached to the Xilnx
ZynqMP machine.

Signed-off-by: Alistair Francis 
Reviewed-by: Peter Maydell 
---

 hw/arm/xlnx-zynqmp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 69227fd4c9..465796e97c 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -282,6 +282,8 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
  s->virt, "has_el2", NULL);
 object_property_set_int(OBJECT(>apu_cpu[i]), GIC_BASE_ADDR,
 "reset-cbar", _abort);
+object_property_set_int(OBJECT(>apu_cpu[i]), num_apus,
+"core-count", _abort);
 object_property_set_bool(OBJECT(>apu_cpu[i]), true, "realized",
  );
 if (err) {
-- 
2.14.1

Re: [Qemu-devel] [PATCH v3] vmgenid: allow VM Generation ID modification via QMP/HMP

2018-03-02 Thread Benjamin Warren via Qemu-devel

On Fri, Mar 2, 2018 at 2:57 AM, Daniel P. Berrangé 
wrote:

> On Fri, Mar 02, 2018 at 10:37:20AM +0200, Or Idgar wrote:
> > From: Or Idgar 
> >
> > This patch allow changing the Virtual Machine Generation
> > ID through QMP/HMP while the vm guest is running.
> > the spec (http://go.microsoft.com/fwlink/?LinkId=260709)
> > mentions that "when the generation ID changes, execute an
> > ACPI Notify operation on the generation ID device".
> > To test it we need the ability to set the generation ID
> > online. QMP/HMP allows that.
>
> If we want to test vmgenid, why don't we simply trigger one of
> the mgmt actions that are required to change vmgenid. eg do a
> savevm, followed by a loadvm to restore from snapshot. This would
> be better testing of the real world usage of this feature.
>
> I agree.  Verifying functionality in a Windows guest is tricky, but in
Linux you can build and load this (old and badly written) device driver:

https://github.com/ben-skyportsystems/vmgenid-test

You should see the 'notices' number increment after restoring a snapshot.

Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/
> dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/
> dberrange :|
>

regards,
Ben

Re: [Qemu-devel] [PATCH 2/4] rocker: drop local duplex definitions

2018-03-02 Thread Michael S. Tsirkin

On Thu, Mar 01, 2018 at 10:46:34PM -0500, Jason Baron wrote:
> Make use of duplex definitions from net/eth.h.
> 
> Signed-off-by: Jason Baron 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Jiri Pirko 
> Cc: virtio-...@lists.oasis-open.org
> ---
>  hw/net/rocker/rocker_fp.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/hw/net/rocker/rocker_fp.c b/hw/net/rocker/rocker_fp.c
> index 4b3c984..13a14a0 100644
> --- a/hw/net/rocker/rocker_fp.c
> +++ b/hw/net/rocker/rocker_fp.c
> @@ -16,17 +16,13 @@
>  
>  #include "qemu/osdep.h"
>  #include "net/clients.h"
> +#include "net/eth.h"
>  
>  #include "rocker.h"
>  #include "rocker_hw.h"
>  #include "rocker_fp.h"
>  #include "rocker_world.h"
>  
> -enum duplex {
> -DUPLEX_HALF = 0,
> -DUPLEX_FULL
> -};
> -

This is hardware emulation, so I actually suspect it
should be renamed ROCKER_FP_HALF/ROCKER_FP_FULL.

>  struct fp_port {
>  Rocker *r;
>  World *world;
> -- 
> 2.7.4

[Qemu-devel] [PATCH v2 0/2] Add a property to set the ARM CPU core count

2018-03-02 Thread Alistair Francis


Add an ARM CPU property which allows us to set the ARM CPU core count.

V2:
 - Fix commit message and title.
 - Move the core_count default setting logic to the arm_cpu_realize()
   function.


Alistair Francis (2):
  target/arm: Add a core count property
  hw/arm: Set the core count for Xilinx's ZynqMP

 target/arm/cpu.h | 5 +
 hw/arm/xlnx-zynqmp.c | 2 ++
 target/arm/cpu.c | 6 ++
 target/arm/cpu64.c   | 6 --
 4 files changed, 17 insertions(+), 2 deletions(-)

-- 
2.14.1

[Qemu-devel] [PATCH v2 1/2] target/arm: Add a core count property

2018-03-02 Thread Alistair Francis

The cortex A53 TRM specifies that bits 24 and 25 of the L2CTLR register
specify the number of cores in the processor, not the total number of
cores in the sytem. To report this correctly on machines with multiple
CPU clusters (ARM's big.LITTLE or Xilinx's ZynqMP) we need to allow
the machine to overwrite this value. To do this let's add an optional
property.

Signed-off-by: Alistair Francis 
---
V2:
 - Fix commit message and title.
 - Move the core_count default setting logic to the arm_cpu_realize()
   function.

 target/arm/cpu.h   | 5 +
 target/arm/cpu.c   | 6 ++
 target/arm/cpu64.c | 6 --
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8dd6b788df..3fa8fdad21 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -745,6 +745,11 @@ struct ARMCPU {
 /* Uniprocessor system with MP extensions */
 bool mp_is_up;
 
+/* Specify the number of cores in this CPU cluster. Used for the L2CTLR
+ * register.
+ */
+int32_t core_count;
+
 /* The instance init functions for implementation-specific subclasses
  * set these fields to specify the implementation-dependent values of
  * various constant registers and reset values of non-constant
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 6b77aaa445..83590decde 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -877,6 +877,11 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 cpu->has_mpu = false;
 }
 
+/* No core_count specified, default to smp_cpus. */
+if (cpu->core_count == -1) {
+cpu->core_count = smp_cpus;
+}
+
 if (arm_feature(env, ARM_FEATURE_PMSA) &&
 arm_feature(env, ARM_FEATURE_V7)) {
 uint32_t nr = cpu->pmsav7_dregion;
@@ -1765,6 +1770,7 @@ static Property arm_cpu_properties[] = {
 DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
 mp_affinity, ARM64_AFFINITY_INVALID),
 DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
+DEFINE_PROP_INT32("core-count", ARMCPU, core_count, -1),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 4228713b19..dd9ba973f7 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -42,8 +42,10 @@ static inline void unset_feature(CPUARMState *env, int 
feature)
 #ifndef CONFIG_USER_ONLY
 static uint64_t a57_a53_l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
-/* Number of processors is in [25:24]; otherwise we RAZ */
-return (smp_cpus - 1) << 24;
+ARMCPU *cpu = arm_env_get_cpu(env);
+
+/* Number of cores is in [25:24]; otherwise we RAZ */
+return (cpu->core_count - 1) << 24;
 }
 #endif
 
-- 
2.14.1

Re: [Qemu-devel] [PATCH 1/4] eth: add speed and duplex definitions

2018-03-02 Thread Michael S. Tsirkin

On Thu, Mar 01, 2018 at 10:46:33PM -0500, Jason Baron wrote:
> Pull in definitions for SPEED_UNKNOWN, DUPLEX_UNKNOWN, DUPLEX_HALF,
> and DUPLEX_FULL.
> 
> Signed-off-by: Jason Baron 
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: virtio-...@lists.oasis-open.org
> ---
>  include/net/eth.h | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/net/eth.h b/include/net/eth.h
> index 09054a5..9843678 100644
> --- a/include/net/eth.h
> +++ b/include/net/eth.h
> @@ -417,4 +417,11 @@ bool
>  eth_parse_ipv6_hdr(const struct iovec *pkt, int pkt_frags,
> size_t ip6hdr_off, eth_ip6_hdr_info *info);
>  
> +/* ethtool defines - from linux/ethtool.h */
> +#define SPEED_UNKNOWN   -1
> +
> +#define DUPLEX_HALF 0x00
> +#define DUPLEX_FULL 0x01
> +#define DUPLEX_UNKNOWN  0xff
> +
>  #endif

While that's not a lot, I think we should import linux/ethtool.h into
include/standard-headers/linux/ using scripts/update-linux-headers.sh

> -- 
> 2.7.4

Re: [Qemu-devel] [Qemu-ppc] [PULL 00/24] ppc-for-2.12 queue 20180302

2018-03-02 Thread BALATON Zoltan


On Fri, 2 Mar 2018, Peter Maydell wrote:

On 2 March 2018 at 06:03, David Gibson <da...@gibson.dropbear.id.au> wrote:

The following changes since commit 0dc8ae5e8e693737dfe65ba02d0c6eccb58a9c67:

  Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20180301-v2' into 
staging (2018-03-01 17:08:16 +)

are available in the Git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-2.12-20180302

for you to fetch changes up to 57ae75b2e401f1d04f37a8cd26212eb3134c51a6:

  hw/ppc/spapr,e500: Use new property "stdout-path" for boot console 
(2018-03-02 12:24:44 +1100)


ppc patch queue 2018-03-02

Here's the next batch of accumulated spapr and ppc patches.
Highlights are:
* New Sam460ex machine type
* Yet more fixes related to vcpu id allocation for spapr
* Numerous macio cleanupsr
* Some enhancements to the Spectre/Meltdown fixes for pseries,
  allowing use of a better mitigation for indirect branch based
  exploits
* New pseries machine types with Spectre/Meltdown mitigations
  enabled (stop gap until libvirt and management understands the
  machine options)
* A handful of other fixes



Hi. This generates a compile error from some compilers in my test set
(I think just the older gccs):

/home/petmay01/linaro/qemu-for-merges/hw/ppc/ppc440_uc.c: In function
‘ppc460ex_pcie_realize’:
/home/petmay01/linaro/qemu-for-merges/hw/ppc/ppc440_uc.c:1054:5:
error: ‘id’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
snprintf(buf, sizeof(buf), "pcie%d-io", id);
^
cc1: all warnings being treated as errors

Looks like a valid complaint to me -- the realize function
should check that dcrn_base was set to a valid value, fail
realize if it wasn't, and have a 'default:' case in the
switch with g_assert_not_reached().


I've sent an updated patch (v3) that should fix this.

Thank you,
BALATON Zoltan

[Qemu-devel] [PATCH v3] ppc4xx: Add device models found in PPC440 core SoCs

2018-03-02 Thread BALATON Zoltan

These devices are found in newer SoCs based on 440 core e.g. the 460EX
(http://www.embeddeddeveloper.com/assets/processors/amcc/datasheets/
PP460EX_DS2063.pdf)

Signed-off-by: BALATON Zoltan 
Signed-off-by: David Gibson 
---
v3: Fixed warning from older gcc about variable used uninitialised

 hw/ppc/ppc440.h|   26 +
 hw/ppc/ppc440_uc.c | 1162 
 include/hw/pci/pcie_host.h |2 +-
 3 files changed, 1189 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/ppc440.h
 create mode 100644 hw/ppc/ppc440_uc.c

diff --git a/hw/ppc/ppc440.h b/hw/ppc/ppc440.h
new file mode 100644
index 000..ad27db1
--- /dev/null
+++ b/hw/ppc/ppc440.h
@@ -0,0 +1,26 @@
+/*
+ * QEMU PowerPC 440 shared definitions
+ *
+ * Copyright (c) 2012 FranÃ§ois Revol
+ * Copyright (c) 2016-2018 BALATON Zoltan
+ *
+ * This work is licensed under the GNU GPL license version 2 or later.
+ *
+ */
+
+#ifndef PPC440_H
+#define PPC440_H
+
+#include "hw/ppc/ppc.h"
+
+void ppc4xx_l2sram_init(CPUPPCState *env);
+void ppc4xx_cpr_init(CPUPPCState *env);
+void ppc4xx_sdr_init(CPUPPCState *env);
+void ppc440_sdram_init(CPUPPCState *env, int nbanks,
+   MemoryRegion *ram_memories,
+   hwaddr *ram_bases, hwaddr *ram_sizes,
+   int do_init);
+void ppc4xx_ahb_init(CPUPPCState *env);
+void ppc460ex_pcie_init(CPUPPCState *env);
+
+#endif /* PPC440_H */
diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
new file mode 100644
index 000..976ab2b
--- /dev/null
+++ b/hw/ppc/ppc440_uc.c
@@ -0,0 +1,1162 @@
+/*
+ * QEMU PowerPC 440 embedded processors emulation
+ *
+ * Copyright (c) 2012 FranÃ§ois Revol
+ * Copyright (c) 2016-2018 BALATON Zoltan
+ *
+ * This work is licensed under the GNU GPL license version 2 or later.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "cpu.h"
+#include "hw/hw.h"
+#include "exec/address-spaces.h"
+#include "exec/memory.h"
+#include "hw/ppc/ppc.h"
+#include "hw/pci/pci.h"
+#include "sysemu/block-backend.h"
+#include "hw/ppc/ppc440.h"
+
+/*/
+/* L2 Cache as SRAM */
+/* FIXME:fix names */
+enum {
+DCR_L2CACHE_BASE  = 0x30,
+DCR_L2CACHE_CFG   = DCR_L2CACHE_BASE,
+DCR_L2CACHE_CMD,
+DCR_L2CACHE_ADDR,
+DCR_L2CACHE_DATA,
+DCR_L2CACHE_STAT,
+DCR_L2CACHE_CVER,
+DCR_L2CACHE_SNP0,
+DCR_L2CACHE_SNP1,
+DCR_L2CACHE_END   = DCR_L2CACHE_SNP1,
+};
+
+/* base is 460ex-specific, cf. U-Boot, ppc4xx-isram.h */
+enum {
+DCR_ISRAM0_BASE   = 0x20,
+DCR_ISRAM0_SB0CR  = DCR_ISRAM0_BASE,
+DCR_ISRAM0_SB1CR,
+DCR_ISRAM0_SB2CR,
+DCR_ISRAM0_SB3CR,
+DCR_ISRAM0_BEAR,
+DCR_ISRAM0_BESR0,
+DCR_ISRAM0_BESR1,
+DCR_ISRAM0_PMEG,
+DCR_ISRAM0_CID,
+DCR_ISRAM0_REVID,
+DCR_ISRAM0_DPC,
+DCR_ISRAM0_END= DCR_ISRAM0_DPC
+};
+
+enum {
+DCR_ISRAM1_BASE   = 0xb0,
+DCR_ISRAM1_SB0CR  = DCR_ISRAM1_BASE,
+/* single bank */
+DCR_ISRAM1_BEAR   = DCR_ISRAM1_BASE + 0x04,
+DCR_ISRAM1_BESR0,
+DCR_ISRAM1_BESR1,
+DCR_ISRAM1_PMEG,
+DCR_ISRAM1_CID,
+DCR_ISRAM1_REVID,
+DCR_ISRAM1_DPC,
+DCR_ISRAM1_END= DCR_ISRAM1_DPC
+};
+
+typedef struct ppc4xx_l2sram_t {
+MemoryRegion bank[4];
+uint32_t l2cache[8];
+uint32_t isram0[11];
+} ppc4xx_l2sram_t;
+
+#ifdef MAP_L2SRAM
+static void l2sram_update_mappings(ppc4xx_l2sram_t *l2sram,
+   uint32_t isarc, uint32_t isacntl,
+   uint32_t dsarc, uint32_t dsacntl)
+{
+if (l2sram->isarc != isarc ||
+(l2sram->isacntl & 0x8000) != (isacntl & 0x8000)) {
+if (l2sram->isacntl & 0x8000) {
+/* Unmap previously assigned memory region */
+memory_region_del_subregion(get_system_memory(),
+>isarc_ram);
+}
+if (isacntl & 0x8000) {
+/* Map new instruction memory region */
+memory_region_add_subregion(get_system_memory(), isarc,
+>isarc_ram);
+}
+}
+if (l2sram->dsarc != dsarc ||
+(l2sram->dsacntl & 0x8000) != (dsacntl & 0x8000)) {
+if (l2sram->dsacntl & 0x8000) {
+/* Beware not to unmap the region we just mapped */
+if (!(isacntl & 0x8000) || l2sram->dsarc != isarc) {
+/* Unmap previously assigned memory region */
+memory_region_del_subregion(get_system_memory(),
+>dsarc_ram);
+}
+}
+if (dsacntl & 0x8000) {
+/* Beware not to remap the region we just mapped */
+if (!(isacntl & 0x8000) || dsarc != isarc) {
+/* Map new

Re: [Qemu-devel] [PATCH v1 1/2] target/arm: Add a cluster size property

2018-03-02 Thread Peter Maydell

On 2 March 2018 at 17:35, Alistair Francis  wrote:
> Ok will fix. I'll send a V2 out straight away, as today is my last day
> and it'd be great if I could get this done.

Sure -- if there are any further nits in v2 I'll just fix them
up when I put it into target-arm.next.

-- PMM

[Qemu-devel] [PATCH 0/1] FOLL_NOWAIT and get_user_pages_unlocked

2018-03-02 Thread Andrea Arcangeli

Hello,

KVM is hanging on postcopy live migration.

David tracked it down to commit
ce53053ce378c21e7ffc45241fd67d6ee79daa2b and the problem is pretty
obvious then.

Either we teach get_user_pages_locked/unlocked to handle FOLL_NOWAIT
(so faultin_nopage works right even when the nonblocking pointer is
not NULL) or we need to revert part of commit
ce53053ce378c21e7ffc45241fd67d6ee79daa2b and keep using FOLL_NOWAIT
only as parameter to get_user_pages (which won't ever set nonblocking
pointer to non-NULL). I suppose the former approach is preferred to be
more robust.

Thanks,
Andrea

Andrea Arcangeli (1):
  mm: gup: teach get_user_pages_unlocked to handle FOLL_NOWAIT

 mm/gup.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

[Qemu-devel] [PATCH 1/1] mm: gup: teach get_user_pages_unlocked to handle FOLL_NOWAIT

2018-03-02 Thread Andrea Arcangeli

KVM is hanging during postcopy live migration with userfaultfd because
get_user_pages_unlocked is not capable to handle FOLL_NOWAIT.

Earlier FOLL_NOWAIT was only ever passed to get_user_pages.

Specifically faultin_page (the callee of get_user_pages_unlocked
caller) doesn't know that if FAULT_FLAG_RETRY_NOWAIT was set in the
page fault flags, when VM_FAULT_RETRY is returned, the mmap_sem wasn't
actually released (even if nonblocking is not NULL). So it sets
*nonblocking to zero and the caller won't release the mmap_sem
thinking it was already released, but it wasn't because of
FOLL_NOWAIT.

Reported-by: Dr. David Alan Gilbert 
Tested-by: Dr. David Alan Gilbert 
Signed-off-by: Andrea Arcangeli 
---
 mm/gup.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 1b46e6e74881..6afae32571ca 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -516,7 +516,7 @@ static int faultin_page(struct task_struct *tsk, struct 
vm_area_struct *vma,
}
 
if (ret & VM_FAULT_RETRY) {
-   if (nonblocking)
+   if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
*nonblocking = 0;
return -EBUSY;
}
@@ -890,7 +890,10 @@ static __always_inline long __get_user_pages_locked(struct 
task_struct *tsk,
break;
}
if (*locked) {
-   /* VM_FAULT_RETRY didn't trigger */
+   /*
+* VM_FAULT_RETRY didn't trigger or it was a
+* FOLL_NOWAIT.
+*/
if (!pages_done)
pages_done = ret;
break;

Re: [Qemu-devel] [PATCH v1 1/2] target/arm: Add a cluster size property

2018-03-02 Thread Alistair Francis

On Fri, Mar 2, 2018 at 9:31 AM, Peter Maydell  wrote:
> On 2 March 2018 at 17:06, Alistair Francis  
> wrote:
>
> Subject should say "core count" rather than "cluster size" ?

Yes, that was left over. I'll fix.

>
>> The cortex A53 TRM specifices that bits 24 and 25 of the L2CTLR register
>
> "specifies"
>
>> specify the number of cores present and not the number of processors. To
>
> "the number of cores in the processor, not the total number of cores
> in the system".

Fixed these.

>
>> report this correctly on machines with multiple CPU clusters (ARM's
>> big.LITTLE or Xilinx's ZynqMP) we need to allow the machine to overwrite
>> this value. To do this let's add an optional property.
>>
>> Signed-off-by: Alistair Francis 
>> ---
>>
>>  target/arm/cpu.h   |  5 +
>>  target/arm/cpu.c   |  1 +
>>  target/arm/cpu64.c | 11 +--
>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 8dd6b788df..3fa8fdad21 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -745,6 +745,11 @@ struct ARMCPU {
>>  /* Uniprocessor system with MP extensions */
>>  bool mp_is_up;
>>
>> +/* Specify the number of cores in this CPU cluster. Used for the L2CTLR
>> + * register.
>> + */
>> +int32_t core_count;
>> +
>>  /* The instance init functions for implementation-specific subclasses
>>   * set these fields to specify the implementation-dependent values of
>>   * various constant registers and reset values of non-constant
>> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
>> index 6b77aaa445..7a17ba1418 100644
>> --- a/target/arm/cpu.c
>> +++ b/target/arm/cpu.c
>> @@ -1765,6 +1765,7 @@ static Property arm_cpu_properties[] = {
>>  DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
>>  mp_affinity, ARM64_AFFINITY_INVALID),
>>  DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
>> +DEFINE_PROP_INT32("core-count", ARMCPU, core_count, -1),
>>  DEFINE_PROP_END_OF_LIST()
>>  };
>>
>> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
>> index 4228713b19..5e5ae44aeb 100644
>> --- a/target/arm/cpu64.c
>> +++ b/target/arm/cpu64.c
>> @@ -42,8 +42,15 @@ static inline void unset_feature(CPUARMState *env, int 
>> feature)
>>  #ifndef CONFIG_USER_ONLY
>>  static uint64_t a57_a53_l2ctlr_read(CPUARMState *env, const ARMCPRegInfo 
>> *ri)
>>  {
>> -/* Number of processors is in [25:24]; otherwise we RAZ */
>> -return (smp_cpus - 1) << 24;
>> +ARMCPU *cpu = arm_env_get_cpu(env);
>> +
>> +/* Number of cores is in [25:24]; otherwise we RAZ */
>> +if (cpu->core_count == -1) {
>> +/* No core_count specified, default to smp_cpus. */
>> +return (smp_cpus - 1) << 24;
>> +} else {
>> +return (cpu->core_count - 1) << 24;
>> +}
>>  }
>>  #endif
>
> I think having arm_cpu_realizefn do
> if (cpu->core_count == -1) {
> cpu->core_count = smp_cpus;
> }
>
> would be neater than doing that check when the register is read.

Ok will fix. I'll send a V2 out straight away, as today is my last day
and it'd be great if I could get this done.

Alistair

>
> thanks
> -- PMM

Re: [Qemu-devel] [PATCH v1 2/2] hw/arm: Set the core count for Xilinx's ZynqMP

2018-03-02 Thread Peter Maydell

On 2 March 2018 at 17:06, Alistair Francis  wrote:
> Set the ARM CPU core count property for the A53's attached to the Xilnx
> ZynqMP machine.
>
> Signed-off-by: Alistair Francis 
> ---
>
>  hw/arm/xlnx-zynqmp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
> index 69227fd4c9..465796e97c 100644
> --- a/hw/arm/xlnx-zynqmp.c
> +++ b/hw/arm/xlnx-zynqmp.c
> @@ -282,6 +282,8 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
> **errp)
>   s->virt, "has_el2", NULL);
>  object_property_set_int(OBJECT(>apu_cpu[i]), GIC_BASE_ADDR,
>  "reset-cbar", _abort);
> +object_property_set_int(OBJECT(>apu_cpu[i]), num_apus,
> +"core-count", _abort);
>  object_property_set_bool(OBJECT(>apu_cpu[i]), true, "realized",
>   );
>  if (err) {

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v1 1/2] target/arm: Add a cluster size property

2018-03-02 Thread Peter Maydell

On 2 March 2018 at 17:06, Alistair Francis  wrote:

Subject should say "core count" rather than "cluster size" ?

> The cortex A53 TRM specifices that bits 24 and 25 of the L2CTLR register

"specifies"

> specify the number of cores present and not the number of processors. To

"the number of cores in the processor, not the total number of cores
in the system".

> report this correctly on machines with multiple CPU clusters (ARM's
> big.LITTLE or Xilinx's ZynqMP) we need to allow the machine to overwrite
> this value. To do this let's add an optional property.
>
> Signed-off-by: Alistair Francis 
> ---
>
>  target/arm/cpu.h   |  5 +
>  target/arm/cpu.c   |  1 +
>  target/arm/cpu64.c | 11 +--
>  3 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 8dd6b788df..3fa8fdad21 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -745,6 +745,11 @@ struct ARMCPU {
>  /* Uniprocessor system with MP extensions */
>  bool mp_is_up;
>
> +/* Specify the number of cores in this CPU cluster. Used for the L2CTLR
> + * register.
> + */
> +int32_t core_count;
> +
>  /* The instance init functions for implementation-specific subclasses
>   * set these fields to specify the implementation-dependent values of
>   * various constant registers and reset values of non-constant
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 6b77aaa445..7a17ba1418 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1765,6 +1765,7 @@ static Property arm_cpu_properties[] = {
>  DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
>  mp_affinity, ARM64_AFFINITY_INVALID),
>  DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> +DEFINE_PROP_INT32("core-count", ARMCPU, core_count, -1),
>  DEFINE_PROP_END_OF_LIST()
>  };
>
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 4228713b19..5e5ae44aeb 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -42,8 +42,15 @@ static inline void unset_feature(CPUARMState *env, int 
> feature)
>  #ifndef CONFIG_USER_ONLY
>  static uint64_t a57_a53_l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
>  {
> -/* Number of processors is in [25:24]; otherwise we RAZ */
> -return (smp_cpus - 1) << 24;
> +ARMCPU *cpu = arm_env_get_cpu(env);
> +
> +/* Number of cores is in [25:24]; otherwise we RAZ */
> +if (cpu->core_count == -1) {
> +/* No core_count specified, default to smp_cpus. */
> +return (smp_cpus - 1) << 24;
> +} else {
> +return (cpu->core_count - 1) << 24;
> +}
>  }
>  #endif

I think having arm_cpu_realizefn do
if (cpu->core_count == -1) {
cpu->core_count = smp_cpus;
}

would be neater than doing that check when the register is read.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 1/2] net: add e1000e model to the "simple" -net/-nic options

2018-03-02 Thread Paolo Bonzini

On 02/03/2018 18:19, Thomas Huth wrote:
> On 02.03.2018 16:51, Paolo Bonzini wrote:
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  hw/pci/pci.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index e006b6ac71..af3c85a46f 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -1822,6 +1822,7 @@ static const char * const pci_nic_models[] = {
>>  "i82559er",
>>  "rtl8139",
>>  "e1000",
>> +"e1000e",
>>  "pcnet",
>>  "virtio",
>>  "sungem",
>> @@ -1835,6 +1836,7 @@ static const char * const pci_nic_names[] = {
>>  "i82559er",
>>  "rtl8139",
>>  "e1000",
>> +"e1000e",
>>  "pcnet",
>>  "virtio-net-pci",
>>  "sungem",
> 
> I think it would be better and more consisten to finally fix this mess
> and automatically allow all devices in the category
> DEVICE_CATEGORY_NETWORK that are derived from TYPE_PCI_DEVICE, instead
> of manually maintaining this list here...?

Yeah, that would make sense too (virtio would be special cased).  Thanks!

Paolo

Re: [Qemu-devel] [PATCH 2/2] q35: change default NIC to e1000e

2018-03-02 Thread Paolo Bonzini

On 02/03/2018 18:24, Thomas Huth wrote:
> On 02.03.2018 16:51, Paolo Bonzini wrote:
>> The e1000 NIC is getting old and is not a very good default for a
>> PCIe machine type.  Change it to e1000e, which should be supported
>> by a good number of guests.
> 
> Basically a good idea, but you can only do that for new machine types
> (pc-q35-2.12 and later), otherwise migration from an older version will
> fail if the user tries to migrate with the default model.

Yeah, I was just throwing out the idea.

Paolo

Re: [Qemu-devel] [PATCH 2/2] q35: change default NIC to e1000e

2018-03-02 Thread Thomas Huth

On 02.03.2018 16:51, Paolo Bonzini wrote:
> The e1000 NIC is getting old and is not a very good default for a
> PCIe machine type.  Change it to e1000e, which should be supported
> by a good number of guests.

Basically a good idea, but you can only do that for new machine types
(pc-q35-2.12 and later), otherwise migration from an older version will
fail if the user tries to migrate with the default model.

 Thomas

Re: [Qemu-devel] [PATCH] net: fix misaligned member access

2018-03-02 Thread Peter Maydell

On 9 February 2018 at 19:03, Marc-André Lureau
 wrote:
> Fixes the following ASAN warnings:
>
> /home/elmarco/src/qemu/hw/net/net_tx_pkt.c:201:27: runtime error: member 
> access within misaligned address 0x63128846 for type 'struct ip_header', 
> which requires 4 byte alignment
> 0x63128846: note: pointer points here
>  01 00 00 00 45 00  01 a9 01 00 00 00 40 11  78 45 00 00 00 00 ff ff  ff ff 
> 00 00 00 00 00 00  00 00
>  ^
> /home/elmarco/src/qemu/hw/net/net_tx_pkt.c:208:63: runtime error: member 
> access within misaligned address 0x63128846 for type 'struct ip_header', 
> which requires 4 byte alignment
> 0x63128846: note: pointer points here
>  01 00 00 00 45 00  01 a9 01 00 00 00 40 11  78 45 00 00 00 00 ff ff  ff ff 
> 00 00 00 00 00 00  00 00
>  ^
> /home/elmarco/src/qemu/hw/net/net_tx_pkt.c:210:13: runtime error: member 
> access within misaligned address 0x63128846 for type 'struct ip_header', 
> which requires 4 byte alignment
> 0x63128846: note: pointer points here
>  01 00 00 00 45 00  01 a9 01 00 00 00 40 11  78 45 00 00 00 00 ff ff  ff ff 
> 00 00 00 00 00 00  00 00
>
> Signed-off-by: Marc-André Lureau 
> ---
>  include/net/eth.h   | 4 +++-
>  hw/net/net_tx_pkt.c | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/eth.h b/include/net/eth.h
> index 09054a506d..e6dc8a7ba0 100644
> --- a/include/net/eth.h
> +++ b/include/net/eth.h
> @@ -194,7 +194,9 @@ struct tcp_hdr {
>  #define PKT_GET_IP_HDR(p) \
>  ((struct ip_header *)(((uint8_t *)(p)) + eth_get_l2_hdr_length(p)))
>  #define IP_HDR_GET_LEN(p) \
> -struct ip_header *)(p))->ip_ver_len & 0x0F) << 2)
> +((ldub_p(p + offsetof(struct ip_header, ip_ver_len)) & 0x0F) << 2)
> +#define IP_HDR_GET_P(p)   \
> +(ldub_p(p + offsetof(struct ip_header, ip_p)))
>  #define PKT_GET_IP_HDR_LEN(p) \
>  (IP_HDR_GET_LEN(PKT_GET_IP_HDR(p)))
>  #define PKT_GET_IP6_HDR(p)\
> diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
> index e29c881bc2..162f802dd7 100644
> --- a/hw/net/net_tx_pkt.c
> +++ b/hw/net/net_tx_pkt.c
> @@ -205,7 +205,7 @@ static bool net_tx_pkt_parse_headers(struct NetTxPkt *pkt)
>  return false;
>  }
>
> -pkt->l4proto = ((struct ip_header *) l3_hdr->iov_base)->ip_p;
> +pkt->l4proto = IP_HDR_GET_P(l3_hdr->iov_base);
>
>  if (IP_HDR_GET_LEN(l3_hdr->iov_base) != sizeof(struct ip_header)) {
>  /* copy optional IPv4 header data if any*/
> --
> 2.16.1.73.g5832b7e9f2

Reviewed-by: Peter Maydell 

and I'm going to apply this to master, because I'm fed up of the warnings
in my build system logs.

It looks like all these macros need to be fixed, though, not just these two.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 1/2] net: add e1000e model to the "simple" -net/-nic options

2018-03-02 Thread Thomas Huth

On 02.03.2018 16:51, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/pci/pci.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index e006b6ac71..af3c85a46f 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1822,6 +1822,7 @@ static const char * const pci_nic_models[] = {
>  "i82559er",
>  "rtl8139",
>  "e1000",
> +"e1000e",
>  "pcnet",
>  "virtio",
>  "sungem",
> @@ -1835,6 +1836,7 @@ static const char * const pci_nic_names[] = {
>  "i82559er",
>  "rtl8139",
>  "e1000",
> +"e1000e",
>  "pcnet",
>  "virtio-net-pci",
>  "sungem",

I think it would be better and more consisten to finally fix this mess
and automatically allow all devices in the category
DEVICE_CATEGORY_NETWORK that are derived from TYPE_PCI_DEVICE, instead
of manually maintaining this list here...?

 Thomas

1 2 3 4 >

1 - 100 of 315 matches

Mail list logo