date:20190812

Re: [Qemu-devel] [PATCH v4 11/29] typedefs: Separate incomplete types and function types

2019-08-12 Thread Markus Armbruster

Markus Armbruster  writes:

> Alex Bennée  writes:
>
>> Markus Armbruster  writes:
>>
>>> While there, drop the obsolete file comment.
>>>
>>> Signed-off-by: Markus Armbruster 
>>> Reviewed-by: Philippe Mathieu-Daudé 
>>> Tested-by: Philippe Mathieu-Daudé 
>>> ---
>>>  include/qemu/typedefs.h | 12 
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
>>> index fcdaae58c4..29346648d4 100644
>>> --- a/include/qemu/typedefs.h
>>> +++ b/include/qemu/typedefs.h
>>> @@ -1,10 +1,10 @@
>>>  #ifndef QEMU_TYPEDEFS_H
>>>  #define QEMU_TYPEDEFS_H
>>>
>>> -/* A load of opaque types so that device init declarations don't have to
>>> -   pull in all the real definitions.  */
>>> -
>>> -/* Please keep this list in case-insensitive alphabetical order */
>>> +/*
>>> + * Incomplete struct types
>>
>> Maybe expand this a little...
>>
>>   "Incomplete struct types for modules that don't need the complete
>>   definitions but still pass around typed variables."?
>
> If we explain proper use of qemu/typedefs.h in HACKING, as discussed in
> review of v1[*], we could point there.

Perhaps rewriting the obsolete file comment would be better.  Something
like this:

/*
 * This header is for selectively avoiding #include just to get a
 * typedef name.
 *
 * Declaring a typedef name in its "obvious" place can result in
 * inclusion cycles, in particular for complete struct and union
 * types that need more types for their members.  It can also result
 * in headers pulling in many more headers, slowing down builds.
 *
 * You can break such cycles and unwanted dependencies by declaring
 * the typedef name here.
 *
 * For struct types used in only a few headers, judicious use of the
 * struct tag instead of the typedef name is commonly preferable.
 */

/*
 * Incomplete struct types
 * Please keep this list in case-insensitive alphabetical order.
 */
typedef struct AdapterInfo AdapterInfo;
[...]

/*
 * Pointer types
 * Such typedefs should be limited to cases where the typedef's users
 * are oblivious of its "pointer-ness".
 * Please keep this list in case-insensitive alphabetical order.
 */
typedef struct IRQState *qemu_irq;

/*
 * Function types
 */
typedef void SaveStateHandler(QEMUFile *f, void *opaque);
typedef int LoadStateHandler(QEMUFile *f, void *opaque, int version_id);
typedef void (*qemu_irq_handler)(void *opaque, int n, int level);

What do you think?

[...]

Re: [Qemu-devel] [RFC PATCH v2 15/39] target/i386: introduce function ck_cpuid

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> +enum {
> +CK_CPUID_MMX = 1,
> +CK_CPUID_3DNOW,
> +CK_CPUID_SSE,
> +CK_CPUID_SSE2,
> +CK_CPUID_SSE3,
> +CK_CPUID_SSSE3,
> +CK_CPUID_SSE4_1,
> +CK_CPUID_SSE4_2,
> +CK_CPUID_SSE4A,
> +CK_CPUID_AVX,
> +CK_CPUID_AVX2,
> +};

Name the enumeration,

> +static int ck_cpuid(CPUX86State *env, DisasContext *s, int ck_cpuid_feat)

and use it in the parameter.  Return bool and true on success.

Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH] Add git-publish profile for security bugs

2019-08-12 Thread Gerd Hoffmann

> > +# https://wiki.qemu.org/SecurityProcess
> > +[gitpublishprofile "security"]
> > +base = master
> > +to = m...@redhat.com
> > +to = pmato...@redhat.com
> > +to = sstabell...@kernel.org
> > +to = secal...@redhat.com
> > +to = mdr...@linux.vnet.ibm.com
> > +to = p...@redhat.com
> > +suppresscc = all
> > 
> 
> Should we force inspect-emails = true here due to the nature of the
> security list? That way if we accidentally add extra CCs/etc there's a
> chance to review 'em.

That makes sense indeed.

> Also, should we update MAINTAINERS to match this script?
> 
> Responsible Disclosure, Reporting Security Issues
> -
> W: https://wiki.qemu.org/SecurityProcess
> M: Michael S. Tsirkin 
> L: secal...@redhat.com

Hmm, good question.  I took the list of addresses from the
SecurityProcess page.  Not sure why MAINTAINERS is not in sync even
though it links the page.  Is that intentional or just an oversight?

[ Cc'ing mst ]

cheers,
  Gerd

Re: [Qemu-devel] [RFC PATCH v2 14/39] target/i386: introduce mnemonic aliases for several gvec operations

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> It is helpful to introduce aliases for some general gvec operations as
> it makes a couple of instruction code generators simpler (added
> later).
> 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 23550a21d3..03b49411e5 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -4493,6 +4493,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
> int b)
>  #pragma GCC diagnostic push
>  #pragma GCC diagnostic ignored "-Wunused-function"
>  
> +#define tcg_gen_gvec_andn(vece, dofs, aofs, bofs, oprsz, maxsz) \
> +tcg_gen_gvec_andc(vece, dofs, bofs, aofs, oprsz, maxsz)
> +#define tcg_gen_gvec_cmpeq(vece, dofs, aofs, bofs, oprsz, maxsz)\
> +tcg_gen_gvec_cmp(TCG_COND_EQ, vece, dofs, aofs, bofs, oprsz, maxsz)
> +#define tcg_gen_gvec_cmpgt(vece, dofs, aofs, bofs, oprsz, maxsz)\
> +tcg_gen_gvec_cmp(TCG_COND_GT, vece, dofs, aofs, bofs, oprsz, maxsz)
> +
>  static void gen_sse_ng(CPUX86State *env, DisasContext *s, int b)
>  {
>  enum {
> 

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 12/39] target/i386: introduce gen_sse_ng

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> This function serves as the point-of-intercept for all newly
> implemented instructions. If no new implementation exists, fall back
> to gen_sse.
> 
> Note: This changeset is intended for development only and shall not be
> included in the final patch series.
> 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)

Actually, I think this is a nice preparatory patch and should be included.

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 11/39] target/i386: introduce gen_(ld, st)d_env_A0

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> Similar in spirit to the already present gen_(ld,st)(q,o)_env_A0, it
> will prove useful in later commits for smaller-sized vector loads.
> 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 12 
>  1 file changed, 12 insertions(+)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 09/39] target/i386: make variable is_xmm const

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> The variable is_xmm does not change value after assignment, so make
> this fact explicit by marking it const.
> 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 17 ++---
>  1 file changed, 6 insertions(+), 11 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 08/39] target/i386: make variable b1 const

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> The variable b1 does not change value once assigned. Make this fact
> explicit by marking it const.
> 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 05/39] target/i386: use prefix from DisasContext

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> Reduce scope of the local variable prefixes to enforce use of prefix
> from DisasContext instead.
> 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 113 
>  1 file changed, 57 insertions(+), 56 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 04/39] target/i386: use dflag from DisasContext

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> There already is a variable dflag in DisasContext, so reduce the scope
> of the local variable dflag to enforce use of the one in DisasContext.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 184 
>  1 file changed, 92 insertions(+), 92 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 03/39] target/i386: reduce scope of variable aflag

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> The variable aflag is not used in most of disas_insn; make this clear
> by explicitly reducing its scope to the block where it is used.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 07/39] target/i386: use pc_start from DisasContext

2019-08-12 Thread Richard Henderson

On 8/10/19 5:12 AM, Jan Bobek wrote:
> The variable pc_start is already a member of DisasContext. Remove the
> superfluous local variable.
> 
> Signed-off-by: Jan Bobek 
> ---
>  target/i386/translate.c | 131 
>  1 file changed, 65 insertions(+), 66 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH qemu] spapr_iommu: Fix xlate trace to print translated address

2019-08-12 Thread Alexey Kardashevskiy





On 12/08/2019 19:01, Philippe Mathieu-Daudé wrote:

Hi Alexey,

On 8/12/19 7:42 AM, Alexey Kardashevskiy wrote:

Currently we basically print IO address twice, fix this.

Fixes: 7e472264e9e2 ("PPC: spapr: iommu: rework traces")
Signed-off-by: Alexey Kardashevskiy 
---
  hw/ppc/spapr_iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index bd3d0256a65d..6fe57d799a10 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -137,7 +137,7 @@ static IOMMUTLBEntry 
spapr_tce_translate_iommu(IOMMUMemoryRegion *iommu,
  ret.addr_mask = ~page_mask;
  ret.perm = spapr_tce_iommu_access_flags(tce);
  }
-trace_spapr_iommu_xlate(tcet->liobn, addr, ret.iova, ret.perm,
+trace_spapr_iommu_xlate(tcet->liobn, addr, ret.translated_addr, ret.perm,
  ret.addr_mask);


But the trace format is:

spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned
perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64"
perm=%u mask=%x"

So this could be more appropriate:

   trace_spapr_iommu_xlate(tcet->liobn, ret.iova, ret.translated_addr,
   ret.perm, ret.addr_mask);

Anyhow your patch is an improvment, so regardless addr/ret.iova:



I'd rather want to see the raw input data than with some bits removed.


Reviewed-by: Philippe Mathieu-Daudé 


Thanks!



  
  return ret;




--
Alexey

[Qemu-devel] [PATCH v2] HACKING: Document 'struct' keyword usage

2019-08-12 Thread Eduardo Habkost

Sometimes we use the 'struct' keyword in headers to help us
reduce dependencies between header files.  Document that
practice.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Eduardo Habkost 
---
Changes v1 -> v2:
* Use paragraphs written by Paolo Bonzini at
  https://www.mail-archive.com/qemu-devel@nongnu.org/msg586214.html
* Fix typos spotted by Thomas Huth
---
 HACKING | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/HACKING b/HACKING
index 0fc3e0fc04..035276e668 100644
--- a/HACKING
+++ b/HACKING
@@ -100,7 +100,19 @@ pointer, you're guaranteed that it is used to modify the 
storage
 it points to, or it is aliased to another pointer that is.
 
 2.3. Typedefs
-Typedefs are used to eliminate the redundant 'struct' keyword.
+
+Typedefs are used to eliminate the redundant 'struct' keyword, since type
+names have a different style than other identifiers ("CamelCase" versus
+"snake_case").  Each struct should have a CamelCase name and a
+corresponding typedef.
+
+Since certain C compilers choke on duplicated typedefs, you should avoid
+them and declare a typedef only in one header file.  For common types,
+you can use "include/qemu/typedefs.h" for example.  However, as a matter
+of convenience it is also perfectly fine to use forward struct
+definitions instead of typedefs in headers and function prototypes; this
+avoids problems with duplicated typedefs and reduces the need to include
+headers from other headers.
 
 2.4. Reserved namespaces in C and POSIX
 Underscore capital, double underscore, and underscore 't' suffixes should be
-- 
2.21.0

Re: [Qemu-devel] [PATCH v4 22/29] Include hw/boards.h a bit less

2019-08-12 Thread Eduardo Habkost

On Mon, Aug 12, 2019 at 07:23:52AM +0200, Markus Armbruster wrote:
> hw/boards.h pulls in almost 60 headers.  The less we include it into
> headers, the better.  As a first step, drop superfluous inclusions,
> and downgrade some more to what's actually needed.  Gets rid of just
> one inclusion into a header.
> 
> Cc: Eduardo Habkost 
> Cc: Marcel Apfelbaum 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Alistair Francis 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [Qemu-devel] [PATCH-4.2 v2 5/5] target/riscv: Fix Floating Point register names

2019-08-12 Thread Palmer Dabbelt


On Tue, 30 Jul 2019 16:35:34 PDT (-0700), Alistair Francis wrote:

From: Atish Patra 

As per the RISC-V spec, Floating Point registers are named as f0..f31
so lets fix the register names accordingly.

Signed-off-by: Atish Patra 
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f8d07bd20a..af1e9b7690 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -40,10 +40,10 @@ const char * const riscv_int_regnames[] = {
 };

 const char * const riscv_fpr_regnames[] = {
-  "ft0", "ft1", "ft2",  "ft3",  "ft4", "ft5", "ft6",  "ft7",
-  "fs0", "fs1", "fa0",  "fa1",  "fa2", "fa3", "fa4",  "fa5",
-  "fa6", "fa7", "fs2",  "fs3",  "fs4", "fs5", "fs6",  "fs7",
-  "fs8", "fs9", "fs10", "fs11", "ft8", "ft9", "ft10", "ft11"
+  "f0", "f1", "f2",  "f3",  "f4", "f5", "f6", "f7",
+  "f8", "f9", "f10",  "f11",  "f12", "f13", "f14", "f15",
+  "f16", "f17", "f18",  "f19",  "f20", "f21", "f22", "f23",
+  "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31"
 };

 const char * const riscv_excp_names[] = {


I actually don't think this one is right: riscv_int_regnames uses the ABI 
names, so this should match.  I'd be OK switching both of them, but not just 
one.


I've queued the other four patches.

Re: [Qemu-devel] [PATCH 1/3] riscv: sifive_u: Add support for loading initrd

2019-08-12 Thread Palmer Dabbelt


On Fri, 19 Jul 2019 06:40:43 PDT (-0700), li...@roeck-us.net wrote:

Add support for loading initrd with "-initrd "
to the sifive_u machine. This lets us boot into Linux without
disk drive.

Signed-off-by: Guenter Roeck 
---
 hw/riscv/sifive_u.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 71b8083..0657046 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -67,7 +67,7 @@ static const struct MemmapEntry {

 #define GEM_REVISION0x10070109

-static void create_fdt(SiFiveUState *s, const struct MemmapEntry *memmap,
+static void *create_fdt(SiFiveUState *s, const struct MemmapEntry *memmap,
 uint64_t mem_size, const char *cmdline)
 {
 void *fdt;
@@ -244,11 +244,14 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 qemu_fdt_setprop_string(fdt, "/chosen", "bootargs", cmdline);
 }
 g_free(nodename);
+
+return fdt;
 }

 static void riscv_sifive_u_init(MachineState *machine)
 {
 const struct MemmapEntry *memmap = sifive_u_memmap;
+void *fdt;

 SiFiveUState *s = g_new0(SiFiveUState, 1);
 MemoryRegion *system_memory = get_system_memory();
@@ -269,13 +272,24 @@ static void riscv_sifive_u_init(MachineState *machine)
 main_mem);

 /* create device tree */
-create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline);
+fdt = create_fdt(s, memmap, machine->ram_size, machine->kernel_cmdline);

 riscv_find_and_load_firmware(machine, BIOS_FILENAME,
  memmap[SIFIVE_U_DRAM].base);

 if (machine->kernel_filename) {
-riscv_load_kernel(machine->kernel_filename);
+uint64_t kernel_entry = riscv_load_kernel(machine->kernel_filename);
+
+if (machine->initrd_filename) {
+hwaddr start;
+hwaddr end = riscv_load_initrd(machine->initrd_filename,
+   machine->ram_size, kernel_entry,
+   );
+qemu_fdt_setprop_cell(fdt, "/chosen",
+  "linux,initrd-start", start);
+qemu_fdt_setprop_cell(fdt, "/chosen", "linux,initrd-end",
+  end);
+}
 }

 /* reset vector */


Thanks.  I've queued all three of these.

Re: [Qemu-devel] [RFC PATCH v2 02/17] fuzz: Add fuzzer configure options

2019-08-12 Thread Bandan Das

"Oleinik, Alexander"  writes:
...
>  if test "$supported_cpu" = "no"; then
>  echo
> @@ -7306,6 +7310,17 @@ fi
>  if test "$sheepdog" = "yes" ; then
>echo "CONFIG_SHEEPDOG=y" >> $config_host_mak
>  fi
> +if test "$fuzzing" = "yes" ; then
> +  QEMU_CFLAGS="$QEMU_CFLAGS -fsanitize=fuzzer,address  
> -fprofile-instr-generate"
> +  QEMU_CFLAGS="$QEMU_CFLAGS -fprofile-instr-generate -fcoverage-mapping"

What is the purpose of -fprofile-instr-generate ? Coverage info ? (Listed twice 
above)

Bandan

> +  QEMU_LDFLAGS="$LDFLAGS -fsanitize=fuzzer,address"
> +
> +  # Add tests/ to include path, since this is done in tests/Makefile.include,
> +  # and required for QOS objects to build. This can be removed if/when the
> +  # fuzzer is compiled using rules in tests/Makefile.include
> +  QEMU_INCLUDES="-iquote \$(SRC_PATH)/tests $QEMU_INCLUDES"
> +  echo "CONFIG_FUZZ=y" >> $config_host_mak
> +fi
>  
>  if test "$tcg_interpreter" = "yes"; then
>QEMU_INCLUDES="-iquote \$(SRC_PATH)/tcg/tci $QEMU_INCLUDES"

Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] block: Make various formats' block_status recurse again

2019-08-12 Thread John Snow




On 8/12/19 3:11 PM, Max Reitz wrote:
> On 12.08.19 20:39, John Snow wrote:
>>
>>
>> On 7/25/19 11:55 AM, Max Reitz wrote:
>>> Hi,
>>>
>>> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
>>> would only go down to the protocol layer if the format layer returned
>>> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
>>> information whether a given range in the image is zero or not.
>>> Generally, this is because the image is preallocated and thus all ranges
>>> appear as zeroes.
>>>
>>> However, it only implemented this preallocation detection for qcow2.
>>> There are more formats that support preallocation, though: vdi, vhdx,
>>> vmdk, vpc.  (Funny how they all start with “v”.)
>>>
>>> For vdi, vmdk, and vpc, the fix is rather simple, because they really
>>> have different subformats depending on whether an image is preallocated
>>> or not.  This makes the check very simple.
>>>
>>> vhdx is more like qcow2, where after the image has been created, it
>>> isn’t clear whether it’s been preallocated or everything is allocated
>>> because everything was already written to.  69f47505ee added a heuristic
>>> to qcow2 to get around this, but I think that’s too much for vhdx.  I
>>> just left it unfixed, because I don’t care that much, honestly (and I
>>> don’t think anyone else does).
>>>
>>
>> What's the practical outcome of that, and is the limitation documented
>> somewhere?
> 
> The outcome is that it if you preallocate a vhdx image
> (subformat=fixed), you’ll see that all sectors contain data, even if
> they may be zero sectors on the filesystem level.
> 
> I don’t think it’s user-visible whatsoever.
> 

But it might mean that doing things with sync=top might over-allocate
data depending on the destination, wouldn't it?

That's not crucial, but it's possibly visible, no?

>> (I'm fine with not fixing it, I just want it documented somehow.)
> 
> I am really not inclined to start any documentation on the
> particularities with which qemu handles vhdx images.
> 
> (Especially so considering we don’t even have any documentation on the
> qcow2 case.  The stress in my paragraph was “heuristic”.  If you
> preallocate a qcow2 image, but then discard enough sectors that the
> heuristic thinks you didn’t, you’ll have the same effect.  Or if you
> grow a preallocated image without preallocating the new area.)
> 
> Max
> 

"But our qcow2 docs are also bad" is the kind of argument I can't
*really* disagree with, but...

(I wish we did have a documentation manual per-format that mentioned
some gotchas and general info about each format, but I can't really ask
you to do that now: I just worry when I see patches like this that the
knowledge or memory that there ever was a quirk will vanish immediately.)

--js

Re: [Qemu-devel] [Qemu-block] [PATCH 7/7] iotests: Disable 126 for some vmdk subformats

2019-08-12 Thread John Snow




On 7/25/19 11:57 AM, Max Reitz wrote:
> Several vmdk subformats do not work with iotest 126, so disable them.
> 
> (twoGbMaxExtentSparse actually should work, but fixing that is a bit
> difficult.  The problem is that the vmdk descriptor file will contain a
> referenc to "image:base.vmdk", which the block layer cannot open because

reference

> it does not know the protocol "image".  This is not trivial to solve,
> because I suppose real protocols like "http://; should be supported.
> Making vmdk treat all paths with a potential protocol prefix that the
> block layer does not recognize as plain files seems a bit weird,
> though.  Ignoring this problem does not seem too bad.)
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/126 | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tests/qemu-iotests/126 b/tests/qemu-iotests/126
> index 9b0dcf9255..8e55d7c843 100755
> --- a/tests/qemu-iotests/126
> +++ b/tests/qemu-iotests/126
> @@ -33,6 +33,12 @@ status=1   # failure is the default!
>  
>  # Needs backing file support
>  _supported_fmt qcow qcow2 qed vmdk
> +# (1) Flat vmdk images do not support backing files
> +# (2) Split vmdk images simply fail this test right now.  Fixing that
> +# is left for another day.

Which one? :)

> +_unsupported_imgopts "subformat=monolithicFlat" \
> + "subformat=twoGbMaxExtentFlat" \
> + "subformat=twoGbMaxExtentSparse"
>  # This is the default protocol (and we want to test the difference between
>  # colons which separate a protocol prefix from the rest and colons which are
>  # just part of the filename, so we cannot test protocols which require a 
> prefix)
> 

What exactly fails? Does the VMDK driver see `image:` and think it's a
special filename it needs to handle and fails to do so?

Re: [Qemu-devel] [Qemu-block] [PATCH 6/7] iotests: Disable 110 for vmdk.twoGbMaxExtentSparse

2019-08-12 Thread John Snow




On 7/25/19 11:57 AM, Max Reitz wrote:
> The error message for the test case where we have a quorum node for
> which no directory name can be generated is different: For
> twoGbMaxExtentSparse, it complains that it cannot open the extent file.
> For other (sub)formats, it just notes that it cannot determine the
> backing file path.  Both are fine, but just disable twoGbMaxExtentSparse
> for simplicity's sake.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/110 | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/110 b/tests/qemu-iotests/110
> index 2cdc7c8a72..2ef516baf1 100755
> --- a/tests/qemu-iotests/110
> +++ b/tests/qemu-iotests/110
> @@ -40,7 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  # Any format supporting backing files
>  _supported_fmt qed qcow qcow2 vmdk
>  _supported_proto file
> -_unsupported_imgopts "subformat=monolithicFlat" 
> "subformat=twoGbMaxExtentFlat"
> +_unsupported_imgopts "subformat=monolithicFlat" 
> "subformat=twoGbMaxExtentFlat" \
> + "subformat=twoGbMaxExtentSparse"
>  
>  TEST_IMG_REL=$(basename "$TEST_IMG")
>  
> 

Hm, fine. This is the second patch I've seen today that tries to work
around error messages changing because of configuration options, though.

Wonder if we need a more general purpose solution to this kind of thing.

Ah, well, regardless... good enough for now, I think, probably.

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH 3/9] vfio: unplug failover primary device before migration

2019-08-12 Thread Alex Williamson

On Mon, 12 Aug 2019 17:18:54 +0200
Cornelia Huck  wrote:

> On Fri,  2 Aug 2019 17:05:59 +0200
> Jens Freimann  wrote:
> 
> > As usual block all vfio-pci devices from being migrated, but make an
> > exception for failover primary devices. This is achieved by setting
> > unmigratable to 0 but also add a migration blocker for all vfio-pci
> > devices except failover primary devices. These will be unplugged before
> > migration happens by the migration handler of the corresponding
> > virtio-net standby device.
> > 
> > Signed-off-by: Jens Freimann 
> > ---
> >  hw/vfio/pci.c | 24 +++-
> >  hw/vfio/pci.h |  1 +
> >  2 files changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index d6ae9bd4ac..398d26669b 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -35,6 +35,9 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "migration/blocker.h"
> > +#include "qemu/option.h"
> > +#include "qemu/option_int.h"
> >  
> >  #define TYPE_VFIO_PCI "vfio-pci"
> >  #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
> > @@ -2693,6 +2696,12 @@ static void 
> > vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >  vdev->req_enabled = false;
> >  }
> >  
> > +static int has_standby_arg(void *opaque, const char *name,
> > +   const char *value, Error **errp)
> > +{
> > +return strcmp(name, "standby") == 0;
> > +}
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -2706,6 +2715,19 @@ static void vfio_realize(PCIDevice *pdev, Error 
> > **errp)
> >  int i, ret;
> >  bool is_mdev;
> >  
> > +if (qemu_opt_foreach(pdev->qdev.opts, has_standby_arg,
> > + (void *) pdev->qdev.opts, ) == 0) {
> > +error_setg(>migration_blocker,
> > +"VFIO device doesn't support migration");
> > +ret = migrate_add_blocker(vdev->migration_blocker, );
> > +if (err) {
> > +error_propagate(errp, err);
> > +error_free(vdev->migration_blocker);
> > +}
> > +} else {
> > +pdev->qdev.allow_unplug_during_migration = true;  
> 
> I think you add this only in the next patch?
> 
> > +}
> > +
> >  if (!vdev->vbasedev.sysfsdev) {
> >  if (!(~vdev->host.domain || ~vdev->host.bus ||
> >~vdev->host.slot || ~vdev->host.function)) {
> > @@ -3148,7 +3170,7 @@ static Property vfio_pci_dev_properties[] = {
> >  
> >  static const VMStateDescription vfio_pci_vmstate = {
> >  .name = "vfio-pci",
> > -.unmigratable = 1,
> > +.unmigratable = 0,
> >  };
> >  
> >  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > index 27d58fc55b..0f6f8cb395 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -169,6 +169,7 @@ typedef struct VFIOPCIDevice {
> >  bool no_vfio_ioeventfd;
> >  bool enable_ramfb;
> >  VFIODisplay *dpy;
> > +Error *migration_blocker;
> >  } VFIOPCIDevice;
> >  
> >  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);  
> 
> This patch interacts with support for vfio migration (last posted in
> <1562665760-26158-1-git-send-email-kwankh...@nvidia.com>, I've not seen
> a later version yet.)
> 
> With that, we'd have three cases to consider:
> 1) device is a failover primary
> 2) device has a migration region
> 3) none of the above
> 
> Can 1) and 2) happen simultaneously? If yes, what should take
> precedence?

Great questions.  I would assume that a user specifying this option
intends the behavior here regardless of the device's support for
migration, which could be made more clear and easier to test by adding
this option to other, otherwise migratable, QEMU NICs.

Also, I thought we agreed that "standby" was not a sufficiently
descriptive name for this device option and that this option would be
rejected with an error on non-Ethernet class devices[1].  Thanks,

Alex

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg04727.html

Re: [Qemu-devel] [PATCH 2/2] ssh: implement private key authentication

2019-08-12 Thread Eric Blake

On 7/29/19 6:08 AM, Kevin Wolf wrote:

>> On a different topic, how much of this work overlaps with the nbdkit ssh
>> plugin? Should we be duplicating efforts with both projects supporting
>> ssh natively, or is it worth considering getting qemu out of the ssh
>> business and instead connecting to an nbd device provided by nbdkit
>> connecting to ssh?
> 
> ssh behaves essentially like a filesystem whereas NBD behaves like a
> block device. This is especially relevant for everything related to the
> file size. As far as I know, using an image format like qcow2 that wants
> to grow the image file isn't possible over NBD, whereas I expect it to
> work with the ssh block driver.

Resizing NBD devices isn't available yet, but it is rapidly moving
higher on my list of TODO items to implement as an extension (the ideas
for how it should work have been tossed around, but having code to back
up those ideas is the next step).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 4/7] vmdk: Reject invalid compressed writes

2019-08-12 Thread John Snow




On 8/12/19 5:03 PM, Max Reitz wrote:
> On 12.08.19 22:26, John Snow wrote:
>>
>>
>> On 7/25/19 11:57 AM, Max Reitz wrote:
>>> Compressed writes generally have to write full clusters, not just in
>>> theory but also in practice when it comes to vmdk's streamOptimized
>>> subformat.  It currently is just silently broken for writes with
>>> non-zero in-cluster offsets:
>>>
>>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>>> $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
>>> wrote 4096/4096 bytes at offset 4096
>>> 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
>>> read failed: Invalid argument
>>>
>>> (The technical reason is that vmdk_write_extent() just writes the
>>> incomplete compressed data actually to offset 4k.  When reading the
>>> data, vmdk_read_extent() looks at offset 0 and finds the compressed data
>>> size to be 0, because that is what it reads from there.  This yields an
>>> error.)
>>>
>>> For incomplete writes with zero in-cluster offsets, the error path when
>>> reading the rest of the cluster is a bit different, but the result is
>>> the same:
>>>
>>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>>> $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
>>> wrote 4096/4096 bytes at offset 0
>>> 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
>>> read failed: Invalid argument
>>>
>>> (Here, vmdk_read_extent() finds the data and then sees that the
>>> uncompressed data is short.)
>>>
>>> It is better to reject invalid writes than to make the user believe they
>>> might have succeeded and then fail when trying to read it back.
>>>
>>> Signed-off-by: Max Reitz 
>>> ---
>>>  block/vmdk.c | 10 ++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/block/vmdk.c b/block/vmdk.c
>>> index db6acfc31e..641acacfe0 100644
>>> --- a/block/vmdk.c
>>> +++ b/block/vmdk.c
>>> @@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, 
>>> int64_t cluster_offset,
>>>  if (extent->compressed) {
>>>  void *compressed_data;
>>>  
>>> +/* Only whole clusters */
>>> +if (offset_in_cluster ||
>>> +n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
>>> +(n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
>>> + offset + n_bytes != extent->end_sector * SECTOR_SIZE))
>>> +{
>>> +ret = -EINVAL;
>>> +goto out;
>>> +}
>>> +
>>>  if (!extent->has_marker) {
>>>  ret = -EINVAL;
>>>  goto out;
>>>
>>
>> What does this look like from a guest's perspective? Is there something
>> that enforces the alignment in the graph for us?
>>
>> Or is it the case that indeed guests (or users via qemu-io) can request
>> invalid writes and we will halt the VM in those cases (in preference to
>> corrupting the disk)?
> 
> Have you ever tried using a streamOptimized VMDK disk with a guest?
> 

Nope! It's why I'm asking. I have no idea what the whole picture before
and after is.

> I haven’t, but I know that it won’t work. O:-)  If you try to write to
> an already allocated cluster, you’ll get an EIO and an error message via
> error_report() (“Could not write to allocated cluster for
> streamOptimized”).  So really, the only use of streamOptimized is as a
> qemu-img convert source/target, or as a backup/mirror target.  (Just
> like compressed clusters in qcow2 images.)
> 

OK, makes sense. Someone's going to try to use it in cases where it
doesn't make sense though, for sure.

> I suppose if I introduced streamOptimized support today, I wouldn’t just
> forward vmdk_co_pwritev_compressed() to vmdk_co_pwritev(), but instead
> make vmdk_co_pwritev_compressed() only work on streamOptimized images,
> and vmdk_co_pwritev() only on everything else.  Then it would be more clear.
> 
> Hm.  In fact, that’s a bug, isn’t it?  vmdk will accept compressed
> writes for any subformat, even if it doesn’t support compression.  So if
> you use -c and convert to vmdk, it will succeed, but the result won’t be
> compressed,
> 
> It’s also a bit weird to accept normal writes for streamOptimized, but
> I’m not sure whether that’s really a bug?  In any case, changing this
> behavior would not be backwards-compatible...  Should we deprecate
> normal writes to streamOptimized?
> 

If it's supposed to be the case that streamOptimized *only* gets
compressed, aligned writes -- it probably is a bug to do normal,
uncompressed writes, isn't it? Does that produce a usable image?

> Max
> 

Anyway, I'm fine with this patch because things aren't any worse, and
our support for non-native formats has always been kind of best-attempt.

Reviewed-by: John Snow

Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier

2019-08-12 Thread Peter Xu

On Mon, Aug 12, 2019 at 10:24:53AM -0600, Alex Williamson wrote:
> On Mon, 12 Aug 2019 09:45:27 +0200
> Peter Xu  wrote:
> 
> > This is a RFC series.
> > 
> > The VT-d code has some defects, one of them is that we cannot detect
> > the misuse of vIOMMU and vfio-pci early enough.
> > 
> > For example, logically this is not allowed:
> > 
> >   -device intel-iommu,caching-mode=off \
> >   -device vfio-pci,host=05:00.0
> 
> Do we require intel-iommu with intremap=on in order to get x2apic for
> large vCPU count guests?  If so, wouldn't it be a valid configuration
> for the user to specify:
> 
>-device intel-iommu,caching-mode=off,intremap=on \
>-device vfio-pci,host=05:00.0
> 
> so long as they never have any intention of the guest enabling DMA
> translation?  Would there be any advantage to this config versus
> caching-mode=on?  I suspect the overhead of CM=1 when only using
> interrupt remapping is small to non-existent, but are there other
> reasons for running with CM=0, perhaps guest drivers not supporting it?

AFAIU the major users of the vIOMMU should be guest DPDK apps and
nested device assignments.  For these users I would just make bold to
guess they are mostly using Linux so the majority should be safe.

For the minority, I do agree that above question is valid.  IMHO the
hard point is to find out those users and let them join the
discussion, then we can know how many will be affected and how.  I
think one way to achieve it could be that we merge the patchset like
this, then people will start to complain if there is any. :) I'm not
sure whether that's the best way to go.  I think that could still be a
serious option considering that it could potentially fix a more severe
issue (unexpected QEMU quits), and also reverting the patchset like
this one could be easy as well when really necessary (e.g., the
patchset will not bring machine state changes which might cause
migration issues, or so on).

> 
> I like the idea of being able to nak an incompatible hot-add rather
> than kill the VM, we could narrow that even further to look at not only
> whether caching-mode support is enabled, but also whether translation
> is enabled on the vIOMMU.  Ideally we might disallow the guest from
> enabling translation in such a configuration, but the Linux code is not
> written with the expectation that the hardware can refuse to enable
> translation and there are no capability bits to remove the DMA
> translation capability of the IOMMU.

This is an interesting view at least to me, while... I'm not sure we
should allow that even for emulation.  I'm just imaging such a patch
for the Linux kernel to allow failures on enabling DMAR - it'll be
only for QEMU emulation and I'm not sure whether upstream would like
such a patch.  After all, we are emulating the hardwares, and the
hardware will always succeed in enabling DMAR, AFAICT.  For Windows
and other OSs it could be even harder.  If without the support of all
these, we could simply have other risks of having hanging guests when
the driver is busy waiting for the DMAR status bit to be set.

> Still, we might want to think
> about which is the better user experience, to have the guest panic when
> DMA_GSTS_TES never becomes set (as it seems Linux would do) or to have
> QEMU exit, or as proposed here, prevent all configurations where this
> might occur.  Thanks,

Agreed.  So far, a stricter rule could be a bit better than a hanging
guest to me.  Though that could be very subjective.

Thanks!

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 2/2] ssh: implement private key authentication

2019-08-12 Thread Max Reitz

On 29.07.19 13:08, Kevin Wolf wrote:
> Am 26.07.2019 um 16:24 hat Eric Blake geschrieben:
>> On 7/26/19 9:09 AM, Pino Toscano wrote:
>>> Add a 'private-key' option which represents the path of a private key
>>> to use for authentication, and 'private-key-secret' as the name of an
>>> object with its passphrase.
>>>
>>> Signed-off-by: Pino Toscano 
>>
>>> +++ b/qapi/block-core.json
>>> @@ -3226,6 +3226,11 @@
>>>  # @password-secret: ID of a QCryptoSecret object providing a password
>>>  #   for authentication (since 4.2)
>>>  #
>>> +# @private-key: path to the private key (since 4.2)
>>> +#
>>> +# @private-key-secret:  ID of a QCryptoSecret object providing the 
>>> passphrase
>>> +#   for 'private-key' (since 4.2)
>>
>> Is password-secret intended to be mutually-exclusive with
>> private-key/private-key-secret?  If so, this should probably utilize an
>> enum for a discriminator
>> { 'enum': 'SshAuth', 'data': ['ssh-agent', 'password', 'private'key'] }
>>
>> then update BlockdevOptionsSsh to be a union type with an optional
>> discriminator (defaulting to ssh-agent) for back-compat, where
>> 'auth':'ssh-agent' needs no further fields, 'auth':'password' adds in a
>> 'secret' field for use as password, or where 'auth':'private-key' adds
>> in both 'key-file' and 'secret' for use as the two pieces needed for
>> private key use.
> 
> Can we actually support optional discriminators when we don't have
> defaults in the QAPI schema yet?

Just chiming in here, because I wanted to throw in that v4 of my “block:
Try to create well-typed json:{} filenames” series adds that.

>> On a different topic, how much of this work overlaps with the nbdkit ssh
>> plugin? Should we be duplicating efforts with both projects supporting
>> ssh natively, or is it worth considering getting qemu out of the ssh
>> business and instead connecting to an nbd device provided by nbdkit
>> connecting to ssh?
> 
> ssh behaves essentially like a filesystem whereas NBD behaves like a
> block device. This is especially relevant for everything related to the
> file size. As far as I know, using an image format like qcow2 that wants
> to grow the image file isn't possible over NBD, whereas I expect it to
> work with the ssh block driver.

Just using sshfs and file-posix would seem simpler to me, by the way.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 4/7] vmdk: Reject invalid compressed writes

2019-08-12 Thread Max Reitz

On 12.08.19 22:26, John Snow wrote:
> 
> 
> On 7/25/19 11:57 AM, Max Reitz wrote:
>> Compressed writes generally have to write full clusters, not just in
>> theory but also in practice when it comes to vmdk's streamOptimized
>> subformat.  It currently is just silently broken for writes with
>> non-zero in-cluster offsets:
>>
>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>> $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
>> wrote 4096/4096 bytes at offset 4096
>> 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
>> read failed: Invalid argument
>>
>> (The technical reason is that vmdk_write_extent() just writes the
>> incomplete compressed data actually to offset 4k.  When reading the
>> data, vmdk_read_extent() looks at offset 0 and finds the compressed data
>> size to be 0, because that is what it reads from there.  This yields an
>> error.)
>>
>> For incomplete writes with zero in-cluster offsets, the error path when
>> reading the rest of the cluster is a bit different, but the result is
>> the same:
>>
>> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
>> $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
>> wrote 4096/4096 bytes at offset 0
>> 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
>> read failed: Invalid argument
>>
>> (Here, vmdk_read_extent() finds the data and then sees that the
>> uncompressed data is short.)
>>
>> It is better to reject invalid writes than to make the user believe they
>> might have succeeded and then fail when trying to read it back.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block/vmdk.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index db6acfc31e..641acacfe0 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, 
>> int64_t cluster_offset,
>>  if (extent->compressed) {
>>  void *compressed_data;
>>  
>> +/* Only whole clusters */
>> +if (offset_in_cluster ||
>> +n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
>> +(n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
>> + offset + n_bytes != extent->end_sector * SECTOR_SIZE))
>> +{
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>>  if (!extent->has_marker) {
>>  ret = -EINVAL;
>>  goto out;
>>
> 
> What does this look like from a guest's perspective? Is there something
> that enforces the alignment in the graph for us?
> 
> Or is it the case that indeed guests (or users via qemu-io) can request
> invalid writes and we will halt the VM in those cases (in preference to
> corrupting the disk)?

Have you ever tried using a streamOptimized VMDK disk with a guest?

I haven’t, but I know that it won’t work. O:-)  If you try to write to
an already allocated cluster, you’ll get an EIO and an error message via
error_report() (“Could not write to allocated cluster for
streamOptimized”).  So really, the only use of streamOptimized is as a
qemu-img convert source/target, or as a backup/mirror target.  (Just
like compressed clusters in qcow2 images.)

I suppose if I introduced streamOptimized support today, I wouldn’t just
forward vmdk_co_pwritev_compressed() to vmdk_co_pwritev(), but instead
make vmdk_co_pwritev_compressed() only work on streamOptimized images,
and vmdk_co_pwritev() only on everything else.  Then it would be more clear.

Hm.  In fact, that’s a bug, isn’t it?  vmdk will accept compressed
writes for any subformat, even if it doesn’t support compression.  So if
you use -c and convert to vmdk, it will succeed, but the result won’t be
compressed,

It’s also a bit weird to accept normal writes for streamOptimized, but
I’m not sure whether that’s really a bug?  In any case, changing this
behavior would not be backwards-compatible...  Should we deprecate
normal writes to streamOptimized?

Max

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH 4/7] vmdk: Reject invalid compressed writes

2019-08-12 Thread John Snow




On 7/25/19 11:57 AM, Max Reitz wrote:
> Compressed writes generally have to write full clusters, not just in
> theory but also in practice when it comes to vmdk's streamOptimized
> subformat.  It currently is just silently broken for writes with
> non-zero in-cluster offsets:
> 
> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
> $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk
> wrote 4096/4096 bytes at offset 4096
> 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec)
> read failed: Invalid argument
> 
> (The technical reason is that vmdk_write_extent() just writes the
> incomplete compressed data actually to offset 4k.  When reading the
> data, vmdk_read_extent() looks at offset 0 and finds the compressed data
> size to be 0, because that is what it reads from there.  This yields an
> error.)
> 
> For incomplete writes with zero in-cluster offsets, the error path when
> reading the rest of the cluster is a bit different, but the result is
> the same:
> 
> $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M
> $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk
> wrote 4096/4096 bytes at offset 0
> 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec)
> read failed: Invalid argument
> 
> (Here, vmdk_read_extent() finds the data and then sees that the
> uncompressed data is short.)
> 
> It is better to reject invalid writes than to make the user believe they
> might have succeeded and then fail when trying to read it back.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/vmdk.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index db6acfc31e..641acacfe0 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1731,6 +1731,16 @@ static int vmdk_write_extent(VmdkExtent *extent, 
> int64_t cluster_offset,
>  if (extent->compressed) {
>  void *compressed_data;
>  
> +/* Only whole clusters */
> +if (offset_in_cluster ||
> +n_bytes > (extent->cluster_sectors * SECTOR_SIZE) ||
> +(n_bytes < (extent->cluster_sectors * SECTOR_SIZE) &&
> + offset + n_bytes != extent->end_sector * SECTOR_SIZE))
> +{
> +ret = -EINVAL;
> +goto out;
> +}
> +
>  if (!extent->has_marker) {
>  ret = -EINVAL;
>  goto out;
> 

What does this look like from a guest's perspective? Is there something
that enforces the alignment in the graph for us?

Or is it the case that indeed guests (or users via qemu-io) can request
invalid writes and we will halt the VM in those cases (in preference to
corrupting the disk)?

Re: [Qemu-devel] [Qemu-block] [PATCH 3/7] iotests: Keep testing broken relative extent paths

2019-08-12 Thread John Snow




On 7/25/19 11:57 AM, Max Reitz wrote:
> We had a test for a case where relative extent paths did not work, but
> unfortunately we just fixed the underlying problem, so it works now.
> This patch adds a new test case that still fails.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/059 | 27 +++
>  tests/qemu-iotests/059.out |  4 
>  2 files changed, 31 insertions(+)
> 
> diff --git a/tests/qemu-iotests/059 b/tests/qemu-iotests/059
> index fbed5f9483..2a883d0f21 100755
> --- a/tests/qemu-iotests/059
> +++ b/tests/qemu-iotests/059
> @@ -114,6 +114,8 @@ $QEMU_IMG convert -f qcow2 -O vmdk -o 
> subformat=streamOptimized "$TEST_IMG.qcow2
>  
>  echo
>  echo "=== Testing monolithicFlat with internally generated JSON file name 
> ==="
> +
> +echo '--- blkdebug ---'
>  # Should work, because bdrv_dirname() works fine with blkdebug

^

>  IMGOPTS="subformat=monolithicFlat" _make_test_img 64M
>  $QEMU_IO -c "open -o 
> driver=$IMGFMT,file.driver=blkdebug,file.image.filename=$TEST_IMG,file.inject-error.0.event=read_aio"
>  \
> @@ -122,6 +124,31 @@ $QEMU_IO -c "open -o 
> driver=$IMGFMT,file.driver=blkdebug,file.image.filename=$TE
>  | _filter_testdir | _filter_imgfmt | _filter_img_info
>  _cleanup_test_img
>  
> +echo '--- quorum ---'
> +# Should not work, because bdrv_dirname() does not work with blkdebug

^ ? So uh, which is it?

(you wanted: s/blkdebug/quorum/)

> +IMGOPTS="subformat=monolithicFlat" _make_test_img 64M
> +cp "$TEST_IMG" "$TEST_IMG.orig"
> +
> +filename="json:{
> +\"driver\": \"$IMGFMT\",
> +\"file\": {
> +\"driver\": \"quorum\",
> +\"children\": [ {
> +\"driver\": \"file\",
> +\"filename\": \"$TEST_IMG\"
> +}, {
> +\"driver\": \"file\",
> +\"filename\": \"$TEST_IMG.orig\"
> +} ],
> +\"vote-threshold\": 1
> +} }"
> +
> +filename=$(echo "$filename" | tr '\n' ' ' | sed -e 's/\s\+/ /g')
> +$QEMU_IMG info "$filename" 2>&1 \
> +| sed -e "s/'json:[^']*'/\$QUORUM_FILE/g" \
> +| _filter_testdir | _filter_imgfmt | _filter_img_info
> +
> +
>  echo
>  echo "=== Testing version 3 ==="
>  _use_sample_img iotest-version3.vmdk.bz2
> diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
> index 120cddd207..f8895ba434 100644
> --- a/tests/qemu-iotests/059.out
> +++ b/tests/qemu-iotests/059.out
> @@ -2049,10 +2049,14 @@ wrote 512/512 bytes at offset 10240
>  512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  
>  === Testing monolithicFlat with internally generated JSON file name ===
> +--- blkdebug ---
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>  format name: IMGFMT
>  cluster size: 0 bytes
>  vm state offset: 0 bytes
> +--- quorum ---
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
> +qemu-img: Could not open $QUORUM_FILE: Cannot use relative paths with VMDK 
> descriptor file $QUORUM_FILE: Cannot generate a base directory for quorum 
> nodes
>  
>  === Testing version 3 ===
>  image: TEST_DIR/iotest-version3.IMGFMT
> 

With the paste-o fixed:

Reviewed-by: John Snow

Re: [Qemu-devel] [Qemu-block] [PATCH 2/7] vmdk: Use bdrv_dirname() for relative extent paths

2019-08-12 Thread John Snow




On 7/25/19 11:57 AM, Max Reitz wrote:
> This makes iotest 033 pass with e.g. subformat=monolithicFlat.  It also
> turns a former error in 059 into success.
> 
> Signed-off-by: Max Reitz 

Seems roughly correct, but I only really gave it a cursory look; my
trust in you knowing the exact semantics of filename and path variables
because of those lengthy series is doing the heavy lifting here:

Reviewed-by: John Snow 

(And if it breaks, it's for 4.2, and it's just vmdk, we'll figure it out.)

> ---
>  block/vmdk.c   | 54 --
>  tests/qemu-iotests/059 |  7 +++--
>  tests/qemu-iotests/059.out |  4 ++-
>  3 files changed, 42 insertions(+), 23 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index bd36ece125..db6acfc31e 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1076,8 +1076,7 @@ static const char *next_line(const char *s)
>  }
>  
>  static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
> -  const char *desc_file_path, QDict *options,
> -  Error **errp)
> +  QDict *options, Error **errp)
>  {
>  int ret;
>  int matches;
> @@ -1087,6 +1086,7 @@ static int vmdk_parse_extents(const char *desc, 
> BlockDriverState *bs,
>  const char *p, *np;
>  int64_t sectors = 0;
>  int64_t flat_offset;
> +char *desc_file_dir = NULL;
>  char *extent_path;
>  BdrvChild *extent_file;
>  BDRVVmdkState *s = bs->opaque;
> @@ -1130,16 +1130,23 @@ static int vmdk_parse_extents(const char *desc, 
> BlockDriverState *bs,
>  continue;
>  }
>  
> -if (!path_is_absolute(fname) && !path_has_protocol(fname) &&
> -!desc_file_path[0])
> -{
> -bdrv_refresh_filename(bs->file->bs);
> -error_setg(errp, "Cannot use relative extent paths with VMDK "
> -   "descriptor file '%s'", bs->file->bs->filename);
> -return -EINVAL;
> -}
> +if (path_is_absolute(fname) || path_has_protocol(fname)) {
> +extent_path = g_strdup(fname);
> +} else {
> +if (!desc_file_dir) {
> +desc_file_dir = bdrv_dirname(bs->file->bs, errp);
> +if (!desc_file_dir) {
> +bdrv_refresh_filename(bs->file->bs);
> +error_prepend(errp, "Cannot use relative paths with VMDK 
> "
> +  "descriptor file '%s': ",
> +  bs->file->bs->filename);
> +ret = -EINVAL;
> +goto out;
> +}
> +}
>  
> -extent_path = path_combine(desc_file_path, fname);
> +extent_path = g_strconcat(desc_file_dir, fname, NULL);
> +}
>  
>  ret = snprintf(extent_opt_prefix, 32, "extents.%d", s->num_extents);
>  assert(ret < 32);
> @@ -1149,7 +1156,8 @@ static int vmdk_parse_extents(const char *desc, 
> BlockDriverState *bs,
>  g_free(extent_path);
>  if (local_err) {
>  error_propagate(errp, local_err);
> -return -EINVAL;
> +ret = -EINVAL;
> +goto out;
>  }
>  
>  /* save to extents array */
> @@ -1160,7 +1168,7 @@ static int vmdk_parse_extents(const char *desc, 
> BlockDriverState *bs,
>  0, 0, 0, 0, 0, , errp);
>  if (ret < 0) {
>  bdrv_unref_child(bs, extent_file);
> -return ret;
> +goto out;
>  }
>  extent->flat_start_offset = flat_offset << 9;
>  } else if (!strcmp(type, "SPARSE") || !strcmp(type, "VMFSSPARSE")) {
> @@ -1175,24 +1183,27 @@ static int vmdk_parse_extents(const char *desc, 
> BlockDriverState *bs,
>  g_free(buf);
>  if (ret) {
>  bdrv_unref_child(bs, extent_file);
> -return ret;
> +goto out;
>  }
>  extent = >extents[s->num_extents - 1];
>  } else if (!strcmp(type, "SESPARSE")) {
>  ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
>  if (ret) {
>  bdrv_unref_child(bs, extent_file);
> -return ret;
> +goto out;
>  }
>  extent = >extents[s->num_extents - 1];
>  } else {
>  error_setg(errp, "Unsupported extent type '%s'", type);
>  bdrv_unref_child(bs, extent_file);
> -return -ENOTSUP;
> +ret = -ENOTSUP;
> +goto out;
>  }
>  extent->type = g_strdup(type);
>  }
> -return 0;
> +
> +ret = 0;
> +goto out;
>  
>  invalid:
>  np = next_line(p);
> @@ -1201,7 +1212,11 @@ invalid:
>  np--;
>  }
>  error_setg(errp, "Invalid extent line: %.*s", (int)(np - p), p);
> -return -EINVAL;
> +ret

Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW

2019-08-12 Thread Max Reitz

On 12.08.19 21:46, Max Reitz wrote:
> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>> broken. It should be either fixed like I propose (by Max's suggestion)
>> or somehow forbidden (just forbid backing-file supporting node to be
>> file child of raw-format node).
> 
> I agree, I think only filters should return BDRV_BLOCK_RAW.
> 
> (And not even them, they should just be handled transparently by
> bdrv_co_block_status().  But that’s something for later.)
> 
>> Vladimir Sementsov-Ogievskiy (2):
>>   block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>   iotests: test mirroring qcow2 under raw format
>>
>>  block/raw-format.c |  2 +-
>>  tests/qemu-iotests/263 | 46 ++
>>  tests/qemu-iotests/263.out | 12 ++
>>  tests/qemu-iotests/group   |  1 +
>>  4 files changed, 60 insertions(+), 1 deletion(-)
>>  create mode 100755 tests/qemu-iotests/263
>>  create mode 100644 tests/qemu-iotests/263.out
> 
> Thanks, applied to my block-next branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Oops, maybe not.  221 needs to be adjusted.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW

2019-08-12 Thread Max Reitz

On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
> broken. It should be either fixed like I propose (by Max's suggestion)
> or somehow forbidden (just forbid backing-file supporting node to be
> file child of raw-format node).

I agree, I think only filters should return BDRV_BLOCK_RAW.

(And not even them, they should just be handled transparently by
bdrv_co_block_status().  But that’s something for later.)

> Vladimir Sementsov-Ogievskiy (2):
>   block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>   iotests: test mirroring qcow2 under raw format
> 
>  block/raw-format.c |  2 +-
>  tests/qemu-iotests/263 | 46 ++
>  tests/qemu-iotests/263.out | 12 ++
>  tests/qemu-iotests/group   |  1 +
>  4 files changed, 60 insertions(+), 1 deletion(-)
>  create mode 100755 tests/qemu-iotests/263
>  create mode 100644 tests/qemu-iotests/263.out

Thanks, applied to my block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] tests/test-hbitmap: test next_zero and _next_dirty_area after truncate

2019-08-12 Thread John Snow




On 8/9/19 4:26 AM, Vladimir Sementsov-Ogievskiy wrote:
> 08.08.2019 3:04, John Snow wrote:
>>
>>
>> On 8/5/19 12:46 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> Test that hbitmap_next_zero and hbitmap_next_dirty_area can find things
>>> after old bitmap end.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>
>>> It's a follow-up for
>>>
>>>  [PATCH for-4.1] util/hbitmap: update orig_size on truncate
>>>
>>>   tests/test-hbitmap.c | 22 ++
>>>   1 file changed, 22 insertions(+)
>>>
>>> diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
>>> index 592d8219db..eed5d288cb 100644
>>> --- a/tests/test-hbitmap.c
>>> +++ b/tests/test-hbitmap.c
>>> @@ -1004,6 +1004,15 @@ static void test_hbitmap_next_zero_4(TestHBitmapData 
>>> *data, const void *unused)
>>>   test_hbitmap_next_zero_do(data, 4);
>>>   }
>>>   
>>> +static void test_hbitmap_next_zero_after_truncate(TestHBitmapData *data,
>>> +  const void *unused)
>>> +{
>>> +hbitmap_test_init(data, L1, 0);
>>> +hbitmap_test_truncate_impl(data, L1 * 2);
>>> +hbitmap_set(data->hb, 0, L1);
>>> +test_hbitmap_next_zero_check(data, 0);
>>> +}
>>> +
>>>   static void test_hbitmap_next_dirty_area_check(TestHBitmapData *data,
>>>  uint64_t offset,
>>>  uint64_t count)
>>> @@ -1104,6 +1113,15 @@ static void 
>>> test_hbitmap_next_dirty_area_4(TestHBitmapData *data,
>>>   test_hbitmap_next_dirty_area_do(data, 4);
>>>   }
>>>   
>>> +static void test_hbitmap_next_dirty_area_after_truncate(TestHBitmapData 
>>> *data,
>>> +const void *unused)
>>> +{
>>> +hbitmap_test_init(data, L1, 0);
>>> +hbitmap_test_truncate_impl(data, L1 * 2);
>>> +hbitmap_set(data->hb, L1 + 1, 1);
>>> +test_hbitmap_next_dirty_area_check(data, 0, UINT64_MAX);
>>> +}
>>> +
>>>   int main(int argc, char **argv)
>>>   {
>>>   g_test_init(, , NULL);
>>> @@ -1169,6 +1187,8 @@ int main(int argc, char **argv)
>>>test_hbitmap_next_zero_0);
>>>   hbitmap_test_add("/hbitmap/next_zero/next_zero_4",
>>>test_hbitmap_next_zero_4);
>>> +hbitmap_test_add("/hbitmap/next_zero/next_zero_after_truncate",
>>> + test_hbitmap_next_zero_after_truncate);
>>>   
>>>   hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_0",
>>>test_hbitmap_next_dirty_area_0);
>>> @@ -1176,6 +1196,8 @@ int main(int argc, char **argv)
>>>test_hbitmap_next_dirty_area_1);
>>>   hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_4",
>>>test_hbitmap_next_dirty_area_4);
>>> +
>>> hbitmap_test_add("/hbitmap/next_dirty_area/next_dirty_area_after_truncate",
>>> + test_hbitmap_next_dirty_area_after_truncate);
>>>   
>>>   g_test_run();
>>>   
>>>
>>
>> Tested-by: John Snow 
>> Reviewed-by: John Snow 
>>
>> And staged:
>>
>> Thanks, applied to my bitmaps tree:
>>
>> https://github.com/jnsnow/qemu/commits/bitmaps
> 
> Thanks! Hmm but I don't see the patch at this link, neither 01 and 03 from
> "[Qemu-devel] [PATCH 0/3] backup fixes for 4.1?"...
> 

Made a mistake with my git-push syntax, because the local branch was
named `bitmaps-next`. Should be properly synchronized now.

(Also, as Max noted, I need to fix my rebase attempt for the sync=top
routine.)

--js

Re: [Qemu-devel] [Qemu-block] [PATCH 1/7] iotests: Fix _filter_img_create()

2019-08-12 Thread John Snow




On 7/25/19 11:57 AM, Max Reitz wrote:
> fe646693acc changed qemu-img create's output so that it no longer prints
> single quotes around parameter values.  The subformat and adapter_type
> filters in _filter_img_create() have never been adapted to that change.
> 
> Fixes: fe646693acc13ac48b98435d14149ab04dc597bc
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/059.out   | 16 
>  tests/qemu-iotests/common.filter |  4 ++--
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 

either way, the test passes. and it was apparently our intent to filter
out the subformat, so I suppose this is correct.

Reviewed-by: John Snow

Re: [Qemu-devel] [Qemu-block] qemu-iotest 059 fails with vmdk

2019-08-12 Thread Max Reitz

On 12.08.19 21:14, John Snow wrote:
> 
> 
> On 7/22/19 8:58 AM, Thomas Huth wrote:
>> Not sure if it has been reported before, but test 059 currently fails:
>>
>> 059  fail   [14:55:21] [14:55:26]output
>> mismatch (see 059.out.bad)
>> --- /home/thuth/devel/qemu/tests/qemu-iotests/059.out2019-07-19
>> 10:19:18.0 +0200
>> +++ /home/thuth/tmp/qemu-build/tests/qemu-iotests/059.out.bad
>> 2019-07-22
>> 14:55:26.0 +0200
>> @@ -27,7 +27,7 @@
>>  image: TEST_DIR/t.vmdk
>>  file format: vmdk
>>  virtual size: 0.977 TiB (1073741824000 bytes)
>> -disk size: 16 KiB
>> +disk size: 517 KiB
>>  Format specific information:
>>  cid: 
>>  parent cid: 
>> Failures: 059
>> Failed 1 of 1 tests
>>
>> ... I think this was working fine for me a couple of weeks ago, so I
>> assume this is a rather new bug?
>>
>>  Thomas
>>
> 
> I know this is a pretty late response, but "worksforme" -- did someone
> address this in the meantime? I don't see any commits to 059 in some
> time. (Not since March.)

I didn’t because I could never reproduce this failure.  (XFS/tmpfs here.)

I have “vmdk: Misc fixes” on list which came from running the iotests
with all possible subformats (which broke many things, but not 059 in
this way).

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] qemu-iotests 069 and 111 are failing on NetBSD

2019-08-12 Thread John Snow




On 7/25/19 4:34 AM, Thomas Huth wrote:
> On 24/07/2019 18.29, Paolo Bonzini wrote:
>> On 24/07/19 11:34, Thomas Huth wrote:
>>> In case somebody is interested, two of the "auto" iotests are failing
>>> on NetBSD due to non-matching output:
>>>
>>>   TESTiotest-qcow2: 069 [fail]
>>> --- /var/tmp/qemu-test.1BMupF/tests/qemu-iotests/069.out2019-07-24 
>>> 09:19:22.0 +
>>> +++ /var/tmp/qemu-test.1BMupF/tests/qemu-iotests/069.out.bad2019-07-24 
>>> 09:21:34.0 +
>>> @@ -4,5 +4,5 @@
>>>  
>>>  Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=131072
>>>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072 
>>> backing_file=TEST_DIR/t.IMGFMT.base
>>> -qemu-io: can't open device TEST_DIR/t.IMGFMT: Could not open backing file: 
>>> Could not open 'TEST_DIR/t.IMGFMT.base': No such file or directory
>>> +qemu-io: can't open device TEST_DIR/t.IMGFMT: Could not open backing file: 
>>> TEST_DIR/t.IMGFMT.base: stat failed: No such file or directory
>>>  *** done
>>>
>>> and:
>>>
>>>   TESTiotest-qcow2: 111 [fail]
>>> --- /var/tmp/qemu-test.1BMupF/tests/qemu-iotests/111.out2019-07-24 
>>> 09:19:22.0 +
>>> +++ /var/tmp/qemu-test.1BMupF/tests/qemu-iotests/111.out.bad2019-07-24 
>>> 09:21:40.0 +
>>> @@ -1,4 +1,4 @@
>>>  QA output created by 111
>>> -qemu-img: TEST_DIR/t.IMGFMT: Could not open 
>>> 'TEST_DIR/t.IMGFMT.inexistent': No such file or directory
>>> +qemu-img: TEST_DIR/t.IMGFMT: TEST_DIR/t.IMGFMT.inexistent: stat failed: No 
>>> such file or directory
>>>  Could not open backing image to determine size.
>>>  *** done
>>>
>>> It's currently not a problem yet since we're not running the
>>> iotests on NetBSD yet (since our netbsd VM image does not have
>>> bash and gsed installed yet), but if somebody has some spare
>>> minutes, it would be great if this could be fixed so that we
>>> can enable the iotests on NetBSD, too, one day...
>>
>> Is this (slightly ridiculous but effective) patch enough?
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index 73a001ceb7..ce847f4d62 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -217,7 +217,7 @@ static int raw_normalize_devicepath(const char 
>> **filename, Error **errp)
>>  fname = *filename;
>>  dp = strrchr(fname, '/');
>>  if (lstat(fname, ) < 0) {
>> -error_setg_errno(errp, errno, "%s: stat failed", fname);
>> +error_setg_errno(errp, errno, "Could not open: '%s'", fname);
>>  return -errno;
>>  }
> 
> Yes, good idea! It works after removing the colon after "open"! :-)
> 
> With the colon removed:
> 
> Tested-by: Thomas Huth 
> 
>  Thomas
> 

Does someone intend to submit this patch formally?

--js

Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-08-12 Thread Max Reitz

On 12.08.19 19:14, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 16:09, Max Reitz wrote:
>> On 10.08.19 18:41, Vladimir Sementsov-Ogievskiy wrote:
>>> 09.08.2019 19:13, Max Reitz wrote:
 If the driver does not implement bdrv_get_allocated_file_size(), we
 should fall back to cumulating the allocated size of all non-COW
 children instead of just bs->file.

 Suggested-by: Vladimir Sementsov-Ogievskiy 
 Signed-off-by: Max Reitz 
 ---
block.c | 22 --
1 file changed, 20 insertions(+), 2 deletions(-)

 diff --git a/block.c b/block.c
 index 1070aa1ba9..6e1ddab056 100644
 --- a/block.c
 +++ b/block.c
 @@ -4650,9 +4650,27 @@ int64_t 
 bdrv_get_allocated_file_size(BlockDriverState *bs)
if (drv->bdrv_get_allocated_file_size) {
return drv->bdrv_get_allocated_file_size(bs);
}
 -if (bs->file) {
 -return bdrv_get_allocated_file_size(bs->file->bs);
 +
 +if (!QLIST_EMPTY(>children)) {
 +BdrvChild *child;
 +int64_t child_size, total_size = 0;
 +
 +QLIST_FOREACH(child, >children, next) {
 +if (child == bdrv_filtered_cow_child(bs)) {
 +/* Ignore COW backing files */
 +continue;
 +}
 +
 +child_size = bdrv_get_allocated_file_size(child->bs);
 +if (child_size < 0) {
 +return child_size;
 +}
 +total_size += child_size;
 +}
 +
 +return total_size;
}
 +
return -ENOTSUP;
}


>>>
>>> Hmm..
>>>
>>> 1. No children -> -ENOTSUP
>>> 2. Only cow child -> 0
>>> 3. Some non-cow children -> SUM
>>>
>>> It's all arguable (the strictest way is -ENOTSUP in either case),
>>> but if we want to fallback to SUM of non-cow children, 1. and 2. should 
>>> return
>>> the same.
>>
>> I don’t think 2 is possible at all.  If you have a COW child, you need
>> some other child to COW to.
>>
>> And in the weird (and probably impossible) case where a node really only
>> has a COW child, I’d say it’s correct that it has a disk size of 0 –
>> because it hasn’t COWed anything yet.  (Just like a new qcow2 image with
>> a backing file only has its metadata as its disk size.)
>>
> 
> Agreed. Then, why not return 0 on [1] ?

(1) Because that’s the current behavior. :-)

(2) Nodes that have no children are protocol nodes.  Protocol nodes
(apart from null) still have to store their data somewhere.  Therefore,
they must implement .bdrv_get_allocated_file_size() to report that.  If
they don’t, that doesn’t mean they don’t store any data, but only that
we don’t know how much data they store.

> Also, another idea: shouldn't we return 0 for filters, i.e. skip 
> filtered_rw_child too?
> [as filtered-child is more like backing child than file one, it's "less 
> owned" by its parent]

Why would we do that?  If I have a block device with a throttle node
attached to it and request how much space it uses, of course I will want
to know how much space the whole tree below it uses.

(Otherwise, bdrv_get_allocated_file_size() should only report anything
for protocol nodes, and 0 for everything else.)

Max

> with or without any of these suggestions:
> Reviewed-by: Vladimir Sementsov-Ogievskiy 





signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] qemu-iotest 059 fails with vmdk

2019-08-12 Thread John Snow




On 7/22/19 8:58 AM, Thomas Huth wrote:
> Not sure if it has been reported before, but test 059 currently fails:
> 
> 059  fail   [14:55:21] [14:55:26]output
> mismatch (see 059.out.bad)
> --- /home/thuth/devel/qemu/tests/qemu-iotests/059.out 2019-07-19
> 10:19:18.0 +0200
> +++ /home/thuth/tmp/qemu-build/tests/qemu-iotests/059.out.bad 2019-07-22
> 14:55:26.0 +0200
> @@ -27,7 +27,7 @@
>  image: TEST_DIR/t.vmdk
>  file format: vmdk
>  virtual size: 0.977 TiB (1073741824000 bytes)
> -disk size: 16 KiB
> +disk size: 517 KiB
>  Format specific information:
>  cid: 
>  parent cid: 
> Failures: 059
> Failed 1 of 1 tests
> 
> ... I think this was working fine for me a couple of weeks ago, so I
> assume this is a rather new bug?
> 
>  Thomas
> 

I know this is a pretty late response, but "worksforme" -- did someone
address this in the meantime? I don't see any commits to 059 in some
time. (Not since March.)

I'm testing on x86-64, fedora 30, with rc4:

../../configure --target-list="x86_64-softmmu" --enable-debug
--extra-cflags='-ggdb -O0'; and make -j9

./check -v -vmdk 059

filesystem is ext4.

--js

Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] block: Make various formats' block_status recurse again

2019-08-12 Thread Max Reitz

On 12.08.19 20:39, John Snow wrote:
> 
> 
> On 7/25/19 11:55 AM, Max Reitz wrote:
>> Hi,
>>
>> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
>> would only go down to the protocol layer if the format layer returned
>> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
>> information whether a given range in the image is zero or not.
>> Generally, this is because the image is preallocated and thus all ranges
>> appear as zeroes.
>>
>> However, it only implemented this preallocation detection for qcow2.
>> There are more formats that support preallocation, though: vdi, vhdx,
>> vmdk, vpc.  (Funny how they all start with “v”.)
>>
>> For vdi, vmdk, and vpc, the fix is rather simple, because they really
>> have different subformats depending on whether an image is preallocated
>> or not.  This makes the check very simple.
>>
>> vhdx is more like qcow2, where after the image has been created, it
>> isn’t clear whether it’s been preallocated or everything is allocated
>> because everything was already written to.  69f47505ee added a heuristic
>> to qcow2 to get around this, but I think that’s too much for vhdx.  I
>> just left it unfixed, because I don’t care that much, honestly (and I
>> don’t think anyone else does).
>>
> 
> What's the practical outcome of that, and is the limitation documented
> somewhere?

The outcome is that it if you preallocate a vhdx image
(subformat=fixed), you’ll see that all sectors contain data, even if
they may be zero sectors on the filesystem level.

I don’t think it’s user-visible whatsoever.

> (I'm fine with not fixing it, I just want it documented somehow.)

I am really not inclined to start any documentation on the
particularities with which qemu handles vhdx images.

(Especially so considering we don’t even have any documentation on the
qcow2 case.  The stress in my paragraph was “heuristic”.  If you
preallocate a qcow2 image, but then discard enough sectors that the
heuristic thinks you didn’t, you’ll have the same effect.  Or if you
grow a preallocated image without preallocating the new area.)

Max

>>
>> Max Reitz (3):
>>   vdi: Make block_status recurse for fixed images
>>   vmdk: Make block_status recurse for flat extents
>>   vpc: Do not return RAW from block_status
>>
>>  block/vdi.c  | 3 ++-
>>  block/vmdk.c | 3 +++
>>  block/vpc.c  | 2 +-
>>  3 files changed, 6 insertions(+), 2 deletions(-)
>>




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 3/3] vpc: Do not return RAW from block_status

2019-08-12 Thread Max Reitz

On 12.08.19 18:50, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 18:56, Max Reitz wrote:
>> On 12.08.19 17:33, Vladimir Sementsov-Ogievskiy wrote:
>>> 25.07.2019 18:55, Max Reitz wrote:
 vpc is not really a passthrough driver, even when using the fixed
 subformat (where host and guest offsets are equal).  It should handle
 preallocation like all other drivers do, namely by returning
 DATA | RECURSE instead of RAW.

 There is no tangible difference but the fact that bdrv_is_allocated() no
 longer falls through to the protocol layer.
>>>
>>> Hmm. Isn't a real bug (fixed by this patch) ?
>>>
>>> Assume vpc->file is qcow2 with backing, which have "unallocated" region, 
>>> which is
>>> backed by actual data in backing file.
>>
>> Come on now.
>>
>>> So, this region will be reported as not allocated and will be skipped by 
>>> any copying
>>> loop using block-status? Is it a bug of BDRV_BLOCK_RAW itself? Or I don't 
>>> understand
>>> something..
>>
>> I think what you don’t understand is that if you have a vpc file inside
>> of a qcow2 file, you’re doing basically everything wrong. ;-)
>>
>> But maybe we should drop BDRV_BLOCK_RAW...  Does it do anything good for
>> us in the raw driver?  Shouldn’t it too just return DATA | RECURSE?
>>
> 
> And if I have raw driver above qcow2, it will not work, like I've described 
> above..

Yep.  That’s why I was wondering.  (This is a more likely case, because
maybe you really want to use raw’s offset capability on top of qcow2.)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v8 9/9] qapi: query-blockstat: add driver specific file-posix stats

2019-08-12 Thread Max Reitz

On 16.05.19 16:33, Anton Nefedov wrote:
> A block driver can provide a callback to report driver-specific
> statistics.
> 
> file-posix driver now reports discard statistics
> 
> Signed-off-by: Anton Nefedov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Acked-by: Markus Armbruster 
> ---
>  qapi/block-core.json  | 38 ++
>  include/block/block.h |  1 +
>  include/block/block_int.h |  1 +
>  block.c   |  9 +
>  block/file-posix.c| 38 +++---
>  block/qapi.c  |  5 +
>  6 files changed, 89 insertions(+), 3 deletions(-)

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 55194f84ce..368e09ae37 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -956,6 +956,41 @@
> '*wr_latency_histogram': 'BlockLatencyHistogramInfo',
> '*flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
>  
> +##
> +# @BlockStatsSpecificFile:
> +#
> +# File driver statistics
> +#
> +# @discard-nb-ok: The number of successful discard operations performed by
> +# the driver.
> +#
> +# @discard-nb-failed: The number of failed discard operations performed by
> +# the driver.
> +#
> +# @discard-bytes-ok: The number of bytes discarded by the driver.
> +#
> +# Since: 4.1
> +##
> +{ 'struct': 'BlockStatsSpecificFile',
> +  'data': {
> +  'discard-nb-ok': 'uint64',
> +  'discard-nb-failed': 'uint64',
> +  'discard-bytes-ok': 'uint64' } }
> +
> +##
> +# @BlockStatsSpecific:
> +#
> +# Block driver specific statistics
> +#
> +# Since: 4.1
> +##
> +{ 'union': 'BlockStatsSpecific',
> +  'base': { 'driver': 'BlockdevDriver' },
> +  'discriminator': 'driver',
> +  'data': {
> +  'file': 'BlockStatsSpecificFile',
> +  'host_device': 'BlockStatsSpecificFile' } }

I would like to use these chance to complain that I find this awkward.
My problem is that I don’t know how any management application is
supposed to reasonably consume this.  It feels weird to potentially have
to recognize the result for every block driver.

I would now like to note that I’m clearly not in a position to block
this at this point, because I’ve had a year to do so, I didn’t, so it
would be unfair to do it now.

(Still, I feel like if I have a concern, I should raise it, even if it’s
too late.)

I know Markus has proposed this, but I don’t understand why.  He set
ImageInfoSpecific as a precedence, but that has a different reasoning
behind it.  The point for that is that it simply doesn’t work any other
way, because it is clearly format-specific information that cannot be
shared between drivers.  Anything that can be shared is put into
ImageInfo (like the cluster size).

We have the same constellation here, BlockStats contains common stuff,
and BlockStatsSpecific would contain driver-specific stuff.  But to me,
BlockStatsSpecificFile doesn’t look very special.  It looks like it just
duplicates fields that already exist in BlockDeviceStats.

(Furthermore, most of ImageInfoSpecific is actually not useful to
management software, but only as an information for humans (and having
such a structure for that is perfectly fine).  But these stats don’t
really look like something for immediate human consumption.)

So I wonder why you don’t just put this information into
BlockDeviceStats.  From what I can tell looking at
bdrv_query_bds_stats() and qmp_query_blockstats(), the @stats field is
currently completely 0 if @query-nodes is true.

(Furthermore, I wonder whether it would make sense to re-add
BlockAcctStats to each BDS and then let the generic block code do the
accounting on it.  I moved it to the BB in 7f0e9da6f13 because we didn’t
care about node-level information at the time, but maybe it’s time to
reconsider.)

Anyway, as I’ve said, I fully understand that complaining about a design
decision is just unfair at this point, so this is not a veto.

> +
>  ##
>  # @BlockStats:
>  #
> @@ -971,6 +1006,8 @@
>  #
>  # @stats:  A @BlockDeviceStats for the device.
>  #
> +# @driver-specific: Optional driver-specific stats. (Since 4.1)
> +#
>  # @parent: This describes the file block device if it has one.
>  #  Contains recursively the statistics of the underlying
>  #  protocol (e.g. the host file for a qcow2 image). If there is
> @@ -984,6 +1021,7 @@
>  { 'struct': 'BlockStats',
>'data': {'*device': 'str', '*qdev': 'str', '*node-name': 'str',
> 'stats': 'BlockDeviceStats',
> +   '*driver-specific': 'BlockStatsSpecific',
> '*parent': 'BlockStats',
> '*backing': 'BlockStats'} }
>  

[...]

> diff --git a/block/file-posix.c b/block/file-posix.c
> index 76d54b3a85..a2f01cfe10 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -160,9 +160,9 @@ typedef struct BDRVRawState {
>  bool drop_cache;
>  bool check_cache_dropped;
>  struct {
> -int64_t discard_nb_ok;
> -

Re: [Qemu-devel] [Qemu-block] [PATCH v2 3/5] iotests: Allow skipping test cases

2019-08-12 Thread John Snow




On 6/25/19 5:19 PM, Max Reitz wrote:
> case_notrun() does not actually skip the current test case.  It just
> adds a "notrun" note and then returns to the caller, who manually has to
> skip the test.  Generally, skipping a test case is as simple as
> returning from the current function, but not always: For example, this
> model does not allow skipping tests already in the setUp() function.
> 
> Thus, add a QMPTestCase.case_skip() function that invokes case_notrun()
> and then self.skipTest().  To make this work, we need to filter the
> information on how many test cases were skipped from the unittest
> output.
> 

I feel like skipping sub-tests has been a point of contention before,
but I don't recall the outcome.

Cleber, can you help jog my memory?

> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/iotests.py | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 3ecef5bc90..1b0ac50153 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -741,6 +741,11 @@ class QMPTestCase(unittest.TestCase):
>  return self.pause_wait(job_id)
>  return result
>  
> +def case_skip(self, reason):
> +'''Skip this test case'''
> +case_notrun(reason)
> +self.skipTest(reason)
> +
>  
>  def notrun(reason):
>  '''Skip this test suite'''
> @@ -752,7 +757,10 @@ def notrun(reason):
>  sys.exit(0)
>  
>  def case_notrun(reason):
> -'''Skip this test case'''
> +'''Mark this test case as not having been run, but do not actually
> +skip it; that is left to the caller.  See QMPTestCase.case_skip()
> +for a variant that actually skips the current test case.'''
> +
>  # Each test in qemu-iotests has a number ("seq")
>  seq = os.path.basename(sys.argv[0])
>  
> @@ -879,4 +887,12 @@ def main(supported_fmts=[], supported_oses=['linux'], 
> supported_cache_modes=[],
>  unittest.main(testRunner=MyTestRunner)
>  finally:
>  if not debug:
> -sys.stderr.write(re.sub(r'Ran (\d+) tests? in [\d.]+s', r'Ran \1 
> tests', output.getvalue()))
> +out = output.getvalue()
> +out = re.sub(r'Ran (\d+) tests? in [\d.]+s', r'Ran \1 tests', 
> out)
> +
> +# Hide skipped tests from the reference output
> +out = re.sub(r'OK \(skipped=\d+\)', 'OK', out)
> +out_first_line, out_rest = out.split('\n', 1)
> +out = out_first_line.replace('s', '.') + '\n' + out_rest
> +
> +sys.stderr.write(out)
> 

-- 
—js

Re: [Qemu-devel] [Qemu-block] [PATCH v2 2/5] iotests: Prefer null-co over null-aio

2019-08-12 Thread John Snow




On 6/25/19 5:19 PM, Max Reitz wrote:
> We use null-co basically everywhere in the iotests.  Unless we want to
> test null-aio specifically, we should use it instead (for consistency).
> 
> Signed-off-by: Max Reitz 

Probably fine.

Reviewed-by: John Snow

Re: [Qemu-devel] [Qemu-block] [PATCH] qemu-img convert: Deprecate using -n and -o together

2019-08-12 Thread John Snow




On 8/9/19 5:11 AM, Kevin Wolf wrote:
> bdrv_create options specified with -o have no effect when skipping image
> creation with -n, so this doesn't make sense. Warn against the misuse
> and deprecate the combination so we can make it a hard error later.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img.c   | 5 +
>  qemu-deprecated.texi | 7 +++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 79983772de..d9321f6418 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2231,6 +2231,11 @@ static int img_convert(int argc, char **argv)
>  goto fail_getopt;
>  }
>  
> +if (skip_create && options) {
> +warn_report("-o has no effect when skipping image creation");
> +warn_report("This will become an error in future QEMU versions.");
> +}
> +
>  s.src_num = argc - optind - 1;
>  out_filename = s.src_num >= 1 ? argv[argc - 1] : NULL;
>  
> diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> index fff07bb2a3..7673d079c5 100644
> --- a/qemu-deprecated.texi
> +++ b/qemu-deprecated.texi
> @@ -305,6 +305,13 @@ to just export the entire image and then mount only 
> /dev/nbd0p1 than
>  it is to reinvoke @command{qemu-nbd -c /dev/nbd0} limited to just a
>  subset of the image.
>  
> +@subsection qemu-img convert -n -o (since 4.2.0)
> +
> +All options specified in @option{-o} are image creation options, so they
> +have no effect when used with @option{-n} to skip image creation. This
> +combination never made sense and shows that the user misunderstood the
> +effect of the options, so this will be made an error in future versions.
> +

I would avoid too much finger-wagging here. We can just say that the
combination never had a well-defined behavior, so it will now be treated
as an error.

Otherwise:

Reviewed-by: John Snow

Re: [Qemu-devel] [Qemu-block] [PATCH] iotests: Fix 141 when run with qed

2019-08-12 Thread John Snow




On 8/9/19 2:52 PM, Max Reitz wrote:
> 69f47505ee has changed qcow2 in such a way that the commit job run in
> test 141 (and 144[1]) returns before it emits the READY event.  However,
> 141 also runs with qed, where the order is still the other way around.
> Just filter out the {"return": {}} so the test passes for qed again.
> 
> [1] 144 only runs with qcow2, so it is fine as it is.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
> Signed-off-by: Max Reitz 

Reviewed-by: John Snow

Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] block: Make various formats' block_status recurse again

2019-08-12 Thread John Snow




On 7/25/19 11:55 AM, Max Reitz wrote:
> Hi,
> 
> 69f47505ee66afaa513305de0c1895a224e52c45 changed block_status so that it
> would only go down to the protocol layer if the format layer returned
> BDRV_BLOCK_RECURSE, thus indicating that it has no sufficient
> information whether a given range in the image is zero or not.
> Generally, this is because the image is preallocated and thus all ranges
> appear as zeroes.
> 
> However, it only implemented this preallocation detection for qcow2.
> There are more formats that support preallocation, though: vdi, vhdx,
> vmdk, vpc.  (Funny how they all start with “v”.)
> 
> For vdi, vmdk, and vpc, the fix is rather simple, because they really
> have different subformats depending on whether an image is preallocated
> or not.  This makes the check very simple.
> 
> vhdx is more like qcow2, where after the image has been created, it
> isn’t clear whether it’s been preallocated or everything is allocated
> because everything was already written to.  69f47505ee added a heuristic
> to qcow2 to get around this, but I think that’s too much for vhdx.  I
> just left it unfixed, because I don’t care that much, honestly (and I
> don’t think anyone else does).
> 

What's the practical outcome of that, and is the limitation documented
somewhere?

(I'm fine with not fixing it, I just want it documented somehow.)

> 
> Max Reitz (3):
>   vdi: Make block_status recurse for fixed images
>   vmdk: Make block_status recurse for flat extents
>   vpc: Do not return RAW from block_status
> 
>  block/vdi.c  | 3 ++-
>  block/vmdk.c | 3 +++
>  block/vpc.c  | 2 +-
>  3 files changed, 6 insertions(+), 2 deletions(-)
>

[Qemu-devel] Fwd: Re: [PATCH 4/7] ati-vga: Fix cursor color with guest_hwcursor=true

2019-08-12 Thread Andrew Randrianasulu



--  Пересланное сообщение  --

Тема: Re: [Qemu-devel] [PATCH 4/7] ati-vga: Fix cursor color with 
guest_hwcursor=true
Дата: Понедельник 12 августа 2019
Отправитель: Andrew Randrianasulu 
Получатель:  BALATON Zoltan 

В сообщении от Monday 12 August 2019 13:55:45 BALATON Zoltan написал(а):
> On Mon, 12 Aug 2019, Philippe Mathieu-Daudé wrote:
> > On 8/12/19 12:28 PM, BALATON Zoltan wrote:
> >> On Mon, 12 Aug 2019, Philippe Mathieu-Daudé wrote:
> >>> On 8/11/19 11:14 PM, BALATON Zoltan wrote:
>  Fixes: a38127414bd007c5b6ae64c664d9e8839393277e
>  Signed-off-by: BALATON Zoltan 
>  ---
>  ?hw/display/ati.c | 2 +-
>  ?1 file changed, 1 insertion(+), 1 deletion(-)
> 
>  diff --git a/hw/display/ati.c b/hw/display/ati.c
>  index 699f38223b..b849f5d510 100644
>  --- a/hw/display/ati.c
>  +++ b/hw/display/ati.c
>  @@ -207,7 +207,7 @@ static void ati_cursor_draw_line(VGACommonState
>  *vga, uint8_t *d, int scr_y)
>   }
>   } else {
>   color = (xbits & BIT(7) ? s->regs.cur_color1 :
>  -? s->regs.cur_color0) << 8 |
>  0xff;
>  +? s->regs.cur_color0) |
>  0xff00;
>   }
>   if (vga->hw_cursor_x + i * 8 + j >= h) {
>   return; /* end of screen, don't span to next line */
> 
> >>>
> >>> Reviewed-by: Philippe Mathieu-Daudé 
> >>
> >> Thanks. I've noticed that cursor color may still be wrong with MacOS
> >> that uses big endian frame buffer so maybe this will also need to take
> >> frame buffer endianness into account somehow but this patch corrects a
> >> difference between guest_hwcursor true and false values, reproducible
> >> with some Linux versions (as far as I remember). While the wrong cursor
> >> color with MacOS is now consistent after this patch with both
> >> guest_hwcursor true or false so that likely needs a different fix that
> >> should be applied both to this place and to ati_cursor_define. (MacOS
> >> does not yet boot fully, it needs patches to OpenBIOS to be able to run
> >> an FCode ROM and will probably need the VBlank interrupt implemented in
> >> ati-vga without which it displays a desktop but freezes there).
> >
> > If you remember which Linux version had this problem, or you can link to
> > roms that behave incorrectly, please share, so we can add display
> > regression tests.
> 
> Unfortunately I don't. I think it was Andrew who found this so maybe he 
> can remember.

Blue cursor was seen on specific dvd (x86) image I did for myself, 
because it was  using plain X cursor (arrow or X-shaped), not colored 
versions used by default in many distributions.

may be 'startx -- -retro" will show it briefly too?

from man Xserver (1.19.7):

-retro  starts  the  server  with  the  classic stipple and cursor visible.  
The default is to start with a black root window, and to suppress display of 
the cursor until the
   first time an application calls XDefineCursor(). 


https://yadi.sk/d/F8cbINWzWJ-DkA

users: root and guest
passwords: toor and guest

You need to boot it to syslinux and type 'slax' there, because default will 
boot x86-64 kernel without aty128fb  (my fault)
Unfortunately, i don't have fresh qemu compiled (previous tests were done from 
tmpfs copy of qemu source tree),
 will recompile from git and re-test. from memory, loading aty128fb and
setting config fragment in /etc/X11/xorg.conf.d for EXA AccelMethod and "Option 
"UseFBDev" "1" ' allowed device (ati-vga) to work.

--

Update: qemu commit 864ab314f1d924129d06ac7b571f105a2b76a4b2 (tag: v4.1.0-rc4)
plus patch series by Zoltan 
(https://patchew.org/QEMU/cover.1565558093.git.balaton%40eik.bme.hu/)
actually boots my x86 dvd image with this command:

x86_64-softmmu/qemu-system-x86_64 -m 512 -enable-kvm -device 
ati-vga,guest_hwcursor=true -cdrom /mnt/sdb1/slax-14_06_2019-private0.iso

or
x86_64-softmmu/qemu-system-x86_64 -m 512 -enable-kvm -device 
ati-vga,guest_hwcursor=true -cdrom /mnt/sdb1/slax-14_06_2019-private0.iso 
-display sdl,gl=on

Cursor is normally-colored, but you need something like "xrandr -s 23" because 
default resolution a bit too big. 
(after modprobe aty128fb + usual xorg.conf dance about UseFBDev)

end of update



> 
> (You may also need latest vgabios-ati.bin from Gerd's repo to get Linux 
> drivers load and for rage128p that has to be in pc-bios dir because PCI 
> IDs are only patched in that way.)
> 
> Regards,
> BALATON Zoltan



---

Re: [Qemu-devel] [PATCH v8 4/9] ide: account UNMAP (TRIM) operations

2019-08-12 Thread Max Reitz

On 16.05.19 16:33, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  hw/ide/core.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 6afadf894f..3a7ac93777 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -441,6 +441,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>  TrimAIOCB *iocb = opaque;
>  IDEState *s = iocb->s;
>  
> +if (iocb->i >= 0) {
> +if (ret >= 0) {
> +block_acct_done(blk_get_stats(s->blk), >acct);
> +} else {
> +block_acct_failed(blk_get_stats(s->blk), >acct);

Hmm, in other places (ide_handle_rw_error() here or
scsi_handle_rw_error() in scsi-disk) only report this with
BLOCK_ERROR_ACTION_REPORT.

So I’m wondering whether the same should be done here.

Max

> +}
> +}
> +
>  if (ret >= 0) {
>  while (iocb->j < iocb->qiov->niov) {
>  int j = iocb->j;
> @@ -458,10 +466,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>  }
>  
>  if (!ide_sect_range_ok(s, sector, count)) {
> +block_acct_invalid(blk_get_stats(s->blk), 
> BLOCK_ACCT_UNMAP);
>  iocb->ret = -EINVAL;
>  goto done;
>  }
>  
> +block_acct_start(blk_get_stats(s->blk), >acct,
> + count << BDRV_SECTOR_BITS, 
> BLOCK_ACCT_UNMAP);
> +
>  /* Got an entry! Submit and exit.  */
>  iocb->aiocb = blk_aio_pdiscard(s->blk,
> sector << BDRV_SECTOR_BITS,
> 




signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

Check that it is fixed by previous commit

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/263 | 46 ++
 tests/qemu-iotests/263.out | 12 ++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 59 insertions(+)
 create mode 100755 tests/qemu-iotests/263
 create mode 100644 tests/qemu-iotests/263.out

diff --git a/tests/qemu-iotests/263 b/tests/qemu-iotests/263
new file mode 100755
index 00..dbe38e6c6c
--- /dev/null
+++ b/tests/qemu-iotests/263
@@ -0,0 +1,46 @@
+#!/usr/bin/env python
+#
+# Test mirroring qcow2 under raw-format
+#
+# Copyright (c) 2019 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import iotests
+from iotests import qemu_img_create, qemu_io, filter_qemu_io, file_path, log
+
+iotests.verify_image_format(supported_fmts=['qcow2'])
+
+base, top, target = file_path('base', 'top', 'target')
+size = 1024 * 1024
+
+qemu_img_create('-f', iotests.imgfmt, base, str(size))
+log(filter_qemu_io(qemu_io('-f', iotests.imgfmt,
+   '-c', 'write -P 1 0 1M', base)))
+qemu_img_create('-f', iotests.imgfmt, '-b', base, top, str(size))
+
+vm = iotests.VM().add_drive(None, opts='driver=raw,size=' + str(size) +
+',file.driver=qcow2,file.file.filename=' + top)
+vm.launch()
+
+vm.qmp_log('drive-mirror', device='drive0', target=target, sync='full',
+   filters=[iotests.filter_qmp_testfiles])
+log(vm.event_wait('BLOCK_JOB_READY'), filters=[iotests.filter_qmp_event])
+vm.qmp_log('block-job-complete', device='drive0')
+vm.shutdown()
+
+log(filter_qemu_io(qemu_io('-f', 'raw', '-c', 'read -P 1 0 1M', target)))
+
+log('-- finish --') # avoid extra new-line at the end of .out file
diff --git a/tests/qemu-iotests/263.out b/tests/qemu-iotests/263.out
new file mode 100644
index 00..65e5c812c6
--- /dev/null
+++ b/tests/qemu-iotests/263.out
@@ -0,0 +1,12 @@
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+{"execute": "drive-mirror", "arguments": {"device": "drive0", "sync": "full", 
"target": "TEST_DIR/PID-target"}}
+{"return": {}}
+{"data": {"device": "drive0", "len": 1048576, "offset": 1048576, "speed": 0, 
"type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": 
"USECS", "seconds": "SECS"}}
+{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
+{"return": {}}
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+-- finish --
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index f13e5f2e23..186a0220e5 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -271,3 +271,4 @@
 254 rw backing quick
 255 rw quick
 256 rw quick
+263 rw quick
-- 
2.18.0

[Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
returned file. But is it correct behavior at all? If returned file
itself has a backing file, we may report as totally unallocated and
area which actually has data in bottom backing file.

So, mirroring of qcow2 under raw-format is broken. Which is illustrated
by following commit with a test. Let's make raw-format behave more
correctly returning BDRV_BLOCK_DATA.

Suggested-by: Max Reitz 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/raw-format.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index bffd424dd0..a273ee2387 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -275,7 +275,7 @@ static int coroutine_fn 
raw_co_block_status(BlockDriverState *bs,
 *pnum = bytes;
 *file = bs->file->bs;
 *map = offset + s->offset;
-return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
+return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_RECURSE;
 }
 
 static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
-- 
2.18.0

[Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

Hi all!

I'm not sure, is it a bug or a feature, but using qcow2 under raw is
broken. It should be either fixed like I propose (by Max's suggestion)
or somehow forbidden (just forbid backing-file supporting node to be
file child of raw-format node).

Vladimir Sementsov-Ogievskiy (2):
  block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  iotests: test mirroring qcow2 under raw format

 block/raw-format.c |  2 +-
 tests/qemu-iotests/263 | 46 ++
 tests/qemu-iotests/263.out | 12 ++
 tests/qemu-iotests/group   |  1 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/263
 create mode 100644 tests/qemu-iotests/263.out

-- 
2.18.0

Re: [Qemu-devel] backup bug or question

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

12.08.2019 20:46, John Snow wrote:
> 
> 
> On 8/10/19 7:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 23:13, John Snow wrote:
>>>
>>>
>>> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
 Hi!

 Hmm, hacking around backup I have a question:

 What prevents guest write request after job_start but before setting
 write notifier?

 code path:

 qmp_drive_backup or transaction with backup

   job_start
  aio_co_enter(job_co_entry) /* may only schedule execution, isn't 
 it ? */

 

 job_co_entry
   job_pause_point() /* it definitely yields, isn't it bad? */
   job->driver->run() /* backup_run */

 

 backup_run()
   bdrv_add_before_write_notifier()

 ...

>>>
>>> I think you're right... :(
>>>
>>>
>>> We create jobs like this:
>>>
>>> job->paused= true;
>>> job->pause_count   = 1;
>>>
>>>
>>> And then job_start does this:
>>>
>>> job->co = qemu_coroutine_create(job_co_entry, job);
>>> job->pause_count--;
>>> job->busy = true;
>>> job->paused = false;
>>>
>>>
>>> Which means that job_co_entry is being called before we lift the pause:
>>>
>>> assert(job && job->driver && job->driver->run);
>>> job_pause_point(job);
>>> job->ret = job->driver->run(job, >err);
>>>
>>> ...Which means that we are definitely yielding in job_pause_point.
>>>
>>> Yeah, that's a race condition waiting to happen.
>>>
 And what guarantees we give to the user? Is it guaranteed that write 
 notifier is
 set when qmp command returns?

 And I guess, if we start several backups in a transaction it should be 
 guaranteed
 that the set of backups is consistent and correspond to one point in 
 time...

>>>
>>> I would have hoped that maybe the drain_all coupled with the individual
>>> jobs taking drain_start and drain_end would save us, but I guess we
>>> simply don't have a guarantee that all backup jobs WILL have installed
>>> their handler by the time the transaction ends.
>>>
>>> Or, if there is that guarantee, I don't know what provides it, so I
>>> think we shouldn't count on it accidentally working anymore.
>>>
>>>
>>>
>>> I think we should do two things:
>>>
>>> 1. Move the handler installation to creation time.
>>> 2. Modify backup_before_write_notify to return without invoking
>>> backup_do_cow if the job isn't started yet.
>>>
>>
>> Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
>> guest write, is it?
>>
>>
> 
> The idea is that by installing the write notifier during creation, the
> write notifier can be switched on the instant job_start is created,
> regardless of if we yield in the co_entry shim or not.
> 
> That way, no matter when we yield or when the backup_run coroutine
> actually gets scheduled and executed, the write notifier is active already.
> 
> Or put another way: calling job_start() guarantees that the write
> notifier is active.


Oh, got it, feel stupid)

> 
> I think using filters will save us too, but I don't know how ready those
> are. Do we still want a patch that guarantees this behavior in the meantime?
> 

I think we want of course, will look at it tomorrow.


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH] Add git-publish profile for security bugs

2019-08-12 Thread John Snow




On 8/12/19 3:12 AM, Gerd Hoffmann wrote:
> Simplifies sending security patches to all people listed in
> https://wiki.qemu.org/SecurityProcess.  Should also make it
> harder to send a copy to the mailing list by accident.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  .gitpublish | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/.gitpublish b/.gitpublish
> index a13f8c7c0ecd..55750c45ed89 100644
> --- a/.gitpublish
> +++ b/.gitpublish
> @@ -49,3 +49,14 @@ base = master
>  to = qemu-devel@nongnu.org
>  cc = qemu-...@nongnu.org
>  cccmd = scripts/get_maintainer.pl --noroles --norolestats --nogit 
> --nogit-fallback 2>/dev/null
> +
> +# https://wiki.qemu.org/SecurityProcess
> +[gitpublishprofile "security"]
> +base = master
> +to = m...@redhat.com
> +to = pmato...@redhat.com
> +to = sstabell...@kernel.org
> +to = secal...@redhat.com
> +to = mdr...@linux.vnet.ibm.com
> +to = p...@redhat.com
> +suppresscc = all
> 

Should we force inspect-emails = true here due to the nature of the
security list? That way if we accidentally add extra CCs/etc there's a
chance to review 'em.

Also, should we update MAINTAINERS to match this script?

Responsible Disclosure, Reporting Security Issues
-
W: https://wiki.qemu.org/SecurityProcess
M: Michael S. Tsirkin 
L: secal...@redhat.com


With perhaps a footnote encouraging anyone changing this section to also
update the git-publish script and vice-versa?

--js

Re: [Qemu-devel] [PATCH v8 5/9] scsi: store unmap offset and nb_sectors in request struct

2019-08-12 Thread Max Reitz

On 16.05.19 16:33, Anton Nefedov wrote:
> it allows to report it in the error handler
> 
> Signed-off-by: Anton Nefedov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Alberto Garcia 
> ---
>  hw/scsi/scsi-disk.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)

(Sorry for the late reply :-/)

> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index e7e865ab3b..b43254103c 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -1602,8 +1602,6 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, 
> int ret)
>  {
>  SCSIDiskReq *r = data->r;
>  SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> -uint64_t sector_num;
> -uint32_t nb_sectors;
>  
>  assert(r->req.aiocb == NULL);
>  if (scsi_disk_req_check_error(r, ret, false)) {
> @@ -1611,16 +1609,16 @@ static void scsi_unmap_complete_noio(UnmapCBData 
> *data, int ret)
>  }
>  
>  if (data->count > 0) {
> -sector_num = ldq_be_p(>inbuf[0]);
> -nb_sectors = ldl_be_p(>inbuf[8]) & 0xULL;
> -if (!check_lba_range(s, sector_num, nb_sectors)) {
> +r->sector = ldq_be_p(>inbuf[0]);
> +r->sector_count = ldl_be_p(>inbuf[8]) & 0xULL;
> +if (!check_lba_range(s, r->sector, r->sector_count)) {
>  scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
>  goto done;
>  }
>  
>  r->req.aiocb = blk_aio_pdiscard(s->qdev.conf.blk,
> -sector_num * s->qdev.blocksize,
> -nb_sectors * s->qdev.blocksize,
> +r->sector * s->qdev.blocksize,
> +r->sector_count * s->qdev.blocksize,

This looks to me like these are not necessarily in terms of 512-byte
sectors.  It doesn’t seem to make anything technically wrong, because
patch 7 takes that into account.

But it’s still weird if everything else in this file treats these fields
as being in terms of 512 byte sectors (and they are actually defined
this way in SCSIDiskReq).

Max

>  scsi_unmap_complete, data);
>  data->count--;
>  data->inbuf += 16;
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] backup bug or question

2019-08-12 Thread John Snow




On 8/10/19 7:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 23:13, John Snow wrote:
>>
>>
>> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi!
>>>
>>> Hmm, hacking around backup I have a question:
>>>
>>> What prevents guest write request after job_start but before setting
>>> write notifier?
>>>
>>> code path:
>>>
>>> qmp_drive_backup or transaction with backup
>>>
>>>  job_start
>>> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it 
>>> ? */
>>>
>>> 
>>>
>>> job_co_entry
>>>  job_pause_point() /* it definitely yields, isn't it bad? */
>>>  job->driver->run() /* backup_run */
>>>
>>> 
>>>
>>> backup_run()
>>>  bdrv_add_before_write_notifier()
>>>
>>> ...
>>>
>>
>> I think you're right... :(
>>
>>
>> We create jobs like this:
>>
>> job->paused= true;
>> job->pause_count   = 1;
>>
>>
>> And then job_start does this:
>>
>> job->co = qemu_coroutine_create(job_co_entry, job);
>> job->pause_count--;
>> job->busy = true;
>> job->paused = false;
>>
>>
>> Which means that job_co_entry is being called before we lift the pause:
>>
>> assert(job && job->driver && job->driver->run);
>> job_pause_point(job);
>> job->ret = job->driver->run(job, >err);
>>
>> ...Which means that we are definitely yielding in job_pause_point.
>>
>> Yeah, that's a race condition waiting to happen.
>>
>>> And what guarantees we give to the user? Is it guaranteed that write 
>>> notifier is
>>> set when qmp command returns?
>>>
>>> And I guess, if we start several backups in a transaction it should be 
>>> guaranteed
>>> that the set of backups is consistent and correspond to one point in time...
>>>
>>
>> I would have hoped that maybe the drain_all coupled with the individual
>> jobs taking drain_start and drain_end would save us, but I guess we
>> simply don't have a guarantee that all backup jobs WILL have installed
>> their handler by the time the transaction ends.
>>
>> Or, if there is that guarantee, I don't know what provides it, so I
>> think we shouldn't count on it accidentally working anymore.
>>
>>
>>
>> I think we should do two things:
>>
>> 1. Move the handler installation to creation time.
>> 2. Modify backup_before_write_notify to return without invoking
>> backup_do_cow if the job isn't started yet.
>>
> 
> Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
> guest write, is it?
> 
> 

The idea is that by installing the write notifier during creation, the
write notifier can be switched on the instant job_start is created,
regardless of if we yield in the co_entry shim or not.

That way, no matter when we yield or when the backup_run coroutine
actually gets scheduled and executed, the write notifier is active already.

Or put another way: calling job_start() guarantees that the write
notifier is active.

I think using filters will save us too, but I don't know how ready those
are. Do we still want a patch that guarantees this behavior in the meantime?

--js

[Qemu-devel] [Bug 1839807] Re: Snapshots freeze guest Sabrelite IMX.6 board

2019-08-12 Thread Peter Maydell

The underlying cause of this is that we're not migrating the Secure
banked cp15 register contents. So boards which don't enable TrustZone or
where the guest runs in the NonSecure state (like the virt board, etc)
can save/restore fine, but since the imx6 happens to run the guest in
the Secure state it gets hit by the bug and its MMU setup etc is
completely broken on restore.

This is a bug I knew about (I mentioned it in LP:1739378 comment #5),
but I've forgotten the detail of why it happens and don't seem to have
written it down, so I'm going to have to think it through again...


** Changed in: qemu
   Status: New => Confirmed

** Tags added: arm

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1839807

Title:
  Snapshots freeze guest Sabrelite IMX.6 board

Status in QEMU:
  Confirmed

Bug description:
  Hello,

  I'm trying to take and restore  a snapshot with the whole system state of the 
Sabrelite IMX.6 board running on QEMU with commands savevm/loadvm.
  It seems that I am able to take a snapshot but loading the snapshot fails.

  For comparison I checked out snapshots on 32bit ARM Virt with Debian as well 
as on the Versatilepb board with a bare metal application and it works fine.
  The problem occurs only with that one particular board.

  My environment is:
  Ubuntu 18.04
  QEMU 3.0.1 (I see the same issue in QEMU 4.0.0 as well)
  The kernel and device tree used for the board was 5.1.14 version from 
kernel.org

  The file system was build from imx_v6_v7_defconfig config in buildroot
  as and sd card image.

  Problem:

  Loading snapshot stops the whole machine and it's impossible to resume
  it.

  Steps to reproduce problem:

  1.  I converted the sdcard.img built from the buildroot to qcow2
  using command qemu-img convert -f raw -O qcow2 sdcard.img
  sdcard.qcow2, since the raw doesn't support snapshots.

  2.  I start QEMU with a command
  ./arm-softmmu/qemu-system-arm -m 512 -M sabrelite -kernel zImage -append 
"rootfstype=ext4 root=/dev/mmcblk2p2 rw rootwait" -rtc base=localtime,clock=vm 
-dtb imx6dl-sabresd.dtb -drive file=sdcard.qcow2,index=2,format=qcow2,id=mycard 
-device sd-card,drive=mycard -nographic -net nic -net user

  3.  I run a simple program which print characters to the console
  in the background and add some files in user directory, to differ from
  original image.

  4.  I switch to QEMU monitor, and type “savevm ”.
  When I type “info snapshots”, the snapshot is listed.
  So I assume it was saved correctly.

  5.  Then I switch back to Linux console from monitor, remove the
  added files and stop the background printing process.

  6.  I switch back to monitor and I'm trying now to load the
  snapshot by “loadvm ” command.

  That’s where the problem occurs. QEMU stops and I can't switch back from 
monitor to Linux.
  Typing “cont” doesn’t help.
  It seems like the simulation has freezed. CPU usage on my Laptop machine 
equals 100% until I exit QEMU.

  
  What’s interesting when I exit the QEMU and then start it again the Linux 
boots and after it reaches the command prompt I can see the files which were 
removed after saving the snapshot.

  It looks like loading the snapshots works for restoring disk space but
  it fails for restoring the running processes.

  Due to the answer on QEMU mailing list
  (https://lists.nongnu.org/archive/html/qemu-
  discuss/2019-08/msg00016.html) it is QEMUs bug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1839807/+subscriptions

Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

12.08.2019 16:09, Max Reitz wrote:
> On 10.08.19 18:41, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 19:13, Max Reitz wrote:
>>> If the driver does not implement bdrv_get_allocated_file_size(), we
>>> should fall back to cumulating the allocated size of all non-COW
>>> children instead of just bs->file.
>>>
>>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>>> Signed-off-by: Max Reitz 
>>> ---
>>>block.c | 22 --
>>>1 file changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index 1070aa1ba9..6e1ddab056 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -4650,9 +4650,27 @@ int64_t 
>>> bdrv_get_allocated_file_size(BlockDriverState *bs)
>>>if (drv->bdrv_get_allocated_file_size) {
>>>return drv->bdrv_get_allocated_file_size(bs);
>>>}
>>> -if (bs->file) {
>>> -return bdrv_get_allocated_file_size(bs->file->bs);
>>> +
>>> +if (!QLIST_EMPTY(>children)) {
>>> +BdrvChild *child;
>>> +int64_t child_size, total_size = 0;
>>> +
>>> +QLIST_FOREACH(child, >children, next) {
>>> +if (child == bdrv_filtered_cow_child(bs)) {
>>> +/* Ignore COW backing files */
>>> +continue;
>>> +}
>>> +
>>> +child_size = bdrv_get_allocated_file_size(child->bs);
>>> +if (child_size < 0) {
>>> +return child_size;
>>> +}
>>> +total_size += child_size;
>>> +}
>>> +
>>> +return total_size;
>>>}
>>> +
>>>return -ENOTSUP;
>>>}
>>>
>>>
>>
>> Hmm..
>>
>> 1. No children -> -ENOTSUP
>> 2. Only cow child -> 0
>> 3. Some non-cow children -> SUM
>>
>> It's all arguable (the strictest way is -ENOTSUP in either case),
>> but if we want to fallback to SUM of non-cow children, 1. and 2. should 
>> return
>> the same.
> 
> I don’t think 2 is possible at all.  If you have a COW child, you need
> some other child to COW to.
> 
> And in the weird (and probably impossible) case where a node really only
> has a COW child, I’d say it’s correct that it has a disk size of 0 –
> because it hasn’t COWed anything yet.  (Just like a new qcow2 image with
> a backing file only has its metadata as its disk size.)
> 

Agreed. Then, why not return 0 on [1] ?

Also, another idea: shouldn't we return 0 for filters, i.e. skip 
filtered_rw_child too?
[as filtered-child is more like backing child than file one, it's "less owned" 
by its parent]

with or without any of these suggestions:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir

Re: [Qemu-devel] backup bug or question

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

12.08.2019 19:49, Kevin Wolf wrote:
> Am 12.08.2019 um 18:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 12.08.2019 16:23, Kevin Wolf wrote:
>>> Am 09.08.2019 um 15:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
 Hi!

 Hmm, hacking around backup I have a question:

 What prevents guest write request after job_start but before setting
 write notifier?

 code path:

 qmp_drive_backup or transaction with backup

   job_start
  aio_co_enter(job_co_entry) /* may only schedule execution, isn't 
 it ? */

 

 job_co_entry
   job_pause_point() /* it definitely yields, isn't it bad? */
   job->driver->run() /* backup_run */

 

 backup_run()
   bdrv_add_before_write_notifier()

 ...

 And what guarantees we give to the user? Is it guaranteed that write 
 notifier is
 set when qmp command returns?

 And I guess, if we start several backups in a transaction it should be 
 guaranteed
 that the set of backups is consistent and correspond to one point in 
 time...
>>>
>>> Do the patches to switch backup to a filter node solve this
>>> automatically because that node would be inserted in
>>> backup_job_create()?
>>>
>>
>> Hmm, great, looks like they should. At least it moves scope of the
>> problem to do_drive_backup and do_blockdev_backup functions..
>>
>> Am I right that aio_context_acquire/aio_context_release guarantees no
>> new request created during the section? Or should we add
>> drained_begin/drained_end pair, or at least drain() at start of
>> qmp_blockdev_backup and qmp_drive_backup?
> 
> Holding the AioContext lock should be enough for this.
> 
> But note that it doesn't make a difference if new requests are actually
> incoming. The timing of the QMP command to start a backup job versus the
> timing of guest requests is essentially random. QEMU doesn't know what
> guest requests you mean to be included in the backup and which you don't
> unless you stop sending new requests well ahead of time.
> 
> If you send a QMP request to start a backup, the backup will be
> consistent for some arbitrary point in time between the time that you
> sent the QMP request and the time that you received the reply to it.
> 
> Draining in the QMP command handler wouldn't change any of this, because
> even the drain section starts at some arbitrary point in time.

Hmm and it don't guarantee even that requests started before qmp command are
taken into backup, as they may be started in guest point of view, but not yet
in QEMU..

> 
>> Assume scenario like the this,
>>
>> 1. fsfreeze
>> 2. qmp backup
>> 3. fsthaw
>>
>> to make sure that backup starting point is consistent. So in our qmp command 
>> we should:
>> 1. complete all current requests to make drives corresponding to fsfreeze 
>> point
>> 2. initialize write-notifiers or filter before any new guest request, i.e. 
>> before fsthaw,
>> i.e. before qmp command return.
> 
> If I understand correctly, fsfreeze only returns success after it has
> made sure that the guest has quiesced the device. So at any point
> between receiving the successful return of the fsfreeze and calling
> fsthaw, the state should be consistent.
> 
>> Transactions should be OK, as they use drained_begin/drained_end
>> pairs, and additional aio_context_acquire/aio_context_release pairs.
> 
> Here, draining is actually important because you don't synchronise
> against something external that you don't control anyway, but you just
> make sure that you start the backup of all disks at the same point in
> time (which is still an arbitrary point between the time that you send
> the transaction QMP command and the time that you receive success), even
> if no fsfreeze/fsthaw was used.
> 
> Kevin
> 

OK, thanks for explanation!

-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH 3/3] vpc: Do not return RAW from block_status

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

12.08.2019 18:56, Max Reitz wrote:
> On 12.08.19 17:33, Vladimir Sementsov-Ogievskiy wrote:
>> 25.07.2019 18:55, Max Reitz wrote:
>>> vpc is not really a passthrough driver, even when using the fixed
>>> subformat (where host and guest offsets are equal).  It should handle
>>> preallocation like all other drivers do, namely by returning
>>> DATA | RECURSE instead of RAW.
>>>
>>> There is no tangible difference but the fact that bdrv_is_allocated() no
>>> longer falls through to the protocol layer.
>>
>> Hmm. Isn't a real bug (fixed by this patch) ?
>>
>> Assume vpc->file is qcow2 with backing, which have "unallocated" region, 
>> which is
>> backed by actual data in backing file.
> 
> Come on now.
> 
>> So, this region will be reported as not allocated and will be skipped by any 
>> copying
>> loop using block-status? Is it a bug of BDRV_BLOCK_RAW itself? Or I don't 
>> understand
>> something..
> 
> I think what you don’t understand is that if you have a vpc file inside
> of a qcow2 file, you’re doing basically everything wrong. ;-)
> 
> But maybe we should drop BDRV_BLOCK_RAW...  Does it do anything good for
> us in the raw driver?  Shouldn’t it too just return DATA | RECURSE?
> 

And if I have raw driver above qcow2, it will not work, like I've described 
above..


-- 
Best regards,
Vladimir

Re: [Qemu-devel] backup bug or question

2019-08-12 Thread Kevin Wolf

Am 12.08.2019 um 18:09 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 12.08.2019 16:23, Kevin Wolf wrote:
> > Am 09.08.2019 um 15:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> Hi!
> >>
> >> Hmm, hacking around backup I have a question:
> >>
> >> What prevents guest write request after job_start but before setting
> >> write notifier?
> >>
> >> code path:
> >>
> >> qmp_drive_backup or transaction with backup
> >>
> >>  job_start
> >> aio_co_enter(job_co_entry) /* may only schedule execution, isn't 
> >> it ? */
> >>
> >> 
> >>
> >> job_co_entry
> >>  job_pause_point() /* it definitely yields, isn't it bad? */
> >>  job->driver->run() /* backup_run */
> >>
> >> 
> >>
> >> backup_run()
> >>  bdrv_add_before_write_notifier()
> >>
> >> ...
> >>
> >> And what guarantees we give to the user? Is it guaranteed that write 
> >> notifier is
> >> set when qmp command returns?
> >>
> >> And I guess, if we start several backups in a transaction it should be 
> >> guaranteed
> >> that the set of backups is consistent and correspond to one point in 
> >> time...
> > 
> > Do the patches to switch backup to a filter node solve this
> > automatically because that node would be inserted in
> > backup_job_create()?
> > 
> 
> Hmm, great, looks like they should. At least it moves scope of the
> problem to do_drive_backup and do_blockdev_backup functions..
> 
> Am I right that aio_context_acquire/aio_context_release guarantees no
> new request created during the section? Or should we add
> drained_begin/drained_end pair, or at least drain() at start of
> qmp_blockdev_backup and qmp_drive_backup?

Holding the AioContext lock should be enough for this.

But note that it doesn't make a difference if new requests are actually
incoming. The timing of the QMP command to start a backup job versus the
timing of guest requests is essentially random. QEMU doesn't know what
guest requests you mean to be included in the backup and which you don't
unless you stop sending new requests well ahead of time.

If you send a QMP request to start a backup, the backup will be
consistent for some arbitrary point in time between the time that you
sent the QMP request and the time that you received the reply to it.

Draining in the QMP command handler wouldn't change any of this, because
even the drain section starts at some arbitrary point in time.

> Assume scenario like the this,
> 
> 1. fsfreeze
> 2. qmp backup
> 3. fsthaw
> 
> to make sure that backup starting point is consistent. So in our qmp command 
> we should:
> 1. complete all current requests to make drives corresponding to fsfreeze 
> point
> 2. initialize write-notifiers or filter before any new guest request, i.e. 
> before fsthaw,
> i.e. before qmp command return.

If I understand correctly, fsfreeze only returns success after it has
made sure that the guest has quiesced the device. So at any point
between receiving the successful return of the fsfreeze and calling
fsthaw, the state should be consistent.

> Transactions should be OK, as they use drained_begin/drained_end
> pairs, and additional aio_context_acquire/aio_context_release pairs.

Here, draining is actually important because you don't synchronise
against something external that you don't control anyway, but you just
make sure that you start the backup of all disks at the same point in
time (which is still an arbitrary point between the time that you send
the transaction QMP command and the time that you receive success), even
if no fsfreeze/fsthaw was used.

Kevin

Re: [Qemu-devel] [PATCH] linux-user: Add AT_HWCAP2 for aarch64-linux-user

2019-08-12 Thread Peter Maydell

On Fri, 9 Aug 2019 at 18:11, Richard Henderson
 wrote:
>
> Add the HWCAP2_* bits from kernel version v5.3-rc3.
> Enable the bits corresponding to ARMv8.5-CondM and ARMv8.5-FRINT.
>
> Signed-off-by: Richard Henderson 
> ---
>  linux-user/elfload.c | 31 +++
>  1 file changed, 27 insertions(+), 4 deletions(-)
> ---
>
> The HWCAP2_FLAGM2 and HWCAP2_FRINT bits came in during the
> last merge window and will be in the upcoming v5.3 release.
> We don't yet implement any of the other extensions that make
> up the rest of the HWCAP2 bits.

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

12.08.2019 19:11, Max Reitz wrote:
> On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>> 12.08.2019 18:10, Max Reitz wrote:
>>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
 backup_cow_with_offload can transfer more than one cluster. Let
 backup_cow_with_bounce_buffer behave similarly. It reduces the number
 of IO requests, since there is no need to copy cluster by cluster.

 Logic around bounce_buffer allocation changed: we can't just allocate
 one-cluster-sized buffer to share for all iterations. We can't also
 allocate buffer of full-request length it may be too large, so
 BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
 is to allocate a buffer sufficient to handle all remaining iterations
 at the point where we need the buffer for the first time.

 Bonus: get rid of pointer-to-pointer.

 Signed-off-by: Vladimir Sementsov-Ogievskiy 
 ---
block/backup.c | 65 +++---
1 file changed, 41 insertions(+), 24 deletions(-)

 diff --git a/block/backup.c b/block/backup.c
 index d482d93458..65f7212c85 100644
 --- a/block/backup.c
 +++ b/block/backup.c
 @@ -27,6 +27,7 @@
#include "qemu/error-report.h"

#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)

typedef struct CowRequest {
int64_t start_byte;
 @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
qemu_co_queue_restart_all(>wait_queue);
}

 -/* Copy range to target with a bounce buffer and return the bytes copied. 
 If
 - * error occurred, return a negative error number */
 +/*
 + * Copy range to target with a bounce buffer and return the bytes copied. 
 If
 + * error occurred, return a negative error number
 + *
 + * @bounce_buffer is assumed to enough to store
>>>
>>> s/to/to be/
>>>
 + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
 + */
static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob 
 *job,
  int64_t start,
  int64_t end,
  bool 
 is_write_notifier,
  bool 
 *error_is_read,
 -  void 
 **bounce_buffer)
 +  void *bounce_buffer)
{
int ret;
BlockBackend *blk = job->common.blk;
 -int nbytes;
 +int nbytes, remaining_bytes;
int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;

assert(QEMU_IS_ALIGNED(start, job->cluster_size));
 -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
 -nbytes = MIN(job->cluster_size, job->len - start);
 -if (!*bounce_buffer) {
 -*bounce_buffer = blk_blockalign(blk, job->cluster_size);
 -}
 +bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
 +nbytes = MIN(end - start, job->len - start);

 -ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
 -if (ret < 0) {
 -trace_backup_do_cow_read_fail(job, start, ret);
 -if (error_is_read) {
 -*error_is_read = true;
 +
 +remaining_bytes = nbytes;
 +while (remaining_bytes) {
 +int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
 +
 +ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
 +if (ret < 0) {
 +trace_backup_do_cow_read_fail(job, start, ret);
 +if (error_is_read) {
 +*error_is_read = true;
 +}
 +goto fail;
}
 -goto fail;
 -}

 -ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
 -job->write_flags);
 -if (ret < 0) {
 -trace_backup_do_cow_write_fail(job, start, ret);
 -if (error_is_read) {
 -*error_is_read = false;
 +ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
 +job->write_flags);
 +if (ret < 0) {
 +trace_backup_do_cow_write_fail(job, start, ret);
 +if (error_is_read) {
 +*error_is_read = false;
 +}
 +goto fail;
}
 -goto fail;
 +
 +start += chunk;
 +remaining_bytes -= chunk;
}

return nbytes;
 @@ -301,9 +313,14 @@ static int

Re: [Qemu-devel] [PATCH 0/1] display/bochs: fix pcie support (qemu security issue)

2019-08-12 Thread Peter Maydell

On Mon, 12 Aug 2019 at 16:48, Alex Williamson
 wrote:
>
> On Mon, 12 Aug 2019 16:38:05 +0100
> Peter Maydell  wrote:
>
> > On Mon, 12 Aug 2019 at 16:35, Alex Williamson
> >  wrote:
> > > Quoting new commit log:
> > >
> > > This makes sure the pci config space allocation is big enough,
> > > so accessing the PCIe extended config space doesn't overflow
> > > the pci config space buffer.
> > >
> > > PCI(e) config space is guest writable.  Writes are limited
> > > bywrite mask (which probably is also filled with random stuff),
> > > so the guest can only flip enabled bits.  But I suspect it
> > > still might be exploitable, so rather serious because it might
> > > be a host escape for the guest.  On the other hand the device
> > > is probably not yet in widespread use.
> > >
> > > Mitigation: use "-device bochs-display" as conventional pci
> > > device only.
> > >
> > > Is it clear to others that this mitigation remark seems to be
> > > referencing an alternative configuration constraint to avoid the issue
> > > rather than what's actually implemented in this patch?  IOW, if we
> > > never place the bochs-display device into a PCIe hierarchy, then
> > > extended config space is never accessible to the guest anyway, and
> > > there is no issue.  I think this was meant to be an alternative to the
> > > patch but the enforcement of that would happen above QEMU, probably why
> > > it was mentioned in the cover letter rather than the original commit
> > > log.  Thanks,
> >
> > Yeah, that's unclear in retrospect. How about:
> >
> > # (For a QEMU version without this commit, a mitigation for the
> > # bug is available: use "-device bochs-display" as a conventional pci
> > # device only.)
>
> Yes, better.  Thanks,

Cool. Updated commit message now pushed to master.

-- PMM

Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier

2019-08-12 Thread Alex Williamson

On Mon, 12 Aug 2019 09:45:27 +0200
Peter Xu  wrote:

> This is a RFC series.
> 
> The VT-d code has some defects, one of them is that we cannot detect
> the misuse of vIOMMU and vfio-pci early enough.
> 
> For example, logically this is not allowed:
> 
>   -device intel-iommu,caching-mode=off \
>   -device vfio-pci,host=05:00.0

Do we require intel-iommu with intremap=on in order to get x2apic for
large vCPU count guests?  If so, wouldn't it be a valid configuration
for the user to specify:

   -device intel-iommu,caching-mode=off,intremap=on \
   -device vfio-pci,host=05:00.0

so long as they never have any intention of the guest enabling DMA
translation?  Would there be any advantage to this config versus
caching-mode=on?  I suspect the overhead of CM=1 when only using
interrupt remapping is small to non-existent, but are there other
reasons for running with CM=0, perhaps guest drivers not supporting it?

I like the idea of being able to nak an incompatible hot-add rather
than kill the VM, we could narrow that even further to look at not only
whether caching-mode support is enabled, but also whether translation
is enabled on the vIOMMU.  Ideally we might disallow the guest from
enabling translation in such a configuration, but the Linux code is not
written with the expectation that the hardware can refuse to enable
translation and there are no capability bits to remove the DMA
translation capability of the IOMMU.  Still, we might want to think
about which is the better user experience, to have the guest panic when
DMA_GSTS_TES never becomes set (as it seems Linux would do) or to have
QEMU exit, or as proposed here, prevent all configurations where this
might occur.  Thanks,

Alex

> Because the caching mode is required to make vfio-pci devices
> functional.
> 
> Previously we did this sanity check in vtd_iommu_notify_flag_changed()
> as when the memory regions change their attributes.  However that's
> too late in most cases!  Because the memory region layouts will only
> change after IOMMU is enabled, and that's in most cases during the
> guest OS boots.  So when the configuration is wrong, we will only bail
> out during the guest boots rather than simply telling the user before
> QEMU starts.
> 
> The same problem happens on device hotplug, say, when we have this:
> 
>   -device intel-iommu,caching-mode=off
> 
> Then we do something like:
> 
>   (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1
> 
> If at that time the vIOMMU is enabled in the guest then the QEMU
> process will simply quit directly due to this hotplug event.  This is
> a bit insane...
> 
> This series tries to solve above two problems by introducing two
> sanity checks upon these places separately:
> 
>   - machine done
>   - hotplug device
> 
> This is a bit awkward but I hope this could be better than before.
> There is of course other solutions like hard-code the check into
> vfio-pci but I feel it even more unpretty.  I didn't think out any
> better way to do this, if there is please kindly shout out.
> 
> Please have a look to see whether this would be acceptable, thanks.
> 
> Peter Xu (4):
>   intel_iommu: Sanity check vfio-pci config on machine init done
>   qdev/machine: Introduce hotplug_allowed hook
>   pc/q35: Disallow vfio-pci hotplug without VT-d caching mode
>   intel_iommu: Remove the caching-mode check during flag change
> 
>  hw/core/qdev.c | 17 +
>  hw/i386/intel_iommu.c  | 40 ++--
>  hw/i386/pc.c   | 21 +
>  include/hw/boards.h|  9 +
>  include/hw/qdev-core.h |  1 +
>  qdev-monitor.c |  7 +++
>  6 files changed, 89 insertions(+), 6 deletions(-)
>

Re: [Qemu-devel] [PATCH v3 1/2] bitmap: get last word mask from nr directly

2019-08-12 Thread Paolo Bonzini

On 18/07/19 03:04, Wei Yang wrote:
> The value left in nr is the number of bits for the last word, which
> could be calculate the last word mask directly.
> 
> Remove the unnecessary size.

Hi,

the value left in nr is _not_ the number of bits for the last word if
the start and the end are in the same word.  For example, if start %
BITS_PER_LONG was 3 and nr == 1, you'd have:

- before the patch BITMAP_LAST_WORD_MASK(4)

- after the patch BITMAP_LAST_WORD_MASK(1)

Paolo

> Signed-off-by: Wei Yang 
> 
> ---
> v2: refine bitmap_set_atomic too, suggested from Peter
> ---
>  util/bitmap.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/util/bitmap.c b/util/bitmap.c
> index 1753ff7f5b..5b15249796 100644
> --- a/util/bitmap.c
> +++ b/util/bitmap.c
> @@ -160,7 +160,6 @@ int slow_bitmap_andnot(unsigned long *dst, const unsigned 
> long *bitmap1,
>  void bitmap_set(unsigned long *map, long start, long nr)
>  {
>  unsigned long *p = map + BIT_WORD(start);
> -const long size = start + nr;
>  int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
>  unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
>  
> @@ -174,7 +173,7 @@ void bitmap_set(unsigned long *map, long start, long nr)
>  p++;
>  }
>  if (nr) {
> -mask_to_set &= BITMAP_LAST_WORD_MASK(size);
> +mask_to_set &= BITMAP_LAST_WORD_MASK(nr);
>  *p |= mask_to_set;
>  }
>  }
> @@ -182,7 +181,6 @@ void bitmap_set(unsigned long *map, long start, long nr)
>  void bitmap_set_atomic(unsigned long *map, long start, long nr)
>  {
>  unsigned long *p = map + BIT_WORD(start);
> -const long size = start + nr;
>  int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
>  unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
>  
> @@ -208,7 +206,7 @@ void bitmap_set_atomic(unsigned long *map, long start, 
> long nr)
>  
>  /* Last word */
>  if (nr) {
> -mask_to_set &= BITMAP_LAST_WORD_MASK(size);
> +mask_to_set &= BITMAP_LAST_WORD_MASK(nr);
>  atomic_or(p, mask_to_set);
>  } else {
>  /* If we avoided the full barrier in atomic_or(), issue a
> @@ -221,7 +219,6 @@ void bitmap_set_atomic(unsigned long *map, long start, 
> long nr)
>  void bitmap_clear(unsigned long *map, long start, long nr)
>  {
>  unsigned long *p = map + BIT_WORD(start);
> -const long size = start + nr;
>  int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
>  unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
>  
> @@ -235,7 +232,7 @@ void bitmap_clear(unsigned long *map, long start, long nr)
>  p++;
>  }
>  if (nr) {
> -mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
> +mask_to_clear &= BITMAP_LAST_WORD_MASK(nr);
>  *p &= ~mask_to_clear;
>  }
>  }
>

Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

2019-08-12 Thread Max Reitz

On 12.08.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 18:10, Max Reitz wrote:
>> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>>> backup_cow_with_offload can transfer more than one cluster. Let
>>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>>> of IO requests, since there is no need to copy cluster by cluster.
>>>
>>> Logic around bounce_buffer allocation changed: we can't just allocate
>>> one-cluster-sized buffer to share for all iterations. We can't also
>>> allocate buffer of full-request length it may be too large, so
>>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>>> is to allocate a buffer sufficient to handle all remaining iterations
>>> at the point where we need the buffer for the first time.
>>>
>>> Bonus: get rid of pointer-to-pointer.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/backup.c | 65 +++---
>>>   1 file changed, 41 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index d482d93458..65f7212c85 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -27,6 +27,7 @@
>>>   #include "qemu/error-report.h"
>>>   
>>>   #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>>   
>>>   typedef struct CowRequest {
>>>   int64_t start_byte;
>>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>>   qemu_co_queue_restart_all(>wait_queue);
>>>   }
>>>   
>>> -/* Copy range to target with a bounce buffer and return the bytes copied. 
>>> If
>>> - * error occurred, return a negative error number */
>>> +/*
>>> + * Copy range to target with a bounce buffer and return the bytes copied. 
>>> If
>>> + * error occurred, return a negative error number
>>> + *
>>> + * @bounce_buffer is assumed to enough to store
>>
>> s/to/to be/
>>
>>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>>> + */
>>>   static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>>> int64_t start,
>>> int64_t end,
>>> bool 
>>> is_write_notifier,
>>> bool *error_is_read,
>>> -  void **bounce_buffer)
>>> +  void *bounce_buffer)
>>>   {
>>>   int ret;
>>>   BlockBackend *blk = job->common.blk;
>>> -int nbytes;
>>> +int nbytes, remaining_bytes;
>>>   int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>>   
>>>   assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>> -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>>> -nbytes = MIN(job->cluster_size, job->len - start);
>>> -if (!*bounce_buffer) {
>>> -*bounce_buffer = blk_blockalign(blk, job->cluster_size);
>>> -}
>>> +bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>>> +nbytes = MIN(end - start, job->len - start);
>>>   
>>> -ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>>> -if (ret < 0) {
>>> -trace_backup_do_cow_read_fail(job, start, ret);
>>> -if (error_is_read) {
>>> -*error_is_read = true;
>>> +
>>> +remaining_bytes = nbytes;
>>> +while (remaining_bytes) {
>>> +int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>>> +
>>> +ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>>> +if (ret < 0) {
>>> +trace_backup_do_cow_read_fail(job, start, ret);
>>> +if (error_is_read) {
>>> +*error_is_read = true;
>>> +}
>>> +goto fail;
>>>   }
>>> -goto fail;
>>> -}
>>>   
>>> -ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>>> -job->write_flags);
>>> -if (ret < 0) {
>>> -trace_backup_do_cow_write_fail(job, start, ret);
>>> -if (error_is_read) {
>>> -*error_is_read = false;
>>> +ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>>> +job->write_flags);
>>> +if (ret < 0) {
>>> +trace_backup_do_cow_write_fail(job, start, ret);
>>> +if (error_is_read) {
>>> +*error_is_read = false;
>>> +}
>>> +goto fail;
>>>   }
>>> -goto fail;
>>> +
>>> +start += chunk;
>>> +remaining_bytes -= chunk;
>>>   }
>>>   
>>>   return nbytes;
>>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
>>> *job,
>>>   }
>>>   }
>>>   if (!job->use_copy_range) {
>>> +if (!bounce_buffer) {
>>> +size_t len

Re: [Qemu-devel] backup bug or question

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

12.08.2019 16:23, Kevin Wolf wrote:
> Am 09.08.2019 um 15:18 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi!
>>
>> Hmm, hacking around backup I have a question:
>>
>> What prevents guest write request after job_start but before setting
>> write notifier?
>>
>> code path:
>>
>> qmp_drive_backup or transaction with backup
>>
>>  job_start
>> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it 
>> ? */
>>
>> 
>>
>> job_co_entry
>>  job_pause_point() /* it definitely yields, isn't it bad? */
>>  job->driver->run() /* backup_run */
>>
>> 
>>
>> backup_run()
>>  bdrv_add_before_write_notifier()
>>
>> ...
>>
>> And what guarantees we give to the user? Is it guaranteed that write 
>> notifier is
>> set when qmp command returns?
>>
>> And I guess, if we start several backups in a transaction it should be 
>> guaranteed
>> that the set of backups is consistent and correspond to one point in time...
> 
> Do the patches to switch backup to a filter node solve this
> automatically because that node would be inserted in
> backup_job_create()?
> 

Hmm, great, looks like they should. At least it moves scope of the problem to 
do_drive_backup
and do_blockdev_backup functions..

Am I right that aio_context_acquire/aio_context_release guarantees no new 
request created during
the section? Or should we add drained_begin/drained_end pair, or at least 
drain() at start of
qmp_blockdev_backup and qmp_drive_backup?

Assume scenario like the this,

1. fsfreeze
2. qmp backup
3. fsthaw

to make sure that backup starting point is consistent. So in our qmp command we 
should:
1. complete all current requests to make drives corresponding to fsfreeze point
2. initialize write-notifiers or filter before any new guest request, i.e. 
before fsthaw,
i.e. before qmp command return.

Transactions should be OK, as they use drained_begin/drained_end pairs, and 
additional
aio_context_acquire/aio_context_release pairs.

-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v4 28/29] sysemu: Move the VMChangeStateEntry typedef to qemu/typedefs.h

2019-08-12 Thread Philippe Mathieu-Daudé

On 8/12/19 7:23 AM, Markus Armbruster wrote:
> In my "build everything" tree, changing sysemu/sysemu.h triggers a
> recompile of some 1800 out of 6600 objects (not counting tests and
> objects that don't depend on qemu/osdep.h, down from 5400 due to the
> previous commit).
> 
> Several headers include sysemu/sysemu.h just to get typedef
> VMChangeStateEntry.  Move it from sysemu/sysemu.h to qemu/typedefs.h.
> Spell its structure tag the same while there.  Drop the now
> superfluous includes of sysemu/sysemu.h from headers.
> 
> Touching sysemu/sysemu.h now recompiles some 1100 objects.
> qemu/uuid.h also drops from 1800 to 1100, and
> qapi/qapi-types-run-state.h from 5000 to 4400.

Nice :)

> 
> Signed-off-by: Markus Armbruster 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

> ---
>  hw/usb/hcd-ehci.h   | 1 -
>  include/hw/ide/internal.h   | 3 ++-
>  include/hw/ppc/spapr_xive.h | 1 -
>  include/hw/scsi/scsi.h  | 1 -
>  include/hw/virtio/virtio.h  | 1 -
>  include/qemu/typedefs.h | 1 +
>  include/sysemu/sysemu.h | 1 -
>  hw/block/vhost-user-blk.c   | 1 +
>  hw/block/virtio-blk.c   | 1 +
>  hw/display/virtio-gpu.c | 1 +
>  hw/misc/macio/macio.c   | 1 +
>  hw/net/virtio-net.c | 1 +
>  hw/s390x/s390-ccw.c | 1 +
>  hw/s390x/s390-virtio-ccw.c  | 1 +
>  hw/scsi/scsi-bus.c  | 1 +
>  hw/scsi/vhost-scsi.c| 1 +
>  hw/scsi/vhost-user-scsi.c   | 1 +
>  hw/usb/hcd-ehci.c   | 1 +
>  hw/virtio/virtio-rng.c  | 1 +
>  hw/virtio/virtio.c  | 1 +
>  vl.c| 6 +++---
>  21 files changed, 19 insertions(+), 9 deletions(-)

Re: [Qemu-devel] [qemu-s390x] [PATCH-for-4.2 v1 1/6] s390x/mmu: ASC selection in s390_cpu_get_phys_page_debug()

2019-08-12 Thread Thomas Huth

On 8/12/19 5:39 PM, David Hildenbrand wrote:
> On 12.08.19 17:28, David Hildenbrand wrote:
>> On 12.08.19 17:18, Thomas Huth wrote:
>>> On 8/12/19 1:27 PM, David Hildenbrand wrote:
 Let's select the ASC before calling the function. This is a prepararion
 to remove the ASC magic depending on the access mode from mmu_translate.

 There is currently no way to distinguish if we have code or data access.
 For now, we were using code access, because especially when debugging with
 the gdbstub, we want to read and disassemble what we single-step.
>>>
>>> IMHO we should add a "instruction" bit to MemTxAttrs and then use the
>>> ...page_attrs_debug() interface instead. But ok, that's likely really
>>> something for a separate clean-up, so for the time being:
>>>
>>
>> That sounds like a good idea, and then switching over to
>> cc->get_phys_page_attrs()
> 
> But looking at get_phys_page_attrs_debug() again, "MemTxAttrs *attrs" is
> an output value not an input value. So there wouldn't be a way to
> specify "what you want" from the caller. At least unless I am missing
> something :)

Oops, you're right. Too bad :-(

So never mind that idea ... your patch is certainly the best you can do
here right now.

 Thomas

Re: [Qemu-devel] [PATCH v4 18/29] Include hw/hw.h exactly where needed

2019-08-12 Thread Philippe Mathieu-Daudé

On 8/12/19 7:23 AM, Markus Armbruster wrote:
> In my "build everything" tree, changing hw/hw.h triggers a recompile
> of some 2600 out of 6600 objects (not counting tests and objects that
> don't depend on qemu/osdep.h).
> 
> The previous commits have left only the declaration of hw_error() in
> hw/hw.h.  This permits dropping most of its inclusions.  Touching it
> now recompiles less than 200 objects.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Alistair Francis 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

> ---
>  hw/display/qxl.h | 1 -
>  hw/i386/amd_iommu.h  | 1 -
>  hw/microblaze/boot.h | 1 -
>  hw/net/ne2000.h  | 1 -
>  hw/nios2/boot.h  | 1 -
>  hw/usb/hcd-ehci.h| 1 -
>  include/hw/audio/pcspk.h | 1 -
>  include/hw/audio/wm8750.h| 1 -
>  include/hw/char/serial.h | 1 -
>  include/hw/char/stm32f2xx_usart.h| 1 -
>  include/hw/dma/i8257.h   | 1 -
>  include/hw/hw.h  | 1 -
>  include/hw/i386/ich9.h   | 1 -
>  include/hw/i386/ioapic_internal.h| 1 -
>  include/hw/input/i8042.h | 1 -
>  include/hw/isa/apm.h | 1 -
>  include/hw/isa/i8259_internal.h  | 1 -
>  include/hw/misc/stm32f2xx_syscfg.h   | 1 -
>  include/hw/net/ne2000-isa.h  | 1 -
>  include/hw/pci-host/designware.h | 1 -
>  include/hw/pci-host/gpex.h   | 1 -
>  include/hw/pci-host/q35.h| 1 -
>  include/hw/pci-host/uninorth.h   | 1 -
>  include/hw/pci-host/xilinx-pcie.h| 1 -
>  include/hw/pci/pcie.h| 1 -
>  include/hw/pci/pcie_aer.h| 1 -
>  include/hw/qdev.h| 1 -
>  include/hw/riscv/riscv_htif.h| 1 -
>  include/hw/ssi/stm32f2xx_spi.h   | 1 -
>  include/hw/timer/aspeed_rtc.h| 1 -
>  include/hw/timer/i8254.h | 1 -
>  include/hw/timer/i8254_internal.h| 1 -
>  include/hw/virtio/vhost.h| 1 -
>  include/hw/virtio/virtio.h   | 1 -
>  include/hw/xen/xen_common.h  | 1 -
>  include/sysemu/dma.h | 1 -
>  include/sysemu/hax.h | 1 -
>  include/sysemu/hvf.h | 1 -
>  accel/kvm/kvm-all.c  | 1 -
>  audio/audio.c| 1 -
>  audio/spiceaudio.c   | 1 -
>  audio/wavcapture.c   | 1 -
>  cpus.c   | 1 +
>  device-hotplug.c | 1 -
>  exec.c   | 1 -
>  hw/9pfs/xen-9p-backend.c | 1 -
>  hw/acpi/core.c   | 1 -
>  hw/acpi/cpu_hotplug.c| 1 -
>  hw/acpi/ich9.c   | 1 -
>  hw/acpi/pcihp.c  | 1 -
>  hw/acpi/piix4.c  | 1 -
>  hw/adc/stm32f2xx_adc.c   | 1 -
>  hw/alpha/dp264.c | 1 -
>  hw/alpha/typhoon.c   | 1 -
>  hw/arm/boot.c| 1 -
>  hw/arm/collie.c  | 1 -
>  hw/arm/gumstix.c | 1 -
>  hw/arm/integratorcp.c| 1 +
>  hw/arm/mainstone.c   | 1 -
>  hw/arm/musicpal.c| 1 +
>  hw/arm/omap2.c   | 1 -
>  hw/arm/omap_sx1.c| 1 -
>  hw/arm/palm.c| 1 -
>  hw/arm/pxa2xx_pic.c  | 1 -
>  hw/arm/spitz.c   | 1 -
>  hw/arm/tosa.c| 1 -
>  hw/arm/virt-acpi-build.c | 1 -
>  hw/arm/z2.c  | 1 -
>  hw/audio/ac97.c  | 1 -
>  hw/audio/adlib.c | 1 -
>  hw/audio/cs4231a.c   | 1 -
>  hw/audio/es1370.c| 1 -
>  hw/audio/gus.c   | 1 -
>  hw/audio/hda-codec.c | 1 -
>  hw/audio/intel-hda.c | 1 -
>  hw/audio/marvell_88w8618.c   | 1 -
>  hw/audio/milkymist-ac97.c| 1 -
>  hw/audio/pcspk.c | 1 -
>  hw/audio/sb16.c  | 1 -
>  hw/block/dataplane/xen-block.c   | 1 -
>  hw/block/ecc.c   | 1 -
>  hw/block/fdc.c   | 1 -
>  hw/block/m25p80.c| 1 -
>  hw/block/nvme.c  | 1 -
>  hw/block/pflash_cfi01.c  | 1 -
>  hw/block/pflash_cfi02.c  | 1 -
>  hw/block/tc58128.c   | 1 -
>  hw/block/xen-block.c | 1 -
>  hw/char/debugcon.c

Re: [Qemu-devel] [PATCH v4 22/29] Include hw/boards.h a bit less

2019-08-12 Thread Philippe Mathieu-Daudé

On 8/12/19 7:23 AM, Markus Armbruster wrote:
> hw/boards.h pulls in almost 60 headers.  The less we include it into
> headers, the better.  As a first step, drop superfluous inclusions,
> and downgrade some more to what's actually needed.  Gets rid of just
> one inclusion into a header.
> 
> Cc: Eduardo Habkost 
> Cc: Marcel Apfelbaum 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Alistair Francis 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

> ---
>  include/hw/mem/pc-dimm.h| 1 -
>  backends/cryptodev-builtin.c| 1 -
>  backends/cryptodev-vhost-user.c | 1 -
>  backends/cryptodev.c| 1 -
>  hw/acpi/ich9.c  | 1 +
>  hw/alpha/dp264.c| 1 -
>  hw/alpha/typhoon.c  | 1 +
>  hw/arm/boot.c   | 1 -
>  hw/arm/exynos4210.c | 2 +-
>  hw/arm/fsl-imx25.c  | 1 -
>  hw/arm/fsl-imx31.c  | 1 -
>  hw/arm/msf2-soc.c   | 1 -
>  hw/arm/nrf51_soc.c  | 1 -
>  hw/arm/omap1.c  | 1 +
>  hw/arm/omap2.c  | 1 +
>  hw/arm/smmuv3.c | 1 -
>  hw/arm/virt.c   | 1 +
>  hw/core/numa.c  | 2 ++
>  hw/i386/pc_piix.c   | 1 -
>  hw/i386/pc_q35.c| 1 -
>  hw/i386/pc_sysfw.c  | 1 -
>  hw/ppc/e500plat.c   | 1 -
>  hw/ppc/mpc8544ds.c  | 1 -
>  hw/ppc/pnv.c| 1 +
>  hw/ppc/ppc405_uc.c  | 1 -
>  hw/ppc/spapr_cpu_core.c | 1 -
>  hw/ppc/spapr_vio.c  | 1 -
>  hw/riscv/boot.c | 2 +-
>  hw/s390x/s390-stattrib.c| 1 -
>  hw/xtensa/xtensa_memory.c   | 1 -
>  monitor/qmp-cmds.c  | 1 -
>  target/alpha/machine.c  | 1 -
>  target/arm/machine.c| 1 -
>  target/arm/monitor.c| 1 -
>  target/hppa/machine.c   | 1 -
>  target/i386/hvf/hvf.c   | 1 -
>  target/i386/hvf/x86_task.c  | 1 -
>  target/i386/machine.c   | 1 -
>  target/i386/whpx-all.c  | 1 -
>  target/lm32/machine.c   | 1 -
>  target/moxie/machine.c  | 1 -
>  target/openrisc/machine.c   | 1 -
>  target/ppc/machine.c| 1 -
>  target/sparc/machine.c  | 1 -
>  44 files changed, 10 insertions(+), 37 deletions(-)

Re: [Qemu-devel] [PATCH v4 15/29] Include migration/vmstate.h less

2019-08-12 Thread Philippe Mathieu-Daudé

On 8/12/19 7:23 AM, Markus Armbruster wrote:
> In my "build everything" tree, changing migration/vmstate.h triggers a
> recompile of some 2700 out of 6600 objects (not counting tests and
> objects that don't depend on qemu/osdep.h).
> 
> hw/hw.h supposedly includes it for convenience.  Several other headers
> include it just to get VMStateDescription.  The previous commit made
> that unnecessary.
> 
> Include migration/vmstate.h only where it's still needed.  Touching it
> now recompiles only some 1600 objects.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Alistair Francis 

Tested-by: Philippe Mathieu-Daudé 

> ---
>  include/hw/acpi/tco.h  | 1 -
>  include/hw/block/flash.h   | 1 -
>  include/hw/hw.h| 1 -
>  include/hw/input/hid.h | 1 -
>  include/hw/pci/shpc.h  | 1 +
>  include/hw/ppc/spapr_ovec.h| 1 -
>  include/hw/ptimer.h| 1 -
>  include/hw/virtio/virtio.h | 1 +
>  include/migration/cpu.h| 1 +
>  include/net/net.h  | 1 -
>  include/qemu/fifo8.h   | 1 -
>  audio/audio.c  | 1 +
>  cpus.c | 1 +
>  hw/acpi/cpu.c  | 1 +
>  hw/acpi/ich9.c | 1 +
>  hw/acpi/memory_hotplug.c   | 1 +
>  hw/acpi/pcihp.c| 1 +
>  hw/acpi/piix4.c| 1 +
>  hw/acpi/tco.c  | 2 ++
>  hw/acpi/vmgenid.c  | 1 +
>  hw/adc/stm32f2xx_adc.c | 1 +
>  hw/arm/armsse.c| 1 +
>  hw/arm/highbank.c  | 1 +
>  hw/arm/integratorcp.c  | 1 +
>  hw/arm/musicpal.c  | 1 +
>  hw/arm/pxa2xx.c| 1 +
>  hw/arm/pxa2xx_gpio.c   | 1 +
>  hw/arm/pxa2xx_pic.c| 1 +
>  hw/arm/smmuv3.c| 1 +
>  hw/arm/spitz.c | 1 +
>  hw/arm/stellaris.c | 1 +
>  hw/arm/strongarm.c | 1 +
>  hw/arm/versatilepb.c   | 1 +
>  hw/arm/virt-acpi-build.c   | 1 +
>  hw/arm/z2.c| 1 +
>  hw/audio/ac97.c| 1 +
>  hw/audio/cs4231.c  | 1 +
>  hw/audio/cs4231a.c | 1 +
>  hw/audio/es1370.c  | 1 +
>  hw/audio/gus.c | 1 +
>  hw/audio/hda-codec.c   | 1 +
>  hw/audio/intel-hda.c   | 1 +
>  hw/audio/lm4549.c  | 1 +
>  hw/audio/marvell_88w8618.c | 1 +
>  hw/audio/milkymist-ac97.c  | 1 +
>  hw/audio/pcspk.c   | 1 +
>  hw/audio/pl041.c   | 1 +
>  hw/audio/sb16.c| 1 +
>  hw/audio/wm8750.c  | 1 +
>  hw/block/ecc.c | 1 +
>  hw/block/fdc.c | 1 +
>  hw/block/m25p80.c  | 1 +
>  hw/block/nand.c| 1 +
>  hw/block/nvme.c| 1 +
>  hw/block/onenand.c | 1 +
>  hw/block/pflash_cfi01.c| 1 +
>  hw/block/pflash_cfi02.c| 1 +
>  hw/char/bcm2835_aux.c  | 1 +
>  hw/char/cadence_uart.c | 1 +
>  hw/char/cmsdk-apb-uart.c   | 1 +
>  hw/char/digic-uart.c   | 1 +
>  hw/char/escc.c | 1 +
>  hw/char/exynos4210_uart.c  | 1 +
>  hw/char/imx_serial.c   | 1 +
>  hw/char/ipoctal232.c   | 1 +
>  hw/char/lm32_juart.c   | 1 +
>  hw/char/lm32_uart.c| 1 +
>  hw/char/milkymist-uart.c   | 1 +
>  hw/char/nrf51_uart.c   | 1 +
>  hw/char/parallel.c | 1 +
>  hw/char/pl011.c| 1 +
>  hw/char/sclpconsole-lm.c   | 1 +
>  hw/char/sclpconsole.c  | 1 +
>  hw/char/serial-isa.c   | 1 +
>  hw/char/serial-pci-multi.c | 1 +
>  hw/char/serial-pci.c   | 1 +
>  hw/char/serial.c   | 1 +
>  hw/char/spapr_vty.c| 1 +
>  hw/core/loader.c   | 1 +
>  hw/core/numa.c | 1 +
>  hw/core/or-irq.c   | 1 +
>  hw/core/ptimer.c   | 2 ++
>  hw/core/qdev.c | 1 +
>  hw/display/ads7846.c   | 1 +
>  hw/display/bcm2835_fb.c| 1 +
>  hw/display/bochs-display.c | 1 +
>  hw/display/cg3.c   | 1 +
>  hw/display/cirrus_vga.c| 1 +
>  hw/display/dpcd.c  | 1 +
>  hw/display/exynos4210_fimd.c   | 1 +
>  hw/display/g364fb.c| 1 +
>  hw/display/i2c-ddc.c   | 1 +
>  hw/display/jazz_led.c  | 1 +
>  hw/display/milkymist-tmu2.c| 1 +
>  hw/display/milkymist-vgafb.c   | 1 +
>  hw/display/pl110.c | 1 +
>  hw/display/pxa2xx_lcd.c| 1 +
>  hw/display/qxl.c   | 1 +
>  hw/display/sii9022.c   | 1 +
>

Re: [Qemu-devel] [PATCH v4 01/29] include: Make headers more self-contained

2019-08-12 Thread Philippe Mathieu-Daudé

On 8/12/19 7:23 AM, Markus Armbruster wrote:
> Back in 2016, we discussed[1] rules for headers, and these were
> generally liked:
> 
> 1. Have a carefully curated header that's included everywhere first.  We
>got that already thanks to Peter: osdep.h.
> 
> 2. Headers should normally include everything they need beyond osdep.h.
>If exceptions are needed for some reason, they must be documented in
>the header.  If all that's needed from a header is typedefs, put
>those into qemu/typedefs.h instead of including the header.
> 
> 3. Cyclic inclusion is forbidden.
> 
> This patch gets include/ closer to obeying 2.
> 
> It's actually extracted from my "[RFC] Baby steps towards saner
> headers" series[2], which demonstrates a possible path towards
> checking 2 automatically.  It passes the RFC test there.
> 
> [1] Message-ID: <87h9g8j57d@blackfin.pond.sub.org>
> https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03345.html
> [2] Message-Id: <20190711122827.18970-1-arm...@redhat.com>
> https://lists.nongnu.org/archive/html/qemu-devel/2019-07/msg02715.html
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Alistair Francis 

Tested-by: Philippe Mathieu-Daudé 

> ---
>  include/block/raw-aio.h   | 2 ++
>  include/block/write-threshold.h   | 2 ++
>  include/disas/disas.h | 1 +
>  include/exec/cputlb.h | 3 +++
>  include/exec/exec-all.h   | 1 +
>  include/exec/ioport.h | 2 ++
>  include/exec/memory-internal.h| 2 ++
>  include/exec/ram_addr.h   | 1 +
>  include/exec/softmmu-semi.h   | 2 ++
>  include/exec/tb-hash.h| 2 ++
>  include/exec/user/thunk.h | 2 ++
>  include/fpu/softfloat-macros.h| 2 ++
>  include/hw/acpi/pci.h | 3 +++
>  include/hw/acpi/tco.h | 3 +++
>  include/hw/adc/stm32f2xx_adc.h| 2 ++
>  include/hw/arm/allwinner-a10.h| 1 +
>  include/hw/arm/aspeed_soc.h   | 1 +
>  include/hw/arm/bcm2836.h  | 1 +
>  include/hw/arm/exynos4210.h   | 3 +--
>  include/hw/arm/fsl-imx25.h| 1 +
>  include/hw/arm/fsl-imx31.h| 1 +
>  include/hw/arm/sharpsl.h  | 3 +++
>  include/hw/arm/xlnx-zynqmp.h  | 1 +
>  include/hw/block/fdc.h| 2 ++
>  include/hw/block/flash.h  | 1 +
>  include/hw/char/escc.h| 1 +
>  include/hw/char/xilinx_uartlite.h | 2 ++
>  include/hw/core/generic-loader.h  | 1 +
>  include/hw/cris/etraxfs.h | 1 +
>  include/hw/cris/etraxfs_dma.h | 3 +++
>  include/hw/display/i2c-ddc.h  | 1 +
>  include/hw/empty_slot.h   | 2 ++
>  include/hw/gpio/bcm2835_gpio.h| 1 +
>  include/hw/i2c/aspeed_i2c.h   | 2 ++
>  include/hw/i386/apic_internal.h   | 1 +
>  include/hw/i386/ioapic_internal.h | 1 +
>  include/hw/intc/allwinner-a10-pic.h   | 2 ++
>  include/hw/intc/heathrow_pic.h| 2 ++
>  include/hw/intc/mips_gic.h| 1 +
>  include/hw/isa/vt82c686.h | 2 ++
>  include/hw/mips/cps.h | 1 +
>  include/hw/misc/macio/cuda.h  | 2 ++
>  include/hw/misc/macio/gpio.h  | 3 +++
>  include/hw/misc/macio/macio.h | 2 ++
>  include/hw/misc/macio/pmu.h   | 3 +++
>  include/hw/misc/mips_cmgcr.h  | 2 ++
>  include/hw/misc/mips_cpc.h| 2 ++
>  include/hw/misc/pvpanic.h | 3 +++
>  include/hw/net/allwinner_emac.h   | 1 +
>  include/hw/net/lance.h| 1 +
>  include/hw/nvram/chrp_nvram.h | 2 ++
>  include/hw/pci-host/sabre.h   | 2 ++
>  include/hw/pci-host/uninorth.h| 2 +-
>  include/hw/pci/pcie_aer.h | 1 +
>  include/hw/ppc/pnv_core.h | 1 +
>  include/hw/ppc/ppc4xx.h   | 4 
>  include/hw/ppc/spapr_irq.h| 3 +++
>  include/hw/ppc/spapr_vio.h| 1 +
>  include/hw/ppc/spapr_xive.h   | 2 ++
>  include/hw/ppc/xive_regs.h| 3 +++
>  include/hw/riscv/boot.h   | 2 ++
>  include/hw/riscv/riscv_hart.h | 3 +++
>  include/hw/riscv/sifive_clint.h   | 2 ++
>  include/hw/riscv/sifive_e.h   | 1 +
>  include/hw/riscv/sifive_plic.h| 2 +-
>  include/hw/riscv/sifive_prci.h| 2 ++
>  include/hw/riscv/sifive_test.h| 2 ++
>  include/hw/riscv/sifive_u.h   | 1 +
>  include/hw/riscv/sifive_uart.h| 3 +++
>  include/hw/riscv/spike.h  | 3 +++
>  include/hw/riscv/virt.h   | 3 +++
>  include/hw/s390x/ap-device.h  | 3 +++
>  include/hw/s390x/css-bridge.h | 3 ++-
>  include/hw/s390x/css.h| 1 +
>  include/hw/s390x/tod.h| 2 +-
>  include/hw/semihosting/console.h  | 2 ++
>  include/hw/sh4/sh_intc.h  | 1 +
>  include/hw/sparc/sparc64.h| 2 ++
>  include/hw/ssi/aspeed_smc.h   | 1 +
>

[Qemu-devel] [PATCH v4 1/2] main-loop: Fix GSource leak in qio_task_thread_worker()

2019-08-12 Thread Andrey Shinkevich

From: Alberto Garcia 

After g_source_attach() the GMainContext holds a reference to the
GSource, so the caller does not need to keep it.

qio_task_thread_worker() is not releasing its reference so the GSource
is being leaked since a17536c594bfed94d05667b419f747b692f5fc7f.

Signed-off-by: Alberto Garcia 
Reviewed-by: Daniel P. Berrangé 
---
 io/task.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/io/task.c b/io/task.c
index 64c4c71..1ae7b86 100644
--- a/io/task.c
+++ b/io/task.c
@@ -136,6 +136,7 @@ static gpointer qio_task_thread_worker(gpointer opaque)
   qio_task_thread_result, task, NULL);
 g_source_attach(task->thread->completion,
 task->thread->context);
+g_source_unref(task->thread->completion);
 trace_qio_task_thread_source_attach(task, task->thread->completion);
 
 qemu_cond_signal(>thread_cond);
-- 
1.8.3.1

[Qemu-devel] [PATCH v4 2/2] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()

2019-08-12 Thread Andrey Shinkevich

From: Alberto Garcia 

There's a race condition in which the tcp_chr_read() ioc handler can
close a connection that is being written to from another thread.

Running iotest 136 in a loop triggers this problem and crashes QEMU.

 (gdb) bt
 #0  0x5558b842902d in object_get_class (obj=0x0) at qom/object.c:860
 #1  0x5558b84f92db in qio_channel_writev_full (ioc=0x0, 
iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
 #2  0x5558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, 
len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
 #3  0x5558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, 
buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
 #4  0x5558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, 
buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at 
chardev/char.c:112
 #5  0x5558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 
"...", len=138, write_all=false) at chardev/char.c:147
 #6  0x5558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, 
buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
 #7  0x5558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at 
monitor.c:406
 #8  0x5558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 
"") at monitor.c:449
 #9  0x5558b8089178 in qmp_send_response (mon=0x5558ba476610, 
rsp=0x5558bb161600) at monitor.c:498
 #10 0x5558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, 
qdict=0x5558bb161600) at monitor.c:526
 #11 0x5558b8089307 in monitor_qapi_event_queue_no_reenter 
(event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
 #12 0x5558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, 
qdict=0x5558bb161600) at monitor.c:626
 #13 0x5558b855f23b in qapi_event_send_shutdown (guest=false, 
reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
 #14 0x5558b81911ef in qemu_system_shutdown 
(cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
 #15 0x5558b8191308 in main_loop_should_exit () at vl.c:1885
 #16 0x5558b819140d in main_loop () at vl.c:1924
 #17 0x5558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, 
envp=0x7ffc355df490) at vl.c:4665

This patch adds a lock to protect tcp_chr_disconnect() and
socket_reconnect_timeout()

Signed-off-by: Alberto Garcia 
Signed-off-by: Andrey Shinkevich 
---
 chardev/char-socket.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 7ca5d97..03f0340 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -150,7 +150,7 @@ static void tcp_chr_accept(QIONetListener *listener,
void *opaque);
 
 static int tcp_chr_read_poll(void *opaque);
-static void tcp_chr_disconnect(Chardev *chr);
+static void tcp_chr_disconnect_locked(Chardev *chr);
 
 /* Called with chr_write_lock held.  */
 static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
@@ -174,7 +174,7 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, 
int len)
 
 if (ret < 0 && errno != EAGAIN) {
 if (tcp_chr_read_poll(chr) <= 0) {
-tcp_chr_disconnect(chr);
+tcp_chr_disconnect_locked(chr);
 return len;
 } /* else let the read handler finish it properly */
 }
@@ -469,8 +469,9 @@ static void update_disconnected_filename(SocketChardev *s)
 /* NB may be called even if tcp_chr_connect has not been
  * reached, due to TLS or telnet initialization failure,
  * so can *not* assume s->state == TCP_CHARDEV_STATE_CONNECTED
+ * This must be called with chr->chr_write_lock held.
  */
-static void tcp_chr_disconnect(Chardev *chr)
+static void tcp_chr_disconnect_locked(Chardev *chr)
 {
 SocketChardev *s = SOCKET_CHARDEV(chr);
 bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
@@ -490,6 +491,13 @@ static void tcp_chr_disconnect(Chardev *chr)
 }
 }
 
+static void tcp_chr_disconnect(Chardev *chr)
+{
+qemu_mutex_lock(>chr_write_lock);
+tcp_chr_disconnect_locked(chr);
+qemu_mutex_unlock(>chr_write_lock);
+}
+
 static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)
 {
 Chardev *chr = CHARDEV(opaque);
@@ -1131,8 +1139,10 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
 Chardev *chr = CHARDEV(opaque);
 SocketChardev *s = SOCKET_CHARDEV(opaque);
 
+qemu_mutex_lock(>chr_write_lock);
 g_source_unref(s->reconnect_timer);
 s->reconnect_timer = NULL;
+qemu_mutex_unlock(>chr_write_lock);
 
 if (chr->be_open) {
 return false;
-- 
1.8.3.1

[Qemu-devel] [PATCH v4 0/2] char-socket: Fix race condition

2019-08-12 Thread Andrey Shinkevich

This fixes a race condition in which the tcp_chr_read() ioc handler
can close a connection that is being written to from another thread.

v4:
The functions qemu_idle_add() and tcp_chr_be_event_closed() were removed
because the callback is invoked after the call to object_property_del_all()
so, the "chardev" object had been deleted and the segmentation fault occurs.
Let's please apply the Alberto's simplified series to avoid the race 
condition.

v3:
See the email thread with the Message ID


Alberto Garcia (2):
  main-loop: Fix GSource leak in qio_task_thread_worker()
  char-socket: Lock tcp_chr_disconnect()

 chardev/char-socket.c | 16 +---
 io/task.c |  1 +
 2 files changed, 14 insertions(+), 3 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [PATCH 3/3] vpc: Do not return RAW from block_status

2019-08-12 Thread Max Reitz

On 12.08.19 17:33, Vladimir Sementsov-Ogievskiy wrote:
> 25.07.2019 18:55, Max Reitz wrote:
>> vpc is not really a passthrough driver, even when using the fixed
>> subformat (where host and guest offsets are equal).  It should handle
>> preallocation like all other drivers do, namely by returning
>> DATA | RECURSE instead of RAW.
>>
>> There is no tangible difference but the fact that bdrv_is_allocated() no
>> longer falls through to the protocol layer.
> 
> Hmm. Isn't a real bug (fixed by this patch) ?
> 
> Assume vpc->file is qcow2 with backing, which have "unallocated" region, 
> which is
> backed by actual data in backing file.

Come on now.

> So, this region will be reported as not allocated and will be skipped by any 
> copying
> loop using block-status? Is it a bug of BDRV_BLOCK_RAW itself? Or I don't 
> understand
> something..

I think what you don’t understand is that if you have a vpc file inside
of a qcow2 file, you’re doing basically everything wrong. ;-)

But maybe we should drop BDRV_BLOCK_RAW...  Does it do anything good for
us in the raw driver?  Shouldn’t it too just return DATA | RECURSE?

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/1] display/bochs: fix pcie support (qemu security issue)

2019-08-12 Thread Alex Williamson

On Mon, 12 Aug 2019 16:38:05 +0100
Peter Maydell  wrote:

> On Mon, 12 Aug 2019 at 16:35, Alex Williamson
>  wrote:
> > Quoting new commit log:
> >
> > This makes sure the pci config space allocation is big enough,
> > so accessing the PCIe extended config space doesn't overflow
> > the pci config space buffer.
> >
> > PCI(e) config space is guest writable.  Writes are limited
> > bywrite mask (which probably is also filled with random stuff),
> > so the guest can only flip enabled bits.  But I suspect it
> > still might be exploitable, so rather serious because it might
> > be a host escape for the guest.  On the other hand the device
> > is probably not yet in widespread use.
> >
> > Mitigation: use "-device bochs-display" as conventional pci
> > device only.
> >
> > Is it clear to others that this mitigation remark seems to be
> > referencing an alternative configuration constraint to avoid the issue
> > rather than what's actually implemented in this patch?  IOW, if we
> > never place the bochs-display device into a PCIe hierarchy, then
> > extended config space is never accessible to the guest anyway, and
> > there is no issue.  I think this was meant to be an alternative to the
> > patch but the enforcement of that would happen above QEMU, probably why
> > it was mentioned in the cover letter rather than the original commit
> > log.  Thanks,  
> 
> Yeah, that's unclear in retrospect. How about:
> 
> # (For a QEMU version without this commit, a mitigation for the
> # bug is available: use "-device bochs-display" as a conventional pci
> # device only.)

Yes, better.  Thanks,

Alex

Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

12.08.2019 18:10, Max Reitz wrote:
> On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
>> backup_cow_with_offload can transfer more than one cluster. Let
>> backup_cow_with_bounce_buffer behave similarly. It reduces the number
>> of IO requests, since there is no need to copy cluster by cluster.
>>
>> Logic around bounce_buffer allocation changed: we can't just allocate
>> one-cluster-sized buffer to share for all iterations. We can't also
>> allocate buffer of full-request length it may be too large, so
>> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
>> is to allocate a buffer sufficient to handle all remaining iterations
>> at the point where we need the buffer for the first time.
>>
>> Bonus: get rid of pointer-to-pointer.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/backup.c | 65 +++---
>>   1 file changed, 41 insertions(+), 24 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index d482d93458..65f7212c85 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -27,6 +27,7 @@
>>   #include "qemu/error-report.h"
>>   
>>   #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
>> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>>   
>>   typedef struct CowRequest {
>>   int64_t start_byte;
>> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>>   qemu_co_queue_restart_all(>wait_queue);
>>   }
>>   
>> -/* Copy range to target with a bounce buffer and return the bytes copied. If
>> - * error occurred, return a negative error number */
>> +/*
>> + * Copy range to target with a bounce buffer and return the bytes copied. If
>> + * error occurred, return a negative error number
>> + *
>> + * @bounce_buffer is assumed to enough to store
> 
> s/to/to be/
> 
>> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
>> + */
>>   static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>> int64_t start,
>> int64_t end,
>> bool 
>> is_write_notifier,
>> bool *error_is_read,
>> -  void **bounce_buffer)
>> +  void *bounce_buffer)
>>   {
>>   int ret;
>>   BlockBackend *blk = job->common.blk;
>> -int nbytes;
>> +int nbytes, remaining_bytes;
>>   int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>   
>>   assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>> -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>> -nbytes = MIN(job->cluster_size, job->len - start);
>> -if (!*bounce_buffer) {
>> -*bounce_buffer = blk_blockalign(blk, job->cluster_size);
>> -}
>> +bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
>> +nbytes = MIN(end - start, job->len - start);
>>   
>> -ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
>> -if (ret < 0) {
>> -trace_backup_do_cow_read_fail(job, start, ret);
>> -if (error_is_read) {
>> -*error_is_read = true;
>> +
>> +remaining_bytes = nbytes;
>> +while (remaining_bytes) {
>> +int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
>> +
>> +ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
>> +if (ret < 0) {
>> +trace_backup_do_cow_read_fail(job, start, ret);
>> +if (error_is_read) {
>> +*error_is_read = true;
>> +}
>> +goto fail;
>>   }
>> -goto fail;
>> -}
>>   
>> -ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
>> -job->write_flags);
>> -if (ret < 0) {
>> -trace_backup_do_cow_write_fail(job, start, ret);
>> -if (error_is_read) {
>> -*error_is_read = false;
>> +ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
>> +job->write_flags);
>> +if (ret < 0) {
>> +trace_backup_do_cow_write_fail(job, start, ret);
>> +if (error_is_read) {
>> +*error_is_read = false;
>> +}
>> +goto fail;
>>   }
>> -goto fail;
>> +
>> +start += chunk;
>> +remaining_bytes -= chunk;
>>   }
>>   
>>   return nbytes;
>> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
>> *job,
>>   }
>>   }
>>   if (!job->use_copy_range) {
>> +if (!bounce_buffer) {
>> +size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
>> + MAX(dirty_end - start, end - dirty_end));
>> +bounce_buffer = blk_try_blockalign(job->common.blk,

Re: [Qemu-devel] [qemu-s390x] [PATCH-for-4.2 v1 1/6] s390x/mmu: ASC selection in s390_cpu_get_phys_page_debug()

2019-08-12 Thread David Hildenbrand

On 12.08.19 17:28, David Hildenbrand wrote:
> On 12.08.19 17:18, Thomas Huth wrote:
>> On 8/12/19 1:27 PM, David Hildenbrand wrote:
>>> Let's select the ASC before calling the function. This is a prepararion
>>> to remove the ASC magic depending on the access mode from mmu_translate.
>>>
>>> There is currently no way to distinguish if we have code or data access.
>>> For now, we were using code access, because especially when debugging with
>>> the gdbstub, we want to read and disassemble what we single-step.
>>
>> IMHO we should add a "instruction" bit to MemTxAttrs and then use the
>> ...page_attrs_debug() interface instead. But ok, that's likely really
>> something for a separate clean-up, so for the time being:
>>
> 
> That sounds like a good idea, and then switching over to
> cc->get_phys_page_attrs()

But looking at get_phys_page_attrs_debug() again, "MemTxAttrs *attrs" is
an output value not an input value. So there wouldn't be a way to
specify "what you want" from the caller. At least unless I am missing
something :)

> 
> Thanks!
> 
>> Reviewed-by: Thomas Huth 
>>
> 
> 


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH 0/1] display/bochs: fix pcie support (qemu security issue)

2019-08-12 Thread Peter Maydell

On Mon, 12 Aug 2019 at 16:35, Alex Williamson
 wrote:
> Quoting new commit log:
>
> This makes sure the pci config space allocation is big enough,
> so accessing the PCIe extended config space doesn't overflow
> the pci config space buffer.
>
> PCI(e) config space is guest writable.  Writes are limited
> bywrite mask (which probably is also filled with random stuff),
> so the guest can only flip enabled bits.  But I suspect it
> still might be exploitable, so rather serious because it might
> be a host escape for the guest.  On the other hand the device
> is probably not yet in widespread use.
>
> Mitigation: use "-device bochs-display" as conventional pci
> device only.
>
> Is it clear to others that this mitigation remark seems to be
> referencing an alternative configuration constraint to avoid the issue
> rather than what's actually implemented in this patch?  IOW, if we
> never place the bochs-display device into a PCIe hierarchy, then
> extended config space is never accessible to the guest anyway, and
> there is no issue.  I think this was meant to be an alternative to the
> patch but the enforcement of that would happen above QEMU, probably why
> it was mentioned in the cover letter rather than the original commit
> log.  Thanks,

Yeah, that's unclear in retrospect. How about:

# (For a QEMU version without this commit, a mitigation for the
# bug is available: use "-device bochs-display" as a conventional pci
# device only.)

?

thanks
-- PMM

Re: [Qemu-devel] [Qemu-arm] [PATCH] elf: Allow loading AArch64 ELF files

2019-08-12 Thread Aaron Lindsay OS via Qemu-devel

On Aug 12 16:02, Peter Maydell wrote:
> On Mon, 12 Aug 2019 at 15:46, Aaron Lindsay OS via Qemu-arm
>  wrote:
> >
> > Treat EM_AARCH64 as a valid value when checking the ELF's machine-type
> > header.
> >
> > Signed-off-by: Aaron Lindsay 
> > ---
> >  include/hw/elf_ops.h | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
> > index 690f9238c8..f12faa90a1 100644
> > --- a/include/hw/elf_ops.h
> > +++ b/include/hw/elf_ops.h
> > @@ -381,6 +381,12 @@ static int glue(load_elf, SZ)(const char *name, int fd,
> >  goto fail;
> >  }
> >  break;
> > +case EM_AARCH64:
> > +if (ehdr.e_machine != EM_AARCH64) {
> > +ret = ELF_LOAD_WRONG_ARCH;
> > +goto fail;
> > +}
> > +break;
> >  default:
> >  if (elf_machine != ehdr.e_machine) {
> >  ret = ELF_LOAD_WRONG_ARCH;
> > --
> > 2.17.1
> 
> What problem are we trying to solve here ? If I'm reading your patch
> correctly then it makes no difference to execution, because we're
> switching on 'elf_machine', and so this extra case is saying
> "if elf_machine is EM_AARCH64, insist that ehdr.e_machine
> is also EM_AARCH64", which is exactly what the default
> case would do anyway. The only reason to add extra cases in
> this switch is to handle the situation where a target's board/loader
> code says "I can handle elf files of type X" but actually this
> implicitly means it can handle both X and Y (so for EM_X86_64 we
> need to accept both EM_X86_64 and EM_386, for EM_PPC64 we need to
> accept both EM_PPC64 and EM_PPC, and so on). We don't have a
> case like that for aarch64/arm because the boot loader code has
> to deal with the 32 bit and 64 bit code separately anyway, so
> we can pass in the correct value depending on whether the CPU
> is 32-bit or 64-bit.
> 
> The code in hw/arm/boot.c passes in an elf_machine value of
> EM_AARCH64 when it wants to load an AArch64 ELF file, so
> for that the default case is OK. The code in hw/core/generic-loader.c
> passes in 0 (EM_NONE) which will be handled by the check just
> before this switch statement, so the default case should
> work there too. Presumably there's some other code path
> for ELF file loading that doesn't work that you're trying to fix?

Please disregard this patch.

I'm sorry, upon closer inspection you are obviously correct... and I
have no earthly idea what happened here. I hit the "goto fail" in the
"default" case when debugging why I couldn't load an ELF on AArch64 last
week. I was in a hurry and saw that there were other architectures in
the switch/case and threw this code in there quickly without much
thought, and after re-compiling, it worked!

...But after your email this morning, I'm completely unable to reproduce
the failure case. I must have had another local issue which was
coincidentally resolved at the same time, unbeknownst to me.

-Aaron

Re: [Qemu-devel] [PATCH 0/1] display/bochs: fix pcie support (qemu security issue)

2019-08-12 Thread Alex Williamson

On Mon, 12 Aug 2019 14:39:53 +0100
Peter Maydell  wrote:

> On Mon, 12 Aug 2019 at 13:51, Philippe Mathieu-Daudé  
> wrote:
> >
> > On 8/12/19 2:45 PM, Paolo Bonzini wrote:  
> > > On 12/08/19 08:52, Gerd Hoffmann wrote:  
> > >> Just found while investigating
> > >>   https://bugzilla.redhat.com/show_bug.cgi?id=1707118
> > >>
> > >> Found PCIe extended config space filled with random crap due to
> > >> allocation being too small (conventional pci config space only).
> > >>  
> >
> > Can you amend this information to the commit description?
> >
> > <...
> >  
> > >> PCI(e) config space is guest writable.  Writes are limited by
> > >> write mask (which probably is also filled with random stuff),  
> > >
> > > Yes, it is also allocated with 256 bytes only.
> > >  
> > >> so the guest can only flip enabled bits.  But I suspect it
> > >> still might be exploitable, so rather serious because it might
> > >> be a host escape for the guest.  On the other hand the device
> > >> is probably not yet in widespread use.  
> >  
> > ...>  
> 
> I can add to the commit this paragraph of the cover letter,
> and I think also the 'mitigation' note might as well go in.
> 
> I've also put the cc:stable into the commit message.
> 
> Updated commit, ready to apply to master if we're OK with it:
> 
> https://git.linaro.org/people/peter.maydell/qemu-arm.git/commit/?h=staging=c075b5f318a8be628ab8edf93be33f5a93a4aacd

Quoting new commit log:

This makes sure the pci config space allocation is big enough,
so accessing the PCIe extended config space doesn't overflow
the pci config space buffer.

PCI(e) config space is guest writable.  Writes are limited
bywrite mask (which probably is also filled with random stuff),
so the guest can only flip enabled bits.  But I suspect it
still might be exploitable, so rather serious because it might
be a host escape for the guest.  On the other hand the device
is probably not yet in widespread use.

Mitigation: use "-device bochs-display" as conventional pci
device only.

Is it clear to others that this mitigation remark seems to be
referencing an alternative configuration constraint to avoid the issue
rather than what's actually implemented in this patch?  IOW, if we
never place the bochs-display device into a PCIe hierarchy, then
extended config space is never accessible to the guest anyway, and
there is no issue.  I think this was meant to be an alternative to the
patch but the enforcement of that would happen above QEMU, probably why
it was mentioned in the cover letter rather than the original commit
log.  Thanks,

Alex

Re: [Qemu-devel] [PATCH 3/3] vpc: Do not return RAW from block_status

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

25.07.2019 18:55, Max Reitz wrote:
> vpc is not really a passthrough driver, even when using the fixed
> subformat (where host and guest offsets are equal).  It should handle
> preallocation like all other drivers do, namely by returning
> DATA | RECURSE instead of RAW.
> 
> There is no tangible difference but the fact that bdrv_is_allocated() no
> longer falls through to the protocol layer.

Hmm. Isn't a real bug (fixed by this patch) ?

Assume vpc->file is qcow2 with backing, which have "unallocated" region, which 
is
backed by actual data in backing file.

So, this region will be reported as not allocated and will be skipped by any 
copying
loop using block-status? Is it a bug of BDRV_BLOCK_RAW itself? Or I don't 
understand
something..

> 
> Signed-off-by: Max Reitz 
> ---
>   block/vpc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/vpc.c b/block/vpc.c
> index d4776ee8a5..b25aab0425 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -737,7 +737,7 @@ static int coroutine_fn 
> vpc_co_block_status(BlockDriverState *bs,
>   *pnum = bytes;
>   *map = offset;
>   *file = bs->file->bs;
> -return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
> +return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | 
> BDRV_BLOCK_RECURSE;
>   }
>   
>   qemu_co_mutex_lock(>lock);
> 


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [Qemu-arm] Beagle Board support

2019-08-12 Thread Peter Maydell

(I've added qemu-devel to the cc list; some people don't
read the qemu-arm list.)

On Sat, 10 Aug 2019 at 16:24, Esteban Bosse  wrote:
> El sáb., 10 ago. 2019 23:01, Peter Maydell  
> escribió:
>> On Sat, 10 Aug 2019 at 04:39, Esteban Bosse  wrote:
>> > I am new in this world, but I want to port the old beagle support for 
>> > qemu-linaro to mainstream.
>>
>> That would certainly be nice. The major issue is that the
>> patchset in that tree is (a) often in an older style
>> of API that would need to be updated to use the current
>> recommended-practice APIs to go upstream, and (b) in
>> some places still has multiple changes tangled together
>> that would need to be disentangled to form a clean
>> reviewable patchset.
>
> (a) Sounds "not impossible" then for a beginer, hahaha.
> I am looking other boards like Musca to use it as example
>
> (b) I don't know yet how to make the patches I have to learn everything.
>
>>
>> I'm happy to provide advice and code review if you're
>> interested in doing this work and helping to maintain
>> it in the upstream tree.
>
>
> Yes! I am interested, I have a repo where I am doing my firsts trys to 
> understand how qemu and the beagle are.
>
> How do you recommend to start?

Here's something I wrote up in 2015 the last time somebody
talked about maybe trying to get the beagle/omap3 changes into
master:

The initial steps here would be:
1) rebase the patchset on to qemu master and fix up
  the breakage due to API changes in qemu
[this will be moderately painful if you haven't been
following the API changes as they happened; if you're
interested in taking on the omap3 patches then I could
maybe do this step for you]
2) add support for direct boot of a guest kernel via
  "-kernel" (currently only "boot via an SD card image"
  is supported, which is awkward because the u-boot
  image insists on checking every device on the board
  and won't boot if any are missing)
3) build a cut-down minimally configured kernel that
  only needs the smallest possible set of devices
  [in particular, no SPI, I2C, MMC or graphics]
4) reorder the patchstack so that it starts with
  those relating to the required minimal device set
  and the SoC model and the board model, and the
  complicated ones to do with I2C etc are afterwards;
  check the kernel boots on this initial set
5) update the patches to correspond to current QEMU
  practice for QOM device modelling (the code in that
  tree is somewhat old and does many things in out
  of date ways)
6) submit the minimal-device patches and get them
  through code review and into QEMU
7) then start trying to clean up the remaining
  patches one device at a time

You'll need (or will learn :-)) familiarity with
handling stacks of patches in git, rebasing them,
reordering them, splitting them, and so on. Personally
I use 'stgit' as a frontend to doing that, but you can
do it all with raw git too.

Back in 2014 or whenever it was I abandoned the idea of
doing this upstreaming work I reckoned it was probably
a couple of months worth of (full-time) work to get
everything upstream.

thanks
-- PMM

[Qemu-devel] [PATCH] memory: Correct access mask generation in access_with_adjusted_size

2019-08-12 Thread Francisco Iglesias

Also consider the requested transaction size when generating the access
mask (so that only the requested bytes are returned when those are less
than the memory region's minimum access size).

Signed-off-by: Francisco Iglesias 
---
 memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/memory.c b/memory.c
index 5d8c9a9234..56a2510836 100644
--- a/memory.c
+++ b/memory.c
@@ -563,7 +563,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 
 /* FIXME: support unaligned access? */
 access_size = MAX(MIN(size, access_size_max), access_size_min);
-access_mask = MAKE_64BIT_MASK(0, access_size * 8);
+access_mask = MAKE_64BIT_MASK(0, MIN(size, access_size) * 8);
 if (memory_region_big_endian(mr)) {
 for (i = 0; i < size; i += access_size) {
 r |= access_fn(mr, addr + i, value, access_size,
-- 
2.11.0

Re: [Qemu-devel] [qemu-s390x] [PATCH-for-4.2 v1 1/6] s390x/mmu: ASC selection in s390_cpu_get_phys_page_debug()

2019-08-12 Thread David Hildenbrand

On 12.08.19 17:18, Thomas Huth wrote:
> On 8/12/19 1:27 PM, David Hildenbrand wrote:
>> Let's select the ASC before calling the function. This is a prepararion
>> to remove the ASC magic depending on the access mode from mmu_translate.
>>
>> There is currently no way to distinguish if we have code or data access.
>> For now, we were using code access, because especially when debugging with
>> the gdbstub, we want to read and disassemble what we single-step.
> 
> IMHO we should add a "instruction" bit to MemTxAttrs and then use the
> ...page_attrs_debug() interface instead. But ok, that's likely really
> something for a separate clean-up, so for the time being:
> 

That sounds like a good idea, and then switching over to
cc->get_phys_page_attrs()

Thanks!

> Reviewed-by: Thomas Huth 
> 


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH 3/9] vfio: unplug failover primary device before migration

2019-08-12 Thread Cornelia Huck

On Fri,  2 Aug 2019 17:05:59 +0200
Jens Freimann  wrote:

> As usual block all vfio-pci devices from being migrated, but make an
> exception for failover primary devices. This is achieved by setting
> unmigratable to 0 but also add a migration blocker for all vfio-pci
> devices except failover primary devices. These will be unplugged before
> migration happens by the migration handler of the corresponding
> virtio-net standby device.
> 
> Signed-off-by: Jens Freimann 
> ---
>  hw/vfio/pci.c | 24 +++-
>  hw/vfio/pci.h |  1 +
>  2 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index d6ae9bd4ac..398d26669b 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -35,6 +35,9 @@
>  #include "pci.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "migration/blocker.h"
> +#include "qemu/option.h"
> +#include "qemu/option_int.h"
>  
>  #define TYPE_VFIO_PCI "vfio-pci"
>  #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
> @@ -2693,6 +2696,12 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
> *vdev)
>  vdev->req_enabled = false;
>  }
>  
> +static int has_standby_arg(void *opaque, const char *name,
> +   const char *value, Error **errp)
> +{
> +return strcmp(name, "standby") == 0;
> +}
> +
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
>  {
>  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> @@ -2706,6 +2715,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  int i, ret;
>  bool is_mdev;
>  
> +if (qemu_opt_foreach(pdev->qdev.opts, has_standby_arg,
> + (void *) pdev->qdev.opts, ) == 0) {
> +error_setg(>migration_blocker,
> +"VFIO device doesn't support migration");
> +ret = migrate_add_blocker(vdev->migration_blocker, );
> +if (err) {
> +error_propagate(errp, err);
> +error_free(vdev->migration_blocker);
> +}
> +} else {
> +pdev->qdev.allow_unplug_during_migration = true;

I think you add this only in the next patch?

> +}
> +
>  if (!vdev->vbasedev.sysfsdev) {
>  if (!(~vdev->host.domain || ~vdev->host.bus ||
>~vdev->host.slot || ~vdev->host.function)) {
> @@ -3148,7 +3170,7 @@ static Property vfio_pci_dev_properties[] = {
>  
>  static const VMStateDescription vfio_pci_vmstate = {
>  .name = "vfio-pci",
> -.unmigratable = 1,
> +.unmigratable = 0,
>  };
>  
>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 27d58fc55b..0f6f8cb395 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -169,6 +169,7 @@ typedef struct VFIOPCIDevice {
>  bool no_vfio_ioeventfd;
>  bool enable_ramfb;
>  VFIODisplay *dpy;
> +Error *migration_blocker;
>  } VFIOPCIDevice;
>  
>  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);

This patch interacts with support for vfio migration (last posted in
<1562665760-26158-1-git-send-email-kwankh...@nvidia.com>, I've not seen
a later version yet.)

With that, we'd have three cases to consider:
1) device is a failover primary
2) device has a migration region
3) none of the above

Can 1) and 2) happen simultaneously? If yes, what should take
precedence?

Re: [Qemu-devel] [qemu-s390x] [PATCH-for-4.2 v1 1/6] s390x/mmu: ASC selection in s390_cpu_get_phys_page_debug()

2019-08-12 Thread Thomas Huth

On 8/12/19 1:27 PM, David Hildenbrand wrote:
> Let's select the ASC before calling the function. This is a prepararion
> to remove the ASC magic depending on the access mode from mmu_translate.
> 
> There is currently no way to distinguish if we have code or data access.
> For now, we were using code access, because especially when debugging with
> the gdbstub, we want to read and disassemble what we single-step.

IMHO we should add a "instruction" bit to MemTxAttrs and then use the
...page_attrs_debug() interface instead. But ok, that's likely really
something for a separate clean-up, so for the time being:

Reviewed-by: Thomas Huth 

> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/helper.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/target/s390x/helper.c b/target/s390x/helper.c
> index 13ae9909ad..c5fb8966b6 100644
> --- a/target/s390x/helper.c
> +++ b/target/s390x/helper.c
> @@ -58,6 +58,11 @@ hwaddr s390_cpu_get_phys_page_debug(CPUState *cs, vaddr 
> vaddr)
>  vaddr &= 0x7fff;
>  }
>  
> +/* We want to read the code (e.g., see what we are single-stepping).*/
> +if (asc != PSW_ASC_HOME) {
> +asc = PSW_ASC_PRIMARY;
> +}
> +
>  if (mmu_translate(env, vaddr, MMU_INST_FETCH, asc, , , 
> false)) {
>  return -1;
>  }
>

Re: [Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

2019-08-12 Thread Max Reitz

On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
> backup_cow_with_offload can transfer more than one cluster. Let
> backup_cow_with_bounce_buffer behave similarly. It reduces the number
> of IO requests, since there is no need to copy cluster by cluster.
> 
> Logic around bounce_buffer allocation changed: we can't just allocate
> one-cluster-sized buffer to share for all iterations. We can't also
> allocate buffer of full-request length it may be too large, so
> BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
> is to allocate a buffer sufficient to handle all remaining iterations
> at the point where we need the buffer for the first time.
> 
> Bonus: get rid of pointer-to-pointer.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/backup.c | 65 +++---
>  1 file changed, 41 insertions(+), 24 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index d482d93458..65f7212c85 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -27,6 +27,7 @@
>  #include "qemu/error-report.h"
>  
>  #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
> +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
>  
>  typedef struct CowRequest {
>  int64_t start_byte;
> @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
>  qemu_co_queue_restart_all(>wait_queue);
>  }
>  
> -/* Copy range to target with a bounce buffer and return the bytes copied. If
> - * error occurred, return a negative error number */
> +/*
> + * Copy range to target with a bounce buffer and return the bytes copied. If
> + * error occurred, return a negative error number
> + *
> + * @bounce_buffer is assumed to enough to store

s/to/to be/

> + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
> + */
>  static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
>int64_t start,
>int64_t end,
>bool is_write_notifier,
>bool *error_is_read,
> -  void **bounce_buffer)
> +  void *bounce_buffer)
>  {
>  int ret;
>  BlockBackend *blk = job->common.blk;
> -int nbytes;
> +int nbytes, remaining_bytes;
>  int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>  
>  assert(QEMU_IS_ALIGNED(start, job->cluster_size));
> -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
> -nbytes = MIN(job->cluster_size, job->len - start);
> -if (!*bounce_buffer) {
> -*bounce_buffer = blk_blockalign(blk, job->cluster_size);
> -}
> +bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
> +nbytes = MIN(end - start, job->len - start);
>  
> -ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
> -if (ret < 0) {
> -trace_backup_do_cow_read_fail(job, start, ret);
> -if (error_is_read) {
> -*error_is_read = true;
> +
> +remaining_bytes = nbytes;
> +while (remaining_bytes) {
> +int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
> +
> +ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
> +if (ret < 0) {
> +trace_backup_do_cow_read_fail(job, start, ret);
> +if (error_is_read) {
> +*error_is_read = true;
> +}
> +goto fail;
>  }
> -goto fail;
> -}
>  
> -ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
> -job->write_flags);
> -if (ret < 0) {
> -trace_backup_do_cow_write_fail(job, start, ret);
> -if (error_is_read) {
> -*error_is_read = false;
> +ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
> +job->write_flags);
> +if (ret < 0) {
> +trace_backup_do_cow_write_fail(job, start, ret);
> +if (error_is_read) {
> +*error_is_read = false;
> +}
> +goto fail;
>  }
> -goto fail;
> +
> +start += chunk;
> +remaining_bytes -= chunk;
>  }
>  
>  return nbytes;
> @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob 
> *job,
>  }
>  }
>  if (!job->use_copy_range) {
> +if (!bounce_buffer) {
> +size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
> + MAX(dirty_end - start, end - dirty_end));
> +bounce_buffer = blk_try_blockalign(job->common.blk, len);
> +}

If you use _try_, you should probably also check whether it succeeded.

Anyway, I wonder whether it’d be better to just allocate this buffer
once per job (the first time we

Re: [Qemu-devel] [Qemu-arm] [PATCH] elf: Allow loading AArch64 ELF files

2019-08-12 Thread Peter Maydell

On Mon, 12 Aug 2019 at 15:46, Aaron Lindsay OS via Qemu-arm
 wrote:
>
> Treat EM_AARCH64 as a valid value when checking the ELF's machine-type
> header.
>
> Signed-off-by: Aaron Lindsay 
> ---
>  include/hw/elf_ops.h | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
> index 690f9238c8..f12faa90a1 100644
> --- a/include/hw/elf_ops.h
> +++ b/include/hw/elf_ops.h
> @@ -381,6 +381,12 @@ static int glue(load_elf, SZ)(const char *name, int fd,
>  goto fail;
>  }
>  break;
> +case EM_AARCH64:
> +if (ehdr.e_machine != EM_AARCH64) {
> +ret = ELF_LOAD_WRONG_ARCH;
> +goto fail;
> +}
> +break;
>  default:
>  if (elf_machine != ehdr.e_machine) {
>  ret = ELF_LOAD_WRONG_ARCH;
> --
> 2.17.1

What problem are we trying to solve here ? If I'm reading your patch
correctly then it makes no difference to execution, because we're
switching on 'elf_machine', and so this extra case is saying
"if elf_machine is EM_AARCH64, insist that ehdr.e_machine
is also EM_AARCH64", which is exactly what the default
case would do anyway. The only reason to add extra cases in
this switch is to handle the situation where a target's board/loader
code says "I can handle elf files of type X" but actually this
implicitly means it can handle both X and Y (so for EM_X86_64 we
need to accept both EM_X86_64 and EM_386, for EM_PPC64 we need to
accept both EM_PPC64 and EM_PPC, and so on). We don't have a
case like that for aarch64/arm because the boot loader code has
to deal with the 32 bit and 64 bit code separately anyway, so
we can pass in the correct value depending on whether the CPU
is 32-bit or 64-bit.

The code in hw/arm/boot.c passes in an elf_machine value of
EM_AARCH64 when it wants to load an AArch64 ELF file, so
for that the default case is OK. The code in hw/core/generic-loader.c
passes in 0 (EM_NONE) which will be handled by the check just
before this switch statement, so the default case should
work there too. Presumably there's some other code path
for ELF file loading that doesn't work that you're trying to fix?

thanks
-- PMM

Re: [Qemu-devel] [PATCH v4 28/29] sysemu: Move the VMChangeStateEntry typedef to qemu/typedefs.h

2019-08-12 Thread Alex Bennée



Markus Armbruster  writes:

> Alex Bennée  writes:
>
>> Markus Armbruster  writes:
>>
>>> In my "build everything" tree, changing sysemu/sysemu.h triggers a
>>> recompile of some 1800 out of 6600 objects (not counting tests and
>>> objects that don't depend on qemu/osdep.h, down from 5400 due to the
>>> previous commit).
>>>
>>> Several headers include sysemu/sysemu.h just to get typedef
>>> VMChangeStateEntry.  Move it from sysemu/sysemu.h to qemu/typedefs.h.
>>> Spell its structure tag the same while there.  Drop the now
>>> superfluous includes of sysemu/sysemu.h from headers.
>>
>> You should probably mention you also fix the struct definition to meet
>> our coding style. Otherwise:
>
> I did: "Spell its structure tag the same while there."  Would you like
> to suggest a better wording?

Apologies - my eyes obviously glazed over that part...

>
>> Reviewed-by: Alex Bennée 
>
> Thanks!


--
Alex Bennée

Re: [Qemu-devel] [PATCH 2/3] vmdk: Make block_status recurse for flat extents

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

25.07.2019 18:55, Max Reitz wrote:
> Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
> Signed-off-by: Max Reitz 

Again, don't know vmdk code, but briefly looking at it (and at vmdk spec) I
see that "extents" are files, and flat extent is a raw file without any special
format. And it is allocated by blk_truncate(.. PREALLOC_MODE_OFF ..), so really
looks like metadata preallocation.

And, any way, there should not be real damage, as patch simply brings back old 
behavior
for one case.

Reviewed-by: Vladimir Sementsov-Ogievskiy 

> ---
>   block/vmdk.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index bd36ece125..fd78fd0ccf 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -1692,6 +1692,9 @@ static int coroutine_fn 
> vmdk_co_block_status(BlockDriverState *bs,
>   if (!extent->compressed) {
>   ret |= BDRV_BLOCK_OFFSET_VALID;
>   *map = cluster_offset + index_in_cluster;
> +if (extent->flat) {
> +ret |= BDRV_BLOCK_RECURSE;
> +}
>   }
>   *file = extent->file->bs;
>   break;
> 

-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH] elf: Allow loading AArch64 ELF files

2019-08-12 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/2019081212.30027-1-aa...@os.amperecomputing.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH] elf: Allow loading AArch64 ELF files
Message-id: 2019081212.30027-1-aa...@os.amperecomputing.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/2019081212.30027-1-aa...@os.amperecomputing.com -> 
patchew/2019081212.30027-1-aa...@os.amperecomputing.com
Submodule 'capstone' (https://git.qemu.org/git/capstone.git) registered for 
path 'capstone'
Submodule 'dtc' (https://git.qemu.org/git/dtc.git) registered for path 'dtc'
Submodule 'roms/QemuMacDrivers' (https://git.qemu.org/git/QemuMacDrivers.git) 
registered for path 'roms/QemuMacDrivers'
Submodule 'roms/SLOF' (https://git.qemu.org/git/SLOF.git) registered for path 
'roms/SLOF'
Submodule 'roms/edk2' (https://git.qemu.org/git/edk2.git) registered for path 
'roms/edk2'
Submodule 'roms/ipxe' (https://git.qemu.org/git/ipxe.git) registered for path 
'roms/ipxe'
Submodule 'roms/openbios' (https://git.qemu.org/git/openbios.git) registered 
for path 'roms/openbios'
Submodule 'roms/openhackware' (https://git.qemu.org/git/openhackware.git) 
registered for path 'roms/openhackware'
Submodule 'roms/opensbi' (https://git.qemu.org/git/opensbi.git) registered for 
path 'roms/opensbi'
Submodule 'roms/qemu-palcode' (https://git.qemu.org/git/qemu-palcode.git) 
registered for path 'roms/qemu-palcode'
Submodule 'roms/seabios' (https://git.qemu.org/git/seabios.git/) registered for 
path 'roms/seabios'
Submodule 'roms/seabios-hppa' (https://git.qemu.org/git/seabios-hppa.git) 
registered for path 'roms/seabios-hppa'
Submodule 'roms/sgabios' (https://git.qemu.org/git/sgabios.git) registered for 
path 'roms/sgabios'
Submodule 'roms/skiboot' (https://git.qemu.org/git/skiboot.git) registered for 
path 'roms/skiboot'
Submodule 'roms/u-boot' (https://git.qemu.org/git/u-boot.git) registered for 
path 'roms/u-boot'
Submodule 'roms/u-boot-sam460ex' (https://git.qemu.org/git/u-boot-sam460ex.git) 
registered for path 'roms/u-boot-sam460ex'
Submodule 'slirp' (https://git.qemu.org/git/libslirp.git) registered for path 
'slirp'
Submodule 'tests/fp/berkeley-softfloat-3' 
(https://git.qemu.org/git/berkeley-softfloat-3.git) registered for path 
'tests/fp/berkeley-softfloat-3'
Submodule 'tests/fp/berkeley-testfloat-3' 
(https://git.qemu.org/git/berkeley-testfloat-3.git) registered for path 
'tests/fp/berkeley-testfloat-3'
Submodule 'ui/keycodemapdb' (https://git.qemu.org/git/keycodemapdb.git) 
registered for path 'ui/keycodemapdb'
Cloning into 'capstone'...
Submodule path 'capstone': checked out 
'22ead3e0bfdb87516656453336160e0a37b066bf'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '88f18909db731a627456f26d779445f84e449536'
Cloning into 'roms/QemuMacDrivers'...
Submodule path 'roms/QemuMacDrivers': checked out 
'90c488d5f4a407342247b9ea869df1c2d9c8e266'
Cloning into 'roms/SLOF'...
Submodule path 'roms/SLOF': checked out 
'ba1ab360eebe6338bb8d7d83a9220ccf7e213af3'
Cloning into 'roms/edk2'...
Submodule path 'roms/edk2': checked out 
'20d2e5a125e34fc8501026613a71549b2a1a3e54'
Submodule 'SoftFloat' (https://github.com/ucb-bar/berkeley-softfloat-3.git) 
registered for path 'ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'
Submodule 'CryptoPkg/Library/OpensslLib/openssl' 
(https://github.com/openssl/openssl) registered for path 
'CryptoPkg/Library/OpensslLib/openssl'
Cloning into 'ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3'...
Submodule path 'roms/edk2/ArmPkg/Library/ArmSoftFloatLib/berkeley-softfloat-3': 
checked out 'b64af41c3276f97f0e181920400ee056b9c88037'
Cloning into 'CryptoPkg/Library/OpensslLib/openssl'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl': checked out 
'50eaac9f3337667259de725451f201e784599687'
Submodule 'boringssl' (https://boringssl.googlesource.com/boringssl) registered 
for path 'boringssl'
Submodule 'krb5' (https://github.com/krb5/krb5) registered for path 'krb5'
Submodule 'pyca.cryptography' (https://github.com/pyca/cryptography.git) 
registered for path 'pyca-cryptography'
Cloning into 'boringssl'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/boringssl': 
checked out '2070f8ad9151dc8f3a73bffaa146b5e6937a583f'
Cloning into 'krb5'...
Submodule path 'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/krb5': checked 
out 'b9ad6c49505c96a088326b62a52568e3484f2168'
Cloning into 'pyca-cryptography'...
Submodule path 
'roms/edk2/CryptoPkg/Library/OpensslLib/openssl/pyca-cryptography': checked out 
'09403100de2f6f1cdd0d484dcb8e620f1c335c8f'

[Qemu-devel] [Bug 1796520] Re: autogen crashes on qemu-sh4-user after 61dedf2af7

2019-08-12 Thread Peter Maydell

(Edit to note that "that hardware" is an SH7785LCR with an SH7785 CPU.)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1796520

Title:
  autogen crashes on qemu-sh4-user after 61dedf2af7

Status in QEMU:
  Confirmed

Bug description:
  Running "autogen --help" crashes on qemu-sh4-user with:

  (sid-sh4-sbuild)root@nofan:/# autogen --help
  Unhandled trap: 0x180
  pc=0xf64dd2de sr=0x pr=0xf63b9c74 fpscr=0x0008
  spc=0x ssr=0x gbr=0xf61102a8 vbr=0x
  sgr=0x dbr=0x delayed_pc=0xf64dd2a0 fpul=0x0003
  r0=0xf6fc1320 r1=0x r2=0x5dc4 r3=0xf67bfb50
  r4=0xf6fc1230 r5=0xf6fc141c r6=0x03ff r7=0x
  r8=0x0004 r9=0xf63e20bc r10=0xf6fc141c r11=0xf63e28f0
  r12=0xf63e2258 r13=0xf63eae1c r14=0x0804 r15=0xf6fc1220
  r16=0x r17=0x r18=0x r19=0x
  r20=0x r21=0x r22=0x r23=0x
  (sid-sh4-sbuild)root@nofan:/#

  Bi-secting found this commit to be the culprit:

  61dedf2af79fb5866dc7a0f972093682f2185e17 is the first bad commit
  commit 61dedf2af79fb5866dc7a0f972093682f2185e17
  Author: Richard Henderson 
  Date:   Tue Jul 18 10:02:50 2017 -1000

  target/sh4: Add missing FPSCR.PR == 0 checks
  
  Both frchg and fschg require PR == 0, otherwise undefined_operation.
  
  Reviewed-by: Aurelien Jarno 
  Signed-off-by: Richard Henderson 
  Message-Id: <20170718200255.31647-26-...@twiddle.net>
  Signed-off-by: Aurelien Jarno 

  :04 04 980d79b69ae712f23a1e4c56983e97a843153b4a
  1024c109f506c7ad57367c63bc8bbbc8a7a36cd7 M  target

  Reverting 61dedf2af79fb5866dc7a0f972093682f2185e17 fixes the problem
  for me.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1796520/+subscriptions

Re: [Qemu-devel] [PATCH v2 0/9] add failover feature for assigned network devices

2019-08-12 Thread Michael S. Tsirkin

On Fri, Aug 02, 2019 at 05:05:56PM +0200, Jens Freimann wrote:
> This is implementing the host side of the net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)

Virtio bits look fine. Anyone else wants to comment on
management/migration aspects?
Which tree should this be merged in? Mine?

> Changes since v1:
> - add new QMP events,
>- one is send when the primary device is unplugged
>- one is send when VIRTIO_NET_F_STANDBY is not negotiated.
>  This is needed because we hide the primary device until
>  the feature bit is negotiated and then add it with
>  qdev_add_device(). The event is for libvirt to be aware of that.
> - patch 7/9: a new migration state, called wait-unplug.
>   It is entered after SETUP and before ACTIVE. In this phase we check
>   devices for pending guest unplug complete event.
>   This patch still has a problem. It checks in a loop if there are still
>   devices that are not unplugged by the guest. If the guest never
>   returns it will run forever. How to terminate this loop? I thought
>   about a timed wait semaphore or just spinning for a certain amount of time,
>   but nothing seems good. Any ideas here?
> - patch 2/9: When unplugging the primary devices, only do the guest part i.e.
>   call hotplug_handler_unplug_request() which calls the pcie attention
>   button code. The unrealize part is not done so the ressources of the
>   device are not freed.  In case of migration failure we can re-plug the 
> device to
>   the guest with hotplug_handler_hotplug(). I tested migration failure and
>   a following second attempt to migrate that doesn't fail.
> - add the primary device on the target VM, done in runstate change
>   handler.
> - fix error reporting (dgilbert)
> - get rid of timer to add device after feature negotiation
> 
> The general idea is that we have a pair of devices, a vfio-pci and a
> virtio-net device. Before migration the vfio device is unplugged and data
> flows to the virtio-net device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * Patch 1 adds the infrastructure to hide the device for the qbus and qdev 
> APIs
> 
> * Patch 2 In the second patch the virtio-net uses the API to defer adding the 
> vfio
>   device until the VIRTIO_NET_F_STANDBY feature is acked. It also
>   implements the migration handler to unplug the device from the guest and
>   re-plug in case of migration failure
> 
> * Patch 3 and 4 make sure that we can unplug the vfio-device before
>   migration starts
> 
> * Patch 5 and 6 add new qmp events, one sends the device id of a device
>   that was just requested to be unplugged from the guest and another one
>   to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated
> 
> * Patch 7 adds a new migration state that is entered while we wait for
>   devices to be unplugged by guest OS
> 
> * Patch 8 sets a new flag for PCIDevice 'partially_hotplugged' which we
>   use to skip the unrealize code path when doing a unplug of the primary
>   device
> 
> * Patch 9 sets the pending_deleted_event before triggering the guest
>   unplug request
> 
> Previous discussion:
>   RFC v1 https://patchwork.ozlabs.org/cover/989098/
>   RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
> 
> To summarize concerns/feedback from previous discussion:
> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>   Migration might get stuck for unpredictable time with unclear reason.
>   This approach combines two tricky things, hot/unplug and migration.
>   -> We need to let libvirt know what's happening. Add new qmp events
>  and a new migration state. When a primary device is (partially)
>  unplugged (only from guest) we send a qmp event with the device id. When
>  it is unplugged from the guest the DEVICE_DELETED event is sent.
>  Migration will enter the wait-unplug state while waiting for the guest
>  os to unplug all primary devices and then move on with migration.
> 2. PCI devices are a precious ressource. The primary device should never
>   be added to QEMU if it won't be used by guest instead of hiding it in
>   QEMU.
>   -> We only hotplug the device when the standby feature bit was
>  negotiated. We save the device cmdline options until we need it for
>  qdev_device_add()
>  Hiding a device can be a useful concept to model. For example a
>  pci device in a powered-off slot could be marked as hidden until the 
> slot is
>  powered on (mst).
> 3. Management layer software should handle this. Open Stack already has
>   components/code to handle unplug/replug VFIO devices and metadata to
>   provide to the guest for detecting which devices should be paired.
>   -> An approach that includes all software from firmware to
>

Re: [Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range

2019-08-12 Thread Max Reitz

On 10.08.19 21:31, Vladimir Sementsov-Ogievskiy wrote:
> copy_range ignores these limitations, let's improve it.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/io.c | 44 
>  1 file changed, 36 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/3] vdi: Make block_status recurse for fixed images

2019-08-12 Thread Vladimir Sementsov-Ogievskiy

25.07.2019 18:55, Max Reitz wrote:
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
> Signed-off-by: Max Reitz 

Sorry for a delay, I thought that maintainers of the formats will approve these 
patches ;)

Don't know vdi code, but it is what I suggested and seems to be the right thing 
to do:

Reviewed-by: Vladimir Sementsov-Ogievskiy 

> ---
>   block/vdi.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index b9845a4cbd..40d40c34d5 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -542,7 +542,8 @@ static int coroutine_fn 
> vdi_co_block_status(BlockDriverState *bs,
>   *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
>   index_in_block;
>   *file = bs->file->bs;
> -return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> +return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID |
> +(s->header.image_type == VDI_TYPE_STATIC ? BDRV_BLOCK_RECURSE : 0);
>   }
>   
>   static int coroutine_fn
> 


-- 
Best regards,
Vladimir

[Qemu-devel] [PATCH] elf: Allow loading AArch64 ELF files

2019-08-12 Thread Aaron Lindsay OS via Qemu-devel

Treat EM_AARCH64 as a valid value when checking the ELF's machine-type
header.

Signed-off-by: Aaron Lindsay 
---
 include/hw/elf_ops.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h
index 690f9238c8..f12faa90a1 100644
--- a/include/hw/elf_ops.h
+++ b/include/hw/elf_ops.h
@@ -381,6 +381,12 @@ static int glue(load_elf, SZ)(const char *name, int fd,
 goto fail;
 }
 break;
+case EM_AARCH64:
+if (ehdr.e_machine != EM_AARCH64) {
+ret = ELF_LOAD_WRONG_ARCH;
+goto fail;
+}
+break;
 default:
 if (elf_machine != ehdr.e_machine) {
 ret = ELF_LOAD_WRONG_ARCH;
-- 
2.17.1

[Qemu-devel] [Bug 1796520] Re: autogen crashes on qemu-sh4-user after 61dedf2af7

2019-08-12 Thread Peter Maydell

On that hardware, at least, the user-space visible FPSCR value is indeed
0x00080. Execution of the 'frchg' insn either doesn't trap, or the
trap is caught by the kernel and emulated. I think it is not being
emulated because CONFIG_SH_FPU_EMU is not set.

The comment at the top of arch/sh/kernel/cpu/sh4/fpu.c:
https://elixir.bootlin.com/linux/latest/source/arch/sh/kernel/cpu/sh4/fpu.c

says that on "some SH4 implementations" executing frchg with the
FPSCR.PR bit set causes a trap; the "SH-4 Software Manual" says it is
"undefined_operation()" which I think means "implementation could do
anything" (though the manual doesn't clearly define its terms here). The
kernel code in fpu.c carefully sets the FPSCR value to clear the PR bit
before performing the frchg instruction.

My best guess is that this is a glibc bug, where its getcontext etc
implementations should be doing the same set-fpscr-first work that the
kernel code is doing, and so the glibc code happens to work OK on
implementations like the SH7785 that seem to not trap here, but will
fail on whatever the SH4 variants are that do trap (as well as on QEMU).
But this needs an SH4 expert to clarify (for instance I might well have
misread the kernel code and maybe it does trap-and-emulate here so that
userspace can rely on the frchg behaviour with FPSCR.PR=1 ?).

Is there somebody we can ask for clarification on what the defined
behaviour of the architecture is here and what the impdef behaviour of
the 7750/7751/7785 happen to be ?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1796520

Title:
  autogen crashes on qemu-sh4-user after 61dedf2af7

Status in QEMU:
  Confirmed

Bug description:
  Running "autogen --help" crashes on qemu-sh4-user with:

  (sid-sh4-sbuild)root@nofan:/# autogen --help
  Unhandled trap: 0x180
  pc=0xf64dd2de sr=0x pr=0xf63b9c74 fpscr=0x0008
  spc=0x ssr=0x gbr=0xf61102a8 vbr=0x
  sgr=0x dbr=0x delayed_pc=0xf64dd2a0 fpul=0x0003
  r0=0xf6fc1320 r1=0x r2=0x5dc4 r3=0xf67bfb50
  r4=0xf6fc1230 r5=0xf6fc141c r6=0x03ff r7=0x
  r8=0x0004 r9=0xf63e20bc r10=0xf6fc141c r11=0xf63e28f0
  r12=0xf63e2258 r13=0xf63eae1c r14=0x0804 r15=0xf6fc1220
  r16=0x r17=0x r18=0x r19=0x
  r20=0x r21=0x r22=0x r23=0x
  (sid-sh4-sbuild)root@nofan:/#

  Bi-secting found this commit to be the culprit:

  61dedf2af79fb5866dc7a0f972093682f2185e17 is the first bad commit
  commit 61dedf2af79fb5866dc7a0f972093682f2185e17
  Author: Richard Henderson 
  Date:   Tue Jul 18 10:02:50 2017 -1000

  target/sh4: Add missing FPSCR.PR == 0 checks
  
  Both frchg and fschg require PR == 0, otherwise undefined_operation.
  
  Reviewed-by: Aurelien Jarno 
  Signed-off-by: Richard Henderson 
  Message-Id: <20170718200255.31647-26-...@twiddle.net>
  Signed-off-by: Aurelien Jarno 

  :04 04 980d79b69ae712f23a1e4c56983e97a843153b4a
  1024c109f506c7ad57367c63bc8bbbc8a7a36cd7 M  target

  Reverting 61dedf2af79fb5866dc7a0f972093682f2185e17 fixes the problem
  for me.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1796520/+subscriptions

1 2 >

1 - 100 of 195 matches

Mail list logo