Re: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX

2020-04-24 Thread Richard Henderson
On 4/24/20 3:47 PM, Stephen Long wrote:
> Oh, maybe I misread the manual description for SVE2 TBL, but I thought Zm was 
> the indexes register and the loop compares the index from Zm with the total 
> number of elems, table_elems.

That's right.  You take the index from Zm just fine, but fail to apply that
index properly across Zn and Zn+1.


r~

> 
> -Original Message-
> From: Richard Henderson 
> Sent: Friday, April 24, 2020 2:37 PM
> To: Stephen Long ; qemu-devel@nongnu.org
> Cc: qemu-...@nongnu.org; Ana Pazos 
> Subject: [EXT] Re: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX
> 
> On 4/23/20 9:42 AM, Stephen Long wrote:
>> Signed-off-by: Stephen Long 
>>
>> These insns don't show up under any SVE2 categories in the manual. But
>> if you lookup each insn, you'll find they have SVE2 variants.
>> ---
>>  target/arm/helper-sve.h| 10 +++
>>  target/arm/sve.decode  |  5 
>>  target/arm/sve_helper.c| 53 ++
>>  target/arm/translate-sve.c | 20 ++
>>  4 files changed, 88 insertions(+)
>>
>> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index
>> f6ae814021..54d20575e8 100644
>> --- a/target/arm/helper-sve.h
>> +++ b/target/arm/helper-sve.h
>> @@ -2687,3 +2687,13 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__s, 
>> TCG_CALL_NO_RWG,
>> void, ptr, ptr, ptr, ptr, i32)
>> DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG,
>> void, ptr, ptr, ptr, ptr, i32)
>> +
>> +DEF_HELPER_FLAGS_5(sve2_tbl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr,
>> +ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr,
>> +ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG,
>> +void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_d,
>> +TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
>> +
>> +DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr,
>> +i32) DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr,
>> +ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr,
>> +ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void,
>> +ptr, ptr, ptr, i32)
>> diff --git a/target/arm/sve.decode b/target/arm/sve.decode index
>> 3a2a4a7f1c..483fbf0dcc 100644
>> --- a/target/arm/sve.decode
>> +++ b/target/arm/sve.decode
>> @@ -1387,3 +1387,8 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . 
>> .  @rda_rn_rm
>>
>>  CMLA_   01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5  ra=%reg_movprfx
>>  SQRDCMLAH_  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5
>> ra=%reg_movprfx
>> +
>> +### SVE2 Table Lookup (three sources)
>> +
>> +TBL_zzz 0101 .. 1 . 00101 0 . .  @rd_rn_rm
>> +TBX_zzz 0101 .. 1 . 00101 1 . .  @rd_rn_rm
>> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index
>> 55e2c32f03..d1e91da02a 100644
>> --- a/target/arm/sve_helper.c
>> +++ b/target/arm/sve_helper.c
>> @@ -2968,6 +2968,59 @@ DO_TBL(sve_tbl_d, uint64_t, )
>>
>>  #undef TBL
>>
>> +#define DO_SVE2_TBL(NAME, TYPE, H) \
>> +void HELPER(NAME)(void *vd, void *vn1, void *vm, void *vn2, uint32_t desc)  
>> \
>> +{   
>> \
>> +intptr_t i, opr_sz = simd_oprsz(desc);  
>> \
>> +uintptr_t elem = opr_sz / sizeof(TYPE); 
>> \
>> +TYPE *d = vd, *n1 = vn1, *n2 = vn2, *m = vm;
>> \
>> +ARMVectorReg tmp1, tmp2;
>> \
> 
> Only one temp needed.
> 
>> +if (unlikely(vd == vn1)) {  
>> \
>> +n1 = memcpy(, vn1, opr_sz);
>> \
>> +}   
>> \
>> +if (unlikely(vd == vn2)) {  
>> \
>> +n2 = memcpy(, vn2, opr_sz);
>> \
>> +}
> 
> Better with else if here.
> Because vd cannot overlap both vn1 or vn2, only one of them.
>  \
>> +for (i = 0; i < elem; i++) {
>> \
>> +TYPE j = m[H(i)];   
>> \
>> +d[H(i)] = j < (elem * 2) ? n1[H(j)] : 0;
>> \
>> +
>> \
>> +TYPE k = m[H(elem + i)];
>> \
>> +d[H(elem + i)] = k < (elem * 2) ? n2[H(k)] : 0; 
>> \
>> +}
> 
> First, the indexing is wrong.
> 
> Note that you're range checking vs elem * 2, but only indexing a single 
> vector.
>  Thus you must be indexing into the next vector.
> 
> This should look more like
> 
> TYPE j = m[H(i)];
> TYPE r = 0;
> 
> if (j < elem) {
> 

RE: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX

2020-04-24 Thread Stephen Long
Oh, maybe I misread the manual description for SVE2 TBL, but I thought Zm was 
the indexes register and the loop compares the index from Zm with the total 
number of elems, table_elems.

-Original Message-
From: Richard Henderson 
Sent: Friday, April 24, 2020 2:37 PM
To: Stephen Long ; qemu-devel@nongnu.org
Cc: qemu-...@nongnu.org; Ana Pazos 
Subject: [EXT] Re: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX

On 4/23/20 9:42 AM, Stephen Long wrote:
> Signed-off-by: Stephen Long 
>
> These insns don't show up under any SVE2 categories in the manual. But
> if you lookup each insn, you'll find they have SVE2 variants.
> ---
>  target/arm/helper-sve.h| 10 +++
>  target/arm/sve.decode  |  5 
>  target/arm/sve_helper.c| 53 ++
>  target/arm/translate-sve.c | 20 ++
>  4 files changed, 88 insertions(+)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index
> f6ae814021..54d20575e8 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -2687,3 +2687,13 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__s, 
> TCG_CALL_NO_RWG,
> void, ptr, ptr, ptr, ptr, i32)
> DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG,
> void, ptr, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_5(sve2_tbl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr,
> +ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr,
> +ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG,
> +void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve2_tbl_d,
> +TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr,
> +i32) DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr,
> +ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr,
> +ptr, ptr, i32) DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void,
> +ptr, ptr, ptr, i32)
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode index
> 3a2a4a7f1c..483fbf0dcc 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -1387,3 +1387,8 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . . 
>  @rda_rn_rm
>
>  CMLA_   01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5  ra=%reg_movprfx
>  SQRDCMLAH_  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5
> ra=%reg_movprfx
> +
> +### SVE2 Table Lookup (three sources)
> +
> +TBL_zzz 0101 .. 1 . 00101 0 . .  @rd_rn_rm
> +TBX_zzz 0101 .. 1 . 00101 1 . .  @rd_rn_rm
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index
> 55e2c32f03..d1e91da02a 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2968,6 +2968,59 @@ DO_TBL(sve_tbl_d, uint64_t, )
>
>  #undef TBL
>
> +#define DO_SVE2_TBL(NAME, TYPE, H) \
> +void HELPER(NAME)(void *vd, void *vn1, void *vm, void *vn2, uint32_t desc)  \
> +{   \
> +intptr_t i, opr_sz = simd_oprsz(desc);  \
> +uintptr_t elem = opr_sz / sizeof(TYPE); \
> +TYPE *d = vd, *n1 = vn1, *n2 = vn2, *m = vm;\
> +ARMVectorReg tmp1, tmp2;\

Only one temp needed.

> +if (unlikely(vd == vn1)) {  \
> +n1 = memcpy(, vn1, opr_sz);\
> +}   \
> +if (unlikely(vd == vn2)) {  \
> +n2 = memcpy(, vn2, opr_sz);\
> +}

Better with else if here.
Because vd cannot overlap both vn1 or vn2, only one of them.
 \
> +for (i = 0; i < elem; i++) {\
> +TYPE j = m[H(i)];   \
> +d[H(i)] = j < (elem * 2) ? n1[H(j)] : 0;\
> +\
> +TYPE k = m[H(elem + i)];\
> +d[H(elem + i)] = k < (elem * 2) ? n2[H(k)] : 0; \
> +}

First, the indexing is wrong.

Note that you're range checking vs elem * 2, but only indexing a single vector.
 Thus you must be indexing into the next vector.

This should look more like

TYPE j = m[H(i)];
TYPE r = 0;

if (j < elem) {
r = n1[H(j)];
} else if (j < 2 * elem) {
r = n2[H(j - elem)];
}
d[H(i)] = r;

Second, this is one case where I'd prefer to share code with AArch64.  It would 
be worthwhile to rearrange both sve1 and advsimd to use a common set of helpers.

> +static bool trans_TBL_zzz(DisasContext *s, arg_rrr_esz *a)

_zzz is not helpful here.  The SVE1 

Re: [PATCH RFC] target/arm: Implement SVE2 SPLICE, EXT

2020-04-24 Thread Richard Henderson
On 4/23/20 11:03 AM, Stephen Long wrote:
> Signed-off-by: Stephen Long 
> 
> I'm not sure I can just use the SVE helper functions for the SVE2
> variants of EXT and SPLICE.

Absolutely.   The new insns are functionally identical to movprfx + the old
insn.  I presume the new insns were added for code density reasons, but I don't
know for sure.

> +SPLICE_zpz  0101 .. 101 101 100 ... . . @rd_pg_rn
> +EXT_zzi 0101 011 . 000 ... rn:5 rd:5 \

I have renamed these to SPLICE_sve2 and EXT_sve2, since the same suffix applies
to the original too.

Applied to my SVE2 branch.  Thanks!


r~



[PATCH] linux-user: return target error codes for socket() and prctl()

2020-04-24 Thread Helge Deller
Return target error codes instead of host error codes.

Signed-off-by: Helge Deller 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 05f03919ff..655a86fa45 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2987,7 +2987,7 @@ static abi_long do_socket(int domain, int type, int 
protocol)
 #endif
  protocol == NETLINK_KOBJECT_UEVENT ||
  protocol == NETLINK_AUDIT)) {
-return -EPFNOSUPPORT;
+return -TARGET_EPFNOSUPPORT;
 }

 if (domain == AF_PACKET ||
@@ -5856,7 +5856,7 @@ static abi_long do_get_thread_area(CPUX86State *env, 
abi_ulong ptr)

 abi_long do_arch_prctl(CPUX86State *env, int code, abi_ulong addr)
 {
-return -ENOSYS;
+return -TARGET_ENOSYS;
 }
 #else
 abi_long do_arch_prctl(CPUX86State *env, int code, abi_ulong addr)



Re: [PATCH] linux-user: Drop unnecessary check in dup3 syscall

2020-04-24 Thread Eric Blake

On 4/24/20 4:47 PM, Helge Deller wrote:


-    host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);
+    int host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);


I don't think this is quite correct.  target_to_host_bitmask()
silently ignores unknown bits, and a user that was relying on bit
0x4000 to cause an EINVAL will not fail with this change (unless
bit 0x4000 happens to be one of the bits translated by
fcntl_flags_tbl).


True.


The open() syscall is notorious for ignoring unknown bits rather than
failing with EINVAL, and it is has come back to haunt kernel
developers; newer syscalls like dup3() learned from the mistake, and
we really do want to catch unsupported bits up to make it easier for
future kernels to define meanings to those bits without them being
silently swallowed when run on older systems that did not know what
those bits meant.

Ok, I wasn't aware that it's a design goal to manually find such
cases of wrong userspace applications. But in this case, you're right
that my patch shouldn't be applied.


This, and several similar ones that you also posted.

Maybe you could add a new int target_to_host_bitmask_strict(int src, 
translate_tbl, int *dst), which returns 0 when *dst is bit-for-bit 
translated from src, and returns -1 if src had bits not specified by 
translate_tbl.  In that case, the caller can then translate all usual 
bits and rely on the syscall() failure (as you tried here), but you can 
also flag -TARGET_EINVAL up front for bits not covered by the table.




While looking at the code I just noticed another bug too, which needs
fixing then:

-if ((arg3 & ~TARGET_O_CLOEXEC) != 0) {
-return -EINVAL;

this needs to be:

-return -TARGET_EINVAL;


Indeed.  Good catch.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH] linux-user: Drop open-coded fcntl flags conversion in eventfd2 syscall

2020-04-24 Thread Richard Henderson
On 4/24/20 1:48 PM, Helge Deller wrote:
> Drop the open-coded fcntl flags conversion in the eventfd2 syscall and
> replace it with the built-in conversion with fcntl_flags_tbl.
> 
> Signed-off-by: Helge Deller 

Reviewed-by: Richard Henderson 


r~



Re: [PATCH] linux-user: Add support for /proc/cpuinfo on hppa platform

2020-04-24 Thread Richard Henderson
On 4/24/20 2:06 PM, Helge Deller wrote:
> Provide an own /proc/cpuinfo file for the hppa (parisc) platform.

"our own" perhaps?

> Signed-off-by: Helge Deller 

Reviewed-by: Richard Henderson 


r~



Re: [PATCH] linux-user: Drop unnecessary check in dup3 syscall

2020-04-24 Thread Helge Deller
On 24.04.20 23:32, Eric Blake wrote:
> On 4/24/20 3:57 PM, Helge Deller wrote:
>> Drop the extra check in dup3() if anything other than FD_CLOEXEC (aka
>> O_CLOEXEC) was given. Instead simply rely on any error codes returned by
>> the host dup3() syscall.
>>
>> Signed-off-by: Helge Deller 
>>
>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>> index 05f03919ff..ebf0d38321 100644
>> --- a/linux-user/syscall.c
>> +++ b/linux-user/syscall.c
>> @@ -8301,12 +8310,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
>> abi_long arg1,
>>   #if defined(CONFIG_DUP3) && defined(TARGET_NR_dup3)
>>   case TARGET_NR_dup3:
>>   {
>> -    int host_flags;
>> -
>> -    if ((arg3 & ~TARGET_O_CLOEXEC) != 0) {
>> -    return -EINVAL;
>> -    }
>> -    host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);
>> +    int host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);
>
> I don't think this is quite correct.  target_to_host_bitmask()
> silently ignores unknown bits, and a user that was relying on bit
> 0x4000 to cause an EINVAL will not fail with this change (unless
> bit 0x4000 happens to be one of the bits translated by
> fcntl_flags_tbl).

True.

> The open() syscall is notorious for ignoring unknown bits rather than
> failing with EINVAL, and it is has come back to haunt kernel
> developers; newer syscalls like dup3() learned from the mistake, and
> we really do want to catch unsupported bits up to make it easier for
> future kernels to define meanings to those bits without them being
> silently swallowed when run on older systems that did not know what
> those bits meant.
Ok, I wasn't aware that it's a design goal to manually find such
cases of wrong userspace applications. But in this case, you're right
that my patch shouldn't be applied.

While looking at the code I just noticed another bug too, which needs
fixing then:
>> -if ((arg3 & ~TARGET_O_CLOEXEC) != 0) {
>> -return -EINVAL;
this needs to be:
>> -return -TARGET_EINVAL;

Helge



Re: [PATCH RFC 1/3] target/arm: Implement SVE2 AESMC, AESIMC

2020-04-24 Thread Richard Henderson
On 4/23/20 3:37 PM, Stephen Long wrote:
> +#define DO_CRYPTO_AESMC(NAME, DECRYPT)  \
> +void HELPER(NAME)(void *vd, void *vn, uint32_t desc)\
> +{   \
> +intptr_t i, opr_sz = simd_oprsz(desc);  \
> +void *d = vd, *n = vn;  \
> +for (i = 0; i < opr_sz; i += 16) {  \
> +HELPER(crypto_aesmc)(vd + i, vn + i, DECRYPT);  \
> +}
> +}

Better, IMO, is to add a "desc" parameter to crypto_aesmc and move the loop
there.  Then all variants of this operation use the exact same helper.  The
separate decrypt parameter would become simd_data().

Same for the other two patches in this series.


r~



Re: [PATCH RFC] target/arm: Implement SVE2 TBL, TBX

2020-04-24 Thread Richard Henderson
On 4/23/20 9:42 AM, Stephen Long wrote:
> Signed-off-by: Stephen Long 
> 
> These insns don't show up under any SVE2 categories in the manual. But
> if you lookup each insn, you'll find they have SVE2 variants.
> ---
>  target/arm/helper-sve.h| 10 +++
>  target/arm/sve.decode  |  5 
>  target/arm/sve_helper.c| 53 ++
>  target/arm/translate-sve.c | 20 ++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index f6ae814021..54d20575e8 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -2687,3 +2687,13 @@ DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__s, 
> TCG_CALL_NO_RWG,
> void, ptr, ptr, ptr, ptr, i32)
>  DEF_HELPER_FLAGS_5(sve2_sqrdcmlah__d, TCG_CALL_NO_RWG,
> void, ptr, ptr, ptr, ptr, i32)
> +
> +DEF_HELPER_FLAGS_5(sve2_tbl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
> i32)
> +DEF_HELPER_FLAGS_5(sve2_tbl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
> i32)
> +DEF_HELPER_FLAGS_5(sve2_tbl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
> i32)
> +DEF_HELPER_FLAGS_5(sve2_tbl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
> i32)
> +
> +DEF_HELPER_FLAGS_4(sve2_tbx_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(sve2_tbx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(sve2_tbx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> +DEF_HELPER_FLAGS_4(sve2_tbx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
> diff --git a/target/arm/sve.decode b/target/arm/sve.decode
> index 3a2a4a7f1c..483fbf0dcc 100644
> --- a/target/arm/sve.decode
> +++ b/target/arm/sve.decode
> @@ -1387,3 +1387,8 @@ UMLSLT_zzzw 01000100 .. 0 . 010 111 . . 
>  @rda_rn_rm
>  
>  CMLA_   01000100 esz:2 0 rm:5 0010 rot:2 rn:5 rd:5  ra=%reg_movprfx
>  SQRDCMLAH_  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
> +
> +### SVE2 Table Lookup (three sources)
> +
> +TBL_zzz 0101 .. 1 . 00101 0 . .  @rd_rn_rm
> +TBX_zzz 0101 .. 1 . 00101 1 . .  @rd_rn_rm
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 55e2c32f03..d1e91da02a 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -2968,6 +2968,59 @@ DO_TBL(sve_tbl_d, uint64_t, )
>  
>  #undef TBL
>  
> +#define DO_SVE2_TBL(NAME, TYPE, H) \
> +void HELPER(NAME)(void *vd, void *vn1, void *vm, void *vn2, uint32_t desc)  \
> +{   \
> +intptr_t i, opr_sz = simd_oprsz(desc);  \
> +uintptr_t elem = opr_sz / sizeof(TYPE); \
> +TYPE *d = vd, *n1 = vn1, *n2 = vn2, *m = vm;\
> +ARMVectorReg tmp1, tmp2;\

Only one temp needed.

> +if (unlikely(vd == vn1)) {  \
> +n1 = memcpy(, vn1, opr_sz);\
> +}   \
> +if (unlikely(vd == vn2)) {  \
> +n2 = memcpy(, vn2, opr_sz);\
> +}  

Better with else if here.
Because vd cannot overlap both vn1 or vn2, only one of them.
 \
> +for (i = 0; i < elem; i++) {\
> +TYPE j = m[H(i)];   \
> +d[H(i)] = j < (elem * 2) ? n1[H(j)] : 0;\
> +\
> +TYPE k = m[H(elem + i)];\
> +d[H(elem + i)] = k < (elem * 2) ? n2[H(k)] : 0; \
> +}

First, the indexing is wrong.

Note that you're range checking vs elem * 2, but only indexing a single vector.
 Thus you must be indexing into the next vector.

This should look more like

TYPE j = m[H(i)];
TYPE r = 0;

if (j < elem) {
r = n1[H(j)];
} else if (j < 2 * elem) {
r = n2[H(j - elem)];
}
d[H(i)] = r;

Second, this is one case where I'd prefer to share code with AArch64.  It would
be worthwhile to rearrange both sve1 and advsimd to use a common set of helpers.

> +static bool trans_TBL_zzz(DisasContext *s, arg_rrr_esz *a)

_zzz is not helpful here.  The SVE1 insn also operates on 3 registers, and thus
could logically be _zzz too.

Better might be _double, after double_table = TRUE, or maybe just _2 just in
case SVE3 adds a variant with more table registers.


r~



Re: [PATCH] linux-user: Drop unnecessary check in dup3 syscall

2020-04-24 Thread Eric Blake

On 4/24/20 3:57 PM, Helge Deller wrote:

Drop the extra check in dup3() if anything other than FD_CLOEXEC (aka
O_CLOEXEC) was given. Instead simply rely on any error codes returned by
the host dup3() syscall.

Signed-off-by: Helge Deller 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 05f03919ff..ebf0d38321 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8301,12 +8310,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
  #if defined(CONFIG_DUP3) && defined(TARGET_NR_dup3)
  case TARGET_NR_dup3:
  {
-int host_flags;
-
-if ((arg3 & ~TARGET_O_CLOEXEC) != 0) {
-return -EINVAL;
-}
-host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);
+int host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);


I don't think this is quite correct.  target_to_host_bitmask() silently 
ignores unknown bits, and a user that was relying on bit 0x4000 to 
cause an EINVAL will not fail with this change (unless bit 0x4000 
happens to be one of the bits translated by fcntl_flags_tbl).  The 
open() syscall is notorious for ignoring unknown bits rather than 
failing with EINVAL, and it is has come back to haunt kernel developers; 
newer syscalls like dup3() learned from the mistake, and we really do 
want to catch unsupported bits up to make it easier for future kernels 
to define meanings to those bits without them being silently swallowed 
when run on older systems that did not know what those bits meant.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [RFC patch v1 2/3] qemu-file: add buffered mode

2020-04-24 Thread Eric Blake

On 4/13/20 6:12 AM, Denis Plotnikov wrote:

The patch adds ability to qemu-file to write the data
asynchronously to improve the performance on writing.
Before, only synchronous writing was supported.

Enabling of the asyncronous mode is managed by new


asynchronous


"enabled_buffered" callback.


The term "enabled_buffered" does not appear in the patch.  Did you mean...



Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 351 +---
  migration/qemu-file.h   |   9 ++
  3 files changed, 339 insertions(+), 22 deletions(-)




@@ -60,6 +66,22 @@ struct QEMUFile {
  bool shutdown;
  /* currently used buffer */
  QEMUFileBuffer *current_buf;
+/*
+ * with buffered_mode enabled all the data copied to 512 byte
+ * aligned buffer, including iov data. Then the buffer is passed
+ * to writev_buffer callback.
+ */
+bool buffered_mode;


..."Asynchronous mode is managed by setting the new buffered_mode flag"? 
 ...




+/* for async buffer writing */
+AioTaskPool *pool;
+/* the list of free buffers, currently used on is NOT there */


s/on/one/


+QLIST_HEAD(, QEMUFileBuffer) free_buffers;
+};
+
+struct QEMUFileAioTask {
+AioTask task;
+QEMUFile *f;
+QEMUFileBuffer *fb;
  };
  
  /*

@@ -115,10 +137,42 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
  f->opaque = opaque;
  f->ops = ops;
  
-f->current_buf = g_new0(QEMUFileBuffer, 1);

-f->current_buf->buf = g_malloc(IO_BUF_SIZE);
-f->current_buf->iov = g_new0(struct iovec, MAX_IOV_SIZE);
-f->current_buf->may_free = bitmap_new(MAX_IOV_SIZE);
+if (f->ops->enable_buffered) {
+f->buffered_mode = f->ops->enable_buffered(f->opaque);


...ah, you meant 'enable_buffered'.  But still, why do we need a 
callback function?  Is it not sufficient to just have a bool flag?




+static size_t get_buf_free_size(QEMUFile *f)
+{
+QEMUFileBuffer *fb = f->current_buf;
+/* buf_index can't be greated than buf_size */


greater


+assert(fb->buf_size >= fb->buf_index);
+return fb->buf_size - fb->buf_index;
+}
+



+static int write_task_fn(AioTask *task)
+{



+/*
+ * Increment file position.
+ * This needs to be here before calling writev_buffer, because
+ * writev_buffer is asynchronous and there could be more than one
+ * writev_buffer started simultaniously. Each writev_buffer should


simultaneously


+ * use its own file pos to write to. writev_buffer may write less
+ * than buf_index bytes but we treat this situation as an error.
+ * If error appeared, further file using is meaningless.


s/using/use/


+ * We expect that, the most of the time the full buffer is written,
+ * (when buf_size == buf_index). The only case when the non-full
+ * buffer is written (buf_size != buf_index) is file close,
+ * when we need to flush the rest of the buffer content.


We expect that most of the time, the full buffer will be written 
(buf_size == buf_index), with the exception at file close where we need 
to flush the final partial buffer.



+ */
+f->pos += fb->buf_index;
+
+ret = f->ops->writev_buffer(f->opaque, , 1, pos, _error);
+
+/* return the just written buffer to the free list */
+QLIST_INSERT_HEAD(>free_buffers, fb, link);
+
+/* check that we have written everything */
+if (ret != fb->buf_index) {
+qemu_file_set_error_obj(f, ret < 0 ? ret : -EIO, local_error);
+}
+
+/*
+ * always return 0 - don't use task error handling, relay on


rely


+ * qemu file error handling
+ */
+return 0;
+}
+
+static void qemu_file_switch_current_buf(QEMUFile *f)
+{
+/*
+ * if the list is empty, wait until some task returns a buffer
+ * to the list of free buffers.
+ */
+if (QLIST_EMPTY(>free_buffers)) {
+aio_task_pool_wait_slot(f->pool);
+}
+
+/*
+ * sanity check that the list isn't empty
+ * if the free list was empty, we waited for a task complition,


completion


+ * and the pompleted task must return a buffer to a list of free buffers


completed


+ */
+assert(!QLIST_EMPTY(>free_buffers));
+
+/* set the current buffer for using from the free list */
+f->current_buf = QLIST_FIRST(>free_buffers);
+reset_buf(f);
+
+QLIST_REMOVE(f->current_buf, link);
+}
+


  
  /*

+ * Copy an external buffer to the intenal current buffer.


internal


+ */
+static void copy_buf(QEMUFile *f, const uint8_t *buf, size_t size,
+ bool may_free)
+{



+++ b/migration/qemu-file.h
@@ -103,6 +103,14 @@ typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
  typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr,
 Error **errp);
  
+/*

+ * Enables or disables the buffered mode
+ * Existing blocking reads/writes must be woken
+ * Returns true if the buffered mode has 

Re: [RFC patch v1 1/3] qemu-file: introduce current buffer

2020-04-24 Thread Eric Blake

On 4/13/20 6:12 AM, Denis Plotnikov wrote:

To approach async wrtiting in the further commits, the buffer


writing


allocated in QEMUFile struct is replaced with the link to the
current buffer. We're going to use many buffers to write the
qemu file stream to the unerlying storage asynchronously. The


underlying


current buffer points out to the buffer is currently filled
with data.


This sentence is confusing.  I'd suggest: The current_buf pointer tracks 
which buffer is currently filled with data.




This patch doesn't add any features to qemu-file and doesn't
change any qemu-file behavior.

Signed-off-by: Denis Plotnikov 
---
  include/qemu/typedefs.h |   1 +
  migration/qemu-file.c   | 156 +---
  2 files changed, 95 insertions(+), 62 deletions(-)

diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 375770a..88dce54 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -97,6 +97,7 @@ typedef struct QDict QDict;
  typedef struct QEMUBH QEMUBH;
  typedef struct QemuConsole QemuConsole;
  typedef struct QEMUFile QEMUFile;
+typedef struct QEMUFileBuffer QEMUFileBuffer;
  typedef struct QemuLockable QemuLockable;
  typedef struct QemuMutex QemuMutex;
  typedef struct QemuOpt QemuOpt;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1c3a358..285c6ef 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -33,6 +33,17 @@
  #define IO_BUF_SIZE 32768
  #define MAX_IOV_SIZE MIN(IOV_MAX, 64)
  
+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(IO_BUF_SIZE, 512));

+
+struct QEMUFileBuffer {
+int buf_index;
+int buf_size; /* 0 when writing */
+uint8_t *buf;
+unsigned long *may_free;


Do we really want something that compiles differently on 32- vs. 64-bit?
/me reads ahead...


+struct iovec *iov;
+unsigned int iovcnt;
+};
+
  struct QEMUFile {
  const QEMUFileOps *ops;
  const QEMUFileHooks *hooks;
@@ -43,18 +54,12 @@ struct QEMUFile {
  
  int64_t pos; /* start of buffer when writing, end of buffer

  when reading */
-int buf_index;
-int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
-
-DECLARE_BITMAP(may_free, MAX_IOV_SIZE);


...ah, you're replacing a bitmap, and our bitmap code _does_ use 'long' 
as its core for optimum speed (which overcomes the fact that it does 
mean annoying differences on 32- vs. 64-bit).



-struct iovec iov[MAX_IOV_SIZE];
-unsigned int iovcnt;
-
  int last_error;
  Error *last_error_obj;
  /* has the file has been shutdown */
  bool shutdown;
+/* currently used buffer */
+QEMUFileBuffer *current_buf;
  };
  


Most of the patch is mechanical conversion to the rearranged struct.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[PATCH] linux-user: Add support for /proc/cpuinfo on hppa platform

2020-04-24 Thread Helge Deller
Provide an own /proc/cpuinfo file for the hppa (parisc) platform.

Signed-off-by: Helge Deller 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 05f03919ff..ebf0d38321 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7438,6 +7435,18 @@ static int open_cpuinfo(void *cpu_env, int fd)
 }
 #endif

+#if defined(TARGET_HPPA)
+static int open_cpuinfo(void *cpu_env, int fd)
+{
+dprintf(fd, "cpu family\t: PA-RISC 1.1e\n");
+dprintf(fd, "cpu\t\t: PA7300LC (PCX-L2)\n");
+dprintf(fd, "capabilities\t: os32\n");
+dprintf(fd, "model\t\t: 9000/778/B160L\n");
+dprintf(fd, "model name\t: Merlin L2 160 QEMU (9000/778/B160L)\n");
+return 0;
+}
+#endif
+
 #if defined(TARGET_M68K)
 static int open_hardware(void *cpu_env, int fd)
 {
@@ -7462,7 +7471,7 @@ static int do_openat(void *cpu_env, int dirfd, const char 
*pathname, int flags,
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN)
 { "/proc/net/route", open_net_route, is_proc },
 #endif
-#if defined(TARGET_SPARC)
+#if defined(TARGET_SPARC) || defined(TARGET_HPPA)
 { "/proc/cpuinfo", open_cpuinfo, is_proc },
 #endif
 #if defined(TARGET_M68K)



[PATCH] linux-user: Drop unnecessary check in signalfd4 syscall

2020-04-24 Thread Helge Deller
The signalfd4() syscall takes optional O_NONBLOCK and O_CLOEXEC fcntl
flags.  If the user gave any other invalid flags, the host syscall will
return correct error codes, so simply drop the extra check here.

Signed-off-by: Helge Deller 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 05f03919ff..ebf0d38321 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7176,9 +7176,6 @@ static abi_long do_signalfd4(int fd, abi_long mask, int 
flags)
 sigset_t host_mask;
 abi_long ret;

-if (flags & ~(TARGET_O_NONBLOCK | TARGET_O_CLOEXEC)) {
-return -TARGET_EINVAL;
-}
 if (!lock_user_struct(VERIFY_READ, target_mask, mask, 1)) {
 return -TARGET_EFAULT;
 }



[PATCH] linux-user: Drop unnecessary check in dup3 syscall

2020-04-24 Thread Helge Deller
Drop the extra check in dup3() if anything other than FD_CLOEXEC (aka
O_CLOEXEC) was given. Instead simply rely on any error codes returned by
the host dup3() syscall.

Signed-off-by: Helge Deller 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 05f03919ff..ebf0d38321 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8301,12 +8310,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(CONFIG_DUP3) && defined(TARGET_NR_dup3)
 case TARGET_NR_dup3:
 {
-int host_flags;
-
-if ((arg3 & ~TARGET_O_CLOEXEC) != 0) {
-return -EINVAL;
-}
-host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);
+int host_flags = target_to_host_bitmask(arg3, fcntl_flags_tbl);
 ret = get_errno(dup3(arg1, arg2, host_flags));
 if (ret >= 0) {
 fd_trans_dup(arg1, arg2);



[PATCH] linux-user: Drop open-coded fcntl flags conversion in eventfd2 syscall

2020-04-24 Thread Helge Deller
Drop the open-coded fcntl flags conversion in the eventfd2 syscall and
replace it with the built-in conversion with fcntl_flags_tbl.

Signed-off-by: Helge Deller 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 05f03919ff..ebf0d38321 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -11938,13 +11942,7 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #if defined(TARGET_NR_eventfd2)
 case TARGET_NR_eventfd2:
 {
-int host_flags = arg2 & (~(TARGET_O_NONBLOCK | TARGET_O_CLOEXEC));
-if (arg2 & TARGET_O_NONBLOCK) {
-host_flags |= O_NONBLOCK;
-}
-if (arg2 & TARGET_O_CLOEXEC) {
-host_flags |= O_CLOEXEC;
-}
+int host_flags = target_to_host_bitmask(arg2, fcntl_flags_tbl);
 ret = get_errno(eventfd(arg1, host_flags));
 if (ret >= 0) {
 fd_trans_register(ret, _eventfd_trans);



[Bug 1874904] [NEW] qemu windows with gtk and french keypad not working as expected

2020-04-24 Thread Jean-Yves BAUDY
Public bug reported:

When I use qemu on Windows 10 with a French AZERTY keypad with the
correct options the emulated system keypad still QWERTY. If I use sdl it
works fine the emulated keypad is AZERTY.

Example of command with ubuntu live cd:
-> qemu-system-x86_64.exe  -m 4G ubuntu-18.04.4-desktop-amd64.iso -display gtk 
-k fr

NOTES:
 - Using the same command on Linux works fine. The emulated keypad is AZERTY.

Qemu version : 4.2.0 on Windows 10

** Affects: qemu
 Importance: Undecided
 Status: New

** Description changed:

  When I use qemu on Windows 10 with a French AZERTY keypad with the
- correct options the emulated system keypad still QWERTY. If we use sdl
- it works fine the emulated keypad is AZERTY.
+ correct options the emulated system keypad still QWERTY. If I use sdl it
+ works fine the emulated keypad is AZERTY.
  
  Example of command with ubuntu live cd:
  -> qemu-system-x86_64.exe  -m 4G ubuntu-18.04.4-desktop-amd64.iso -display 
gtk -k fr
  
  NOTES:
-  - Using the same command on Linux works fine. The emulated keypad is AZERTY.
+  - Using the same command on Linux works fine. The emulated keypad is AZERTY.
  
  Qemu version : 4.2.0 on Windows 10

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1874904

Title:
  qemu windows with gtk and french keypad not working as expected

Status in QEMU:
  New

Bug description:
  When I use qemu on Windows 10 with a French AZERTY keypad with the
  correct options the emulated system keypad still QWERTY. If I use sdl
  it works fine the emulated keypad is AZERTY.

  Example of command with ubuntu live cd:
  -> qemu-system-x86_64.exe  -m 4G ubuntu-18.04.4-desktop-amd64.iso -display 
gtk -k fr

  NOTES:
   - Using the same command on Linux works fine. The emulated keypad is AZERTY.

  Qemu version : 4.2.0 on Windows 10

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1874904/+subscriptions



Re: [PATCH v3] target/arm: Implement SVE2 FMMLA

2020-04-24 Thread Richard Henderson
On 4/22/20 9:55 AM, Stephen Long wrote:
> +intptr_t opr_sz = simd_oprsz(desc) / (sizeof(TYPE) >> 2);   \
> +\
> +for (s = 0; s < opr_sz; ++s) {  \
> +TYPE *n = vn + s * (sizeof(TYPE) >> 2); \
> +TYPE *m = vm + s * (sizeof(TYPE) >> 2); \
> +TYPE *a = va + s * (sizeof(TYPE) >> 2); \
> +TYPE *d = vd + s * (sizeof(TYPE) >> 2); \

Shifting the wrong way.  Need to multiply by 4 not divide.

I've fixed this up, and also expanded the macro to two functions; I think it's
clearer that way in this case.

Applied to my SVE2 branch.  Thanks,


r~



Re: [PATCH RFC] target/arm: Implement SVE2 gather load insns

2020-04-24 Thread Richard Henderson
On 4/22/20 8:23 AM, Stephen Long wrote:
> Add decoding logic for SVE2 64-bit/32-bit gather non-temporal load
> insns.
> 
> 64-bit
> * LDNT1SB
> * LDNT1B (vector plus scalar)
> * LDNT1SH
> * LDNT1H (vector plus scalar)
> * LDNT1SW
> * LDNT1W (vector plus scalar)
> * LDNT1D (vector plus scalar)
> 
> 32-bit
> * LDNT1SB
> * LDNT1B (vector plus scalar)
> * LDNT1SH
> * LDNT1H (vector plus scalar)
> * LDNT1W (vector plus scalar)
> 
> Signed-off-by: Stephen Long 
> 
> I'm not sure I'm initializing xs correctly. This also goes for the
> scatter store insns in the previous patch.

You did.  xs=0 is 32-bit unsigned offset, xs=1 is 32-bit signed offset
(directly from the SVE encoding); I repurpose xs=2 as 64-bit offset.  There's a
comment in there next to the load/store helper array to that effect.

> ---
>  target/arm/sve.decode  | 11 +++
>  target/arm/translate-sve.c |  8 
>  2 files changed, 19 insertions(+)

Applied to my SVE2 branch.  Thanks!


r~



Re: [RFC PATCH] translate-all: include guest address in out_asm output

2020-04-24 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200424173914.2957-1-alex.ben...@linaro.org/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  aarch64-softmmu/hw/arm/pxa2xx_pic.o
  CC  aarch64-softmmu/hw/arm/digic.o
/tmp/qemu-test/src/accel/tcg/translate-all.c: In function 'tb_gen_code':
/tmp/qemu-test/src/accel/tcg/translate-all.c:1806:43: error: format '%ld' 
expects argument of type 'long int', but argument 2 has type 'size_t' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("  prologue: [size=%ld]\n", insn_start);
 ~~^  ~~
 %lld
/tmp/qemu-test/src/accel/tcg/translate-all.c:1827:39: error: format '%ld' 
expects argument of type 'long int', but argument 2 has type 'size_t' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("  data: [size=%ld]\n", data_size);
 ~~^  ~
 %lld
cc1: all warnings being treated as errors
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: accel/tcg/translate-all.o] Error 
1
make[1]: *** Waiting for unfinished jobs
  CC  aarch64-softmmu/hw/arm/omap1.o
  CC  x86_64-softmmu/gdbstub-xml.o
/tmp/qemu-test/src/accel/tcg/translate-all.c: In function 'tb_gen_code':
/tmp/qemu-test/src/accel/tcg/translate-all.c:1806:43: error: format '%ld' 
expects argument of type 'long int', but argument 2 has type 'size_t' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("  prologue: [size=%ld]\n", insn_start);
 ~~^  ~~
 %lld
/tmp/qemu-test/src/accel/tcg/translate-all.c:1827:39: error: format '%ld' 
expects argument of type 'long int', but argument 2 has type 'size_t' {aka 
'long long unsigned int'} [-Werror=format=]
 qemu_log("  data: [size=%ld]\n", data_size);
 ~~^  ~
 %lld
cc1: all warnings being treated as errors
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: accel/tcg/translate-all.o] Error 
1
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:527: x86_64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs
make: *** [Makefile:527: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=710671f8720441949df423784981c162', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-frsoowsh/src/docker-src.2020-04-24-16.07.10.20709:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=710671f8720441949df423784981c162
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-frsoowsh/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real2m52.582s
user0m8.929s


The full log is available at
http://patchew.org/logs/20200424173914.2957-1-alex.ben...@linaro.org/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 06/11] error: Use error_reportf_err() where appropriate

2020-04-24 Thread Eric Blake

On 4/24/20 2:20 PM, Markus Armbruster wrote:

Replace

 error_report("...: %s", ..., error_get_pretty(err));

by

 error_reportf_err(err, "...: ", ...);


Reviewed-by: Eric Blake 



Signed-off-by: Markus Armbruster 
---
  chardev/char-socket.c | 5 +++--
  hw/sd/pxa2xx_mmci.c   | 4 ++--
  hw/sd/sd.c| 4 ++--
  hw/usb/dev-mtp.c  | 9 +
  qemu-nbd.c| 7 +++
  scsi/qemu-pr-helper.c | 4 ++--
  6 files changed, 17 insertions(+), 16 deletions(-)


Although it touches NBD, I'm happy for this to go through your tree with 
the larger series.



+++ b/qemu-nbd.c
@@ -856,8 +856,7 @@ int main(int argc, char **argv)
  }
  tlscreds = nbd_get_tls_creds(tlscredsid, list, _err);
  if (local_err) {
-error_report("Failed to get TLS creds %s",
- error_get_pretty(local_err));
+error_reportf_err(local_err, "Failed to get TLS creds ");


Odd one out for not using ':' in the message, but that's independent of 
this patch.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2] target/arm: Implement SVE2 scatter store insns

2020-04-24 Thread Richard Henderson
On 4/22/20 7:15 AM, Stephen Long wrote:
> Add decoding logic for SVE2 64-bit/32-bit scatter non-temporal store
> insns.
> 
> 64-bit
> * STNT1B (vector plus scalar)
> * STNT1H (vector plus scalar)
> * STNT1W (vector plus scalar)
> * STNT1D (vector plus scalar)
> 
> 32-bit
> * STNT1B (vector plus scalar)
> * STNT1H (vector plus scalar)
> * STNT1W (vector plus scalar)
> 
> Signed-off-by: Stephen Long 
> 
> Cool, it seemed to typedef correctly.
> ---
>  target/arm/sve.decode  | 10 ++
>  target/arm/translate-sve.c |  8 
>  2 files changed, 18 insertions(+)

Applied to my SVE2 branch.  Thanks!


r~



Re: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type

2020-04-24 Thread Alistair Francis
On Fri, Apr 24, 2020 at 12:12 PM Corey Wharton  wrote:
>
>
>
> > -Original Message-
> > From: Alistair Francis 
> > Sent: Friday, April 24, 2020 9:04 AM
> > To: Corey Wharton 
> > Cc: qemu-devel@nongnu.org Developers ; open
> > list:RISC-V ; Sagar Karandikar
> > ; Bastian Koppelmann  > paderborn.de>; Alistair Francis ; Palmer Dabbelt
> > ; Bin Meng 
> > Subject: Re: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type
> >
> > On Fri, Mar 13, 2020 at 12:35 PM Corey Wharton  wrote:
> > >
> > > Allows the CPU to be changed from the default via the -cpu command
> > > line option.
> > >
> > > Signed-off-by: Corey Wharton 
> > > Reviewed-by: Bin Meng 
> > > Reviewed-by: Alistair Francis 
> >
> > Thanks for the patch.
> >
> > Unfortunately this fails `make check`.
> >
> > The problem is that the machine has the `default_cpu_type` set but then you
> > set "cpu-type" from the SoC.
> >
> > This diff fixes the make check failure for me:
> >
> > diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index
> > 1fd08f325c..b53109521e 100644
> > --- a/hw/riscv/sifive_e.c
> > +++ b/hw/riscv/sifive_e.c
> > @@ -123,8 +123,6 @@ static void riscv_sifive_e_soc_init(Object *obj)
> >  object_initialize_child(obj, "cpus", >cpus,
> >  sizeof(s->cpus), TYPE_RISCV_HART_ARRAY,
> >  _abort, NULL);
> > -object_property_set_str(OBJECT(>cpus), ms->cpu_type, "cpu-type",
> > -_abort);
> >  object_property_set_int(OBJECT(>cpus), ms->smp.cpus, "num-harts",
> >  _abort);
> >  sysbus_init_child_obj(obj, "riscv.sifive.e.gpio0", @@ -141,6 +139,8 @@
> > static void riscv_sifive_e_soc_realize(DeviceState
> > *dev, Error **errp)
> >  SiFiveESoCState *s = RISCV_E_SOC(dev);
> >  MemoryRegion *sys_mem = get_system_memory();
> >
> > +object_property_set_str(OBJECT(>cpus), ms->cpu_type, "cpu-type",
> > +_abort);
> >  object_property_set_bool(OBJECT(>cpus), true, "realized",
> >  _abort);
> >
> >
> > I'm happy to just squash that into the patch. Let me know if you don't want
> > me to do that and I'll drop these patches and let you re-send them.
> >
> > Alistair
> >
>
> Thanks for fixing this issue. I tested your patch and it seems to work as
> Intended and  I'm fine with you squashing it into the patch.

Great!

I'll send this patch as part of the PR after 5.0 then.

I also realised that my justification was wrong. It's not because of
the machine/SoC split, but because of the order between init/realise.

Alistair

>
> Corey
>
> > > ---
> > >  hw/riscv/sifive_e.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index
> > > a254cad489..b0a611adb9 100644
> > > --- a/hw/riscv/sifive_e.c
> > > +++ b/hw/riscv/sifive_e.c
> > > @@ -123,7 +123,7 @@ static void riscv_sifive_e_soc_init(Object *obj)
> > >  object_initialize_child(obj, "cpus", >cpus,
> > >  sizeof(s->cpus), TYPE_RISCV_HART_ARRAY,
> > >  _abort, NULL);
> > > -object_property_set_str(OBJECT(>cpus), SIFIVE_E_CPU, "cpu-type",
> > > +object_property_set_str(OBJECT(>cpus), ms->cpu_type,
> > > + "cpu-type",
> > >  _abort);
> > >  object_property_set_int(OBJECT(>cpus), ms->smp.cpus, "num-
> > harts",
> > >  _abort); @@ -220,6 +220,7 @@ static
> > > void riscv_sifive_e_machine_init(MachineClass *mc)
> > >  mc->desc = "RISC-V Board compatible with SiFive E SDK";
> > >  mc->init = riscv_sifive_e_init;
> > >  mc->max_cpus = 1;
> > > +mc->default_cpu_type = SIFIVE_E_CPU;
> > >  }
> > >
> > >  DEFINE_MACHINE("sifive_e", riscv_sifive_e_machine_init)
> > > --
> > > 2.21.1
> > >
> > >



Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions

2020-04-24 Thread Richard Henderson
On 4/21/20 9:39 PM, Richard Henderson wrote:
> On 4/20/20 3:29 AM, Szabolcs Nagy wrote:
>> i'm using the branch at
>>
>> https://github.com/rth7680/qemu/tree/tgt-arm-mte
>>
>> to test armv8.5-a mte and hope this is ok to report bugs here.
>>
>> i'm doing tests in qemu-system-aarch64 with linux userspace
>> code and it seems TCO bit gets cleared after syscalls or other
>> kernel entry, but PSTATE is expected to be restored, so i
>> suspect it is a qemu bug.
>>
>> i think the architecture saves/restores PSTATE using SPSR_ELx
>> on exceptions.
> 
> Yep.  I failed to update aarch64_pstate_valid_mask for TCO.
> Will fix.  Thanks,

Fixed on the branch.

I still need to work out how best to plumb the arm,armv8.5-memtag property so
the devel/mte-v3 kernel branch isn't usable as-is for the moment.  For myself,
I've just commented that test out for now.


r~



Re: [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()

2020-04-24 Thread Eric Blake

On 3/17/20 1:16 PM, Alberto Garcia wrote:

The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated.

This needs to happen even if the cluster was already allocated and the
L2 entry was otherwise valid.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
---
  block/qcow2-cluster.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ceacd91ea3..dfd8b66958 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1006,6 +1006,23 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
  assert((offset & L2E_OFFSET_MASK) == offset);
  
  set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);

+
+/* Update bitmap with the subclusters that were just written */
+if (has_subclusters(s)) {
+unsigned written_from = m->cow_start.offset;
+unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
+m->nb_clusters << s->cluster_bits;
+uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+int sc;
+for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
+int sc_off = i * s->cluster_size + sc * s->subcluster_size;
+if (sc_off >= written_from && sc_off < written_to) {
+l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
+l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
+}
+}


Are there more efficient ways to set this series of bits than iterating 
one bit at a time, while still remaining legible?  For example, what if 
we had something like:


l2_bitmap = get_l2_bitmap(...);
int sc_from = OFFSET_TO_SC(written_from);
int sc_to = OFFSET_TO_SC(written_to - 1);
l2_bitmap |= QCOW_OFLAG_SUB_ALLOC_RANGE(sc_from, sc_to);
l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO_RANGE(sc_from, sc_to);

which would require macros:

#define OFFSET_TO_SC(offset) (offset >> (s->cluster_bits - 6))
#define QCOW_OFLAG_SUB_ALLOC_RANGE(from, to) \
  deposit64(0, (from), (len) - (from), -1)
#define QCOW_OFLAG_SUB_ZERO_RANGE(from, to) \
  deposit64(0, (from) + 32, (len) - (from) + 32, -1)



+set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);


I'm hoping this function doesn't cause redundant I/O if the L2 entry 
didn't actually change.  But that's not the concern for this patch.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [RFC PATCH] translate-all: include guest address in out_asm output

2020-04-24 Thread Richard Henderson
On 4/24/20 10:39 AM, Alex Bennée wrote:
> +/* first dump prologue */
> +insn_start = tcg_ctx->gen_insn_end_off[0];
> +if (insn_start > 0) {
> +qemu_log("  prologue: [size=%ld]\n", insn_start);
> +log_disas(tb->tc.ptr, insn_start);
> +}
> +
> +do {
> +size_t insn_end;
> +if (insn < (tb->icount - 1)) {
> +insn_end = tcg_ctx->gen_insn_end_off[insn + 1];
> +} else {
> +insn_end = code_size;
> +}
> +qemu_log("  for guest addr: " TARGET_FMT_lx ":\n",
> + tcg_ctx->gen_insn_data[insn][0]);

The one thing you're missing here is when a given guest insn emits no host
insns.  E.g. an actual guest nop, or if two guest insns are optimized together.

So you need to search forward through empty insns til you find one that has
contents.  E.g. the very first TB that alpha-softmmu executes in its bios:

OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xfff0  dead: 1  pref=0x
 movi_i32 tmp1,$0x0   pref=0x
 brcond_i32 tmp0,tmp1,lt,$L0  dead: 0 1

  fc00

  fc04

  fc08
 movi_i64 gp,$0xfc012f50  sync: 0  pref=0x



OUT: [size=280]
  prologue: [size=11]
0x7fffa100:  8b 5d f0 movl -0x10(%rbp), %ebx
0x7fffa103:  85 dbtestl%ebx, %ebx
0x7fffa105:  0f 8c d6 00 00 00jl   0x7fffa1e1
  for guest addr: fc00:
  for guest addr: fc04:
0x7fffa10b:  48 bb 50 2f 01 00 00 fc  movabsq  $0xfc012f50, %rbx
0x7fffa113:  ff ff
0x7fffa115:  48 89 9d e8 00 00 00 movq %rbx, 0xe8(%rbp)
  for guest addr: fc08:

So you've attributed to ...04 what actually belongs to ...08.


But it's a good idea.


r~



[PATCH 02/11] xen: Fix and improve handling of device_add usb-host errors

2020-04-24 Thread Markus Armbruster
usbback_portid_add() leaks the error when qdev_device_add() fails.
Fix that.  While there, use the error to improve the error message.

The qemu_opts_from_qdict() similarly leaks on failure.  But any
failure there is a programming error.  Pass _abort.

Fixes: 816ac92ef769f9ffc534e49a1bb6177bddce7aa2
Cc: Stefano Stabellini 
Cc: Anthony Perard 
Cc: Paul Durrant 
Cc: Gerd Hoffmann 
Cc: xen-de...@lists.xenproject.org
Signed-off-by: Markus Armbruster 
---
 hw/usb/xen-usb.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
index 961190d0f7..42643c3390 100644
--- a/hw/usb/xen-usb.c
+++ b/hw/usb/xen-usb.c
@@ -30,6 +30,7 @@
 #include "hw/usb.h"
 #include "hw/xen/xen-legacy-backend.h"
 #include "monitor/qdev.h"
+#include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
 
@@ -755,13 +756,15 @@ static void usbback_portid_add(struct usbback_info 
*usbif, unsigned port,
 qdict_put_int(qdict, "port", port);
 qdict_put_int(qdict, "hostbus", atoi(busid));
 qdict_put_str(qdict, "hostport", portname);
-opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, _err);
-if (local_err) {
-goto err;
-}
+opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict,
+_abort);
 usbif->ports[port - 1].dev = USB_DEVICE(qdev_device_add(opts, _err));
 if (!usbif->ports[port - 1].dev) {
-goto err;
+qobject_unref(qdict);
+xen_pv_printf(>xendev, 0,
+  "device %s could not be opened: %s\n",
+  busid, error_get_pretty(local_err));
+error_free(local_err);
 }
 qobject_unref(qdict);
 speed = usbif->ports[port - 1].dev->speed;
@@ -793,11 +796,6 @@ static void usbback_portid_add(struct usbback_info *usbif, 
unsigned port,
 usbback_hotplug_enq(usbif, port);
 
 TR_BUS(>xendev, "port %d attached\n", port);
-return;
-
-err:
-qobject_unref(qdict);
-xen_pv_printf(>xendev, 0, "device %s could not be opened\n", busid);
 }
 
 static void usbback_process_port(struct usbback_info *usbif, unsigned port)
-- 
2.21.1




[PATCH 10/11] arm/sabrelite: Consistently use _fatal in sabrelite_init()

2020-04-24 Thread Markus Armbruster
Cc: Peter Maydell 
Cc: Jean-Christophe Dubois 
Signed-off-by: Markus Armbruster 
---
 hw/arm/sabrelite.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
index e31694bb92..04f4b96591 100644
--- a/hw/arm/sabrelite.c
+++ b/hw/arm/sabrelite.c
@@ -41,7 +41,6 @@ static void sabrelite_reset_secondary(ARMCPU *cpu,
 static void sabrelite_init(MachineState *machine)
 {
 FslIMX6State *s;
-Error *err = NULL;
 
 /* Check the amount of memory is compatible with the SOC */
 if (machine->ram_size > FSL_IMX6_MMDC_SIZE) {
@@ -52,11 +51,7 @@ static void sabrelite_init(MachineState *machine)
 
 s = FSL_IMX6(object_new(TYPE_FSL_IMX6));
 object_property_add_child(OBJECT(machine), "soc", OBJECT(s), _fatal);
-object_property_set_bool(OBJECT(s), true, "realized", );
-if (err != NULL) {
-error_report("%s", error_get_pretty(err));
-exit(1);
-}
+object_property_set_bool(OBJECT(s), true, "realized", _fatal);
 
 memory_region_add_subregion(get_system_memory(), FSL_IMX6_MMDC_ADDR,
 machine->ram);
-- 
2.21.1




[PATCH 09/11] mips/boston: Plug memory leak in boston_mach_init()

2020-04-24 Thread Markus Armbruster
Fixes: df1d8a1f29f567567b9d20be685a4241282e7005
Cc: Paul Burton 
Cc: Aleksandar Rikalo 
Signed-off-by: Markus Armbruster 
---
 hw/mips/boston.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/mips/boston.c b/hw/mips/boston.c
index 2832dfa6ae..a896056be1 100644
--- a/hw/mips/boston.c
+++ b/hw/mips/boston.c
@@ -426,7 +426,6 @@ static void boston_mach_init(MachineState *machine)
 {
 DeviceState *dev;
 BostonState *s;
-Error *err = NULL;
 MemoryRegion *flash, *ddr_low_alias, *lcd, *platreg;
 MemoryRegion *sys_mem = get_system_memory();
 XilinxPCIEHost *pcie2;
@@ -467,7 +466,8 @@ static void boston_mach_init(MachineState *machine)
 sysbus_mmio_map_overlap(SYS_BUS_DEVICE(>cps), 0, 0, 1);
 
 flash =  g_new(MemoryRegion, 1);
-memory_region_init_rom(flash, NULL, "boston.flash", 128 * MiB, );
+memory_region_init_rom(flash, NULL, "boston.flash", 128 * MiB,
+   _fatal);
 memory_region_add_subregion_overlap(sys_mem, 0x1800, flash, 0);
 
 memory_region_add_subregion_overlap(sys_mem, 0x8000, machine->ram, 0);
-- 
2.21.1




[PATCH 03/11] s390x/cpumodel: Fix harmless misuse of visit_check_struct()

2020-04-24 Thread Markus Armbruster
Commit e47970f51d "s390x/cpumodel: Fix query-cpu-model-FOO error API
violations" neglected to change visit_end_struct()'s Error ** argument
along with the others.  If visit_end_struct() failed, we'd take the
success path.  Fortunately, it can't fail here:
qobject_input_check_struct() checks we consumed the whole dictionary,
and to get here, we did.  Fix it anyway.

Cc: David Hildenbrand 
Cc: Cornelia Huck 
Signed-off-by: Markus Armbruster 
---
 target/s390x/cpu_models.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 7c32180269..87a8028ad3 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -524,7 +524,7 @@ static void cpu_model_from_info(S390CPUModel *model, const 
CpuModelInfo *info,
 }
 }
 if (!err) {
-visit_check_struct(visitor, errp);
+visit_check_struct(visitor, );
 }
 visit_end_struct(visitor, NULL);
 visit_free(visitor);
-- 
2.21.1




[PATCH 08/11] mips/boston: Fix boston_mach_init() error handling

2020-04-24 Thread Markus Armbruster
The Error ** argument must be NULL, _abort, _fatal, or a
pointer to a variable containing NULL.  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second

boston_mach_init() is wrong that way.  The last calls treats an error
as fatal.  Do that for the prior ones, too.

Fixes: df1d8a1f29f567567b9d20be685a4241282e7005
Cc: Paul Burton 
Cc: Aleksandar Rikalo 
Signed-off-by: Markus Armbruster 
---
 hw/mips/boston.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/hw/mips/boston.c b/hw/mips/boston.c
index 98ecd25e8e..2832dfa6ae 100644
--- a/hw/mips/boston.c
+++ b/hw/mips/boston.c
@@ -458,14 +458,11 @@ static void boston_mach_init(MachineState *machine)
 sysbus_init_child_obj(OBJECT(machine), "cps", OBJECT(>cps),
   sizeof(s->cps), TYPE_MIPS_CPS);
 object_property_set_str(OBJECT(>cps), machine->cpu_type, "cpu-type",
-);
-object_property_set_int(OBJECT(>cps), machine->smp.cpus, "num-vp", 
);
-object_property_set_bool(OBJECT(>cps), true, "realized", );
-
-if (err != NULL) {
-error_report("%s", error_get_pretty(err));
-exit(1);
-}
+_fatal);
+object_property_set_int(OBJECT(>cps), machine->smp.cpus, "num-vp",
+_fatal);
+object_property_set_bool(OBJECT(>cps), true, "realized",
+ _fatal);
 
 sysbus_mmio_map_overlap(SYS_BUS_DEVICE(>cps), 0, 0, 1);
 
-- 
2.21.1




[PATCH 06/11] error: Use error_reportf_err() where appropriate

2020-04-24 Thread Markus Armbruster
Replace

error_report("...: %s", ..., error_get_pretty(err));

by

error_reportf_err(err, "...: ", ...);

Signed-off-by: Markus Armbruster 
---
 chardev/char-socket.c | 5 +++--
 hw/sd/pxa2xx_mmci.c   | 4 ++--
 hw/sd/sd.c| 4 ++--
 hw/usb/dev-mtp.c  | 9 +
 qemu-nbd.c| 7 +++
 scsi/qemu-pr-helper.c | 4 ++--
 6 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 185fe38dda..e5ee685f8c 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -138,8 +138,9 @@ static void check_report_connect_error(Chardev *chr,
 SocketChardev *s = SOCKET_CHARDEV(chr);
 
 if (!s->connect_err_reported) {
-error_report("Unable to connect character device %s: %s",
- chr->label, error_get_pretty(err));
+error_reportf_err(err,
+  "Unable to connect character device %s: ",
+  chr->label);
 s->connect_err_reported = true;
 }
 qemu_chr_socket_restart_timer(chr);
diff --git a/hw/sd/pxa2xx_mmci.c b/hw/sd/pxa2xx_mmci.c
index 8f9ab0ec16..f9c50ddda5 100644
--- a/hw/sd/pxa2xx_mmci.c
+++ b/hw/sd/pxa2xx_mmci.c
@@ -497,12 +497,12 @@ PXA2xxMMCIState *pxa2xx_mmci_init(MemoryRegion *sysmem,
 carddev = qdev_create(qdev_get_child_bus(dev, "sd-bus"), TYPE_SD_CARD);
 qdev_prop_set_drive(carddev, "drive", blk, );
 if (err) {
-error_report("failed to init SD card: %s", error_get_pretty(err));
+error_reportf_err(err, "failed to init SD card: ");
 return NULL;
 }
 object_property_set_bool(OBJECT(carddev), true, "realized", );
 if (err) {
-error_report("failed to init SD card: %s", error_get_pretty(err));
+error_reportf_err(err, "failed to init SD card: ");
 return NULL;
 }
 
diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 71a9af09ab..3c06a0ac6d 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -703,13 +703,13 @@ SDState *sd_init(BlockBackend *blk, bool is_spi)
 dev = DEVICE(obj);
 qdev_prop_set_drive(dev, "drive", blk, );
 if (err) {
-error_report("sd_init failed: %s", error_get_pretty(err));
+error_reportf_err(err, "sd_init failed: ");
 return NULL;
 }
 qdev_prop_set_bit(dev, "spi", is_spi);
 object_property_set_bool(obj, true, "realized", );
 if (err) {
-error_report("sd_init failed: %s", error_get_pretty(err));
+error_reportf_err(err, "sd_init failed: ");
 return NULL;
 }
 
diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 20717f026b..168428156b 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -631,8 +631,9 @@ static void usb_mtp_object_readdir(MTPState *s, MTPObject 
*o)
 int64_t id = qemu_file_monitor_add_watch(s->file_monitor, o->path, 
NULL,
  file_monitor_event, s, );
 if (id == -1) {
-error_report("usb-mtp: failed to add watch for %s: %s", o->path,
- error_get_pretty(err));
+error_reportf_err(err,
+  "usb-mtp: failed to add watch for %s: ",
+  o->path);
 error_free(err);
 } else {
 trace_usb_mtp_file_monitor_event(s->dev.addr, o->path,
@@ -1276,8 +1277,8 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 
 s->file_monitor = qemu_file_monitor_new();
 if (err) {
-error_report("usb-mtp: file monitoring init failed: %s",
- error_get_pretty(err));
+error_reportf_err(err,
+  "usb-mtp: file monitoring init failed: ");
 error_free(err);
 } else {
 QTAILQ_INIT(>events);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 4aa005004e..30deb5d9e6 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -856,8 +856,7 @@ int main(int argc, char **argv)
 }
 tlscreds = nbd_get_tls_creds(tlscredsid, list, _err);
 if (local_err) {
-error_report("Failed to get TLS creds %s",
- error_get_pretty(local_err));
+error_reportf_err(local_err, "Failed to get TLS creds ");
 exit(EXIT_FAILURE);
 }
 } else {
@@ -979,8 +978,8 @@ int main(int argc, char **argv)
  _err);
 if (sioc == NULL) {
 object_unref(OBJECT(server));
-error_report("Failed to use socket activation: %s",
- error_get_pretty(local_err));
+error_reportf_err(local_err,
+  "Failed to use socket activation: ");
 exit(EXIT_FAILURE);
 }
 qio_net_listener_add(server, sioc);
diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c
index 181ed4a186..57ad830d54 100644
--- a/scsi/qemu-pr-helper.c
+++ b/scsi/qemu-pr-helper.c
@@ -1030,8 +1030,8 @@ 

[Bug 1874674] Re: [Feature request] acceptance test class to run user-mode binaries

2020-04-24 Thread Richard Henderson
What user-mode testing do you think might be improved by using avocado?

IMO at present we have a fairly comprehensive testing infrastructure for
user-mode that is simply underused.  With docker, we have a set of
cross-compilers for most guest architectures, and we are able to build
statically linked binaries that are copied out of the container for
testing by the just-built qemu binaries on the host.  This
infrastructure is used by check-tcg.  It's fairly easy to add new test
cases to be run on one or all guests.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1874674

Title:
  [Feature request] acceptance test class to run user-mode binaries

Status in QEMU:
  New

Bug description:
  Currently the acceptance test framework only target system-mode emulation.
  It would be useful to test user-mode too.

  Ref: https://www.mail-archive.com/qemu-devel@nongnu.org/msg626610.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1874674/+subscriptions



[PATCH 07/11] mips/malta: Fix create_cps() error handling

2020-04-24 Thread Markus Armbruster
The Error ** argument must be NULL, _abort, _fatal, or a
pointer to a variable containing NULL.  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second

create_cps() is wrong that way.  The last calls treats an error as
fatal.  Do that for the prior ones, too.

Fixes: bff384a4fbd5d0e86939092e74e766ef0f5f592c
Cc: Aleksandar Markovic 
Cc: "Philippe Mathieu-Daudé" 
Cc: Aurelien Jarno 
Signed-off-by: Markus Armbruster 
---
 hw/mips/mips_malta.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
index e4c4de1b4e..17bf41616b 100644
--- a/hw/mips/mips_malta.c
+++ b/hw/mips/mips_malta.c
@@ -1185,17 +1185,14 @@ static void create_cpu_without_cps(MachineState *ms,
 static void create_cps(MachineState *ms, MaltaState *s,
qemu_irq *cbus_irq, qemu_irq *i8259_irq)
 {
-Error *err = NULL;
-
 sysbus_init_child_obj(OBJECT(s), "cps", OBJECT(>cps), sizeof(s->cps),
   TYPE_MIPS_CPS);
-object_property_set_str(OBJECT(>cps), ms->cpu_type, "cpu-type", );
-object_property_set_int(OBJECT(>cps), ms->smp.cpus, "num-vp", );
-object_property_set_bool(OBJECT(>cps), true, "realized", );
-if (err != NULL) {
-error_report("%s", error_get_pretty(err));
-exit(1);
-}
+object_property_set_str(OBJECT(>cps), ms->cpu_type, "cpu-type",
+_fatal);
+object_property_set_int(OBJECT(>cps), ms->smp.cpus, "num-vp",
+_fatal);
+object_property_set_bool(OBJECT(>cps), true, "realized",
+ _fatal);
 
 sysbus_mmio_map_overlap(SYS_BUS_DEVICE(>cps), 0, 0, 1);
 
-- 
2.21.1




[PATCH 05/11] tests/migration: Tighten error checking

2020-04-24 Thread Markus Armbruster
migrate_get_socket_address() neglects to check
visit_type_SocketAddressList() failure.  This smells like a leak, but
it actually will crash dereferencing @addrs.  Pass _abort to
remove the code smell.

Signed-off-by: Markus Armbruster 
---
 tests/qtest/migration-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 2568c9529c..dc3490c9fa 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 
 #include "libqtest.h"
+#include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
@@ -301,7 +302,6 @@ static char *migrate_get_socket_address(QTestState *who, 
const char *parameter)
 {
 QDict *rsp;
 char *result;
-Error *local_err = NULL;
 SocketAddressList *addrs;
 Visitor *iv = NULL;
 QObject *object;
@@ -310,7 +310,7 @@ static char *migrate_get_socket_address(QTestState *who, 
const char *parameter)
 object = qdict_get(rsp, parameter);
 
 iv = qobject_input_visitor_new(object);
-visit_type_SocketAddressList(iv, NULL, , _err);
+visit_type_SocketAddressList(iv, NULL, , _abort);
 visit_free(iv);
 
 /* we are only using a single address */
-- 
2.21.1




[PATCH 01/11] nvdimm: Plug memory leak in uuid property setter

2020-04-24 Thread Markus Armbruster
nvdimm_set_uuid() leaks memory on qemu_uuid_parse() failure.  Fix
that.

Fixes: 6c5627bb24dcd68c997857a8b671617333b1289f
Cc: Xiao Guangrong 
Cc: Shivaprasad G Bhat 
Signed-off-by: Markus Armbruster 
---
 hw/mem/nvdimm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 8e426d24bb..d5752f7bf6 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -97,7 +97,6 @@ static void nvdimm_set_uuid(Object *obj, Visitor *v, const 
char *name,
 if (qemu_uuid_parse(value, >uuid) != 0) {
 error_setg(errp, "Property '%s.%s' has invalid value",
object_get_typename(obj), name);
-goto out;
 }
 g_free(value);
 
-- 
2.21.1




[PATCH 00/11] More miscellaneous error handling fixes

2020-04-24 Thread Markus Armbruster
Markus Armbruster (11):
  nvdimm: Plug memory leak in uuid property setter
  xen: Fix and improve handling of device_add usb-host errors
  s390x/cpumodel: Fix harmless misuse of visit_check_struct()
  s390x/pci: Fix harmless mistake in zpci's property fid's setter
  tests/migration: Tighten error checking
  error: Use error_reportf_err() where appropriate
  mips/malta: Fix create_cps() error handling
  mips/boston: Fix boston_mach_init() error handling
  mips/boston: Plug memory leak in boston_mach_init()
  arm/sabrelite: Consistently use _fatal in sabrelite_init()
  i386: Fix x86_cpu_load_model() error API violation

 chardev/char-socket.c|  5 +++--
 hw/arm/sabrelite.c   |  7 +--
 hw/mem/nvdimm.c  |  1 -
 hw/mips/boston.c | 17 +++--
 hw/mips/mips_malta.c | 15 ++-
 hw/s390x/s390-pci-bus.c  |  4 +++-
 hw/sd/pxa2xx_mmci.c  |  4 ++--
 hw/sd/sd.c   |  4 ++--
 hw/usb/dev-mtp.c |  9 +
 hw/usb/xen-usb.c | 18 --
 qemu-nbd.c   |  7 +++
 scsi/qemu-pr-helper.c|  4 ++--
 target/i386/cpu.c| 25 -
 target/s390x/cpu_models.c|  2 +-
 tests/qtest/migration-test.c |  4 ++--
 15 files changed, 61 insertions(+), 65 deletions(-)

-- 
2.21.1




[PATCH 04/11] s390x/pci: Fix harmless mistake in zpci's property fid's setter

2020-04-24 Thread Markus Armbruster
s390_pci_set_fid() sets zpci->fid_defined to true even when
visit_type_uint32() failed.  Reproducer: "-device zpci,fid=junk".
Harmless in practice, because qdev_device_add() then fails, throwing
away @zpci.  Fix it anyway.

Cc: Matthew Rosato 
Cc: Cornelia Huck 
Signed-off-by: Markus Armbruster 
---
 hw/s390x/s390-pci-bus.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index ed8be124da..19ee1f02bb 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -1276,7 +1276,9 @@ static void s390_pci_set_fid(Object *obj, Visitor *v, 
const char *name,
 return;
 }
 
-visit_type_uint32(v, name, ptr, errp);
+if (!visit_type_uint32(v, name, ptr, errp)) {
+return;
+}
 zpci->fid_defined = true;
 }
 
-- 
2.21.1




[PATCH 11/11] i386: Fix x86_cpu_load_model() error API violation

2020-04-24 Thread Markus Armbruster
The Error ** argument must be NULL, _abort, _fatal, or a
pointer to a variable containing NULL.  Passing an argument of the
latter kind twice without clearing it in between is wrong: if the
first call sets an error, it no longer points to NULL for the second
call.

x86_cpu_load_model() is wrong that way.  Harmless, because its @errp
is always _abort.  To fix, cut out the @errp middleman.

Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Markus Armbruster 
---
 target/i386/cpu.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 90ffc5f3b1..3f09fd2321 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5078,7 +5078,7 @@ static void x86_cpu_apply_version_props(X86CPU *cpu, 
X86CPUModel *model)
 
 /* Load data from X86CPUDefinition into a X86CPU object
  */
-static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model, Error **errp)
+static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel *model)
 {
 X86CPUDefinition *def = model->cpudef;
 CPUX86State *env = >env;
@@ -5092,13 +5092,19 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel 
*model, Error **errp)
  */
 
 /* CPU models only set _minimum_ values for level/xlevel: */
-object_property_set_uint(OBJECT(cpu), def->level, "min-level", errp);
-object_property_set_uint(OBJECT(cpu), def->xlevel, "min-xlevel", errp);
+object_property_set_uint(OBJECT(cpu), def->level, "min-level",
+ _abort);
+object_property_set_uint(OBJECT(cpu), def->xlevel, "min-xlevel",
+ _abort);
 
-object_property_set_int(OBJECT(cpu), def->family, "family", errp);
-object_property_set_int(OBJECT(cpu), def->model, "model", errp);
-object_property_set_int(OBJECT(cpu), def->stepping, "stepping", errp);
-object_property_set_str(OBJECT(cpu), def->model_id, "model-id", errp);
+object_property_set_int(OBJECT(cpu), def->family, "family",
+_abort);
+object_property_set_int(OBJECT(cpu), def->model, "model",
+_abort);
+object_property_set_int(OBJECT(cpu), def->stepping, "stepping",
+_abort);
+object_property_set_str(OBJECT(cpu), def->model_id, "model-id",
+_abort);
 for (w = 0; w < FEATURE_WORDS; w++) {
 env->features[w] = def->features[w];
 }
@@ -5135,7 +5141,8 @@ static void x86_cpu_load_model(X86CPU *cpu, X86CPUModel 
*model, Error **errp)
 vendor = host_vendor;
 }
 
-object_property_set_str(OBJECT(cpu), vendor, "vendor", errp);
+object_property_set_str(OBJECT(cpu), vendor, "vendor",
+_abort);
 
 x86_cpu_apply_version_props(cpu, model);
 }
@@ -6981,7 +6988,7 @@ static void x86_cpu_initfn(Object *obj)
 object_property_add_alias(obj, "sse4_2", obj, "sse4.2", _abort);
 
 if (xcc->model) {
-x86_cpu_load_model(cpu, xcc->model, _abort);
+x86_cpu_load_model(cpu, xcc->model);
 }
 }
 
-- 
2.21.1




RE: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type

2020-04-24 Thread Corey Wharton


> -Original Message-
> From: Alistair Francis 
> Sent: Friday, April 24, 2020 9:04 AM
> To: Corey Wharton 
> Cc: qemu-devel@nongnu.org Developers ; open
> list:RISC-V ; Sagar Karandikar
> ; Bastian Koppelmann  paderborn.de>; Alistair Francis ; Palmer Dabbelt
> ; Bin Meng 
> Subject: Re: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type
> 
> On Fri, Mar 13, 2020 at 12:35 PM Corey Wharton  wrote:
> >
> > Allows the CPU to be changed from the default via the -cpu command
> > line option.
> >
> > Signed-off-by: Corey Wharton 
> > Reviewed-by: Bin Meng 
> > Reviewed-by: Alistair Francis 
> 
> Thanks for the patch.
> 
> Unfortunately this fails `make check`.
> 
> The problem is that the machine has the `default_cpu_type` set but then you
> set "cpu-type" from the SoC.
> 
> This diff fixes the make check failure for me:
> 
> diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index
> 1fd08f325c..b53109521e 100644
> --- a/hw/riscv/sifive_e.c
> +++ b/hw/riscv/sifive_e.c
> @@ -123,8 +123,6 @@ static void riscv_sifive_e_soc_init(Object *obj)
>  object_initialize_child(obj, "cpus", >cpus,
>  sizeof(s->cpus), TYPE_RISCV_HART_ARRAY,
>  _abort, NULL);
> -object_property_set_str(OBJECT(>cpus), ms->cpu_type, "cpu-type",
> -_abort);
>  object_property_set_int(OBJECT(>cpus), ms->smp.cpus, "num-harts",
>  _abort);
>  sysbus_init_child_obj(obj, "riscv.sifive.e.gpio0", @@ -141,6 +139,8 @@
> static void riscv_sifive_e_soc_realize(DeviceState
> *dev, Error **errp)
>  SiFiveESoCState *s = RISCV_E_SOC(dev);
>  MemoryRegion *sys_mem = get_system_memory();
> 
> +object_property_set_str(OBJECT(>cpus), ms->cpu_type, "cpu-type",
> +_abort);
>  object_property_set_bool(OBJECT(>cpus), true, "realized",
>  _abort);
> 
> 
> I'm happy to just squash that into the patch. Let me know if you don't want
> me to do that and I'll drop these patches and let you re-send them.
> 
> Alistair
> 

Thanks for fixing this issue. I tested your patch and it seems to work as
Intended and  I'm fine with you squashing it into the patch.

Corey

> > ---
> >  hw/riscv/sifive_e.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c index
> > a254cad489..b0a611adb9 100644
> > --- a/hw/riscv/sifive_e.c
> > +++ b/hw/riscv/sifive_e.c
> > @@ -123,7 +123,7 @@ static void riscv_sifive_e_soc_init(Object *obj)
> >  object_initialize_child(obj, "cpus", >cpus,
> >  sizeof(s->cpus), TYPE_RISCV_HART_ARRAY,
> >  _abort, NULL);
> > -object_property_set_str(OBJECT(>cpus), SIFIVE_E_CPU, "cpu-type",
> > +object_property_set_str(OBJECT(>cpus), ms->cpu_type,
> > + "cpu-type",
> >  _abort);
> >  object_property_set_int(OBJECT(>cpus), ms->smp.cpus, "num-
> harts",
> >  _abort); @@ -220,6 +220,7 @@ static
> > void riscv_sifive_e_machine_init(MachineClass *mc)
> >  mc->desc = "RISC-V Board compatible with SiFive E SDK";
> >  mc->init = riscv_sifive_e_init;
> >  mc->max_cpus = 1;
> > +mc->default_cpu_type = SIFIVE_E_CPU;
> >  }
> >
> >  DEFINE_MACHINE("sifive_e", riscv_sifive_e_machine_init)
> > --
> > 2.21.1
> >
> >


Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration

2020-04-24 Thread Dr. David Alan Gilbert
* Yan Zhao (yan.y.z...@intel.com) wrote:
> On Tue, Apr 21, 2020 at 08:08:49PM +0800, Tian, Kevin wrote:
> > > From: Yan Zhao
> > > Sent: Tuesday, April 21, 2020 10:37 AM
> > > 
> > > On Tue, Apr 21, 2020 at 06:56:00AM +0800, Alex Williamson wrote:
> > > > On Sun, 19 Apr 2020 21:24:57 -0400
> > > > Yan Zhao  wrote:
> > > >
> > > > > On Fri, Apr 17, 2020 at 07:24:57PM +0800, Cornelia Huck wrote:
> > > > > > On Fri, 17 Apr 2020 05:52:02 -0400
> > > > > > Yan Zhao  wrote:
> > > > > >
> > > > > > > On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > > > > > > > On Mon, 13 Apr 2020 01:52:01 -0400
> > > > > > > > Yan Zhao  wrote:
> > > > > > > >
> > > > > > > > > This patchset introduces a migration_version attribute under 
> > > > > > > > > sysfs
> > > of VFIO
> > > > > > > > > Mediated devices.
> > > > > > > > >
> > > > > > > > > This migration_version attribute is used to check migration
> > > compatibility
> > > > > > > > > between two mdev devices.
> > > > > > > > >
> > > > > > > > > Currently, it has two locations:
> > > > > > > > > (1) under mdev_type node,
> > > > > > > > > which can be used even before device creation, but only 
> > > > > > > > > for
> > > mdev
> > > > > > > > > devices of the same mdev type.
> > > > > > > > > (2) under mdev device node,
> > > > > > > > > which can only be used after the mdev devices are 
> > > > > > > > > created, but
> > > the src
> > > > > > > > > and target mdev devices are not necessarily be of the same
> > > mdev type
> > > > > > > > > (The second location is newly added in v5, in order to keep
> > > consistent
> > > > > > > > > with the migration_version node for migratable pass-though
> > > devices)
> > > > > > > >
> > > > > > > > What is the relationship between those two attributes?
> > > > > > > >
> > > > > > > (1) is for mdev devices specifically, and (2) is provided to keep 
> > > > > > > the
> > > same
> > > > > > > sysfs interface as with non-mdev cases. so (2) is for both mdev
> > > devices and
> > > > > > > non-mdev devices.
> > > > > > >
> > > > > > > in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev 
> > > > > > > device
> > > > > > > is binding to vfio-pci, but is able to register migration region 
> > > > > > > and do
> > > > > > > migration transactions from a vendor provided affiliate driver),
> > > > > > > the vendor driver would export (2) directly, under device node.
> > > > > > > It is not able to provide (1) as there're no mdev devices 
> > > > > > > involved.
> > > > > >
> > > > > > Ok, creating an alternate attribute for non-mdev devices makes 
> > > > > > sense.
> > > > > > However, wouldn't that rather be a case (3)? The change here only
> > > > > > refers to mdev devices.
> > > > > >
> > > > > as you pointed below, (3) and (2) serve the same purpose.
> > > > > and I think a possible usage is to migrate between a non-mdev device 
> > > > > and
> > > > > an mdev device. so I think it's better for them both to use (2) rather
> > > > > than creating (3).
> > > >
> > > > An mdev type is meant to define a software compatible interface, so in
> > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > fail the most basic of compatibility tests that we expect userspace to
> > > > perform?  IOW, if two mdev types are migration compatible, it seems a
> > > > prerequisite to that is that they provide the same software interface,
> > > > which means they should be the same mdev type.
> > > >
> > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > management
> > > > tool begin to even guess what might be compatible?  Are we expecting
> > > > libvirt to probe ever device with this attribute in the system?  Is
> > > > there going to be a new class hierarchy created to enumerate all
> > > > possible migrate-able devices?
> > > >
> > > yes, management tool needs to guess and test migration compatible
> > > between two devices. But I think it's not the problem only for
> > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > to
> > > first assume that the two mdevs have the same type of parent devices
> > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > possibilities.
> > > 
> > > on the other hand, for two mdevs,
> > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > mdev1 <-> mdev2.
> > 
> > How could the manage tool figure out that 1/2 of pdev1 is equivalent 
> > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > choice is to report the same mdev type on both pdev1 and pdev2.
> I think that's exactly the value of this migration_version interface.
> the management tool can take advantage of this interface to know if two
> devices are migration compatible, no matter they are mdevs, non-mdevs,
> or mix.
> 
> as I know, (please correct me if not right), current 

[PATCH v3 2/3] qcow2: Allow resize of images with internal snapshots

2020-04-24 Thread Eric Blake
We originally refused to allow resize of images with internal
snapshots because the v2 image format did not require the tracking of
snapshot size, making it impossible to safely revert to a snapshot
with a different size than the current view of the image.  But the
snapshot size tracking was rectified in v3, and our recent fixes to
qemu-img amend (see 0a85af35) guarantee that we always have a valid
snapshot size.  Thus, we no longer need to artificially limit image
resizes, but it does become one more thing that would prevent a
downgrade back to v2.  And now that we support different-sized
snapshots, it's also easy to fix reverting to a snapshot to apply the
new size.

Upgrade iotest 61 to cover this (we previously had NO coverage of
refusal to resize while snapshots exist).  Note that the amend process
can fail but still have effects: in particular, since we break things
into upgrade, resize, downgrade, a failure during resize does not roll
back changes made during upgrade, nor does failure in downgrade roll
back a resize.  But this situation is pre-existing even without this
patch; and without journaling, the best we could do is minimize the
chance of partial failure by collecting all changes prior to doing any
writes - which adds a lot of complexity but could still fail with EIO.
On the other hand, we are careful that even if we have partial
modification but then fail, the image is left viable (that is, we are
careful to sequence things so that after each successful cluster
write, there may be transient leaked clusters but no corrupt
metadata).  And complicating the code to make it more transaction-like
is not worth the effort: a user can always request multiple 'qemu-img
amend' changing one thing each, if they need finer-grained control
over detecting the first failure than what they get by letting qemu
decide how to sequence multiple changes.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2-snapshot.c | 20 
 block/qcow2.c  | 25 ++---
 tests/qemu-iotests/061 | 35 +++
 tests/qemu-iotests/061.out | 28 
 4 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 82c32d4c9b08..df16424fd952 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -23,6 +23,7 @@
  */

 #include "qemu/osdep.h"
+#include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qcow2.h"
 #include "qemu/bswap.h"
@@ -775,10 +776,21 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 }

 if (sn->disk_size != bs->total_sectors * BDRV_SECTOR_SIZE) {
-error_report("qcow2: Loading snapshots with different disk "
-"size is not implemented");
-ret = -ENOTSUP;
-goto fail;
+BlockBackend *blk = blk_new_with_bs(bs, BLK_PERM_RESIZE, BLK_PERM_ALL,
+_err);
+if (!blk) {
+error_report_err(local_err);
+ret = -ENOTSUP;
+goto fail;
+}
+
+ret = blk_truncate(blk, sn->disk_size, true, PREALLOC_MODE_OFF,
+   _err);
+blk_unref(blk);
+if (ret < 0) {
+error_report_err(local_err);
+goto fail;
+}
 }

 /*
diff --git a/block/qcow2.c b/block/qcow2.c
index f56745c56924..bf739cd06be0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3988,9 +3988,12 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,

 qemu_co_mutex_lock(>lock);

-/* cannot proceed if image has snapshots */
-if (s->nb_snapshots) {
-error_setg(errp, "Can't resize an image which has snapshots");
+/*
+ * Even though we store snapshot size for all images, it was not
+ * required until v3, so it is not safe to proceed for v2.
+ */
+if (s->nb_snapshots && s->qcow_version < 3) {
+error_setg(errp, "Can't resize a v2 image which has snapshots");
 ret = -ENOTSUP;
 goto fail;
 }
@@ -4952,6 +4955,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 BDRVQcow2State *s = bs->opaque;
 int current_version = s->qcow_version;
 int ret;
+int i;

 /* This is qcow2_downgrade(), not qcow2_upgrade() */
 assert(target_version < current_version);
@@ -4969,6 +4973,21 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
 return -ENOTSUP;
 }

+/*
+ * If any internal snapshot has a different size than the current
+ * image size, or VM state size that exceeds 32 bits, downgrading
+ * is unsafe.  Even though we would still use v3-compliant output
+ * to preserve that data, other v2 programs might not realize
+ * those optional fields are important.
+ */
+for (i = 0; i < s->nb_snapshots; i++) {
+if (s->snapshots[i].vm_state_size > UINT32_MAX ||
+ 

[PATCH v3 0/3] qcow2: Allow resize of images with internal snapshots

2020-04-24 Thread Eric Blake
In v3:
- patch 1: fix error returns [patchew, Max], R-b dropped
- patch 2,3: unchanged, so add R-b

Eric Blake (3):
  block: Add blk_new_with_bs() helper
  qcow2: Allow resize of images with internal snapshots
  qcow2: Tweak comment about bitmaps vs. resize

 include/sysemu/block-backend.h |  2 ++
 block/block-backend.c  | 23 +
 block/crypto.c | 10 +++-
 block/parallels.c  |  8 +++---
 block/qcow.c   |  8 +++---
 block/qcow2-snapshot.c | 20 ---
 block/qcow2.c  | 45 +++---
 block/qed.c|  9 +++
 block/sheepdog.c   | 11 -
 block/vdi.c|  8 +++---
 block/vhdx.c   |  9 +++
 block/vmdk.c   |  9 +++
 block/vpc.c|  8 +++---
 blockdev.c |  8 +++---
 blockjob.c |  7 ++
 tests/qemu-iotests/061 | 35 ++
 tests/qemu-iotests/061.out | 28 +
 17 files changed, 177 insertions(+), 71 deletions(-)

-- 
2.26.2




[PATCH v3 1/3] block: Add blk_new_with_bs() helper

2020-04-24 Thread Eric Blake
There are several callers that need to create a new block backend from
an existing BDS; make the task slightly easier with a common helper
routine.

Suggested-by: Max Reitz 
Signed-off-by: Eric Blake 
---
 include/sysemu/block-backend.h |  2 ++
 block/block-backend.c  | 23 +++
 block/crypto.c | 10 --
 block/parallels.c  |  8 
 block/qcow.c   |  8 
 block/qcow2.c  | 18 --
 block/qed.c|  9 -
 block/sheepdog.c   | 11 +--
 block/vdi.c|  8 
 block/vhdx.c   |  9 -
 block/vmdk.c   |  9 -
 block/vpc.c|  8 
 blockdev.c |  8 +++-
 blockjob.c |  7 ++-
 14 files changed, 75 insertions(+), 63 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 9bbdbd63d743..d37c1244dd9c 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -77,6 +77,8 @@ typedef struct BlockBackendPublic {
 } BlockBackendPublic;

 BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm);
+BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
+  uint64_t shared_perm, Error **errp);
 BlockBackend *blk_new_open(const char *filename, const char *reference,
QDict *options, int flags, Error **errp);
 int blk_get_refcnt(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 38ae41382652..3592066b4259 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -355,6 +355,29 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 return blk;
 }

+/*
+ * Create a new BlockBackend connected to an existing BlockDriverState.
+ *
+ * @perm is a bitmasks of BLK_PERM_* constants which describes the
+ * permissions to request for @bs that is attached to this
+ * BlockBackend.  @shared_perm is a bitmask which describes which
+ * permissions may be granted to other users of the attached node.
+ * Both sets of permissions can be changed later using blk_set_perm().
+ *
+ * Return the new BlockBackend on success, null on failure.
+ */
+BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm,
+  uint64_t shared_perm, Error **errp)
+{
+BlockBackend *blk = blk_new(bdrv_get_aio_context(bs), perm, shared_perm);
+
+if (blk_insert_bs(blk, bs, errp) < 0) {
+blk_unref(blk);
+return NULL;
+}
+return blk;
+}
+
 /*
  * Creates a new BlockBackend, opens a new BlockDriverState, and connects both.
  * The new BlockBackend is in the main AioContext.
diff --git a/block/crypto.c b/block/crypto.c
index d577f89659fa..a4d77f07fe8c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -256,16 +256,14 @@ static int 
block_crypto_co_create_generic(BlockDriverState *bs,
   PreallocMode prealloc,
   Error **errp)
 {
-int ret;
+int ret = -EPERM;
 BlockBackend *blk;
 QCryptoBlock *crypto = NULL;
 struct BlockCryptoCreateData data;

-blk = blk_new(bdrv_get_aio_context(bs),
-  BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-
-ret = blk_insert_bs(blk, bs, errp);
-if (ret < 0) {
+blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL,
+  errp);
+if (!blk) {
 goto cleanup;
 }

diff --git a/block/parallels.c b/block/parallels.c
index 6d4ed77f165f..aad07b89c818 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -559,10 +559,10 @@ static int coroutine_fn 
parallels_co_create(BlockdevCreateOptions* opts,
 return -EIO;
 }

-blk = blk_new(bdrv_get_aio_context(bs),
-  BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-ret = blk_insert_bs(blk, bs, errp);
-if (ret < 0) {
+blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL,
+  errp);
+if (!blk) {
+ret = -EPERM;
 goto out;
 }
 blk_set_allow_write_beyond_eof(blk, true);
diff --git a/block/qcow.c b/block/qcow.c
index 8973e4e565a1..3f0c60051933 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -849,10 +849,10 @@ static int coroutine_fn 
qcow_co_create(BlockdevCreateOptions *opts,
 return -EIO;
 }

-qcow_blk = blk_new(bdrv_get_aio_context(bs),
-   BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL);
-ret = blk_insert_bs(qcow_blk, bs, errp);
-if (ret < 0) {
+qcow_blk = blk_new_with_bs(bs, BLK_PERM_WRITE | BLK_PERM_RESIZE,
+   BLK_PERM_ALL, errp);
+if (!qcow_blk) {
+ret = -EPERM;
 goto exit;
 }
 blk_set_allow_write_beyond_eof(qcow_blk, true);
diff --git 

[PATCH v3 3/3] qcow2: Tweak comment about bitmaps vs. resize

2020-04-24 Thread Eric Blake
Our comment did not actually match the code.  Rewrite the comment to
be less sensitive to any future changes to qcow2-bitmap.c that might
implement scenarios that we currently reject.

Signed-off-by: Eric Blake 
Reviewed-by: Max Reitz 
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index bf739cd06be0..81b75401813c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3998,7 +3998,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 goto fail;
 }

-/* cannot proceed if image has bitmaps */
+/* See qcow2-bitmap.c for which bitmap scenarios prevent a resize. */
 if (qcow2_truncate_bitmaps_check(bs, errp)) {
 ret = -ENOTSUP;
 goto fail;
-- 
2.26.2




[Bug 1874888] [NEW] certain programs make QEMU crash with "tcg fatal error"

2020-04-24 Thread Konstantin
Public bug reported:

The following code snippet crashes qemu with

.../tcg/tcg.c:3279: tcg fatal error
qemu-x86_64: 
/usr/local/google/home/kostik/qemu-5.0.0-rc4/accel/tcg/cpu-exec.c:701: int 
cpu_exec(CPUState *): Assertion `!have_mmap_lock()' failed.


int main() {
  /*
 <.data>:
   0:   2e 45 71 ff cs rex.RB jno 0x3
   4:   e9 00 00 f0 00  jmp0xf9
   9:   c4 42 7d 31 3e  vpmovzxbd ymm15,QWORD PTR [r14]
   e:   c4 a3 7d 08 64 82 44vroundps ymm4,YMMWORD PTR [rdx+r8*4+0x44],0x0
  15:   00 
  16:   0f 1e 0anopDWORD PTR [rdx]
  19:   43 0f ec 20 rex.XB paddsb mm4,QWORD PTR [r8]
  1d:   66 47 0f 3a 0c 3d 00rex.RXB blendps xmm15,XMMWORD PTR 
[rip+0x8000],0x0# 0x8028
  24:   80 00 00 00 
  28:   c4 e3 f9 df 5f 86 0dvaeskeygenassist xmm3,XMMWORD PTR [rdi-0x7a],0xd
  2f:   c4 e2 55 92 74 fc 0avgatherdps ymm6,DWORD PTR [rsp+ymm7*8+0xa],ymm5
  36:   c4 e2 f9 17 9a 01 00vptest xmm3,XMMWORD PTR [rdx+0x1]
  3d:   00 00 
*/
  char buf[] = {
0x2E, 0x45, 0x71, 0xFF, 0xE9, 0x00, 0x00, 0xF0, 0x00, 0xC4, 0x42, 0x7D, 
0x31, 0x3E, 0xC4, 0xA3, 0x7D, 0x08, 0x64, 0x82, 0x44, 0x00, 0x0F, 0x1E, 0x0A, 
0x43, 0x0F, 0xEC, 0x20, 0x66, 0x47, 0x0F, 0x3A, 0x0C, 0x3D, 0x00, 0x80, 0x00, 
0x00, 0x00, 0xC4, 0xE3, 0xF9, 0xDF, 0x5F, 0x86, 0x0D, 0xC4, 0xE2, 0x55, 0x92, 
0x74, 0xFC, 0x0A, 0xC4, 0xE2, 0xF9, 0x17, 0x9A, 0x01, 0x00, 0x00, 0x00
  };
  void (*f)(void) = (void (*) (void))buf;
  f();
  return 0;
}

Steps to reproduce:
1) clang -static repro.c -o repro
2) qemu-x86_64-static repro

Tested with 4.2.0 and 5.0.0-rc4. Both -user and -system variants are
affected.

A few more snippets that cause the same sort of behavior:
1) 0x64, 0x46, 0x7D, 0xFF, 0xDF, 0x27, 0x46, 0x0F, 0xD4, 0x83, 0x5E, 0x00, 
0x00, 0x00, 0x3E, 0x0F, 0x6A, 0xEF, 0x0F, 0x05, 0xC4, 0x42, 0xFD, 0x1E, 0xCF, 
0x46, 0x18, 0xE3, 0x47, 0xCD, 0x4E, 0x6E, 0x0F, 0x0F, 0x16, 0x8A

2) 0x67, 0x45, 0xDB, 0xD0, 0xAA, 0xC4, 0xE2, 0xB1, 0x01, 0x57, 0x00,
0xF3, 0x6F, 0xF3, 0x42, 0x0F, 0x1E, 0xFD, 0x64, 0x2E, 0xF2, 0x45, 0xD9,
0xC4, 0x3E, 0xF3, 0x0F, 0xAE, 0xF4, 0x3E, 0x47, 0x0F, 0x1C, 0x22, 0x42,
0x73, 0xFF, 0xD9, 0xFD

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1874888

Title:
  certain programs make QEMU crash with "tcg fatal error"

Status in QEMU:
  New

Bug description:
  The following code snippet crashes qemu with

  .../tcg/tcg.c:3279: tcg fatal error
  qemu-x86_64: 
/usr/local/google/home/kostik/qemu-5.0.0-rc4/accel/tcg/cpu-exec.c:701: int 
cpu_exec(CPUState *): Assertion `!have_mmap_lock()' failed.

  
  int main() {
/*
   <.data>:
 0:   2e 45 71 ff cs rex.RB jno 0x3
 4:   e9 00 00 f0 00  jmp0xf9
 9:   c4 42 7d 31 3e  vpmovzxbd ymm15,QWORD PTR [r14]
 e:   c4 a3 7d 08 64 82 44vroundps ymm4,YMMWORD PTR [rdx+r8*4+0x44],0x0
15:   00 
16:   0f 1e 0anopDWORD PTR [rdx]
19:   43 0f ec 20 rex.XB paddsb mm4,QWORD PTR [r8]
1d:   66 47 0f 3a 0c 3d 00rex.RXB blendps xmm15,XMMWORD PTR 
[rip+0x8000],0x0# 0x8028
24:   80 00 00 00 
28:   c4 e3 f9 df 5f 86 0dvaeskeygenassist xmm3,XMMWORD PTR 
[rdi-0x7a],0xd
2f:   c4 e2 55 92 74 fc 0avgatherdps ymm6,DWORD PTR 
[rsp+ymm7*8+0xa],ymm5
36:   c4 e2 f9 17 9a 01 00vptest xmm3,XMMWORD PTR [rdx+0x1]
3d:   00 00 
  */
char buf[] = {
  0x2E, 0x45, 0x71, 0xFF, 0xE9, 0x00, 0x00, 0xF0, 0x00, 0xC4, 0x42, 0x7D, 
0x31, 0x3E, 0xC4, 0xA3, 0x7D, 0x08, 0x64, 0x82, 0x44, 0x00, 0x0F, 0x1E, 0x0A, 
0x43, 0x0F, 0xEC, 0x20, 0x66, 0x47, 0x0F, 0x3A, 0x0C, 0x3D, 0x00, 0x80, 0x00, 
0x00, 0x00, 0xC4, 0xE3, 0xF9, 0xDF, 0x5F, 0x86, 0x0D, 0xC4, 0xE2, 0x55, 0x92, 
0x74, 0xFC, 0x0A, 0xC4, 0xE2, 0xF9, 0x17, 0x9A, 0x01, 0x00, 0x00, 0x00
};
void (*f)(void) = (void (*) (void))buf;
f();
return 0;
  }
  
  Steps to reproduce:
  1) clang -static repro.c -o repro
  2) qemu-x86_64-static repro

  Tested with 4.2.0 and 5.0.0-rc4. Both -user and -system variants are
  affected.

  A few more snippets that cause the same sort of behavior:
  1) 0x64, 0x46, 0x7D, 0xFF, 0xDF, 0x27, 0x46, 0x0F, 0xD4, 0x83, 0x5E, 0x00, 
0x00, 0x00, 0x3E, 0x0F, 0x6A, 0xEF, 0x0F, 0x05, 0xC4, 0x42, 0xFD, 0x1E, 0xCF, 
0x46, 0x18, 0xE3, 0x47, 0xCD, 0x4E, 0x6E, 0x0F, 0x0F, 0x16, 0x8A

  2) 0x67, 0x45, 0xDB, 0xD0, 0xAA, 0xC4, 0xE2, 0xB1, 0x01, 0x57, 0x00,
  0xF3, 0x6F, 0xF3, 0x42, 0x0F, 0x1E, 0xFD, 0x64, 0x2E, 0xF2, 0x45,
  0xD9, 0xC4, 0x3E, 0xF3, 0x0F, 0xAE, 0xF4, 0x3E, 0x47, 0x0F, 0x1C,
  0x22, 0x42, 0x73, 0xFF, 0xD9, 0xFD

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1874888/+subscriptions



Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Eric Blake

On 4/24/20 1:37 PM, Alberto Garcia wrote:

On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy 
 wrote:

Reading the entire cluster will be interesting - you'll have to
decompress the entire memory, then overwrite the zeroed portions.


I don't think so, qcow2_get_host_offset() would detect the number of
contiguous subclusters of the same type at the given offset. In this
case they would be _ZERO subclusters so there's no need to decompress
anything, or even read it (it works the same with uncompressed
clusters).


But if at least one of subclusters to read is not _ZERO, you'll have
to decompress the whole cluster, and after decompression rewrite
zero-subclusters by zeroes, as Eric says.. Or I lost the thread:)


I don't see why you would need to rewrite anything... you do have to
decompress the whole cluster, and the uncompressed cluster in memory
would have stale data, but you never need to use that data for anything,
let alone to return it to the guest.

Even if there's a COW, the new cluster would inherit the compressed
cluster's bitmap so the zeroized subclusters still read as zeroes.

It's the same with normal clusters, 'write -P 0xff 0 64k' followed by
'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the
L2 entry indicates that part of it is just zeroes.


The point is this:  Consider 'write -P 0xff 0 64k', then 'write -z 16k 
16k', then 'read 0 64k'.  For normal clusters, we can just do a 
scatter-gather iov read of read 0-16k and 32-64k, plus a memset of 
16-32k.  But for compressed clusters, we have to read and decompress the 
entire 64k, AND also memset 16k-32k.  But if zeroing after reading is 
not that expensive, then the same technique for normal clusters is fine 
(instead of a scatter-gather read of 48k, just read the whole 64k 
cluster before doing the memset).  So the question at hand is not what 
happens in writing, but in reading, and whether we are penalizing reads 
from a compressed cluster or even from regular clusters, when reading 
from a cluster where subclusters have different status.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-04-24 Thread Eduardo Habkost
On Fri, Apr 24, 2020 at 03:23:56PM +, Ani Sinha wrote:
> 
> 
> > On Apr 22, 2020, at 4:15 PM, Ani Sinha  wrote:
> > 
> > 
> > 
> >> On Apr 21, 2020, at 8:32 PM, Daniel P. Berrangé  
> >> wrote:
> >> 
> >> On Tue, Apr 21, 2020 at 02:45:04PM +, Ani Sinha wrote:
> >>> 
> >>> 
>  On Apr 20, 2020, at 8:32 PM, Michael S. Tsirkin  wrote:
>  
>  But I for one would like to focus on keeping PIIX stable
>  and focus development on q35.  Not bloating PIIX with lots of new
>  features is IMHO a good way to do that.
> >>> 
> >>> Does this mean this patch is a no-go then? :(
> >> 
> >> I'd support this patch, as I don't think it can really be described as
> >> bloat or destabalizing. It is just adding a simple property to
> >> conditionalize existing functionality.  Telling people to switch to Q35
> >> is unreasonable as it is not a simple 1-1 conversion from existing use
> >> of PIIX. Q35 has much higher complexity in its configuration, has higher
> >> memory overhead per VM too, and lacks certain features of PIIX too.
> > 
> > Cool. How do we go forward from here?
> > 
> 
> We would really appreciate if we can add this extra knob in
> Qemu. Maybe someone else also in the community will find this
> useful. We don’t want to maintain this patch internally forever
> but rather prefer we maintain this as a Qemu community.

Michael, I agree with Daniel here and I don't think we should
start refusing PIIX features if they are useful for a portion of
the QEMU community.

Would you reconsider and merge this patch?

-- 
Eduardo




Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Alberto Garcia
On Fri 24 Apr 2020 08:15:04 PM CEST, Vladimir Sementsov-Ogievskiy 
 wrote:
> AFAIK, now compressed clusters can't be used in scenarios with guest,
> as qcow2 driver doesn't support rewriting them.

You can write to those images just fine, it's just not efficient because
you have to COW the compressed clusters.

> Or am I wrong? And we normally don't combine normal and compressed
> clusters together in one image.

As soon as you start writing to an image with compressed clusters you'll
have a combination of both.

But it's true that you don't have an image with compressed clusters if
what you're looking for is performance. So I wouldn't add support for
this if it complicates things too much.

Berto



Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Alberto Garcia
On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy 
 wrote:
>>> Reading the entire cluster will be interesting - you'll have to
>>> decompress the entire memory, then overwrite the zeroed portions.
>> 
>> I don't think so, qcow2_get_host_offset() would detect the number of
>> contiguous subclusters of the same type at the given offset. In this
>> case they would be _ZERO subclusters so there's no need to decompress
>> anything, or even read it (it works the same with uncompressed
>> clusters).
>
> But if at least one of subclusters to read is not _ZERO, you'll have
> to decompress the whole cluster, and after decompression rewrite
> zero-subclusters by zeroes, as Eric says.. Or I lost the thread:)

I don't see why you would need to rewrite anything... you do have to
decompress the whole cluster, and the uncompressed cluster in memory
would have stale data, but you never need to use that data for anything,
let alone to return it to the guest.

Even if there's a COW, the new cluster would inherit the compressed
cluster's bitmap so the zeroized subclusters still read as zeroes.

It's the same with normal clusters, 'write -P 0xff 0 64k' followed by
'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the
L2 entry indicates that part of it is just zeroes.

Berto



Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Vladimir Sementsov-Ogievskiy

24.04.2020 20:56, Alberto Garcia wrote:

On Fri 24 Apr 2020 07:44:33 PM CEST, Eric Blake  wrote:

at the same time, I can see where you're coming from in stating that
if it makes management of extended L2 easier to allow zero subclusters
on top of a compressed cluster, then there's no reason to forbid it.


I'm not sure if it makes it easier. Some operations are definitely going
to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
subclusters). Or maybe replace QCow2SubclusterType with something
else. I need to evaluate that.


Reviewing your series it already came in my mind, that we are doing too much with the conversion from l2e 
flags to "type". Does it worth it? All these ZERO_PLAIN and UNALLOCATED_ALLOC, and "case 
:" lines combined by three-four into one case, do they help, or is it an extra work? We 
just have to maintain two views of one model.. But I don't suggest to refactor it in these series :)



Reading the entire cluster will be interesting - you'll have to
decompress the entire memory, then overwrite the zeroed portions.


I don't think so, qcow2_get_host_offset() would detect the number of
contiguous subclusters of the same type at the given offset. In this
case they would be _ZERO subclusters so there's no need to decompress
anything, or even read it (it works the same with uncompressed
clusters).



But if at least one of subclusters to read is not _ZERO, you'll have to 
decompress the whole cluster, and after decompression rewrite zero-subclusters 
by zeroes, as Eric says.. Or I lost the thread:)


--
Best regards,
Vladimir



Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Vladimir Sementsov-Ogievskiy

24.04.2020 20:44, Eric Blake wrote:

On 4/24/20 12:21 PM, Alberto Garcia wrote:

On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake  wrote:

'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
copy on write. The compressed data would remain untouched on disk but
some of the subclusters would have the 'all zeroes' bit set, exactly
like what happens with normal clusters.


It's a special case that avoids COW for write zeroes, but not for
anything else. The moment you write any data (whether to the
zero-above-compressed or the regular compressed portion), the entire
cluster has to be rewritten.


That's right but you can still write zeroes without having to rewrite
anything, and read back the zeroes without having to decompress the
data.


at the same time, I can see where you're coming from in stating that
if it makes management of extended L2 easier to allow zero subclusters
on top of a compressed cluster, then there's no reason to forbid it.


I'm not sure if it makes it easier. Some operations are definitely going
to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
subclusters). Or maybe replace QCow2SubclusterType with something
else. I need to evaluate that.


Reading the entire cluster will be interesting - you'll have to decompress the 
entire memory, then overwrite the zeroed portions.  The savings in reading 
occur only when your read is limited to just the subclusters that are zeroed.

But then again, even on a regular cluster, read has to pay attention to which 
subclusters are zeroed, so you already have the workhorse in read for detecting 
whether a normal read is sufficient or if you have to follow up with piecing 
together zeroed sections.



AFAIK, now compressed clusters can't be used in scenarios with guest, as qcow2 
driver doesn't support rewriting them. Or am I wrong? And we normally don't 
combine normal and compressed clusters together in one image. So, currently, 
the usual use-case of compressed clusters is a fully compressed image, written 
once.

It means, that with current specification, subclusters adds nothing to this 
case, and no reason to create compressed image with subclusters. And even if we 
allow zero/unallocated subclusters, seems it adds nothing to this use-case.

So, I don't see real benefits of it for now, but neither any problems with it, 
so agree that it's mostly about which way is simpler..

--
Best regards,
Vladimir



Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Alberto Garcia
On Fri 24 Apr 2020 07:44:33 PM CEST, Eric Blake  wrote:
>>> at the same time, I can see where you're coming from in stating that
>>> if it makes management of extended L2 easier to allow zero subclusters
>>> on top of a compressed cluster, then there's no reason to forbid it.
>> 
>> I'm not sure if it makes it easier. Some operations are definitely going
>> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
>> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
>> subclusters). Or maybe replace QCow2SubclusterType with something
>> else. I need to evaluate that.
>
> Reading the entire cluster will be interesting - you'll have to
> decompress the entire memory, then overwrite the zeroed portions.

I don't think so, qcow2_get_host_offset() would detect the number of
contiguous subclusters of the same type at the given offset. In this
case they would be _ZERO subclusters so there's no need to decompress
anything, or even read it (it works the same with uncompressed
clusters).

Berto



Re: [RFC patch v1 1/3] qemu-file: introduce current buffer

2020-04-24 Thread Vladimir Sementsov-Ogievskiy

13.04.2020 14:12, Denis Plotnikov wrote:

To approach async wrtiting in the further commits, the buffer
allocated in QEMUFile struct is replaced with the link to the
current buffer. We're going to use many buffers to write the
qemu file stream to the unerlying storage asynchronously. The
current buffer points out to the buffer is currently filled
with data.

This patch doesn't add any features to qemu-file and doesn't
change any qemu-file behavior.

Signed-off-by: Denis Plotnikov


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Eric Blake

On 4/24/20 12:21 PM, Alberto Garcia wrote:

On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake  wrote:

'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
copy on write. The compressed data would remain untouched on disk but
some of the subclusters would have the 'all zeroes' bit set, exactly
like what happens with normal clusters.


It's a special case that avoids COW for write zeroes, but not for
anything else. The moment you write any data (whether to the
zero-above-compressed or the regular compressed portion), the entire
cluster has to be rewritten.


That's right but you can still write zeroes without having to rewrite
anything, and read back the zeroes without having to decompress the
data.


at the same time, I can see where you're coming from in stating that
if it makes management of extended L2 easier to allow zero subclusters
on top of a compressed cluster, then there's no reason to forbid it.


I'm not sure if it makes it easier. Some operations are definitely going
to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
subclusters). Or maybe replace QCow2SubclusterType with something
else. I need to evaluate that.


Reading the entire cluster will be interesting - you'll have to 
decompress the entire memory, then overwrite the zeroed portions.  The 
savings in reading occur only when your read is limited to just the 
subclusters that are zeroed.


But then again, even on a regular cluster, read has to pay attention to 
which subclusters are zeroed, so you already have the workhorse in read 
for detecting whether a normal read is sufficient or if you have to 
follow up with piecing together zeroed sections.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




[RFC PATCH] translate-all: include guest address in out_asm output

2020-04-24 Thread Alex Bennée
This is a slightly hackish Friday afternoon attempt to include the
guest address in our out_asm output in an effort to make it a little
easier to see what is generating what final assembly.

Signed-off-by: Alex Bennée 
---
 accel/tcg/translate-all.c | 38 --
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 9924e66d1f..31711de938 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1789,14 +1789,42 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
 qemu_log_in_addr_range(tb->pc)) {
 FILE *logfile = qemu_log_lock();
+size_t code_size, data_size = 0;
+size_t insn_start;
+int insn = 0;
 qemu_log("OUT: [size=%d]\n", gen_code_size);
 if (tcg_ctx->data_gen_ptr) {
-size_t code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr;
-size_t data_size = gen_code_size - code_size;
-size_t i;
+code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr;
+data_size = gen_code_size - code_size;
+} else {
+code_size = gen_code_size;
+}
+
+/* first dump prologue */
+insn_start = tcg_ctx->gen_insn_end_off[0];
+if (insn_start > 0) {
+qemu_log("  prologue: [size=%ld]\n", insn_start);
+log_disas(tb->tc.ptr, insn_start);
+}
+
+do {
+size_t insn_end;
+if (insn < (tb->icount - 1)) {
+insn_end = tcg_ctx->gen_insn_end_off[insn + 1];
+} else {
+insn_end = code_size;
+}
+qemu_log("  for guest addr: " TARGET_FMT_lx ":\n",
+ tcg_ctx->gen_insn_data[insn][0]);
+
+log_disas(tb->tc.ptr + insn_start, insn_end - insn_start);
 
-log_disas(tb->tc.ptr, code_size);
+insn_start = insn_end;
+} while (++insn < tb->icount && insn_start < code_size);
 
+if (data_size) {
+int i;
+qemu_log("  data: [size=%ld]\n", data_size);
 for (i = 0; i < data_size; i += sizeof(tcg_target_ulong)) {
 if (sizeof(tcg_target_ulong) == 8) {
 qemu_log("0x%08" PRIxPTR ":  .quad  0x%016" PRIx64 "\n",
@@ -1808,8 +1836,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  *(uint32_t *)(tcg_ctx->data_gen_ptr + i));
 }
 }
-} else {
-log_disas(tb->tc.ptr, gen_code_size);
 }
 qemu_log("\n");
 qemu_log_flush();
-- 
2.20.1




Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Alberto Garcia
On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake  wrote:
>> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
>> copy on write. The compressed data would remain untouched on disk but
>> some of the subclusters would have the 'all zeroes' bit set, exactly
>> like what happens with normal clusters.
>
> It's a special case that avoids COW for write zeroes, but not for
> anything else. The moment you write any data (whether to the
> zero-above-compressed or the regular compressed portion), the entire
> cluster has to be rewritten.

That's right but you can still write zeroes without having to rewrite
anything, and read back the zeroes without having to decompress the
data.

> at the same time, I can see where you're coming from in stating that
> if it makes management of extended L2 easier to allow zero subclusters
> on top of a compressed cluster, then there's no reason to forbid it.

I'm not sure if it makes it easier. Some operations are definitely going
to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
subclusters). Or maybe replace QCow2SubclusterType with something
else. I need to evaluate that.

Berto



Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Eric Blake

On 4/24/20 12:02 PM, Alberto Garcia wrote:

On Tue 17 Mar 2020 07:16:21 PM CET, Alberto Garcia  wrote:

Compressed clusters always have the bitmap part of the extended L2
entry set to 0.


I was just finishing some improvements to the new code that allows
BDRV_REQ_ZERO_WRITE at the subcluster level, and I'm starting to
entertain the idea of using the L2 bitmap for compressed clusters as
well.

I will make some tests next week, but I would like to know your opinion
in case I'm missing something.

A compressed cluster cannot be divided into subclusters on the image:
you would not be able to allocate or overwrite them separately,
therefore any write request necessarily has to write (or do COW of) the
whole cluster.

However if you consider the uncompressed guest data I don't see any
reason why you wouldn't be able to zeroize or even deallocate individual
subclusters. These operations don't touch the cluster data on disk
anyway, they only touch the L2 metadata in order to change what the
guest sees.

'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
copy on write. The compressed data would remain untouched on disk but
some of the subclusters would have the 'all zeroes' bit set, exactly
like what happens with normal clusters.


It's a special case that avoids COW for write zeroes, but not for 
anything else. The moment you write any data (whether to the 
zero-above-compressed or the regular compressed portion), the entire 
cluster has to be rewritten.  I'm not sure how frequently guests will 
actually have the scenario of doing a zero request on a sub-cluster, but 
at the same time, I can see where you're coming from in stating that if 
it makes management of extended L2 easier to allow zero subclusters on 
top of a compressed cluster, then there's no reason to forbid it.




I think that this would make the on-disk format a bit simpler in general
(no need to treat compressed clusters differently in some cases) and it
would add a new optimization to compressed images. I just need to make
sure that it doesn't complicate the code (my feeling is that it would
actually simplify it, but I have to see).

Opinions?

Berto



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster

2020-04-24 Thread Alberto Garcia
On Tue 17 Mar 2020 07:16:21 PM CET, Alberto Garcia  wrote:
> Compressed clusters always have the bitmap part of the extended L2
> entry set to 0.

I was just finishing some improvements to the new code that allows
BDRV_REQ_ZERO_WRITE at the subcluster level, and I'm starting to
entertain the idea of using the L2 bitmap for compressed clusters as
well.

I will make some tests next week, but I would like to know your opinion
in case I'm missing something.

A compressed cluster cannot be divided into subclusters on the image:
you would not be able to allocate or overwrite them separately,
therefore any write request necessarily has to write (or do COW of) the
whole cluster.

However if you consider the uncompressed guest data I don't see any
reason why you wouldn't be able to zeroize or even deallocate individual
subclusters. These operations don't touch the cluster data on disk
anyway, they only touch the L2 metadata in order to change what the
guest sees.

'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
copy on write. The compressed data would remain untouched on disk but
some of the subclusters would have the 'all zeroes' bit set, exactly
like what happens with normal clusters.

I think that this would make the on-disk format a bit simpler in general
(no need to treat compressed clusters differently in some cases) and it
would add a new optimization to compressed images. I just need to make
sure that it doesn't complicate the code (my feeling is that it would
actually simplify it, but I have to see).

Opinions?

Berto



[PATCH v22 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   67 
 include/hw/virtio/virtio-balloon.h |2 +
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index c1c76ec09c95..2ce56c6c0794 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,67 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+VirtQueueElement *elem;
+
+while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
+unsigned int i;
+
+/*
+ * When we discard the page it has the effect of removing the page
+ * from the hypervisor itself and causing it to be zeroed when it
+ * is returned to us. So we must not discard the page if it is
+ * accessible by another device or process, or if the guest is
+ * expecting it to retain a non-zero value.
+ */
+if (qemu_balloon_is_inhibited() || dev->poison_val) {
+goto skip_element;
+}
+
+for (i = 0; i < elem->in_num; i++) {
+void *addr = elem->in_sg[i].iov_base;
+size_t size = elem->in_sg[i].iov_len;
+ram_addr_t ram_offset;
+RAMBlock *rb;
+
+/*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+rb = qemu_ram_block_from_host(addr, false, _offset);
+if (!rb) {
+trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+continue;
+}
+
+/*
+ * For now we will simply ignore unaligned memory regions, or
+ * regions that overrun the end of the RAMBlock.
+ */
+if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) ||
+(ram_offset + size) > qemu_ram_get_used_length(rb)) {
+continue;
+}
+
+ram_block_discard_range(rb, ram_offset, size);
+}
+
+skip_element:
+virtqueue_push(vq, elem, 0);
+virtio_notify(vdev, vq);
+g_free(elem);
+}
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -818,6 +879,10 @@ static void virtio_balloon_device_realize(DeviceState 
*dev, Error **errp)
 s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
 s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+}
+
 if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
 s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -945,6 +1010,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
 DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_PAGE_POISON, true),
+DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_REPORTING, false),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 3ca2a78e1aca..ac4013d51010 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum 

[PATCH v22 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

In an upcoming patch a feature named Free Page Reporting is about to be
added. In order to avoid any confusion we should drop the use of the word
'report' when referring to Free Page Hinting. So what this patch does is go
through and replace all instances of 'report' with 'hint" when we are
referring to free page hinting.

Acked-by: David Hildenbrand 
Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   74 ++--
 include/hw/virtio/virtio-balloon.h |   20 +-
 2 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..a1d6fb52c876 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 ret = false;
 goto out;
 }
-if (id == dev->free_page_report_cmd_id) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+if (id == dev->free_page_hint_cmd_id) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
 } else {
 /*
  * Stop the optimization only when it has started. This
  * avoids a stale stop sign for the previous command.
  */
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
-dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
+dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 }
 }
 }
 
 if (elem->in_num) {
-if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
 qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
   elem->in_sg[0].iov_len);
 }
@@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
*opaque)
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify(vdev, vq);
   /*
-   * Start to poll the vq once the reporting started. Otherwise, continue
+   * Start to poll the vq once the hinting started. Otherwise, continue
* only when there are entries on the vq, which need to be given back.
*/
 } while (continue_to_get_hints ||
- dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
+ dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
 virtio_queue_set_notification(vq, 1);
 }
 
@@ -531,14 +531,14 @@ static void virtio_balloon_free_page_start(VirtIOBalloon 
*s)
 return;
 }
 
-if (s->free_page_report_cmd_id == UINT_MAX) {
-s->free_page_report_cmd_id =
-   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+if (s->free_page_hint_cmd_id == UINT_MAX) {
+s->free_page_hint_cmd_id =
+   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
 } else {
-s->free_page_report_cmd_id++;
+s->free_page_hint_cmd_id++;
 }
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
 virtio_notify_config(vdev);
 }
 
@@ -546,18 +546,18 @@ static void virtio_balloon_free_page_stop(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
+if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
 /*
  * The lock also guarantees us that the
  * virtio_ballloon_get_free_page_hints exits after the
- * free_page_report_status is set to S_STOP.
+ * free_page_hint_status is set to S_STOP.
  */
 qemu_mutex_lock(>free_page_lock);
 /*
  * The guest hasn't done the reporting, so host sends a notification
  * to the guest to actively stop the reporting.
  */
-s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
 qemu_mutex_unlock(>free_page_lock);
 virtio_notify_config(vdev);
 }
@@ -567,15 +567,15 @@ static void virtio_balloon_free_page_done(VirtIOBalloon 
*s)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
+s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
 virtio_notify_config(vdev);
 }
 
 static int
-virtio_balloon_free_page_report_notify(NotifierWithReturn *n, void *data)
+virtio_balloon_free_page_hint_notify(NotifierWithReturn *n, void *data)
 {
 VirtIOBalloon *dev = container_of(n, VirtIOBalloon,
-  free_page_report_notify);
+  free_page_hint_notify);
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 PrecopyNotifyData *pnd = data;
 
@@ -624,7 +624,7 @@ static size_t 

[PATCH v22 QEMU 1/5] linux-headers: Update to allow renaming of free_page_report_cmd_id

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

Sync to the latest upstream changes for free page hinting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |   11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..af0a6b59dab2 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -47,8 +47,15 @@ struct virtio_balloon_config {
uint32_t num_pages;
/* Number of pages we've actually got in balloon. */
uint32_t actual;
-   /* Free page report command id, readonly by guest */
-   uint32_t free_page_report_cmd_id;
+   /*
+* Free page hint command id, readonly by guest.
+* Was previously name free_page_report_cmd_id so we
+* need to carry that name for legacy support.
+*/
+   union {
+   uint32_t free_page_hint_cmd_id;
+   uint32_t free_page_report_cmd_id;   /* deprecated */
+   };
/* Stores PAGE_POISON if page poisoning is in use */
uint32_t poison_val;
 };




[PATCH v22 QEMU 4/5] virtio-balloon: Implement support for page poison reporting feature

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

We need to make certain to advertise support for page poison reporting if
we want to actually get data on if the guest will be poisoning pages.

Add a value for reporting the poison value being used if page poisoning is
enabled in the guest. With this we can determine if we will need to skip
free page reporting when it is enabled in the future.

The value currently has no impact on existing balloon interfaces. In the
case of existing balloon interfaces the onus is on the guest driver to
reapply whatever poison is in place.

When we add free page reporting the poison value is used to determine if
we can perform in-place page reporting. The expectation is that a reported
page will already contain the value specified by the poison, and the
reporting of the page should not change that value.

Signed-off-by: Alexander Duyck 
---
 hw/virtio/virtio-balloon.c |   29 +
 include/hw/virtio/virtio-balloon.h |1 +
 2 files changed, 30 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a1d6fb52c876..c1c76ec09c95 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -634,6 +634,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, 
uint8_t *config_data)
 
 config.num_pages = cpu_to_le32(dev->num_pages);
 config.actual = cpu_to_le32(dev->actual);
+config.poison_val = cpu_to_le32(dev->poison_val);
 
 if (dev->free_page_hint_status == FREE_PAGE_HINT_S_REQUESTED) {
 config.free_page_hint_cmd_id =
@@ -683,6 +684,14 @@ static ram_addr_t get_current_ram_size(void)
 return size;
 }
 
+static bool virtio_balloon_page_poison_support(void *opaque)
+{
+VirtIOBalloon *s = opaque;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+}
+
 static void virtio_balloon_set_config(VirtIODevice *vdev,
   const uint8_t *config_data)
 {
@@ -697,6 +706,10 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 qapi_event_send_balloon_change(vm_ram_size -
 ((ram_addr_t) dev->actual << 
VIRTIO_BALLOON_PFN_SHIFT));
 }
+dev->poison_val = 0;
+if (virtio_balloon_page_poison_support(dev)) {
+dev->poison_val = le32_to_cpu(config.poison_val);
+}
 trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -755,6 +768,17 @@ static const VMStateDescription 
vmstate_virtio_balloon_free_page_hint = {
 }
 };
 
+static const VMStateDescription vmstate_virtio_balloon_page_poison = {
+.name = "vitio-balloon-device/page-poison",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = virtio_balloon_page_poison_support,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(poison_val, VirtIOBalloon),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
 .name = "virtio-balloon-device",
 .version_id = 1,
@@ -767,6 +791,7 @@ static const VMStateDescription 
vmstate_virtio_balloon_device = {
 },
 .subsections = (const VMStateDescription * []) {
 _virtio_balloon_free_page_hint,
+_virtio_balloon_page_poison,
 NULL
 }
 };
@@ -854,6 +879,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
 }
+
+s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +943,8 @@ static Property virtio_balloon_properties[] = {
 VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
 DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
 VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+DEFINE_PROP_BIT("page-poison", VirtIOBalloon, host_features,
+VIRTIO_BALLOON_F_PAGE_POISON, true),
 /* QEMU 4.0 accidentally changed the config size even when free-page-hint
  * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
  * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h 
b/include/hw/virtio/virtio-balloon.h
index 108cff97e71a..3ca2a78e1aca 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
 uint32_t host_features;
 
 bool qemu_4_0_config_size;
+uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif




[PATCH v22 QEMU 2/5] linux-headers: update to contain virito-balloon free page reporting

2020-04-24 Thread Alexander Duyck
From: Alexander Duyck 

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck 
---
 include/standard-headers/linux/virtio_balloon.h |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h 
b/include/standard-headers/linux/virtio_balloon.h
index af0a6b59dab2..af3b7a1fa263 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON   4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12




[PATCH v22 QEMU 0/5] virtio-balloon: add support for page poison reporting and free page reporting

2020-04-24 Thread Alexander Duyck
This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be discarded and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

As a part of enabling this feature it was necessary to implement the page
poison reporting feature which had been added to the kernel, but was not
available in QEMU. This patch set adds it as a new device property
"page-poison" which is enabled by default, and adds support for migrating
the reported value.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: 
https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

Changes from v19:
Added std-headers change to match changes pushed for linux kernel headers
Added patch to remove "report" from page hinting code paths
Updated comment to better explain why we disable hints w/ page poisoning
Removed code that was modifying config size for poison vs hinting
Dropped x-page-poison property
Added code to bounds check the reported region vs the RAM block
Dropped patch for ROM devices as that was already pulled in by Paolo

Changes from v20:
Rearranged patches to push Linux header sync patches to front
Removed association between free page hinting and VIRTIO_BALLOON_F_PAGE_POISON
Added code to enable VIRTIO_BALLOON_F_PAGE_POISON if page reporting is enabled
Fixed possible resource leak if poison or qemu_balloon_is_inhibited return true

Changes from v21:
Added ack for patch 3
Rewrote patch description for page poison reporting feature
Made page-poison independent property and set to enabled by default
Added logic to migrate poison_val
Added several comments in code to better explain features
Switched free-page-reporting property to disabled by default

---

Alexander Duyck (5):
  linux-headers: Update to allow renaming of free_page_report_cmd_id
  linux-headers: update to contain virito-balloon free page reporting
  virtio-balloon: Replace free page hinting references to 'report' with 
'hint'
  virtio-balloon: Implement support for page poison reporting feature
  virtio-balloon: Provide an interface for free page reporting


 hw/virtio/virtio-balloon.c  |  170 ++-
 include/hw/virtio/virtio-balloon.h  |   23 ++-
 include/standard-headers/linux/virtio_balloon.h |   12 +-
 3 files changed, 155 insertions(+), 50 deletions(-)

--



Re: [PATCH v2 1/2] riscv: sifive_e: Support changing CPU type

2020-04-24 Thread Alistair Francis
On Fri, Mar 13, 2020 at 12:35 PM Corey Wharton  wrote:
>
> Allows the CPU to be changed from the default via the -cpu command
> line option.
>
> Signed-off-by: Corey Wharton 
> Reviewed-by: Bin Meng 
> Reviewed-by: Alistair Francis 

Thanks for the patch.

Unfortunately this fails `make check`.

The problem is that the machine has the `default_cpu_type` set but
then you set "cpu-type" from the SoC.

This diff fixes the make check failure for me:

diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index 1fd08f325c..b53109521e 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -123,8 +123,6 @@ static void riscv_sifive_e_soc_init(Object *obj)
 object_initialize_child(obj, "cpus", >cpus,
 sizeof(s->cpus), TYPE_RISCV_HART_ARRAY,
 _abort, NULL);
-object_property_set_str(OBJECT(>cpus), ms->cpu_type, "cpu-type",
-_abort);
 object_property_set_int(OBJECT(>cpus), ms->smp.cpus, "num-harts",
 _abort);
 sysbus_init_child_obj(obj, "riscv.sifive.e.gpio0",
@@ -141,6 +139,8 @@ static void riscv_sifive_e_soc_realize(DeviceState
*dev, Error **errp)
 SiFiveESoCState *s = RISCV_E_SOC(dev);
 MemoryRegion *sys_mem = get_system_memory();

+object_property_set_str(OBJECT(>cpus), ms->cpu_type, "cpu-type",
+_abort);
 object_property_set_bool(OBJECT(>cpus), true, "realized",
 _abort);


I'm happy to just squash that into the patch. Let me know if you don't
want me to do that and I'll drop these patches and let you re-send
them.

Alistair

> ---
>  hw/riscv/sifive_e.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
> index a254cad489..b0a611adb9 100644
> --- a/hw/riscv/sifive_e.c
> +++ b/hw/riscv/sifive_e.c
> @@ -123,7 +123,7 @@ static void riscv_sifive_e_soc_init(Object *obj)
>  object_initialize_child(obj, "cpus", >cpus,
>  sizeof(s->cpus), TYPE_RISCV_HART_ARRAY,
>  _abort, NULL);
> -object_property_set_str(OBJECT(>cpus), SIFIVE_E_CPU, "cpu-type",
> +object_property_set_str(OBJECT(>cpus), ms->cpu_type, "cpu-type",
>  _abort);
>  object_property_set_int(OBJECT(>cpus), ms->smp.cpus, "num-harts",
>  _abort);
> @@ -220,6 +220,7 @@ static void riscv_sifive_e_machine_init(MachineClass *mc)
>  mc->desc = "RISC-V Board compatible with SiFive E SDK";
>  mc->init = riscv_sifive_e_init;
>  mc->max_cpus = 1;
> +mc->default_cpu_type = SIFIVE_E_CPU;
>  }
>
>  DEFINE_MACHINE("sifive_e", riscv_sifive_e_machine_init)
> --
> 2.21.1
>
>



Re: [PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-24 Thread Alexander Duyck
On Fri, Apr 24, 2020 at 8:34 AM David Hildenbrand  wrote:
>
>
> >> Also, I do wonder if we want to default-enable it. It can still have a
> >> negative performance impact and some people might not want that.
> >
> > The negative performance impact should be minimal. At this point the
> > hinting process ends up being very light since it only processes a
> > small chunk of the memory before it shuts down for a couple seconds.
> > In my original data the only time any significant performance
> > regression was seen was with page shuffling enabled, and that was
> > before I put limits on how many pages we could process per pass and
> > how frequently we would make a pass through memory. My thought is that
> > we are much better having it enabled by default rather than having it
> > disabled by default. In the worst case scenario we have a little bit
> > of extra thread noise from it popping up running for a few
> > milliseconds and then sleeping for about two seconds, but the benefit
> > side is that the VMs will do us a favor and restrict themselves to a
> > much smaller idle footprint until such time as the memory is actually
> > needed in the guest.
>
> While I agree that the impact is small, there *is* a noticeable
> performance impact. One of the main users is memory overcommit.
>
> Also, imagine the following scenarios:
> - RT workload in the guest. Just because you have a virtio-balloon
>   device defined would mean something is suddenly active and
>   discarding/trying to discard pages.
> - vfio: free page reporting is completely useless right now and
>   *only* overhead.
> - prealloc does not expect that pages get suddenly discarded
> - s390x, which has CMM and might not benefit at all (except when using
>   huge pages as backing storage)
>
> No, I don't think it's acceptable to enable this as default. I remember
> that it is quite common to just define a balloon device but never use it.
>
> E.g., "A virtual memory balloon device is added to all Xen and KVM/QEMU
> guest virtual machines. It will be seen as  element" [1].
>
> I think we should let upper layers decide (just as we do for free page
> hinting, for example).

Okay. I guess there are a number of other cases I hand't thought of. I
will switch the default to false.

Thanks.

- Alex



Re: [PATCH 2/3] fuzz: Simplify how we compute available machines and types

2020-04-24 Thread Alexander Bulekov
On 200424 0911, Markus Armbruster wrote:
> apply_to_qlist(), apply_to_node() work with QObjects.  This is
> designed for use by tests/qtest/qos-test.c, which gets the data in
> that form via QMP.  Goes back to commit fc281c8020 "tests: qgraph API
> for the qtest driver framework".
> 
> Commit 275ab39d86 "fuzz: add support for qos-assisted fuzz targets"
> added another user: qtest/fuzz/qos_fuzz.c.  To get the data as
> QObjects, it uses qmp_marshal_query_machines() and
> qmp_marshal_qom_list_types().
> 
> All this code is rather cumbersome.  Switch to working with generated
> QAPI types instead:
> 
> * Replace apply_to_qlist() & friends by machines_apply_to_node() and
>   types_apply_to_node().
> 
> * Have qos_fuzz.c use qmp_query_machines() and qmp_qom_list_types()
>   instead.
> 
> * Have qos_test.c convert from QObject to the QAPI types.
> 
> Signed-off-by: Markus Armbruster 

Thank you for looking at this!
Reviewed-by: Alexander Bulekov 

> ---
>  tests/qtest/libqos/qos_external.h |  8 +++-
>  tests/qtest/fuzz/qos_fuzz.c   | 34 ---
>  tests/qtest/libqos/qos_external.c | 70 +++
>  tests/qtest/qos-test.c| 29 +
>  4 files changed, 59 insertions(+), 82 deletions(-)
> 
> diff --git a/tests/qtest/libqos/qos_external.h 
> b/tests/qtest/libqos/qos_external.h
> index 7b44930c55..f63388cb30 100644
> --- a/tests/qtest/libqos/qos_external.h
> +++ b/tests/qtest/libqos/qos_external.h
> @@ -20,8 +20,12 @@
>  #define QOS_EXTERNAL_H
>  #include "libqos/qgraph.h"
>  
> -void apply_to_node(const char *name, bool is_machine, bool is_abstract);
> -void apply_to_qlist(QList *list, bool is_machine);
> +#include "libqos/malloc.h"
> +#include "qapi/qapi-types-machine.h"
> +#include "qapi/qapi-types-qom.h"
> +
> +void machines_apply_to_node(MachineInfoList *mach_info);
> +void types_apply_to_node(ObjectTypeInfoList *type_info);
>  QGuestAllocator *get_machine_allocator(QOSGraphObject *obj);
>  void *allocate_objects(QTestState *qts, char **path, QGuestAllocator 
> **p_alloc);
>  
> diff --git a/tests/qtest/fuzz/qos_fuzz.c b/tests/qtest/fuzz/qos_fuzz.c
> index af28c92866..87eadb0889 100644
> --- a/tests/qtest/fuzz/qos_fuzz.c
> +++ b/tests/qtest/fuzz/qos_fuzz.c
> @@ -36,7 +36,6 @@
>  
>  #include "qapi/qapi-commands-machine.h"
>  #include "qapi/qapi-commands-qom.h"
> -#include "qapi/qmp/qlist.h"
>  
>  
>  void *fuzz_qos_obj;
> @@ -45,34 +44,19 @@ QGuestAllocator *fuzz_qos_alloc;
>  static const char *fuzz_target_name;
>  static char **fuzz_path_vec;
>  
> -/*
> - * Replaced the qmp commands with direct qmp_marshal calls.
> - * Probably there is a better way to do this
> - */
>  static void qos_set_machines_devices_available(void)
>  {
> -QDict *req = qdict_new();
> -QObject *response;
> -QDict *args = qdict_new();
> -QList *lst;
> +MachineInfoList *mach_info;
> +ObjectTypeInfoList *type_info;
>  
> -qmp_marshal_query_machines(NULL, , _abort);
> -lst = qobject_to(QList, response);
> -apply_to_qlist(lst, true);
> +mach_info = qmp_query_machines(_abort);
> +machines_apply_to_node(mach_info);
> +qapi_free_MachineInfoList(mach_info);
>  
> -qobject_unref(response);
> -
> -
> -qdict_put_str(req, "execute", "qom-list-types");
> -qdict_put_str(args, "implements", "device");
> -qdict_put_bool(args, "abstract", true);
> -qdict_put_obj(req, "arguments", (QObject *) args);
> -
> -qmp_marshal_qom_list_types(args, , _abort);
> -lst = qobject_to(QList, response);
> -apply_to_qlist(lst, false);
> -qobject_unref(response);
> -qobject_unref(req);
> +type_info = qmp_qom_list_types(true, "device", true, true,
> +   _abort);
> +types_apply_to_node(type_info);
> +qapi_free_ObjectTypeInfoList(type_info);
>  }
>  
>  static char **current_path;
> diff --git a/tests/qtest/libqos/qos_external.c 
> b/tests/qtest/libqos/qos_external.c
> index 398556dde0..c707dac3b9 100644
> --- a/tests/qtest/libqos/qos_external.c
> +++ b/tests/qtest/libqos/qos_external.c
> @@ -29,62 +29,40 @@
>  #include "libqos/qgraph_internal.h"
>  #include "libqos/qos_external.h"
>  
> -
> -
> -void apply_to_node(const char *name, bool is_machine, bool is_abstract)
> +static void machine_apply_to_node(const char *name)
>  {
> -char *machine_name = NULL;
> -if (is_machine) {
> -const char *arch = qtest_get_arch();
> -machine_name = g_strconcat(arch, "/", name, NULL);
> -name = machine_name;
> +char *machine_name = g_strconcat(qtest_get_arch(), "/", name, NULL);
> +
> +qos_graph_node_set_availability(machine_name, true);
> +g_free(machine_name);
> +}
> +
> +void machines_apply_to_node(MachineInfoList *mach_info)
> +{
> +MachineInfoList *tail;
> +
> +for (tail = mach_info; tail; tail = tail->next) {
> +machine_apply_to_node(tail->value->name);
> +if (tail->value->alias) {
> +

Re: [PATCH 1/3] Makefile: Drop unused, broken target recurse-fuzz

2020-04-24 Thread Alexander Bulekov
On 200424 0911, Markus Armbruster wrote:
> Target recurse-fuzz depends on pc-bios/optionrom/fuzz, which can't be
> made.  It's not used anywhere.  Added in commit c621dc3e01c, looks
> like cargo cult.  Delete.
> 
> Signed-off-by: Markus Armbruster 
Reviewed-by: Alexander Bulekov 

> ---
>  Makefile | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index 8a9113e666..34275f57c9 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -582,7 +582,6 @@ $(ROM_DIRS_RULES):
>  
>  .PHONY: recurse-all recurse-clean recurse-install
>  recurse-all: $(addsuffix /all, $(TARGET_DIRS) $(ROM_DIRS))
> -recurse-fuzz: $(addsuffix /fuzz, $(TARGET_DIRS) $(ROM_DIRS))
>  recurse-clean: $(addsuffix /clean, $(TARGET_DIRS) $(ROM_DIRS))
>  recurse-install: $(addsuffix /install, $(TARGET_DIRS))
>  $(addsuffix /install, $(TARGET_DIRS)): all
> -- 
> 2.21.1
> 



[Bug 1874539] Re: tulip driver broken in v5.0.0-rc4

2020-04-24 Thread Helge Deller
Attached trivial patch fixes it.


** Attachment added: "patch to fix tulip rx hangs"
   https://bugs.launchpad.net/qemu/+bug/1874539/+attachment/5359564/+files/p2

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1874539

Title:
  tulip driver broken in v5.0.0-rc4

Status in QEMU:
  New

Bug description:
  In a qemu-system-hppa system, qemu release v5.0.0-rc, the tulip nic driver is 
broken.
  The tulip nic is detected, even getting DHCP info does work.
  But when trying to download bigger files via network, the tulip driver gets 
stuck.

  For example when trying to download a 100MB file:

  root@debian:~# wget https://speed.hetzner.de/100MB.bin
  --2020-04-23 20:26:43--  https://speed.hetzner.de/100MB.bin
  Resolving speed.hetzner.de (speed.hetzner.de)... 88.198.248.254, 
2a01:4f8:0:59ed::2
  Connecting to speed.hetzner.de (speed.hetzner.de)|88.198.248.254|:443... 
connected.
  

  When reverting this commit, everything works again:
  commit 8ffb7265af64ec81748335ec8f20e7ab542c3850
  Author: Prasad J Pandit 
  Date:   Tue Mar 24 22:57:22 2020 +0530
  PATCH: net: tulip: check frame size and r/w data length

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1874539/+subscriptions



Re: [PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-24 Thread David Hildenbrand


>> Also, I do wonder if we want to default-enable it. It can still have a
>> negative performance impact and some people might not want that.
> 
> The negative performance impact should be minimal. At this point the
> hinting process ends up being very light since it only processes a
> small chunk of the memory before it shuts down for a couple seconds.
> In my original data the only time any significant performance
> regression was seen was with page shuffling enabled, and that was
> before I put limits on how many pages we could process per pass and
> how frequently we would make a pass through memory. My thought is that
> we are much better having it enabled by default rather than having it
> disabled by default. In the worst case scenario we have a little bit
> of extra thread noise from it popping up running for a few
> milliseconds and then sleeping for about two seconds, but the benefit
> side is that the VMs will do us a favor and restrict themselves to a
> much smaller idle footprint until such time as the memory is actually
> needed in the guest.

While I agree that the impact is small, there *is* a noticeable
performance impact. One of the main users is memory overcommit.

Also, imagine the following scenarios:
- RT workload in the guest. Just because you have a virtio-balloon
  device defined would mean something is suddenly active and
  discarding/trying to discard pages.
- vfio: free page reporting is completely useless right now and
  *only* overhead.
- prealloc does not expect that pages get suddenly discarded
- s390x, which has CMM and might not benefit at all (except when using
  huge pages as backing storage)

No, I don't think it's acceptable to enable this as default. I remember
that it is quite common to just define a balloon device but never use it.

E.g., "A virtual memory balloon device is added to all Xen and KVM/QEMU
guest virtual machines. It will be seen as  element" [1].

I think we should let upper layers decide (just as we do for free page
hinting, for example).


[1]
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/virtualization_administration_guide/section-libvirt-dom-xml-memory-baloon-device

-- 
Thanks,

David / dhildenb




Re: [RFC PATCH 0/3] hw/net/tulip: Fix LP#1874539

2020-04-24 Thread Helge Deller
* Philippe Mathieu-Daudé :
> Attempt to fix the launchpad bug filled by Helge:
>
>   In a qemu-system-hppa system, qemu release v5.0.0-rc,
>   the tulip nic driver is broken.  The tulip nic is detected,
>   even getting DHCP info does work.  But when trying to
>   download bigger files via network, the tulip driver gets
>   stuck.
>
> Philippe Mathieu-Daudé (3):
>   hw/net/tulip: Fix 'Descriptor Error' definition
>   hw/net/tulip: Log descriptor overflows
>   hw/net/tulip: Set descriptor error bit when lenght is incorrect
>
>  hw/net/tulip.h |  2 +-
>  hw/net/tulip.c | 32 
>  2 files changed, 29 insertions(+), 5 deletions(-)

Philippe, thanks for your efforts. Sadly your patch did not fixed the
bug itself, but it had some nice cleanups which should be included at
some point.

Regarding the tulip hang reported by me, the patch below does fix the
issue.

[PATCH] Fix tulip rx hang
Cc: Prasad J Pandit 
Fixes: 8ffb7265af ("check frame size and r/w data length")
Buglink: https://bugs.launchpad.net/bugs/1874539
Signed-off-by: Helge Deller 

Commit 8ffb7265af ("check frame size and r/w data length") introduced
checks to prevent accesses outside of the rx/tx buffers. But the new
checks were plain wrong. rx_frame_len does count backwards, and the
surrounding code ensures that rx_frame_len will not be bigger than
rx_frame_size. Remove those checks again.

diff --git a/hw/net/tulip.c b/hw/net/tulip.c
index 1295f51d07..59d21defcc 100644
--- a/hw/net/tulip.c
+++ b/hw/net/tulip.c
@@ -171,9 +171,6 @@ static void tulip_copy_rx_bytes(TULIPState *s, struct 
tulip_descriptor *desc)
 len = s->rx_frame_len;
 }

-if (s->rx_frame_len + len > sizeof(s->rx_frame)) {
-return;
-}
 pci_dma_write(>dev, desc->buf_addr1, s->rx_frame +
 (s->rx_frame_size - s->rx_frame_len), len);
 s->rx_frame_len -= len;
@@ -186,9 +183,6 @@ static void tulip_copy_rx_bytes(TULIPState *s, struct 
tulip_descriptor *desc)
 len = s->rx_frame_len;
 }

-if (s->rx_frame_len + len > sizeof(s->rx_frame)) {
-return;
-}
 pci_dma_write(>dev, desc->buf_addr2, s->rx_frame +
 (s->rx_frame_size - s->rx_frame_len), len);
 s->rx_frame_len -= len;



Re: [PATCH v7 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation

2020-04-24 Thread Vladimir Sementsov-Ogievskiy

24.04.2020 17:27, Kevin Wolf wrote:

The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
image is possibly preallocated and then the zero flag is added to all
clusters. This means that a copy-on-write operation may be needed when
writing to these clusters, despite having used preallocation, negating
one of the major benefits of preallocation.

Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
and if the protocol driver can ensure that the new area reads as zeros,
we can skip setting the zero flag in the qcow2 layer.

Unfortunately, the same approach doesn't work for metadata
preallocation, so we'll still set the zero flag there.

Signed-off-by: Kevin Wolf
Reviewed-by: Max Reitz


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir



Re: [PATCH] Add a new PIIX option to control PCI hot unplugging of devices on non-root buses

2020-04-24 Thread Ani Sinha


> On Apr 22, 2020, at 4:15 PM, Ani Sinha  wrote:
> 
> 
> 
>> On Apr 21, 2020, at 8:32 PM, Daniel P. Berrangé  wrote:
>> 
>> On Tue, Apr 21, 2020 at 02:45:04PM +, Ani Sinha wrote:
>>> 
>>> 
 On Apr 20, 2020, at 8:32 PM, Michael S. Tsirkin  wrote:
 
 But I for one would like to focus on keeping PIIX stable
 and focus development on q35.  Not bloating PIIX with lots of new
 features is IMHO a good way to do that.
>>> 
>>> Does this mean this patch is a no-go then? :(
>> 
>> I'd support this patch, as I don't think it can really be described as
>> bloat or destabalizing. It is just adding a simple property to
>> conditionalize existing functionality.  Telling people to switch to Q35
>> is unreasonable as it is not a simple 1-1 conversion from existing use
>> of PIIX. Q35 has much higher complexity in its configuration, has higher
>> memory overhead per VM too, and lacks certain features of PIIX too.
> 
> Cool. How do we go forward from here?
> 

We would really appreciate if we can add this extra knob in Qemu. Maybe someone 
else also in the community will find this useful. We don’t want to maintain 
this patch internally forever but rather prefer we maintain this as a Qemu 
community.

ani



Re: [PATCH v21 QEMU 5/5] virtio-balloon: Provide an interface for free page reporting

2020-04-24 Thread Alexander Duyck
On Fri, Apr 24, 2020 at 4:20 AM David Hildenbrand  wrote:
>
> On 22.04.20 20:21, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > Add support for free page reporting. The idea is to function very similar
> > to how the balloon works in that we basically end up madvising the page as
> > not being used. However we don't really need to bother with any deflate
> > type logic since the page will be faulted back into the guest when it is
> > read or written to.
> >
> > This provides a new way of letting the guest proactively report free
> > pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> > inflate/deflate that is triggered via the hypervisor explicitly.
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |   70 
> > 
> >  include/hw/virtio/virtio-balloon.h |2 +
> >  2 files changed, 71 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 5effc8b4653b..b473ff7f4b88 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -321,6 +321,60 @@ static void balloon_stats_set_poll_interval(Object 
> > *obj, Visitor *v,
> >  balloon_stats_change_timer(s, 0);
> >  }
> >
> > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > +VirtQueueElement *elem;
> > +
> > +while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement {
> > +unsigned int i;
> > +
>
> Maybe add a comment like
>
> /*
>  * As discarded pages will be zero when re-accessed, all pages either
>  * have the old value, or were zeroed out. In case the guest expects
>  * another value, make sure to never discard.
>  */
>
> Whatever you think is best.

Okay I will add the following comment:
/*
 * When we discard the page it has the effect of removing the page
 * from the hypervisor itself and causing it to be zeroed when it
 * is returned to us. So we must not discard the page if it is
 * accessible by another device or process, or if the guest is
 * expecting it to retain a non-zero value.
 */


> > +if (qemu_balloon_is_inhibited() || dev->poison_val) {
> > +goto skip_element;
> > +}
> > +
> > +for (i = 0; i < elem->in_num; i++) {
> > +void *addr = elem->in_sg[i].iov_base;
> > +size_t size = elem->in_sg[i].iov_len;
> > +ram_addr_t ram_offset;
> > +RAMBlock *rb;
> > +
> > +/*
> > + * There is no need to check the memory section to see if
> > + * it is ram/readonly/romd like there is for handle_output
> > + * below. If the region is not meant to be written to then
> > + * address_space_map will have allocated a bounce buffer
> > + * and it will be freed in address_space_unmap and trigger
> > + * and unassigned_mem_write before failing to copy over the
> > + * buffer. If more than one bad descriptor is provided it
> > + * will return NULL after the first bounce buffer and fail
> > + * to map any resources.
> > + */
> > +rb = qemu_ram_block_from_host(addr, false, _offset);
> > +if (!rb) {
> > +trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> > +continue;
> > +}
> > +
> > +/*
> > + * For now we will simply ignore unaligned memory regions, or
> > + * regions that overrun the end of the RAMBlock.
> > + */
> > +if (!QEMU_IS_ALIGNED(ram_offset | size, qemu_ram_pagesize(rb)) 
> > ||
> > +(ram_offset + size) > qemu_ram_get_used_length(rb)) {
> > +continue;
> > +}
> > +
> > +ram_block_discard_range(rb, ram_offset, size);
> > +}
> > +
> > +skip_element:
> > +virtqueue_push(vq, elem, 0);
> > +virtio_notify(vdev, vq);
> > +g_free(elem);
> > +}
> > +}
> > +
> >  static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >  {
> >  VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> > @@ -782,6 +836,16 @@ static void virtio_balloon_device_realize(DeviceState 
> > *dev, Error **errp)
> >  VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> >  int ret;
> >
> > +/*
> > + * Page reporting is dependant on page poison to make sure we can
> > + * report a page without changing the state of the internal data.
> > + * We need to set the flag before we call virtio_init as it will
> > + * affect the config size of the vdev.
> > + */
> > +if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
> > +s->host_features |= 1 << VIRTIO_BALLOON_F_PAGE_POISON;
> > +}
> > +
>
> As discussed, this hunk would go away. With that, this patch is 

[PATCH] travis: explicitly include gnutls to ensure it is updated

2020-04-24 Thread Daniel P . Berrangé
Travis includes gnutls in the default package set, but it is
an outdated version linkng to an incompatible libffi version.
The 'update: true' stanza causes the brew toolchain to be
updated but not the installed formula. It is possible to
run 'brew upgrade' to update installed formula, but this is
very slow adding more than 5 minutes to the build time.
Listing the gnutls package explicitly causes it to be updated
without extending the build time.

Signed-off-by: Daniel P. Berrangé 
---

Note in testing this I got past the libffi.6.dylib error, but
eventually hit the Travis 50 minute timeout. So we need to
do more to minimize what we build on macOS, splitting the job
into two I guess.

 .travis.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.travis.yml b/.travis.yml
index 2fd63eceaa..afbb070082 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -287,6 +287,7 @@ jobs:
 - pixman
 - gnu-sed
 - python
+- gnutls
   update: true
   before_script:
 - brew link --overwrite python
-- 
2.24.1




Re: [PATCH v21 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-24 Thread David Hildenbrand
On 24.04.20 16:56, Alexander Duyck wrote:
> On Fri, Apr 24, 2020 at 4:23 AM David Hildenbrand  wrote:
>>
>> On 22.04.20 20:21, Alexander Duyck wrote:
>>> From: Alexander Duyck 
>>>
>>> In an upcoming patch a feature named Free Page Reporting is about to be
>>> added. In order to avoid any confusion we should drop the use of the word
>>> 'report' when referring to Free Page Hinting. So what this patch does is go
>>> through and replace all instances of 'report' with 'hint" when we are
>>> referring to free page hinting.
>>>
>>> Signed-off-by: Alexander Duyck 
>>> ---
>>>  hw/virtio/virtio-balloon.c |   74 
>>> ++--
>>>  include/hw/virtio/virtio-balloon.h |   20 +-
>>>  2 files changed, 47 insertions(+), 47 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>>> index a4729f7fc930..a1d6fb52c876 100644
>>> --- a/hw/virtio/virtio-balloon.c
>>> +++ b/hw/virtio/virtio-balloon.c
>>> @@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
>>>  ret = false;
>>>  goto out;
>>>  }
>>> -if (id == dev->free_page_report_cmd_id) {
>>> -dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
>>> +if (id == dev->free_page_hint_cmd_id) {
>>> +dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
>>>  } else {
>>>  /*
>>>   * Stop the optimization only when it has started. This
>>>   * avoids a stale stop sign for the previous command.
>>>   */
>>> -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
>>> -dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>>> +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
>>> +dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
>>>  }
>>>  }
>>>  }
>>>
>>>  if (elem->in_num) {
>>> -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
>>> +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
>>>  qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
>>>elem->in_sg[0].iov_len);
>>>  }
>>> @@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
>>> *opaque)
>>>  qemu_mutex_unlock(>free_page_lock);
>>>  virtio_notify(vdev, vq);
>>>/*
>>> -   * Start to poll the vq once the reporting started. Otherwise, 
>>> continue
>>> +   * Start to poll the vq once the hinting started. Otherwise, continue
>>> * only when there are entries on the vq, which need to be given 
>>> back.
>>> */
>>>  } while (continue_to_get_hints ||
>>> - dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
>>> + dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
>>>  virtio_queue_set_notification(vq, 1);
>>>  }
>>>
>>> @@ -531,14 +531,14 @@ static void 
>>> virtio_balloon_free_page_start(VirtIOBalloon *s)
>>>  return;
>>>  }
>>>
>>> -if (s->free_page_report_cmd_id == UINT_MAX) {
>>> -s->free_page_report_cmd_id =
>>> -   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>>> +if (s->free_page_hint_cmd_id == UINT_MAX) {
>>> +s->free_page_hint_cmd_id =
>>> +   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
>>>  } else {
>>> -s->free_page_report_cmd_id++;
>>> +s->free_page_hint_cmd_id++;
>>>  }
>>>
>>> -s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
>>> +s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
>>>  virtio_notify_config(vdev);
>>>  }
>>>
>>> @@ -546,18 +546,18 @@ static void 
>>> virtio_balloon_free_page_stop(VirtIOBalloon *s)
>>>  {
>>>  VirtIODevice *vdev = VIRTIO_DEVICE(s);
>>>
>>> -if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
>>> +if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
>>>  /*
>>>   * The lock also guarantees us that the
>>>   * virtio_ballloon_get_free_page_hints exits after the
>>> - * free_page_report_status is set to S_STOP.
>>> + * free_page_hint_status is set to S_STOP.
>>>   */
>>>  qemu_mutex_lock(>free_page_lock);
>>>  /*
>>>   * The guest hasn't done the reporting, so host sends a 
>>> notification
>>>   * to the guest to actively stop the reporting.
>>>   */
>>> -s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>>> +s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
>>>  qemu_mutex_unlock(>free_page_lock);
>>>  virtio_notify_config(vdev);
>>>  }
>>> @@ -567,15 +567,15 @@ static void 
>>> virtio_balloon_free_page_done(VirtIOBalloon *s)
>>>  {
>>>  VirtIODevice *vdev = VIRTIO_DEVICE(s);
>>>
>>> -s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
>>> +

[PATCH] target/riscv: fix check of guest pa top bits

2020-04-24 Thread Jose Martins
The spec states that on sv39x4 guest physical  "address bits 63:41
must all be zeros, or else a guest-page-fault exception occurs.".
However, the check performed for these top bits of the virtual address
on the second stage is the same as the one performed for virtual
addresses on the first stage except with the 2-bit extension,
effectively creating the same kind of "hole" in the guest's physical
address space. I believe the following patch fixes this issue:

Signed-off-by: Jose Martins 
---
 target/riscv/cpu_helper.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index d3ba9efb02..da879f5656 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -421,15 +421,21 @@ static int get_physical_address(CPURISCVState
*env, hwaddr *physical,
 int va_bits = PGSHIFT + levels * ptidxbits + widened;
 target_ulong mask, masked_msbs;

-if (TARGET_LONG_BITS > (va_bits - 1)) {
-mask = (1L << (TARGET_LONG_BITS - (va_bits - 1))) - 1;
+if(!first_stage){
+if ((addr >> va_bits) != 0) {
+return TRANSLATE_FAIL;
+}
 } else {
-mask = 0;
-}
-masked_msbs = (addr >> (va_bits - 1)) & mask;
+if (TARGET_LONG_BITS > (va_bits - 1)) {
+mask = (1L << (TARGET_LONG_BITS - (va_bits - 1))) - 1;
+} else {
+mask = 0;
+}
+masked_msbs = (addr >> (va_bits - 1)) & mask;

-if (masked_msbs != 0 && masked_msbs != mask) {
-return TRANSLATE_FAIL;
+if (masked_msbs != 0 && masked_msbs != mask) {
+return TRANSLATE_FAIL;
+}
 }

 int ptshift = (levels - 1) * ptidxbits;
-- 
2.17.1

Jose



Re: [PATCH RESEND v6 10/36] multi-process: build system for remote device process

2020-04-24 Thread Stefan Hajnoczi
On Wed, Apr 22, 2020 at 09:13:45PM -0700, elena.ufimts...@oracle.com wrote:
> From: Jagannathan Raman 
> 
> Modify Makefile to support the building of the remote
> device process. Implements main() function of remote
> device process.
> 
> Signed-off-by: John G Johnson 
> Signed-off-by: Jagannathan Raman 
> Signed-off-by: Elena Ufimtseva 
> ---
>  MAINTAINERS |  8 ++
>  Makefile|  2 ++
>  Makefile.objs   | 27 ++
>  Makefile.target | 61 -
>  accel/Makefile.objs |  2 ++
>  backends/Makefile.objs  |  2 ++
>  block/Makefile.objs |  2 ++
>  hw/Makefile.objs|  7 +
>  hw/block/Makefile.objs  |  2 ++
>  hw/core/Makefile.objs   | 18 
>  hw/nvram/Makefile.objs  |  2 ++
>  hw/pci/Makefile.objs|  4 +++
>  hw/scsi/Makefile.objs   |  2 ++
>  migration/Makefile.objs |  2 ++
>  qom/Makefile.objs   |  3 ++
>  remote/Makefile.objs|  1 +
>  remote/remote-main.c| 23 
>  stubs/replay.c  |  4 +++
>  18 files changed, 171 insertions(+), 1 deletion(-)
>  create mode 100644 remote/Makefile.objs
>  create mode 100644 remote/remote-main.c

This approach is okay for now but will result in a lot of Makefile
duplication in the long run.

Each hw .o file should specify its dependencies so that qemu-system-*
and the remote executable can link in the needed files.  The Kconfig
system can also help with this by enabling/disabling features.

Then the Makefiles don't need to duplicate *-obj-y and remote-pci-*.

> diff --git a/Makefile.objs b/Makefile.objs
> index f29c60c59d..f6654633b4 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -21,6 +21,33 @@ block-obj-$(CONFIG_REPLICATION) += replication.o
>  
>  block-obj-m = block/
>  
> +#
> +# remote-pci-obj-y is common code used by remote devices
> +
> +remote-pci-obj-$(CONFIG_MPQEMU) += hw/
> +remote-pci-obj-$(CONFIG_MPQEMU) += qom/
> +remote-pci-obj-$(CONFIG_MPQEMU) += backends/
> +remote-pci-obj-$(CONFIG_MPQEMU) += block/
> +remote-pci-obj-$(CONFIG_MPQEMU) += migration/

In the future migration can be split into the QEMU and remote parts.
The remote executable doesn't need all the live migration code.

> +remote-pci-obj-$(CONFIG_MPQEMU) += remote/
> +remote-pci-obj-$(CONFIG_MPQEMU) += accel/

Devices do not execute guest code so they should not need accel/.  kvm
and tcg functions were stubbed out earlier in this patch series, so I'm
surprised to see thing being built into the remote executable.

> @@ -121,6 +131,20 @@ LIBS := $(libs_cpu) $(LIBS)
>  
>  obj-$(CONFIG_PLUGIN) += plugins/
>  
> +ifeq ($(TARGET_NAME)-$(CONFIG_MPQEMU)-$(CONFIG_USER_ONLY), x86_64-y-)
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/kvm-stub.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/tcg-stub.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/hax-stub.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += accel/stubs/whpx-stub.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/vl-stub.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/net-stub.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/replay.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/xen-mapcache.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/audio.o
> +remote-pci-tgt-obj-$(CONFIG_MPQEMU) += stubs/monitor.o
> +endif

Stubs don't need to be explicitly included, they should be linked in via
libqemustub.a.

> diff --git a/remote/remote-main.c b/remote/remote-main.c
> new file mode 100644
> index 00..7c0764ad01
> --- /dev/null
> +++ b/remote/remote-main.c
> @@ -0,0 +1,23 @@
> +/*
> + * Remote device initialization
> + *
> + * Copyright © 2018, 2020 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include 

This is already included by "qemu/osdep.h"

> +
> +#include "qemu/module.h"
> +
> +int main(int argc, char *argv[])
> +{
> +module_call_init(MODULE_INIT_QOM);
> +
> +return 0;
> +}
> diff --git a/stubs/replay.c b/stubs/replay.c
> index 2e3feee6a9..9b53c0cb37 100644
> --- a/stubs/replay.c
> +++ b/stubs/replay.c
> @@ -102,3 +102,7 @@ int replay_get_instructions(void)
>  void replay_account_executed_instructions(void)
>  {
>  }
> +
> +void replay_add_blocker(Error *reason)
> +{
> +}

This can be moved to the stubs patch.


signature.asc
Description: PGP signature


Re: [PATCH v21 QEMU 3/5] virtio-balloon: Replace free page hinting references to 'report' with 'hint'

2020-04-24 Thread Alexander Duyck
On Fri, Apr 24, 2020 at 4:23 AM David Hildenbrand  wrote:
>
> On 22.04.20 20:21, Alexander Duyck wrote:
> > From: Alexander Duyck 
> >
> > In an upcoming patch a feature named Free Page Reporting is about to be
> > added. In order to avoid any confusion we should drop the use of the word
> > 'report' when referring to Free Page Hinting. So what this patch does is go
> > through and replace all instances of 'report' with 'hint" when we are
> > referring to free page hinting.
> >
> > Signed-off-by: Alexander Duyck 
> > ---
> >  hw/virtio/virtio-balloon.c |   74 
> > ++--
> >  include/hw/virtio/virtio-balloon.h |   20 +-
> >  2 files changed, 47 insertions(+), 47 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index a4729f7fc930..a1d6fb52c876 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -466,21 +466,21 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
> >  ret = false;
> >  goto out;
> >  }
> > -if (id == dev->free_page_report_cmd_id) {
> > -dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> > +if (id == dev->free_page_hint_cmd_id) {
> > +dev->free_page_hint_status = FREE_PAGE_HINT_S_START;
> >  } else {
> >  /*
> >   * Stop the optimization only when it has started. This
> >   * avoids a stale stop sign for the previous command.
> >   */
> > -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
> > -dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> > +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
> > +dev->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
> >  }
> >  }
> >  }
> >
> >  if (elem->in_num) {
> > -if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
> > +if (dev->free_page_hint_status == FREE_PAGE_HINT_S_START) {
> >  qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> >elem->in_sg[0].iov_len);
> >  }
> > @@ -506,11 +506,11 @@ static void virtio_ballloon_get_free_page_hints(void 
> > *opaque)
> >  qemu_mutex_unlock(>free_page_lock);
> >  virtio_notify(vdev, vq);
> >/*
> > -   * Start to poll the vq once the reporting started. Otherwise, 
> > continue
> > +   * Start to poll the vq once the hinting started. Otherwise, continue
> > * only when there are entries on the vq, which need to be given 
> > back.
> > */
> >  } while (continue_to_get_hints ||
> > - dev->free_page_report_status == FREE_PAGE_REPORT_S_START);
> > + dev->free_page_hint_status == FREE_PAGE_HINT_S_START);
> >  virtio_queue_set_notification(vq, 1);
> >  }
> >
> > @@ -531,14 +531,14 @@ static void 
> > virtio_balloon_free_page_start(VirtIOBalloon *s)
> >  return;
> >  }
> >
> > -if (s->free_page_report_cmd_id == UINT_MAX) {
> > -s->free_page_report_cmd_id =
> > -   VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> > +if (s->free_page_hint_cmd_id == UINT_MAX) {
> > +s->free_page_hint_cmd_id =
> > +   VIRTIO_BALLOON_FREE_PAGE_HINT_CMD_ID_MIN;
> >  } else {
> > -s->free_page_report_cmd_id++;
> > +s->free_page_hint_cmd_id++;
> >  }
> >
> > -s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> > +s->free_page_hint_status = FREE_PAGE_HINT_S_REQUESTED;
> >  virtio_notify_config(vdev);
> >  }
> >
> > @@ -546,18 +546,18 @@ static void 
> > virtio_balloon_free_page_stop(VirtIOBalloon *s)
> >  {
> >  VirtIODevice *vdev = VIRTIO_DEVICE(s);
> >
> > -if (s->free_page_report_status != FREE_PAGE_REPORT_S_STOP) {
> > +if (s->free_page_hint_status != FREE_PAGE_HINT_S_STOP) {
> >  /*
> >   * The lock also guarantees us that the
> >   * virtio_ballloon_get_free_page_hints exits after the
> > - * free_page_report_status is set to S_STOP.
> > + * free_page_hint_status is set to S_STOP.
> >   */
> >  qemu_mutex_lock(>free_page_lock);
> >  /*
> >   * The guest hasn't done the reporting, so host sends a 
> > notification
> >   * to the guest to actively stop the reporting.
> >   */
> > -s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> > +s->free_page_hint_status = FREE_PAGE_HINT_S_STOP;
> >  qemu_mutex_unlock(>free_page_lock);
> >  virtio_notify_config(vdev);
> >  }
> > @@ -567,15 +567,15 @@ static void 
> > virtio_balloon_free_page_done(VirtIOBalloon *s)
> >  {
> >  VirtIODevice *vdev = VIRTIO_DEVICE(s);
> >
> > -s->free_page_report_status = FREE_PAGE_REPORT_S_DONE;
> > +s->free_page_hint_status = FREE_PAGE_HINT_S_DONE;
> >  

Re: [PATCH v7 09/10] iotests: Test committing to short backing file

2020-04-24 Thread Vladimir Sementsov-Ogievskiy

24.04.2020 15:54, Kevin Wolf wrote:

Signed-off-by: Kevin Wolf 



Reviewed-by: Vladimir Sementsov-Ogievskiy 


---


[..]


+
+# After this, 0 to base_size should be allocated/zeroed.


Actually, top_size_old to base_size, yes? (sorry, nitpicking, missed on 
previous review).


+#
+# In theory, leaving base_size to top_size_new unallocated would be
+# correct, but in practice, if we zero out anything, we zero out
+# everything up to top_size_new.
+iotests.qemu_img_log('resize', '-f', iotests.imgfmt,
+ '--preallocation', prealloc, top, top_size_new)
+iotests.qemu_io_log('-c', 'read -P 0 %s 64k' % off, top)
+iotests.qemu_io_log('-c', 'map', top)
+iotests.qemu_img_log('map', '--output=json', top)




--
Best regards,
Vladimir



Re: [PATCH 4/5] ramfb: add sanity checks to ramfb_create_display_surface

2020-04-24 Thread Laszlo Ersek
On 04/23/20 13:41, Gerd Hoffmann wrote:
>   Hi,
> 
>> - if "linesize" is nonzero, make sure it is a whole multiple of the
>> required word size (?)
>>
>> - if "linesize" is nonzero, make sure it is not bogus with relation to
>> "width". I'm thinking something like:
> 
> Yep, the whole linesize is a bit bogus, noticed after sending out this
> series, I have a followup patch for that (see below).
> 
> take care,
>   Gerd
> 
> From 154dcff73dc533fc95c88bd960ed65108af6c734 Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann 
> Date: Wed, 22 Apr 2020 12:07:59 +0200
> Subject: [PATCH] ramfb: fix size calculation
> 
> size calculation isn't correct with guest-supplied stride, the last
> display line isn't accounted for correctly.
> 
> For the typical case of stride > linesize (add padding) we error on the
> safe side (calculated size is larger than actual size).
> 
> With stride < linesize (scanlines overlap) the calculated size is
> smaller than the actual size though so our guest memory mapping might
> end up being too small.
> 
> While being at it also fix ramfb_create_display_surface to use hwaddr
> for the parameters.  That way all calculation are done with hwaddr type
> and we can't get funny effects from type castings.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/display/ramfb.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
> index be884c9ea837..928d74d10bc7 100644
> --- a/hw/display/ramfb.c
> +++ b/hw/display/ramfb.c
> @@ -44,10 +44,10 @@ static void ramfb_unmap_display_surface(pixman_image_t 
> *image, void *unused)
>  
>  static DisplaySurface *ramfb_create_display_surface(int width, int height,
>  pixman_format_code_t 
> format,
> -int linesize, uint64_t 
> addr)
> +hwaddr stride, hwaddr 
> addr)
>  {
>  DisplaySurface *surface;
> -hwaddr size;
> +hwaddr size, mapsize, linesize;
>  void *data;
>  
>  if (width < 16 || width > VBE_DISPI_MAX_XRES ||
> @@ -55,19 +55,20 @@ static DisplaySurface *ramfb_create_display_surface(int 
> width, int height,
>  format == 0 /* unknown format */)
>  return NULL;
>  
> -if (linesize == 0) {
> -linesize = width * PIXMAN_FORMAT_BPP(format) / 8;
> +linesize = width * PIXMAN_FORMAT_BPP(format) / 8;
> +if (stride == 0) {
> +stride = linesize;
>  }
>  
> -size = (hwaddr)linesize * height;
> -data = cpu_physical_memory_map(addr, , false);
> -if (size != (hwaddr)linesize * height) {
> -cpu_physical_memory_unmap(data, size, 0, 0);
> +mapsize = size = stride * (height - 1) + linesize;
> +data = cpu_physical_memory_map(addr, , false);
> +if (size != mapsize) {
> +cpu_physical_memory_unmap(data, mapsize, 0, 0);
>  return NULL;
>  }
>  
>  surface = qemu_create_displaysurface_from(width, height,
> -  format, linesize, data);
> +  format, stride, data);
>  pixman_image_set_destroy_function(surface->image,
>ramfb_unmap_display_surface, NULL);
>  
> 

I don't understand two things about this patch:

- "stride" can still be smaller than "linesize" (scanlines can still
overlap). Why is that not a problem?

- assuming "stride > linesize" (i.e., strictly larger), we don't seem to
map the last complete stride, and that seems to be intentional. Is that
safe with regard to qemu_create_displaysurface_from()? Ultimately the
stride is passed to pixman_image_create_bits(), and the underlying
"data" may not be large enough for that. What am I missing?

Hm... bonus question: qemu_create_displaysurface_from() still accepts
"linesize" as a signed int. (Not sure about pixman_image_create_bits().)
Should we do something specific to prevent that value from being
negative? The guest gives us a uint32_t.

Thanks
Laszlo




Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-24 Thread Eric Blake

On 4/24/20 7:54 AM, Kevin Wolf wrote:

If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf 
---
  block/qcow2-cluster.c |  2 +-
  block/qcow2.c | 34 ++
  2 files changed, 35 insertions(+), 1 deletion(-)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v7 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation

2020-04-24 Thread Max Reitz
On 24.04.20 16:27, Kevin Wolf wrote:
> The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
> image is possibly preallocated and then the zero flag is added to all
> clusters. This means that a copy-on-write operation may be needed when
> writing to these clusters, despite having used preallocation, negating
> one of the major benefits of preallocation.
> 
> Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
> and if the protocol driver can ensure that the new area reads as zeros,
> we can skip setting the zero flag in the qcow2 layer.
> 
> Unfortunately, the same approach doesn't work for metadata
> preallocation, so we'll still set the zero flag there.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Max Reitz 
> ---
>  block/qcow2.c  | 22 +++---
>  tests/qemu-iotests/274.out |  4 ++--
>  2 files changed, 21 insertions(+), 5 deletions(-)

Thanks for the resend! :)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] iotests/041: Fix NBD socket path

2020-04-24 Thread Eric Blake

On 4/24/20 8:46 AM, Max Reitz wrote:

We should put all UNIX socket files into the sock_dir, not test_dir.

Reported-by: Elena Ufimtseva 
Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/041 | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



Reviewed-by: Eric Blake 

I'm happy to queue this through my NBD tree, if you don't beat me to it 
through an iotest pull request.



diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 5d67bf14bf..46bf1f6c81 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -35,7 +35,7 @@ quorum_img3 = os.path.join(iotests.test_dir, 'quorum3.img')
  quorum_repair_img = os.path.join(iotests.test_dir, 'quorum_repair.img')
  quorum_snapshot_file = os.path.join(iotests.test_dir, 'quorum_snapshot.img')
  
-nbd_sock_path = os.path.join(iotests.test_dir, 'nbd.sock')

+nbd_sock_path = os.path.join(iotests.sock_dir, 'nbd.sock')
  
  class TestSingleDrive(iotests.QMPTestCase):

  image_len = 1 * 1024 * 1024 # MB



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-24 Thread Vladimir Sementsov-Ogievskiy

24.04.2020 15:17, Kevin Wolf wrote:

Am 24.04.2020 um 08:16 hat Vladimir Sementsov-Ogievskiy geschrieben:

23.04.2020 18:01, Kevin Wolf wrote:

If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
   block/qcow2-cluster.c |  2 +-
   block/qcow2.c | 33 +
   2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 17f1363279..4b5fc8c4a7 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1795,7 +1795,7 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t 
offset,
   /* Caller must pass aligned values, except at image end */
   assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
   assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
-   end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+   end_offset >= bs->total_sectors << BDRV_SECTOR_BITS);
   /* The zero flag is only supported by version 3 and newer */
   if (s->qcow_version < 3) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 9cfbdfc939..ad621fe404 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
   bs->supported_zero_flags = header.version >= 3 ?
  BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
   /* Repair image if dirty */
   if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -4214,6 +4215,38 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
   g_assert_not_reached();
   }
+if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
+uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
+
+/*
+ * Use zero clusters as much as we can. qcow2_cluster_zeroize()
+ * requires a cluster-aligned start. The end may be unaligned if it is
+ * at the end of the image (which it is here).
+ */
+ret = qcow2_cluster_zeroize(bs, zero_start, offset - zero_start, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to zero out new clusters");
+goto fail;
+}
+
+/* Write explicit zeros for the unaligned head */
+if (zero_start > old_length) {
+uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);


Why not s/s->cluster_size/zero_start - old_length/? We may save a lot
of pages if cluster_size is large.


Ok.


+QEMUIOVector qiov;
+qemu_iovec_init_buf(, buf, zero_start - old_length);
+
+qemu_co_mutex_unlock(>lock);


We are in intermediate state here. Is it safe to unlock? Anything may
happen, up to another truncate..


I don't think it's worse than unlocking during normal writes, which we
have been doing for a long time. I don't see anything that would corrupt
any internal state.

Of course, doing truncate in parallel with other operations around EOF
has somewhat undefined semantics because the order could be anything.
But if you don't want to get undefined results, you just shouldn't make
such requests. It's similar to reading and writing the same area at the
same time.


If there would be two truncate operations in parallel, we may finish up with 
s->l1_vm_state_index bs->total_sectors not corresponding to other metadata. Not 
sure how much is it bad..




+ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, , 0, 
0);


Better not do it if this cluster is already ZERO.. But it may be a
later patch to improve that.


I doubt it's worth writing code to optimise a corner case inside a
corner case.


+qemu_co_mutex_lock(>lock);
+
+qemu_vfree(buf);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Failed to zero out the new 
area");
+goto fail;
+}
+}
+}
+
   if (prealloc != PREALLOC_MODE_OFF) {
   /* Flush metadata before actually changing the image size */
   ret = qcow2_write_caches(bs);


Kevin




--
Best regards,
Vladimir



[PATCH v7 10/10] qcow2: Forward ZERO_WRITE flag for full preallocation

2020-04-24 Thread Kevin Wolf
The BDRV_REQ_ZERO_WRITE is currently implemented in a way that first the
image is possibly preallocated and then the zero flag is added to all
clusters. This means that a copy-on-write operation may be needed when
writing to these clusters, despite having used preallocation, negating
one of the major benefits of preallocation.

Instead, try to forward the BDRV_REQ_ZERO_WRITE to the protocol driver,
and if the protocol driver can ensure that the new area reads as zeros,
we can skip setting the zero flag in the qcow2 layer.

Unfortunately, the same approach doesn't work for metadata
preallocation, so we'll still set the zero flag there.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/qcow2.c  | 22 +++---
 tests/qemu-iotests/274.out |  4 ++--
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 98065d7808..2ba0b17c39 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4170,9 +4170,25 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 /* Allocate the data area */
 new_file_size = allocation_start +
 nb_new_data_clusters * s->cluster_size;
-/* Image file grows, so @exact does not matter */
-ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
-   errp);
+/*
+ * Image file grows, so @exact does not matter.
+ *
+ * If we need to zero out the new area, try first whether the protocol
+ * driver can already take care of this.
+ */
+if (flags & BDRV_REQ_ZERO_WRITE) {
+ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc,
+   BDRV_REQ_ZERO_WRITE, NULL);
+if (ret >= 0) {
+flags &= ~BDRV_REQ_ZERO_WRITE;
+}
+} else {
+ret = -1;
+}
+if (ret < 0) {
+ret = bdrv_co_truncate(bs->file, new_file_size, false, prealloc, 0,
+   errp);
+}
 if (ret < 0) {
 error_prepend(errp, "Failed to resize underlying file: ");
 qcow2_free_clusters(bs, allocation_start,
diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
index 1a796fd07c..9d6fdeb1f7 100644
--- a/tests/qemu-iotests/274.out
+++ b/tests/qemu-iotests/274.out
@@ -187,7 +187,7 @@ read 65536/65536 bytes at offset 9437184
 10 MiB (0xa0) bytes allocated at offset 5 MiB (0x50)
 
 [{ "start": 0, "length": 5242880, "depth": 1, "zero": true, "data": false},
-{ "start": 5242880, "length": 10485760, "depth": 0, "zero": true, "data": 
false, "offset": 327680}]
+{ "start": 5242880, "length": 10485760, "depth": 0, "zero": false, "data": 
true, "offset": 327680}]
 
 === preallocation=full ===
 Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=16777216 cluster_size=65536 
lazy_refcounts=off refcount_bits=16
@@ -206,7 +206,7 @@ read 65536/65536 bytes at offset 11534336
 4 MiB (0x40) bytes allocated at offset 8 MiB (0x80)
 
 [{ "start": 0, "length": 8388608, "depth": 1, "zero": true, "data": false},
-{ "start": 8388608, "length": 4194304, "depth": 0, "zero": true, "data": 
false, "offset": 327680}]
+{ "start": 8388608, "length": 4194304, "depth": 0, "zero": false, "data": 
true, "offset": 327680}]
 
 === preallocation=off ===
 Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=393216 cluster_size=65536 
lazy_refcounts=off refcount_bits=16
-- 
2.25.3




Re: [RFC PATCH 3/3] hw/net/tulip: Set descriptor error bit when lenght is incorrect

2020-04-24 Thread Eric Blake

On 4/23/20 6:16 PM, Philippe Mathieu-Daudé wrote:

When a frame lenght is incorrect, set the RDES0 'Error Summary'


length (here, and in the subject)


and 'Frame too long' bits. Then stop the receive process and
trigger an abnormal interrupt. See [4.3.5 Receive Process].



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 1/3] block: Add blk_new_with_bs() helper

2020-04-24 Thread Max Reitz
On 24.04.20 16:18, Eric Blake wrote:
> On 4/24/20 5:02 AM, Max Reitz wrote:
> 
>>> (With the Patchew warning fixed, of course (i.e., we should set ret to
>>> -EPERM or something in qcow.c))
>>
>> Er, well, maybe I should have looked into more places.  The compiler
>> only warns about that single one because it’s the only place where @ret
>> is really uninitialized, but there are many more where we need to set
>> it: crypto.c, parallels.c, qcow.c, qcow2.c (both hunks), qed.c,
>> sheepdog.c, vdi.c, vhdx.c, and vpc.c.
>>
>> (So basically everywhere but vmdk.c, blockdev.c, and blockjob.c.)
> 
> Urgh - so it's even less of a win in terms of reduced lines of code.

True, but I still consider it a win in terms of reduced complexity.
Before, we have two function calls with two variable assignments from
their return values (@blk and @ret).  Afterwards, we’ll have a single
function all with one variable assignment from its return value (@blk),
and one stand-alone variable assignment with a constant value (“ret =
-EPERM” or similar). :)

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 1/3] block: Add blk_new_with_bs() helper

2020-04-24 Thread Eric Blake

On 4/24/20 5:02 AM, Max Reitz wrote:


(With the Patchew warning fixed, of course (i.e., we should set ret to
-EPERM or something in qcow.c))


Er, well, maybe I should have looked into more places.  The compiler
only warns about that single one because it’s the only place where @ret
is really uninitialized, but there are many more where we need to set
it: crypto.c, parallels.c, qcow.c, qcow2.c (both hunks), qed.c,
sheepdog.c, vdi.c, vhdx.c, and vpc.c.

(So basically everywhere but vmdk.c, blockdev.c, and blockjob.c.)


Urgh - so it's even less of a win in terms of reduced lines of code. 
Still, I'll fix it and post v3.




And now I’m going to get another coffee...

Max



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [EXT] Re: [PATCH v9 1/9] hw/vfio/common: Remove error print on mmio region translation by viommu

2020-04-24 Thread Auger Eric
Hi Bharat,

On 4/2/20 11:01 AM, Bharat Bhushan wrote:
> Hi Eric/Alex,
> 
>> -Original Message-
>> From: Alex Williamson 
>> Sent: Thursday, March 26, 2020 11:23 PM
>> To: Auger Eric 
>> Cc: Bharat Bhushan ; peter.mayd...@linaro.org;
>> pet...@redhat.com; eric.auger@gmail.com; kevin.t...@intel.com;
>> m...@redhat.com; Tomasz Nowicki [C] ;
>> drjo...@redhat.com; linuc.dec...@gmail.com; qemu-devel@nongnu.org; qemu-
>> a...@nongnu.org; bharatb.li...@gmail.com; jean-phili...@linaro.org;
>> yang.zh...@intel.com; David Gibson 
>> Subject: [EXT] Re: [PATCH v9 1/9] hw/vfio/common: Remove error print on mmio
>> region translation by viommu
>>
>> External Email
>>
>> --
>> On Thu, 26 Mar 2020 18:35:48 +0100
>> Auger Eric  wrote:
>>
>>> Hi Alex,
>>>
>>> On 3/24/20 12:08 AM, Alex Williamson wrote:
 [Cc +dwg who originated this warning]

 On Mon, 23 Mar 2020 14:16:09 +0530
 Bharat Bhushan  wrote:

> On ARM, the MSI doorbell is translated by the virtual IOMMU.
> As such address_space_translate() returns the MSI controller MMIO
> region and we get an "iommu map to non memory area"
> message. Let's remove this latter.
>
> Signed-off-by: Eric Auger 
> Signed-off-by: Bharat Bhushan 
> ---
>  hw/vfio/common.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
> 5ca11488d6..c586edf47a 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -426,8 +426,6 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb,
>> void **vaddr,
>   , , writable,
>   MEMTXATTRS_UNSPECIFIED);
>  if (!memory_region_is_ram(mr)) {
> -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> - xlat);
>  return false;
>  }
>

 I'm a bit confused here, I think we need more justification beyond
 "we hit this warning and we don't want to because it's ok in this
 one special case, therefore remove it".  I assume the special case
 is that the device MSI address is managed via the SET_IRQS ioctl and
 therefore we won't actually get DMAs to this range.
>>> Yes exactly. The guest creates a mapping between one giova and this
>>> gpa (corresponding to the MSI controller doorbell) because MSIs are
>>> mapped on ARM. But practically the physical device is programmed with
>>> an host chosen iova that maps onto the physical MSI controller's
>>> doorbell. so the device never performs DMA accesses to this range.
>>>
>>>   But I imagine the case that
 was in mind when adding this warning was general peer-to-peer
 between and assigned and emulated device.
>>> yes makes sense.
>>>
>>>   Maybe there's an argument to be made
 that such a p2p mapping might also be used in a non-vIOMMU case.  We
 skip creating those mappings and drivers continue to work, maybe
 because nobody attempts to do p2p DMA with the types of devices we
 emulate, maybe because p2p DMA is not absolutely reliable on bare
 metal and drivers test it before using it.
>>> MSI doorbells are mapped using the IOMMU_MMIO flag (dma-iommu.c
>>> iommu_dma_get_msi_page).
>>> One idea could be to pass that flag through the IOMMU Notifier
>>> mechanism into the iotlb->perm. Eventually when we get this in
>>> vfio_get_vaddr() we would not print the warning. Could that make sense?
>>
>> Yeah, if we can identify a valid case that doesn't need a warning, that's 
>> fine by me.
>> Thanks,
> 
> Let me know if I understood the proposal correctly:
> 
> virtio-iommu driver in guest will make map (VIRTIO_IOMMU_T_MAP) with 
> VIRTIO_IOMMU_MAP_F_MMIO flag for MSI mapping.
> In qemu, virtio-iommu device will set a new defined flag (say IOMMU_MMIO) in 
> iotlb->perm in memory_region_notify_iommu(). vfio_get_vaddr() will check same 
> flag and will not print the warning.>
> Is above correct?
Yes that's what I had in mind.

Thanks

Eric
> 
> Thanks
> -Bharat
> 
>>
>> Alex
> 
> 




Re: [PATCH v6 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-24 Thread Eric Blake

On 4/24/20 7:17 AM, Kevin Wolf wrote:




+ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, , 0, 
0);


Better not do it if this cluster is already ZERO.. But it may be a
later patch to improve that.


I doubt it's worth writing code to optimise a corner case inside a
corner case.


I spotted the same issue, and my suggestion was to use 
qcow2_co_pwrite_zero instead of qcow2_co_pwritev_part, but Kevin pointed 
out that my idea would probably not work the way I thought.  Then again, 
checking if the page is already zero is all the more benefit that 
qcow2_co_pwrite_zero would have given for me to have raised the 
question.  The following example illustrates why it might be worthwhile:


Suppose we have an image with 1M clusters, which is an unaligned 100.5M 
in length but started life with all clusters sparse.  We then resize it 
to another unaligned 200.5M.  The call to qcow2_co_pwritev_part will 
cause the image to be non-sparse for one cluster out of 201, whereas 
adding a check to skip the pwritev because the original unaligned tail 
cluster already read as zero will let us keep all clusters sparse.


At any rate, I argued that any such optimization would be a followup 
patch, and Kevin is right that it is a corner case optimization unlikely 
to affect many real-life images.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v7 00/10] block: Fix resize (extending) of short overlays

2020-04-24 Thread Max Reitz
On 24.04.20 14:54, Kevin Wolf wrote:
> v7:
> - Allocate smaller zero buffer [Vladimir]
> - Added missing error_setg_errno() [Max]
> - Code cleanup in the iotest, enabled mapping for 'metadata' [Vladimir]
> - Don't assign to errp twice [Eric]

I would have liked to see that change in patch 10, but the mail seems to
have disappeared somewhere. :?

Max



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 09/10] iotests: Test committing to short backing file

2020-04-24 Thread Max Reitz
On 24.04.20 14:54, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/274 | 155 +
>  tests/qemu-iotests/274.out | 268 +
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 424 insertions(+)
>  create mode 100755 tests/qemu-iotests/274
>  create mode 100644 tests/qemu-iotests/274.out

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 07/10] block: truncate: Don't make backing file data visible

2020-04-24 Thread Max Reitz
On 24.04.20 14:54, Kevin Wolf wrote:
> When extending the size of an image that has a backing file larger than
> its old size, make sure that the backing file data doesn't become
> visible in the guest, but the added area is properly zeroed out.
> 
> Consider the following scenario where the overlay is shorter than its
> backing file:
> 
> base.qcow2: 
> overlay.qcow2:  
> 
> When resizing (extending) overlay.qcow2, the new blocks should not stay
> unallocated and make the additional As from base.qcow2 visible like
> before this patch, but zeros should be read.
> 
> A similar case happens with the various variants of a commit job when an
> intermediate file is short (- for unallocated):
> 
> base.qcow2: A-A-
> mid.qcow2:  BB-B
> top.qcow2:  C--C--C-
> 
> After commit top.qcow2 to mid.qcow2, the following happens:
> 
> mid.qcow2:  CB-C00C0 (correct result)
> mid.qcow2:  CB-C--C- (before this fix)
> 
> Without the fix, blocks that previously read as zeros on top.qcow2
> suddenly turn into A.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/io.c | 25 +
>  1 file changed, 25 insertions(+)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v7 04/10] qcow2: Support BDRV_REQ_ZERO_WRITE for truncate

2020-04-24 Thread Max Reitz
On 24.04.20 14:54, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/qcow2-cluster.c |  2 +-
>  block/qcow2.c | 34 ++
>  2 files changed, 35 insertions(+), 1 deletion(-)

[...]

> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9cfbdfc939..98065d7808 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -4214,6 +4215,39 @@ static int coroutine_fn 
> qcow2_co_truncate(BlockDriverState *bs, int64_t offset,

[...]

> +/* Write explicit zeros for the unaligned head */
> +if (zero_start > old_length) {
> +uint64_t len = zero_start - old_length;
> +uint8_t *buf = qemu_blockalign0(bs, len);

I wonder whether I should raise the question of why this should be
block-aligned when we make no effort to align the offset its written to
(and we know it isn’t aligned to qcow2 clusters at least).

I probably should not.

Reviewed-by: Max Reitz 

> +QEMUIOVector qiov;
> +qemu_iovec_init_buf(, buf, len);
> +
> +qemu_co_mutex_unlock(>lock);
> +ret = qcow2_co_pwritev_part(bs, old_length, len, , 0, 0);
> +qemu_co_mutex_lock(>lock);
> +
> +qemu_vfree(buf);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "Failed to zero out the new 
> area");
> +goto fail;
> +}
> +}



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 00/15] qapi: Spring cleaning

2020-04-24 Thread Eric Blake

On 4/24/20 5:06 AM, no-re...@patchew.org wrote:

Patchew URL: https://patchew.org/QEMU/20200424084338.26803-1-arm...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

-...
+E..
+==
+ERROR: test_stream_pause (__main__.TestSingleDrive)
+--
+Traceback (most recent call last):
+  File "030", line 93, in test_stream_pause


Not sure how this would be related to your series, or if it is a 
spurious flakiness in the test that just chose to hit now.  iotest 30 is 
passing fine for me even with your series applied.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 15/15] qapi: Generate simpler marshalling code when no arguments

2020-04-24 Thread Eric Blake

On 4/24/20 3:43 AM, Markus Armbruster wrote:

When command FOO has no arguments, its generated qmp_marshal_FOO() is
a bit confusing.  Make it simpler:

  visit_start_struct(v, NULL, NULL, 0, );
  if (err) {
  goto out;
  }
-
-if (!err) {
-visit_check_struct(v, );
-}
+visit_check_struct(v, );
  visit_end_struct(v, NULL);
  if (err) {
  goto out;
  }

Signed-off-by: Markus Armbruster 
---
  scripts/qapi/commands.py | 40 
  1 file changed, 24 insertions(+), 16 deletions(-)



A bit more complex in the generator, but the generated code is indeed 
better.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 14/15] qapi: Disallow qmp_marshal_FOO(NULL, ...)

2020-04-24 Thread Eric Blake

On 4/24/20 3:43 AM, Markus Armbruster wrote:

For QMP commands without arguments, gen_marshal() laboriously
generates a qmp_marshal_FOO() that copes with null @args.  Turns
there's just one caller that passes null instead of an empty QDict.
Adjust that caller, and simplify gen_marshal().

Signed-off-by: Markus Armbruster 
---
  docs/devel/qapi-code-gen.txt |  2 +-
  monitor/qmp.c|  5 -
  scripts/qapi/commands.py | 26 ++
  3 files changed, 7 insertions(+), 26 deletions(-)



Fun diffstat.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




  1   2   3   >