date:20200421

Re: [PATCH v3 00/18] target/arm: sve load/store improvements

2020-04-21 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200422043309.18430-1-richard.hender...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH v3 00/18] target/arm: sve load/store improvements
Message-id: 20200422043309.18430-1-richard.hender...@linaro.org
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
4cc449f target/arm: Remove sve_memopidx
cecb8e8 target/arm: Reuse sve_probe_page for gather loads
9c2bbf3 target/arm: Reuse sve_probe_page for scatter stores
1f230e5 target/arm: Reuse sve_probe_page for gather first-fault loads
072e665 target/arm: Use SVEContLdSt for contiguous stores
64dc8bd target/arm: Update contiguous first-fault and no-fault loads
fa5a242 target/arm: Use SVEContLdSt for multi-register contiguous loads
b0e22ec target/arm: Handle watchpoints in sve_ld1_r
336353e target/arm: Use SVEContLdSt in sve_ld1_r
5d103ea target/arm: Adjust interface of sve_ld1_host_fn
cf5a541 target/arm: Add sve infrastructure for page lookup
67e550d target/arm: Drop manual handling of set/clear_helper_retaddr
c1428bf target/arm: Use cpu_*_data_ra for sve_ldst_tlb_fn
e625d1a accel/tcg: Add endian-specific cpu_{ld, st}* operations
c4a1a7c accel/tcg: Add probe_access_flags
7f333bf accel/tcg: Add block comment for probe_access
c43405b exec: Fix cpu_watchpoint_address_matches address length
543daa3 exec: Add block comments for watchpoint routines

=== OUTPUT BEGIN ===
1/18 Checking commit 543daa3e504d (exec: Add block comments for watchpoint 
routines)
2/18 Checking commit c43405b82eea (exec: Fix cpu_watchpoint_address_matches 
address length)
3/18 Checking commit 7f333bf1a998 (accel/tcg: Add block comment for 
probe_access)
4/18 Checking commit c4a1a7c041e7 (accel/tcg: Add probe_access_flags)
5/18 Checking commit e625d1af4070 (accel/tcg: Add endian-specific cpu_{ld, st}* 
operations)
6/18 Checking commit c1428bfc6ba8 (target/arm: Use cpu_*_data_ra for 
sve_ldst_tlb_fn)
ERROR: spaces required around that '*' (ctx:VxV)
#62: FILE: target/arm/sve_helper.c:4029:
+TLB(env, addr, (TYPEM)*(TYPEE *)(vd + H(reg_off)), ra); \
   ^

ERROR: spaces required around that '*' (ctx:WxV)
#152: FILE: target/arm/sve_helper.c:4162:
+  sve_ldst1_tlb_fn *tlb_fn)
^

total: 2 errors, 0 warnings, 455 lines checked

Patch 6/18 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/18 Checking commit 67e550dfef24 (target/arm: Drop manual handling of 
set/clear_helper_retaddr)
8/18 Checking commit cf5a541132ca (target/arm: Add sve infrastructure for page 
lookup)
WARNING: Block comments use a leading /* on a separate line
#31: FILE: target/arm/sve_helper.c:1633:
+/* Big-endian hosts need to frob the byte indices.  If the copy

total: 0 errors, 1 warnings, 281 lines checked

Patch 8/18 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
9/18 Checking commit 5d103ea6bb5f (target/arm: Adjust interface of 
sve_ld1_host_fn)
10/18 Checking commit 336353e1eeae (target/arm: Use SVEContLdSt in sve_ld1_r)
11/18 Checking commit b0e22ec2e54e (target/arm: Handle watchpoints in sve_ld1_r)
12/18 Checking commit fa5a242e25d2 (target/arm: Use SVEContLdSt for 
multi-register contiguous loads)
13/18 Checking commit 64dc8bd8c13d (target/arm: Update contiguous first-fault 
and no-fault loads)
14/18 Checking commit 072e665663b5 (target/arm: Use SVEContLdSt for contiguous 
stores)
15/18 Checking commit 1f230e521a1d (target/arm: Reuse sve_probe_page for gather 
first-fault loads)
16/18 Checking commit 9c2bbf312f6c (target/arm: Reuse sve_probe_page for 
scatter stores)
17/18 Checking commit cecb8e871485 (target/arm: Reuse sve_probe_page for gather 
loads)
18/18 Checking commit 4cc449fa07c0 (target/arm: Remove sve_memopidx)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200422043309.18430-1-richard.hender...@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v3 06/10] iotests: add testfinder.py

2020-04-21 Thread Vladimir Sementsov-Ogievskiy


21.04.2020 19:56, Kevin Wolf wrote:

Am 21.04.2020 um 09:35 hat Vladimir Sementsov-Ogievskiy geschrieben:

Add python script with new logic of searching for tests:

Current ./check behavior:
  - tests are named [0-9][0-9][0-9]
  - tests must be registered in group file (even if test doesn't belong
to any group, like 142)

Behavior of new test:
  - group file is dropped
  - tests are searched by file-name instead of group file, so it's not
needed more to "register the test", just create it with name
*-test. Old names like [0-9][0-9][0-9] are supported too, but not
recommended for new tests


I wonder if a tests/ subdirectory instead of the -test suffix would
organise things a bit better.


No objections.

I also thought about, may be, a tests/ subtree, so we'll have something like

tests/jobs/
tests/formats/
...




  - groups are parsed from '# group: ' line inside test files
  - optional file group.local may be used to define some additional
groups for downstreams
  - 'disabled' group is used to temporary disable tests. So instead of
commenting tests in old 'group' file you now can add them to
disabled group with help of 'group.local' file
  - selecting test ranges like 5-15 are not supported more


Occasionally they were useful when something went wrong during the test
run and I only wanted to repeat the part after it happened. But it's a
rare case and we don't have a clear order any more with arbitrary test
names (which are an improvement otherwise), so we'll live with it.


Yes, I've used it for same thing.

Actually, we still have the order, as I just sort iotests by name. I think,
we could add a parameter for testfinder (may be, as a separate step no in
these series), something like

--start-from TEST : parse all other arguments as usual, make sorted sequence
and than drop tests from the first one to TEST (not inclusive). This may be
used to rerun failed ./check command, starting from the middle of the process.




Benefits:
  - no rebase conflicts in group file on patch porting from branch to
branch
  - no conflicts in upstream, when different series want to occupy same
test number
  - meaningful names for test files
For example, with digital number, when some person wants to add some
test about block-stream, he most probably will just create a new
test. But if there would be test-block-stream test already, he will
at first look at it and may be just add a test-case into it.
And anyway meaningful names are better.

This commit just adds class, which is unused now, and will be used in
further patches, to finally substitute ./check selecting tests logic.


Maybe mention here that the file can be executed standalone, even if
it'S not used by check yet.


Still, the documentation changed like new behavior is already here.
Let's live with this small inconsistency for the following few commits,
until final change.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  docs/devel/testing.rst   |  52 +-
  tests/qemu-iotests/testfinder.py | 167 +++


A little bit of bikeshedding: As this can be executed as a standalone
tool, would a name like findtests.py be better?


Hmm. I named it by class name, considering possibility to execute is just for 
module testing. So for module users, it's just a module with class TestFinder, 
and it's called testfinder.. But I don't have strict opinion in it, findtests 
sound more human-friendly.




  2 files changed, 218 insertions(+), 1 deletion(-)
  create mode 100755 tests/qemu-iotests/testfinder.py

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 770a987ea4..6c9d5b126b 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -153,7 +153,7 @@ check-block
  ---
  
  ``make check-block`` runs a subset of the block layer iotests (the tests that

-are in the "auto" group in ``tests/qemu-iotests/group``).
+are in the "auto" group).
  See the "QEMU iotests" section below for more information.
  
  GCC gcov support

@@ -267,6 +267,56 @@ another application on the host may have locked the file, 
possibly leading to a
  test failure.  If using such devices are explicitly desired, consider adding
  ``locking=off`` option to disable image locking.
  
+Test case groups

+
+
+Test may belong to some groups, you may define it in the comment inside the
+test. By convention, test groups are listed in the second line of the test
+file, after "#!/..." line, like this:
+
+.. code::
+
+  #!/usr/bin/env python3
+  # group: auto quick
+  #
+  ...
+
+Additional way of defining groups is creating tests/qemu-iotests/group.local
+file. This should be used only for downstream (this file should never appear
+in upstream). This file may be used for defining some downstream test groups
+or for temporary disable tests, like this:
+
+.. code::
+
+  # groups for some company downstream process
+  #
+  # ci - tests to run on build
+  # down - our

Re: [PATCH v3 04/10] iotests/check: move QEMU_VXHS_PROG to common.rc

2020-04-21 Thread Vladimir Sementsov-Ogievskiy


21.04.2020 19:03, Kevin Wolf wrote:

Am 21.04.2020 um 09:35 hat Vladimir Sementsov-Ogievskiy geschrieben:

QEMU_VXHS_PROG is used only in common.rc. So, move it to common.rc,
simplifying a bit further conversion of check into python

Signed-off-by: Vladimir Sementsov-Ogievskiy 


This feels inconsistent when every other QEMU_*_PROG stays in check. Is
QEMU_VXHS_PROG really so different?



Hmm, I was just too lazy to understand set_prog_path logic :) If you think it 
worth it, I'll try.

+set_prog_path()
+{
+p=$(command -v $1 2> /dev/null)
+if [ -n "$p" -a -x "$p" ]; then
+type -p "$p"
+else
+return 1
+fi
+}

Aha. It just tries to get path to the command and check that it is executable.

So, in python, it probably should look like simply

p = shutil.which(command)
return p if os.access(p, os.X_OK) else None

OK, I'll add it in next version.


--
Best regards,
Vladimir

Re: tst-arm-mte bug: PSTATE.TCO is cleared on exceptions

2020-04-21 Thread Richard Henderson

On 4/20/20 3:29 AM, Szabolcs Nagy wrote:
> i'm using the branch at
> 
> https://github.com/rth7680/qemu/tree/tgt-arm-mte
> 
> to test armv8.5-a mte and hope this is ok to report bugs here.
> 
> i'm doing tests in qemu-system-aarch64 with linux userspace
> code and it seems TCO bit gets cleared after syscalls or other
> kernel entry, but PSTATE is expected to be restored, so i
> suspect it is a qemu bug.
> 
> i think the architecture saves/restores PSTATE using SPSR_ELx
> on exceptions.

Yep.  I failed to update aarch64_pstate_valid_mask for TCO.
Will fix.  Thanks,


r~

> 
> i used the linux branch
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=devel/mte-v2
> 
> attached a reproducer that segfaults in qemu but should work.
> 
> thanks.
>

[PATCH v3 18/18] target/arm: Remove sve_memopidx

2020-04-21 Thread Richard Henderson

None of the sve helpers use TCGMemOpIdx any longer, so we can
stop passing it.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/internals.h |  5 -
 target/arm/sve_helper.c| 14 +++---
 target/arm/translate-sve.c | 17 +++--
 3 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index e633aff36e..a833e3941d 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -979,11 +979,6 @@ static inline int arm_num_ctx_cmps(ARMCPU *cpu)
 }
 }
 
-/* Note make_memop_idx reserves 4 bits for mmu_idx, and MO_BSWAP is bit 3.
- * Thus a TCGMemOpIdx, without any MO_ALIGN bits, fits in 8 bits.
- */
-#define MEMOPIDX_SHIFT  8
-
 /**
  * v7m_using_psp: Return true if using process stack pointer
  * Return true if the CPU is currently using the process stack
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index fffde4b6ec..f482fdd285 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4440,7 +4440,7 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
sve_ldst1_host_fn *host_fn,
sve_ldst1_tlb_fn *tlb_fn)
 {
-const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+const unsigned rd = simd_data(desc);
 const intptr_t reg_max = simd_oprsz(desc);
 intptr_t reg_off, reg_last, mem_off;
 SVEContLdSt info;
@@ -4696,7 +4696,7 @@ void sve_ldnfff1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
sve_ldst1_host_fn *host_fn,
sve_ldst1_tlb_fn *tlb_fn)
 {
-const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+const unsigned rd = simd_data(desc);
 void *vd = >vfp.zregs[rd];
 const intptr_t reg_max = simd_oprsz(desc);
 intptr_t reg_off, mem_off, reg_last;
@@ -4921,7 +4921,7 @@ void sve_stN_r(CPUARMState *env, uint64_t *vg, 
target_ulong addr, uint32_t desc,
sve_ldst1_host_fn *host_fn,
sve_ldst1_tlb_fn *tlb_fn)
 {
-const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+const unsigned rd = simd_data(desc);
 const intptr_t reg_max = simd_oprsz(desc);
 intptr_t reg_off, reg_last, mem_off;
 SVEContLdSt info;
@@ -5127,9 +5127,9 @@ void sve_ld1_z(CPUARMState *env, void *vd, uint64_t *vg, 
void *vm,
sve_ldst1_host_fn *host_fn,
sve_ldst1_tlb_fn *tlb_fn)
 {
-const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
 const int mmu_idx = cpu_mmu_index(env, false);
 const intptr_t reg_max = simd_oprsz(desc);
+const int scale = simd_data(desc);
 ARMVectorReg scratch;
 intptr_t reg_off;
 SVEHostPage info, info2;
@@ -5272,10 +5272,10 @@ void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t 
*vg, void *vm,
  sve_ldst1_tlb_fn *tlb_fn)
 {
 const int mmu_idx = cpu_mmu_index(env, false);
-const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
+const intptr_t reg_max = simd_oprsz(desc);
+const int scale = simd_data(desc);
 const int esize = 1 << esz;
 const int msize = 1 << msz;
-const intptr_t reg_max = simd_oprsz(desc);
 intptr_t reg_off;
 SVEHostPage info;
 target_ulong addr, in_page;
@@ -5426,9 +5426,9 @@ void sve_st1_z(CPUARMState *env, void *vd, uint64_t *vg, 
void *vm,
sve_ldst1_host_fn *host_fn,
sve_ldst1_tlb_fn *tlb_fn)
 {
-const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
 const int mmu_idx = cpu_mmu_index(env, false);
 const intptr_t reg_max = simd_oprsz(desc);
+const int scale = simd_data(desc);
 void *host[ARM_MAX_VQ * 4];
 intptr_t reg_off, i;
 SVEHostPage info, info2;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b35bad245e..7bd7de80e6 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4584,11 +4584,6 @@ static const uint8_t dtype_esz[16] = {
 3, 2, 1, 3
 };
 
-static TCGMemOpIdx sve_memopidx(DisasContext *s, int dtype)
-{
-return make_memop_idx(s->be_data | dtype_mop[dtype], get_mem_index(s));
-}
-
 static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
int dtype, gen_helper_gvec_mem *fn)
 {
@@ -4601,9 +4596,7 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, 
TCGv_i64 addr,
  * registers as pointers, so encode the regno into the data field.
  * For consistency, do this even for LD1.
  */
-desc = sve_memopidx(s, dtype);
-desc |= zt << MEMOPIDX_SHIFT;
-desc = simd_desc(vsz, vsz, desc);
+desc = simd_desc(vsz, vsz, zt);
 t_desc = tcg_const_i32(desc);
 t_pg = tcg_temp_new_ptr();
 
@@ -4835,9 +4828,7 @@ static void do_ldrq(DisasContext *s, int zt, int pg, 
TCGv_i64 addr, int msz)
 int desc, poff;
 
 /* Load the first quadword using the normal predicated load

[PATCH v3 17/18] target/arm: Reuse sve_probe_page for gather loads

2020-04-21 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 208 +---
 1 file changed, 109 insertions(+), 99 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index f4cdeecdcb..fffde4b6ec 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5120,130 +5120,140 @@ static target_ulong off_zd_d(void *reg, intptr_t 
reg_ofs)
 return *(uint64_t *)(reg + reg_ofs);
 }
 
-static void sve_ld1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
-   target_ulong base, uint32_t desc, uintptr_t ra,
-   zreg_off_fn *off_fn, sve_ldst1_tlb_fn *tlb_fn)
+static inline QEMU_ALWAYS_INLINE
+void sve_ld1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
+   target_ulong base, uint32_t desc, uintptr_t retaddr,
+   int esize, int msize, zreg_off_fn *off_fn,
+   sve_ldst1_host_fn *host_fn,
+   sve_ldst1_tlb_fn *tlb_fn)
 {
 const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
-intptr_t i, oprsz = simd_oprsz(desc);
-ARMVectorReg scratch = { };
+const int mmu_idx = cpu_mmu_index(env, false);
+const intptr_t reg_max = simd_oprsz(desc);
+ARMVectorReg scratch;
+intptr_t reg_off;
+SVEHostPage info, info2;
 
-for (i = 0; i < oprsz; ) {
-uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+memset(, 0, reg_max);
+reg_off = 0;
+do {
+uint64_t pg = vg[reg_off >> 6];
 do {
 if (likely(pg & 1)) {
-target_ulong off = off_fn(vm, i);
-tlb_fn(env, , i, base + (off << scale), ra);
+target_ulong addr = base + (off_fn(vm, reg_off) << scale);
+target_ulong in_page = -(addr | TARGET_PAGE_MASK);
+
+sve_probe_page(, false, env, addr, 0, MMU_DATA_LOAD,
+   mmu_idx, retaddr);
+
+if (likely(in_page >= msize)) {
+if (unlikely(info.flags & TLB_WATCHPOINT)) {
+cpu_check_watchpoint(env_cpu(env), addr, msize,
+ info.attrs, BP_MEM_READ, retaddr);
+}
+/* TODO: MTE check */
+host_fn(, reg_off, info.host);
+} else {
+/* Element crosses the page boundary. */
+sve_probe_page(, false, env, addr + in_page, 0,
+   MMU_DATA_LOAD, mmu_idx, retaddr);
+if (unlikely((info.flags | info2.flags) & TLB_WATCHPOINT)) 
{
+cpu_check_watchpoint(env_cpu(env), addr,
+ msize, info.attrs,
+ BP_MEM_READ, retaddr);
+}
+/* TODO: MTE check */
+tlb_fn(env, , reg_off, addr, retaddr);
+}
 }
-i += 4, pg >>= 4;
-} while (i & 15);
-}
+reg_off += esize;
+pg >>= esize;
+} while (reg_off & 63);
+} while (reg_off < reg_max);
 
 /* Wait until all exceptions have been raised to write back.  */
-memcpy(vd, , oprsz);
+memcpy(vd, , reg_max);
 }
 
-static void sve_ld1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
-   target_ulong base, uint32_t desc, uintptr_t ra,
-   zreg_off_fn *off_fn, sve_ldst1_tlb_fn *tlb_fn)
-{
-const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
-intptr_t i, oprsz = simd_oprsz(desc) / 8;
-ARMVectorReg scratch = { };
-
-for (i = 0; i < oprsz; i++) {
-uint8_t pg = *(uint8_t *)(vg + H1(i));
-if (likely(pg & 1)) {
-target_ulong off = off_fn(vm, i * 8);
-tlb_fn(env, , i * 8, base + (off << scale), ra);
-}
-}
-
-/* Wait until all exceptions have been raised to write back.  */
-memcpy(vd, , oprsz * 8);
+#define DO_LD1_ZPZ_S(MEM, OFS, MSZ) \
+void HELPER(sve_ld##MEM##_##OFS)(CPUARMState *env, void *vd, void *vg,   \
+ void *vm, target_ulong base, uint32_t desc) \
+{\
+sve_ld1_z(env, vd, vg, vm, base, desc, GETPC(), 4, 1 << MSZ, \
+  off_##OFS##_s, sve_ld1##MEM##_host, sve_ld1##MEM##_tlb);   \
 }
 
-#define DO_LD1_ZPZ_S(MEM, OFS) \
-void QEMU_FLATTEN HELPER(sve_ld##MEM##_##OFS) \
-(CPUARMState *env, void *vd, void *vg, void *vm, \
- target_ulong base, uint32_t desc)   \
-{\
-sve_ld1_zs(env, vd, vg, vm, base, desc, GETPC(), \
-  off_##OFS##_s, sve_ld1##MEM##_tlb);\
+#define DO_LD1_ZPZ_D(MEM, OFS, MSZ) \
+void

[PATCH v3 15/18] target/arm: Reuse sve_probe_page for gather first-fault loads

2020-04-21 Thread Richard Henderson

This avoids the need for a separate set of helpers to implement
no-fault semantics, and will enable MTE in the future.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 323 
 1 file changed, 127 insertions(+), 196 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 9389c7c76d..6229ea65c0 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5250,231 +5250,162 @@ DO_LD1_ZPZ_D(dd_be, zd)
 
 /* First fault loads with a vector index.  */
 
-/* Load one element into VD+REG_OFF from (ENV,VADDR) without faulting.
- * The controlling predicate is known to be true.  Return true if the
- * load was successful.
- */
-typedef bool sve_ld1_nf_fn(CPUARMState *env, void *vd, intptr_t reg_off,
-   target_ulong vaddr, int mmu_idx);
-
-#ifdef CONFIG_SOFTMMU
-#define DO_LD_NF(NAME, H, TYPEE, TYPEM, HOST) \
-static bool sve_ld##NAME##_nf(CPUARMState *env, void *vd, intptr_t reg_off, \
-  target_ulong addr, int mmu_idx)   \
-{   \
-target_ulong next_page = -(addr | TARGET_PAGE_MASK);\
-if (likely(next_page - addr >= sizeof(TYPEM))) {\
-void *host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);  \
-if (likely(host)) { \
-TYPEM val = HOST(host); \
-*(TYPEE *)(vd + H(reg_off)) = val;  \
-return true;\
-}   \
-}   \
-return false;   \
-}
-#else
-#define DO_LD_NF(NAME, H, TYPEE, TYPEM, HOST) \
-static bool sve_ld##NAME##_nf(CPUARMState *env, void *vd, intptr_t reg_off, \
-target_ulong addr, int mmu_idx) \
-{   \
-if (likely(page_check_range(addr, sizeof(TYPEM), PAGE_READ))) { \
-TYPEM val = HOST(g2h(addr));\
-*(TYPEE *)(vd + H(reg_off)) = val;  \
-return true;\
-}   \
-return false;   \
-}
-#endif
-
-DO_LD_NF(bsu, H1_4, uint32_t, uint8_t, ldub_p)
-DO_LD_NF(bss, H1_4, uint32_t,  int8_t, ldsb_p)
-DO_LD_NF(bdu, , uint64_t, uint8_t, ldub_p)
-DO_LD_NF(bds, , uint64_t,  int8_t, ldsb_p)
-
-DO_LD_NF(hsu_le, H1_4, uint32_t, uint16_t, lduw_le_p)
-DO_LD_NF(hss_le, H1_4, uint32_t,  int16_t, ldsw_le_p)
-DO_LD_NF(hsu_be, H1_4, uint32_t, uint16_t, lduw_be_p)
-DO_LD_NF(hss_be, H1_4, uint32_t,  int16_t, ldsw_be_p)
-DO_LD_NF(hdu_le, , uint64_t, uint16_t, lduw_le_p)
-DO_LD_NF(hds_le, , uint64_t,  int16_t, ldsw_le_p)
-DO_LD_NF(hdu_be, , uint64_t, uint16_t, lduw_be_p)
-DO_LD_NF(hds_be, , uint64_t,  int16_t, ldsw_be_p)
-
-DO_LD_NF(ss_le,  H1_4, uint32_t, uint32_t, ldl_le_p)
-DO_LD_NF(ss_be,  H1_4, uint32_t, uint32_t, ldl_be_p)
-DO_LD_NF(sdu_le, , uint64_t, uint32_t, ldl_le_p)
-DO_LD_NF(sds_le, , uint64_t,  int32_t, ldl_le_p)
-DO_LD_NF(sdu_be, , uint64_t, uint32_t, ldl_be_p)
-DO_LD_NF(sds_be, , uint64_t,  int32_t, ldl_be_p)
-
-DO_LD_NF(dd_le,  , uint64_t, uint64_t, ldq_le_p)
-DO_LD_NF(dd_be,  , uint64_t, uint64_t, ldq_be_p)
-
 /*
- * Common helper for all gather first-faulting loads.
+ * Common helpers for all gather first-faulting loads.
  */
-static inline void sve_ldff1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
-target_ulong base, uint32_t desc, uintptr_t ra,
-zreg_off_fn *off_fn, sve_ldst1_tlb_fn *tlb_fn,
-sve_ld1_nf_fn *nonfault_fn)
+
+static inline QEMU_ALWAYS_INLINE
+void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
+ target_ulong base, uint32_t desc, uintptr_t retaddr,
+ const int esz, const int msz, zreg_off_fn *off_fn,
+ sve_ldst1_host_fn *host_fn,
+ sve_ldst1_tlb_fn *tlb_fn)
 {
-const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
-const int mmu_idx = get_mmuidx(oi);
+const int mmu_idx = cpu_mmu_index(env, false);
 const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
-intptr_t reg_off, reg_max = simd_oprsz(desc);
-target_ulong addr;
+const int esize = 1 << esz;
+const int msize = 1 << msz;
+const

[PATCH v3 11/18] target/arm: Handle watchpoints in sve_ld1_r

2020-04-21 Thread Richard Henderson

Handle all of the watchpoints for active elements all at once,
before we've modified the vector register.  This removes the
TLB_WATCHPOINT bit from page[].flags, which means that we can
use the normal fast path via RAM.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 72 -
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 6bae342a17..7992a569b0 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4371,6 +4371,70 @@ static bool sve_cont_ldst_pages(SVEContLdSt *info, 
SVEContFault fault,
 return have_work;
 }
 
+static void sve_cont_ldst_watchpoints(SVEContLdSt *info, CPUARMState *env,
+  uint64_t *vg, target_ulong addr,
+  int esize, int msize, int wp_access,
+  uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+intptr_t mem_off, reg_off, reg_last;
+int flags0 = info->page[0].flags;
+int flags1 = info->page[1].flags;
+
+if (likely(!((flags0 | flags1) & TLB_WATCHPOINT))) {
+return;
+}
+
+/* Indicate that watchpoints are handled. */
+info->page[0].flags = flags0 & ~TLB_WATCHPOINT;
+info->page[1].flags = flags1 & ~TLB_WATCHPOINT;
+
+if (flags0 & TLB_WATCHPOINT) {
+mem_off = info->mem_off_first[0];
+reg_off = info->reg_off_first[0];
+reg_last = info->reg_off_last[0];
+
+while (reg_off <= reg_last) {
+uint64_t pg = vg[reg_off >> 6];
+do {
+if ((pg >> (reg_off & 63)) & 1) {
+cpu_check_watchpoint(env_cpu(env), addr + mem_off,
+ msize, info->page[0].attrs,
+ wp_access, retaddr);
+}
+reg_off += esize;
+mem_off += msize;
+} while (reg_off <= reg_last && (reg_off & 63));
+}
+}
+
+mem_off = info->mem_off_split;
+if (mem_off >= 0) {
+cpu_check_watchpoint(env_cpu(env), addr + mem_off, msize,
+ info->page[0].attrs, wp_access, retaddr);
+}
+
+mem_off = info->mem_off_first[1];
+if ((flags1 & TLB_WATCHPOINT) && mem_off >= 0) {
+reg_off = info->reg_off_first[1];
+reg_last = info->reg_off_last[1];
+
+do {
+uint64_t pg = vg[reg_off >> 6];
+do {
+if ((pg >> (reg_off & 63)) & 1) {
+cpu_check_watchpoint(env_cpu(env), addr + mem_off,
+ msize, info->page[1].attrs,
+ wp_access, retaddr);
+}
+reg_off += esize;
+mem_off += msize;
+} while (reg_off & 63);
+} while (reg_off <= reg_last);
+}
+#endif
+}
+
 /*
  * The result of tlb_vaddr_to_host for user-only is just g2h(x),
  * which is always non-null.  Elide the useless test.
@@ -4412,13 +4476,19 @@ void sve_ld1_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
 /* Probe the page(s).  Exit with exception for any invalid page. */
 sve_cont_ldst_pages(, FAULT_ALL, env, addr, MMU_DATA_LOAD, retaddr);
 
+/* Handle watchpoints for all active elements. */
+sve_cont_ldst_watchpoints(, env, vg, addr, 1 << esz, 1 << msz,
+  BP_MEM_READ, retaddr);
+
+/* TODO: MTE check. */
+
 flags = info.page[0].flags | info.page[1].flags;
 if (unlikely(flags != 0)) {
 #ifdef CONFIG_USER_ONLY
 g_assert_not_reached();
 #else
 /*
- * At least one page includes MMIO (or watchpoints).
+ * At least one page includes MMIO.
  * Any bus operation can fail with cpu_transaction_failed,
  * which for ARM will raise SyncExternal.  Perform the load
  * into scratch memory to preserve register state until the end.
-- 
2.20.1

[PATCH v3 12/18] target/arm: Use SVEContLdSt for multi-register contiguous loads

2020-04-21 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 223 ++--
 1 file changed, 79 insertions(+), 144 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 7992a569b0..9365e32646 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4449,27 +4449,28 @@ static inline bool test_host_page(void *host)
 }
 
 /*
- * Common helper for all contiguous one-register predicated loads.
+ * Common helper for all contiguous 1,2,3,4-register predicated stores.
  */
 static inline QEMU_ALWAYS_INLINE
-void sve_ld1_r(CPUARMState *env, uint64_t *vg, const target_ulong addr,
+void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr,
uint32_t desc, const uintptr_t retaddr,
-   const int esz, const int msz,
+   const int esz, const int msz, const int N,
sve_ldst1_host_fn *host_fn,
sve_ldst1_tlb_fn *tlb_fn)
 {
 const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
-void *vd = >vfp.zregs[rd];
 const intptr_t reg_max = simd_oprsz(desc);
 intptr_t reg_off, reg_last, mem_off;
 SVEContLdSt info;
 void *host;
-int flags;
+int flags, i;
 
 /* Find the active elements.  */
-if (!sve_cont_ldst_elements(, addr, vg, reg_max, esz, 1 << msz)) {
+if (!sve_cont_ldst_elements(, addr, vg, reg_max, esz, N << msz)) {
 /* The entire predicate was false; no load occurs.  */
-memset(vd, 0, reg_max);
+for (i = 0; i < N; ++i) {
+memset(>vfp.zregs[(rd + i) & 31], 0, reg_max);
+}
 return;
 }
 
@@ -4477,7 +4478,7 @@ void sve_ld1_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
 sve_cont_ldst_pages(, FAULT_ALL, env, addr, MMU_DATA_LOAD, retaddr);
 
 /* Handle watchpoints for all active elements. */
-sve_cont_ldst_watchpoints(, env, vg, addr, 1 << esz, 1 << msz,
+sve_cont_ldst_watchpoints(, env, vg, addr, 1 << esz, N << msz,
   BP_MEM_READ, retaddr);
 
 /* TODO: MTE check. */
@@ -4493,9 +4494,8 @@ void sve_ld1_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
  * which for ARM will raise SyncExternal.  Perform the load
  * into scratch memory to preserve register state until the end.
  */
-ARMVectorReg scratch;
+ARMVectorReg scratch[4] = { };
 
-memset(, 0, reg_max);
 mem_off = info.mem_off_first[0];
 reg_off = info.reg_off_first[0];
 reg_last = info.reg_off_last[1];
@@ -4510,21 +4510,29 @@ void sve_ld1_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
 uint64_t pg = vg[reg_off >> 6];
 do {
 if ((pg >> (reg_off & 63)) & 1) {
-tlb_fn(env, , reg_off, addr + mem_off, retaddr);
+for (i = 0; i < N; ++i) {
+tlb_fn(env, [i], reg_off,
+   addr + mem_off + (i << msz), retaddr);
+}
 }
 reg_off += 1 << esz;
-mem_off += 1 << msz;
+mem_off += N << msz;
 } while (reg_off & 63);
 } while (reg_off <= reg_last);
 
-memcpy(vd, , reg_max);
+for (i = 0; i < N; ++i) {
+memcpy(>vfp.zregs[(rd + i) & 31], [i], reg_max);
+}
 return;
 #endif
 }
 
 /* The entire operation is in RAM, on valid pages. */
 
-memset(vd, 0, reg_max);
+for (i = 0; i < N; ++i) {
+memset(>vfp.zregs[(rd + i) & 31], 0, reg_max);
+}
+
 mem_off = info.mem_off_first[0];
 reg_off = info.reg_off_first[0];
 reg_last = info.reg_off_last[0];
@@ -4534,10 +4542,13 @@ void sve_ld1_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
 uint64_t pg = vg[reg_off >> 6];
 do {
 if ((pg >> (reg_off & 63)) & 1) {
-host_fn(vd, reg_off, host + mem_off);
+for (i = 0; i < N; ++i) {
+host_fn(>vfp.zregs[(rd + i) & 31], reg_off,
+host + mem_off + (i << msz));
+}
 }
 reg_off += 1 << esz;
-mem_off += 1 << msz;
+mem_off += N << msz;
 } while (reg_off <= reg_last && (reg_off & 63));
 }
 
@@ -4547,7 +4558,11 @@ void sve_ld1_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
  */
 mem_off = info.mem_off_split;
 if (unlikely(mem_off >= 0)) {
-tlb_fn(env, vd, info.reg_off_split, addr + mem_off, retaddr);
+reg_off = info.reg_off_split;
+for (i = 0; i < N; ++i) {
+tlb_fn(env, >vfp.zregs[(rd + i) & 31], reg_off,
+   addr + mem_off + (i << msz), retaddr);
+}
 }
 
 mem_off = info.mem_off_first[1];
@@ -4560,10 +4575,13 @@ void sve_ld1_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,

[PATCH v3 10/18] target/arm: Use SVEContLdSt in sve_ld1_r

2020-04-21 Thread Richard Henderson

First use of the new helper functions, so we can remove the
unused markup.  No longer need a scratch for user-only, as
we completely probe the page set before reading; system mode
still requires a scratch for MMIO.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 188 +---
 1 file changed, 97 insertions(+), 91 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index d007137735..6bae342a17 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4221,9 +4221,9 @@ typedef struct {
  * final element on each page.  Identify any single element that spans
  * the page boundary.  Return true if there are any active elements.
  */
-static bool __attribute__((unused))
-sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr, uint64_t *vg,
-   intptr_t reg_max, int esz, int msize)
+static bool sve_cont_ldst_elements(SVEContLdSt *info, target_ulong addr,
+   uint64_t *vg, intptr_t reg_max,
+   int esz, int msize)
 {
 const int esize = 1 << esz;
 const uint64_t pg_mask = pred_esz_masks[esz];
@@ -4313,10 +4313,9 @@ sve_cont_ldst_elements(SVEContLdSt *info, target_ulong 
addr, uint64_t *vg,
  * Control the generation of page faults with @fault.  Return false if
  * there is no work to do, which can only happen with @fault == FAULT_NO.
  */
-static bool __attribute__((unused))
-sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault, CPUARMState *env,
-target_ulong addr, MMUAccessType access_type,
-uintptr_t retaddr)
+static bool sve_cont_ldst_pages(SVEContLdSt *info, SVEContFault fault,
+CPUARMState *env, target_ulong addr,
+MMUAccessType access_type, uintptr_t retaddr)
 {
 int mmu_idx = cpu_mmu_index(env, false);
 int mem_off = info->mem_off_first[0];
@@ -4388,109 +4387,116 @@ static inline bool test_host_page(void *host)
 /*
  * Common helper for all contiguous one-register predicated loads.
  */
-static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
-  uint32_t desc, const uintptr_t retaddr,
-  const int esz, const int msz,
-  sve_ldst1_host_fn *host_fn,
-  sve_ldst1_tlb_fn *tlb_fn)
+static inline QEMU_ALWAYS_INLINE
+void sve_ld1_r(CPUARMState *env, uint64_t *vg, const target_ulong addr,
+   uint32_t desc, const uintptr_t retaddr,
+   const int esz, const int msz,
+   sve_ldst1_host_fn *host_fn,
+   sve_ldst1_tlb_fn *tlb_fn)
 {
-const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
-const int mmu_idx = get_mmuidx(oi);
 const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
 void *vd = >vfp.zregs[rd];
-const int diffsz = esz - msz;
 const intptr_t reg_max = simd_oprsz(desc);
-const intptr_t mem_max = reg_max >> diffsz;
-ARMVectorReg scratch;
+intptr_t reg_off, reg_last, mem_off;
+SVEContLdSt info;
 void *host;
-intptr_t split, reg_off, mem_off;
+int flags;
 
-/* Find the first active element.  */
-reg_off = find_next_active(vg, 0, reg_max, esz);
-if (unlikely(reg_off == reg_max)) {
+/* Find the active elements.  */
+if (!sve_cont_ldst_elements(, addr, vg, reg_max, esz, 1 << msz)) {
 /* The entire predicate was false; no load occurs.  */
 memset(vd, 0, reg_max);
 return;
 }
-mem_off = reg_off >> diffsz;
 
-/*
- * If the (remaining) load is entirely within a single page, then:
- * For softmmu, and the tlb hits, then no faults will occur;
- * For user-only, either the first load will fault or none will.
- * We can thus perform the load directly to the destination and
- * Vd will be unmodified on any exception path.
- */
-split = max_for_page(addr, mem_off, mem_max);
-if (likely(split == mem_max)) {
-host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
-if (test_host_page(host)) {
-intptr_t i = reg_off;
-host -= mem_off;
-do {
-host_fn(vd, i, host + (i >> diffsz));
-i = find_next_active(vg, i + (1 << esz), reg_max, esz);
-} while (i < reg_max);
-/* After having taken any fault, zero leading inactive elements. */
-swap_memzero(vd, reg_off);
-return;
-}
-}
+/* Probe the page(s).  Exit with exception for any invalid page. */
+sve_cont_ldst_pages(, FAULT_ALL, env, addr, MMU_DATA_LOAD, retaddr);
 
-/*
- * Perform the predicated read into a temporary, thus ensuring
- * if the load of the last element faults, Vd is not modified.
- */
+flags = info.page[0].flags | info.page[1].flags;
+if

Re: [PATCH 03/15] KVM: MIPS: Fix VPN2_MASK definition for variable cpu_vmbits

2020-04-21 Thread chen huacai

Hi, Sasha,

On Wed, Apr 22, 2020 at 9:42 AM Sasha Levin  wrote:
>
> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.6.5, v5.5.18, v5.4.33, v4.19.116, 
> v4.14.176, v4.9.219, v4.4.219.
>
> v5.6.5: Build OK!
> v5.5.18: Build OK!
> v5.4.33: Build OK!
> v4.19.116: Build OK!
> v4.14.176: Build OK!
> v4.9.219: Build OK!
> v4.4.219: Failed to apply! Possible dependencies:
> 029499b47738 ("KVM: x86: MMU: Make mmu_set_spte() return emulate value")
> 19d194c62b25 ("MIPS: KVM: Simplify TLB_* macros")
> 403015b323a2 ("MIPS: KVM: Move non-TLB handling code out of tlb.c")
> 7ee0e5b29d27 ("KVM: x86: MMU: Remove unused parameter of __direct_map()")
> 9a99c4fd6586 ("KVM: MIPS: Define KVM_ENTRYHI_ASID to 
> cpu_asid_mask(_cpu_data)")
> 9fbfb06a4065 ("MIPS: KVM: Arrayify struct kvm_mips_tlb::tlb_lo*")
> ba049e93aef7 ("kvm: rename pfn_t to kvm_pfn_t")
> bdb7ed8608f8 ("MIPS: KVM: Convert headers to kernel sized types")
> ca64c2beecd4 ("MIPS: KVM: Abstract guest ASID mask")
> caa1faa7aba6 ("MIPS: KVM: Trivial whitespace and style fixes")
> e6207bbea16c ("MIPS: KVM: Use MIPS_ENTRYLO_* defs from mipsregs.h")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
Please ignore this patch in linux-4.4 branch and even below.

>
> --
> Thanks
> Sasha
>


-- 
Huacai Chen

[PATCH v3 09/18] target/arm: Adjust interface of sve_ld1_host_fn

2020-04-21 Thread Richard Henderson

The current interface includes a loop; change it to load a
single element.  We will then be able to use the function
for ld{2,3,4} where individual vector elements are not adjacent.

Replace each call with the simplest possible loop over active
elements.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 124 
 1 file changed, 63 insertions(+), 61 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 2f053a9152..d007137735 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3972,20 +3972,10 @@ void HELPER(sve_fcmla_zpzzz_d)(CPUARMState *env, void 
*vg, uint32_t desc)
  */
 
 /*
- * Load elements into @vd, controlled by @vg, from @host + @mem_ofs.
- * Memory is valid through @host + @mem_max.  The register element
- * indices are inferred from @mem_ofs, as modified by the types for
- * which the helper is built.  Return the @mem_ofs of the first element
- * not loaded (which is @mem_max if they are all loaded).
- *
- * For softmmu, we have fully validated the guest page.  For user-only,
- * we cannot fully validate without taking the mmap lock, but since we
- * know the access is within one host page, if any access is valid they
- * all must be valid.  However, when @vg is all false, it may be that
- * no access is valid.
+ * Load one element into @vd + @reg_off from @host.
+ * The controlling predicate is known to be true.
  */
-typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
- intptr_t mem_ofs, intptr_t mem_max);
+typedef void sve_ldst1_host_fn(void *vd, intptr_t reg_off, void *host);
 
 /*
  * Load one element into @vd + @reg_off from (@env, @vaddr, @ra).
@@ -3999,20 +3989,10 @@ typedef void sve_ldst1_tlb_fn(CPUARMState *env, void 
*vd, intptr_t reg_off,
  */
 
 #define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST) \
-static intptr_t sve_##NAME##_host(void *vd, void *vg, void *host,   \
-  intptr_t mem_off, const intptr_t mem_max) \
-{   \
-intptr_t reg_off = mem_off * (sizeof(TYPEE) / sizeof(TYPEM));   \
-uint64_t *pg = vg;  \
-while (mem_off + sizeof(TYPEM) <= mem_max) {\
-TYPEM val = 0;  \
-if (likely((pg[reg_off >> 6] >> (reg_off & 63)) & 1)) { \
-val = HOST(host + mem_off); \
-}   \
-*(TYPEE *)(vd + H(reg_off)) = val;  \
-mem_off += sizeof(TYPEM), reg_off += sizeof(TYPEE); \
-}   \
-return mem_off; \
+static void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host)  \
+{  \
+TYPEM val = HOST(host);\
+*(TYPEE *)(vd + H(reg_off)) = val; \
 }
 
 #define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB) \
@@ -4411,7 +4391,7 @@ static inline bool test_host_page(void *host)
 static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
   uint32_t desc, const uintptr_t retaddr,
   const int esz, const int msz,
-  sve_ld1_host_fn *host_fn,
+  sve_ldst1_host_fn *host_fn,
   sve_ldst1_tlb_fn *tlb_fn)
 {
 const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
@@ -4445,8 +4425,12 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 if (likely(split == mem_max)) {
 host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
 if (test_host_page(host)) {
-mem_off = host_fn(vd, vg, host - mem_off, mem_off, mem_max);
-tcg_debug_assert(mem_off == mem_max);
+intptr_t i = reg_off;
+host -= mem_off;
+do {
+host_fn(vd, i, host + (i >> diffsz));
+i = find_next_active(vg, i + (1 << esz), reg_max, esz);
+} while (i < reg_max);
 /* After having taken any fault, zero leading inactive elements. */
 swap_memzero(vd, reg_off);
 return;
@@ -4459,7 +4443,12 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
  */
 #ifdef CONFIG_USER_ONLY
 swap_memzero(, reg_off);
-host_fn(, vg, g2h(addr), mem_off, mem_max);
+host = g2h(addr);
+do {
+host_fn(, reg_off, host + (reg_off >> diffsz));
+reg_off += 1 << esz;
+reg_off =

[PATCH v3 05/18] accel/tcg: Add endian-specific cpu_{ld, st}* operations

2020-04-21 Thread Richard Henderson

We currently have target-endian versions of these operations,
but no easy way to force a specific endianness.  This can be
helpful if the target has endian-specific operations, or a mode
that swaps endianness.

Signed-off-by: Richard Henderson 
---
 docs/devel/loads-stores.rst |  39 +++--
 include/exec/cpu_ldst.h | 277 +++-
 accel/tcg/cputlb.c  | 236 ++
 accel/tcg/user-exec.c   | 211 ++-
 4 files changed, 587 insertions(+), 176 deletions(-)

diff --git a/docs/devel/loads-stores.rst b/docs/devel/loads-stores.rst
index 0d99eb24c1..9a944ef1af 100644
--- a/docs/devel/loads-stores.rst
+++ b/docs/devel/loads-stores.rst
@@ -97,9 +97,9 @@ function, which is a return address into the generated code.
 
 Function names follow the pattern:
 
-load: ``cpu_ld{sign}{size}_mmuidx_ra(env, ptr, mmuidx, retaddr)``
+load: ``cpu_ld{sign}{size}{end}_mmuidx_ra(env, ptr, mmuidx, retaddr)``
 
-store: ``cpu_st{size}_mmuidx_ra(env, ptr, val, mmuidx, retaddr)``
+store: ``cpu_st{size}{end}_mmuidx_ra(env, ptr, val, mmuidx, retaddr)``
 
 ``sign``
  - (empty) : for 32 or 64 bit sizes
@@ -112,9 +112,14 @@ store: ``cpu_st{size}_mmuidx_ra(env, ptr, val, mmuidx, 
retaddr)``
  - ``l`` : 32 bits
  - ``q`` : 64 bits
 
+``end``
+ - (empty) : for target endian, or 8 bit sizes
+ - ``_be`` : big endian
+ - ``_le`` : little endian
+
 Regexes for git grep:
- - ``\``
- - ``\``
+ - ``\``
+ - ``\``
 
 ``cpu_{ld,st}*_data_ra``
 
@@ -129,9 +134,9 @@ be performed with a context other than the default.
 
 Function names follow the pattern:
 
-load: ``cpu_ld{sign}{size}_data_ra(env, ptr, ra)``
+load: ``cpu_ld{sign}{size}{end}_data_ra(env, ptr, ra)``
 
-store: ``cpu_st{size}_data_ra(env, ptr, val, ra)``
+store: ``cpu_st{size}{end}_data_ra(env, ptr, val, ra)``
 
 ``sign``
  - (empty) : for 32 or 64 bit sizes
@@ -144,9 +149,14 @@ store: ``cpu_st{size}_data_ra(env, ptr, val, ra)``
  - ``l`` : 32 bits
  - ``q`` : 64 bits
 
+``end``
+ - (empty) : for target endian, or 8 bit sizes
+ - ``_be`` : big endian
+ - ``_le`` : little endian
+
 Regexes for git grep:
- - ``\``
- - ``\``
+ - ``\``
+ - ``\``
 
 ``cpu_{ld,st}*_data``
 ~
@@ -163,9 +173,9 @@ the CPU state anyway.
 
 Function names follow the pattern:
 
-load: ``cpu_ld{sign}{size}_data(env, ptr)``
+load: ``cpu_ld{sign}{size}{end}_data(env, ptr)``
 
-store: ``cpu_st{size}_data(env, ptr, val)``
+store: ``cpu_st{size}{end}_data(env, ptr, val)``
 
 ``sign``
  - (empty) : for 32 or 64 bit sizes
@@ -178,9 +188,14 @@ store: ``cpu_st{size}_data(env, ptr, val)``
  - ``l`` : 32 bits
  - ``q`` : 64 bits
 
+``end``
+ - (empty) : for target endian, or 8 bit sizes
+ - ``_be`` : big endian
+ - ``_le`` : little endian
+
 Regexes for git grep
- - ``\``
- - ``\``
+ - ``\``
+ - ``\``
 
 ``cpu_ld*_code``
 
diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 53de19753a..1ba515bfcc 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -26,12 +26,18 @@
  * The syntax for the accessors is:
  *
  * load:  cpu_ld{sign}{size}_{mmusuffix}(env, ptr)
+ *cpu_ld{sign}{size}{end}_{mmusuffix}(env, ptr)
  *cpu_ld{sign}{size}_{mmusuffix}_ra(env, ptr, retaddr)
+ *cpu_ld{sign}{size}{end}_{mmusuffix}_ra(env, ptr, retaddr)
  *cpu_ld{sign}{size}_mmuidx_ra(env, ptr, mmu_idx, retaddr)
+ *cpu_ld{sign}{size}{end}_mmuidx_ra(env, ptr, mmu_idx, retaddr)
  *
  * store: cpu_st{size}_{mmusuffix}(env, ptr, val)
+ *cpu_st{size}{end}_{mmusuffix}(env, ptr, val)
  *cpu_st{size}_{mmusuffix}_ra(env, ptr, val, retaddr)
+ *cpu_st{size}{end}_{mmusuffix}_ra(env, ptr, val, retaddr)
  *cpu_st{size}_mmuidx_ra(env, ptr, val, mmu_idx, retaddr)
+ *cpu_st{size}{end}_mmuidx_ra(env, ptr, val, mmu_idx, retaddr)
  *
  * sign is:
  * (empty): for 32 and 64 bit sizes
@@ -44,6 +50,11 @@
  *   l: 32 bits
  *   q: 64 bits
  *
+ * end is:
+ * (empty): for target native endian, or for 8 bit access
+ * _be: for forced big endian
+ * _le: for forced little endian
+ *
  * mmusuffix is one of the generic suffixes "data" or "code", or "mmuidx".
  * The "mmuidx" suffix carries an extra mmu_idx argument that specifies
  * the index to use; the "data" and "code" suffixes take the index from
@@ -95,32 +106,57 @@ typedef target_ulong abi_ptr;
 #endif
 
 uint32_t cpu_ldub_data(CPUArchState *env, abi_ptr ptr);
-uint32_t cpu_lduw_data(CPUArchState *env, abi_ptr ptr);
-uint32_t cpu_ldl_data(CPUArchState *env, abi_ptr ptr);
-uint64_t cpu_ldq_data(CPUArchState *env, abi_ptr ptr);
 int cpu_ldsb_data(CPUArchState *env, abi_ptr ptr);
-int cpu_ldsw_data(CPUArchState *env, abi_ptr ptr);
 
-uint32_t cpu_ldub_data_ra(CPUArchState *env, abi_ptr ptr, uintptr_t retaddr);
-uint32_t cpu_lduw_data_ra(CPUArchState *env, abi_ptr ptr, uintptr_t retaddr);
-uint32_t cpu_ldl_data_ra(CPUArchState *env, abi_ptr ptr, uintptr_t retaddr);
-uint64_t

[PATCH v3 14/18] target/arm: Use SVEContLdSt for contiguous stores

2020-04-21 Thread Richard Henderson

Follow the model set up for contiguous loads.  This handles
watchpoints correctly for contiguous stores, recognizing the
exception before any changes to memory.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 285 ++--
 1 file changed, 159 insertions(+), 126 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 3473b53928..9389c7c76d 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3995,6 +3995,10 @@ static void sve_##NAME##_host(void *vd, intptr_t 
reg_off, void *host)  \
 *(TYPEE *)(vd + H(reg_off)) = val; \
 }
 
+#define DO_ST_HOST(NAME, H, TYPEE, TYPEM, HOST) \
+static void sve_##NAME##_host(void *vd, intptr_t reg_off, void *host)  \
+{ HOST(host, (TYPEM)*(TYPEE *)(vd + H(reg_off))); }
+
 #define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB) \
 static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
  target_ulong addr, uintptr_t ra)   \
@@ -4022,6 +4026,7 @@ DO_LD_PRIM_1(ld1bdu, , uint64_t, uint8_t)
 DO_LD_PRIM_1(ld1bds, , uint64_t,  int8_t)
 
 #define DO_ST_PRIM_1(NAME, H, TE, TM)   \
+DO_ST_HOST(st1##NAME, H, TE, TM, stb_p) \
 DO_ST_TLB(st1##NAME, H, TE, TM, cpu_stb_data_ra)
 
 DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
@@ -4036,6 +4041,8 @@ DO_ST_PRIM_1(bd, , uint64_t, uint8_t)
 DO_LD_TLB(ld1##NAME##_le, H, TE, TM, cpu_##LD##_le_data_ra)
 
 #define DO_ST_PRIM_2(NAME, H, TE, TM, ST) \
+DO_ST_HOST(st1##NAME##_be, H, TE, TM, ST##_be_p)\
+DO_ST_HOST(st1##NAME##_le, H, TE, TM, ST##_le_p)\
 DO_ST_TLB(st1##NAME##_be, H, TE, TM, cpu_##ST##_be_data_ra) \
 DO_ST_TLB(st1##NAME##_le, H, TE, TM, cpu_##ST##_le_data_ra)
 
@@ -4904,151 +4911,177 @@ DO_LDFF1_LDNF1_2(dd,  MO_64, MO_64)
 #undef DO_LDFF1_LDNF1_2
 
 /*
- * Common helpers for all contiguous 1,2,3,4-register predicated stores.
+ * Common helper for all contiguous 1,2,3,4-register predicated stores.
  */
-static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr,
-  uint32_t desc, const uintptr_t ra,
-  const int esize, const int msize,
-  sve_ldst1_tlb_fn *tlb_fn)
+
+static inline QEMU_ALWAYS_INLINE
+void sve_stN_r(CPUARMState *env, uint64_t *vg, target_ulong addr, uint32_t 
desc,
+   const uintptr_t retaddr, const int esz,
+   const int msz, const int N,
+   sve_ldst1_host_fn *host_fn,
+   sve_ldst1_tlb_fn *tlb_fn)
 {
 const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
-intptr_t i, oprsz = simd_oprsz(desc);
-void *vd = >vfp.zregs[rd];
+const intptr_t reg_max = simd_oprsz(desc);
+intptr_t reg_off, reg_last, mem_off;
+SVEContLdSt info;
+void *host;
+int i, flags;
 
-for (i = 0; i < oprsz; ) {
-uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
-do {
-if (pg & 1) {
-tlb_fn(env, vd, i, addr, ra);
+/* Find the active elements.  */
+if (!sve_cont_ldst_elements(, addr, vg, reg_max, esz, N << msz)) {
+/* The entire predicate was false; no store occurs.  */
+return;
+}
+
+/* Probe the page(s).  Exit with exception for any invalid page. */
+sve_cont_ldst_pages(, FAULT_ALL, env, addr, MMU_DATA_STORE, retaddr);
+
+/* Handle watchpoints for all active elements. */
+sve_cont_ldst_watchpoints(, env, vg, addr, 1 << esz, N << msz,
+  BP_MEM_WRITE, retaddr);
+
+/* TODO: MTE check. */
+
+flags = info.page[0].flags | info.page[1].flags;
+if (unlikely(flags != 0)) {
+#ifdef CONFIG_USER_ONLY
+g_assert_not_reached();
+#else
+/*
+ * At least one page includes MMIO.
+ * Any bus operation can fail with cpu_transaction_failed,
+ * which for ARM will raise SyncExternal.  We cannot avoid
+ * this fault and will leave with the store incomplete.
+ */
+mem_off = info.mem_off_first[0];
+reg_off = info.reg_off_first[0];
+reg_last = info.reg_off_last[1];
+if (reg_last < 0) {
+reg_last = info.reg_off_split;
+if (reg_last < 0) {
+reg_last = info.reg_off_last[0];
 }
-i += esize, pg >>= esize;
-addr += msize;
-} while (i & 15);
+}
+
+do {
+uint64_t pg = vg[reg_off >> 6];
+do {
+if ((pg >> (reg_off & 63)) & 1) {
+for (i = 0; i < N; ++i) {
+tlb_fn(env, >vfp.zregs[(rd + i) & 31], reg_off,
+   addr + mem_off + (i << msz), retaddr);
+}
+}
+reg_off += 1 << esz;
+mem_off += N << msz;
+} while (reg_off & 63);
+} while (reg_off <=

[PATCH v3 08/18] target/arm: Add sve infrastructure for page lookup

2020-04-21 Thread Richard Henderson

For contiguous predicated memory operations, we want to
minimize the number of tlb lookups performed.  We have
open-coded this for sve_ld1_r, but for correctness with
MTE we will need this for all of the memory operations.

Create a structure that holds the bounds of active elements,
and metadata for two pages.  Add routines to find those
active elements, lookup the pages, and run watchpoints
for those pages.

Temporarily mark the functions unused to avoid Werror.

Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 263 +++-
 1 file changed, 261 insertions(+), 2 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index aad2c8c237..2f053a9152 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1630,7 +1630,7 @@ void HELPER(sve_cpy_z_d)(void *vd, void *vg, uint64_t 
val, uint32_t desc)
 }
 }
 
-/* Big-endian hosts need to frob the byte indicies.  If the copy
+/* Big-endian hosts need to frob the byte indices.  If the copy
  * happens to be 8-byte aligned, then no frobbing necessary.
  */
 static void swap_memmove(void *vd, void *vs, size_t n)
@@ -3974,7 +3974,7 @@ void HELPER(sve_fcmla_zpzzz_d)(CPUARMState *env, void 
*vg, uint32_t desc)
 /*
  * Load elements into @vd, controlled by @vg, from @host + @mem_ofs.
  * Memory is valid through @host + @mem_max.  The register element
- * indicies are inferred from @mem_ofs, as modified by the types for
+ * indices are inferred from @mem_ofs, as modified by the types for
  * which the helper is built.  Return the @mem_ofs of the first element
  * not loaded (which is @mem_max if they are all loaded).
  *
@@ -4133,6 +4133,265 @@ static intptr_t max_for_page(target_ulong base, 
intptr_t mem_off,
 return MIN(split, mem_max - mem_off) + mem_off;
 }
 
+/*
+ * Resolve the guest virtual address to info->host and info->flags.
+ * If @nofault, return false if the page is invalid, otherwise
+ * exit via page fault exception.
+ */
+
+typedef struct {
+void *host;
+int flags;
+MemTxAttrs attrs;
+} SVEHostPage;
+
+static bool sve_probe_page(SVEHostPage *info, bool nofault,
+   CPUARMState *env, target_ulong addr,
+   int mem_off, MMUAccessType access_type,
+   int mmu_idx, uintptr_t retaddr)
+{
+int flags;
+
+addr += mem_off;
+flags = probe_access_flags(env, addr, access_type, mmu_idx, nofault,
+   >host, retaddr);
+info->flags = flags;
+
+if (flags & TLB_INVALID_MASK) {
+g_assert(nofault);
+return false;
+}
+
+/* Ensure that info->host[] is relative to addr, not addr + mem_off. */
+info->host -= mem_off;
+
+#ifdef CONFIG_USER_ONLY
+memset(>attrs, 0, sizeof(info->attrs));
+#else
+/*
+ * Find the iotlbentry for addr and return the transaction attributes.
+ * This *must* be present in the TLB because we just found the mapping.
+ */
+{
+uintptr_t index = tlb_index(env, mmu_idx, addr);
+
+# ifdef CONFIG_DEBUG_TCG
+CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
+target_ulong comparator = (access_type == MMU_DATA_LOAD
+   ? entry->addr_read
+   : tlb_addr_write(entry));
+g_assert(tlb_hit(comparator, addr));
+# endif
+
+CPUIOTLBEntry *iotlbentry = _tlb(env)->d[mmu_idx].iotlb[index];
+info->attrs = iotlbentry->attrs;
+}
+#endif
+
+return true;
+}
+
+
+/*
+ * Analyse contiguous data, protected by a governing predicate.
+ */
+
+typedef enum {
+FAULT_NO,
+FAULT_FIRST,
+FAULT_ALL,
+} SVEContFault;
+
+typedef struct {
+/*
+ * First and last element wholly contained within the two pages.
+ * mem_off_first[0] and reg_off_first[0] are always set >= 0.
+ * reg_off_last[0] may be < 0 if the first element crosses pages.
+ * All of mem_off_first[1], reg_off_first[1] and reg_off_last[1]
+ * are set >= 0 only if there are complete elements on a second page.
+ *
+ * The reg_off_* offsets are relative to the internal vector register.
+ * The mem_off_first offset is relative to the memory address; the
+ * two offsets are different when a load operation extends, a store
+ * operation truncates, or for multi-register operations.
+ */
+int16_t mem_off_first[2];
+int16_t reg_off_first[2];
+int16_t reg_off_last[2];
+
+/*
+ * One element that is misaligned and spans both pages,
+ * or -1 if there is no such active element.
+ */
+int16_t mem_off_split;
+int16_t reg_off_split;
+
+/*
+ * The byte offset at which the entire operation crosses a page boundary.
+ * Set >= 0 if and only if the entire operation spans two pages.
+ */
+int16_t page_split;
+
+/* TLB data for the two pages. */
+SVEHostPage page[2];
+} SVEContLdSt;
+
+/*
+ * Find first active element on each page, and a loose bound

[PATCH v3 16/18] target/arm: Reuse sve_probe_page for scatter stores

2020-04-21 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 182 
 1 file changed, 111 insertions(+), 71 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 6229ea65c0..f4cdeecdcb 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5409,94 +5409,134 @@ DO_LDFF1_ZPZ_D(dd_be, zd, MO_64)
 
 /* Stores with a vector index.  */
 
-static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
-   target_ulong base, uint32_t desc, uintptr_t ra,
-   zreg_off_fn *off_fn, sve_ldst1_tlb_fn *tlb_fn)
+static inline QEMU_ALWAYS_INLINE
+void sve_st1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
+   target_ulong base, uint32_t desc, uintptr_t retaddr,
+   int esize, int msize, zreg_off_fn *off_fn,
+   sve_ldst1_host_fn *host_fn,
+   sve_ldst1_tlb_fn *tlb_fn)
 {
 const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
-intptr_t i, oprsz = simd_oprsz(desc);
+const int mmu_idx = cpu_mmu_index(env, false);
+const intptr_t reg_max = simd_oprsz(desc);
+void *host[ARM_MAX_VQ * 4];
+intptr_t reg_off, i;
+SVEHostPage info, info2;
 
-for (i = 0; i < oprsz; ) {
-uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+/*
+ * Probe all of the elements for host addresses and flags.
+ */
+i = reg_off = 0;
+do {
+uint64_t pg = vg[reg_off >> 6];
 do {
-if (likely(pg & 1)) {
-target_ulong off = off_fn(vm, i);
-tlb_fn(env, vd, i, base + (off << scale), ra);
+target_ulong addr = base + (off_fn(vm, reg_off) << scale);
+target_ulong in_page = -(addr | TARGET_PAGE_MASK);
+
+host[i] = NULL;
+if (likely((pg >> (reg_off & 63)) & 1)) {
+if (likely(in_page >= msize)) {
+sve_probe_page(, false, env, addr, 0, MMU_DATA_STORE,
+   mmu_idx, retaddr);
+host[i] = info.host;
+} else {
+/*
+ * Element crosses the page boundary.
+ * Probe both pages, but do not record the host address,
+ * so that we use the slow path.
+ */
+sve_probe_page(, false, env, addr, 0,
+   MMU_DATA_STORE, mmu_idx, retaddr);
+sve_probe_page(, false, env, addr + in_page, 0,
+   MMU_DATA_STORE, mmu_idx, retaddr);
+info.flags |= info2.flags;
+}
+
+if (unlikely(info.flags & TLB_WATCHPOINT)) {
+cpu_check_watchpoint(env_cpu(env), addr, msize,
+ info.attrs, BP_MEM_WRITE, retaddr);
+}
+/* TODO: MTE check. */
 }
-i += 4, pg >>= 4;
-} while (i & 15);
-}
-}
+i += 1;
+reg_off += esize;
+} while (reg_off & 63);
+} while (reg_off < reg_max);
 
-static void sve_st1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
-   target_ulong base, uint32_t desc, uintptr_t ra,
-   zreg_off_fn *off_fn, sve_ldst1_tlb_fn *tlb_fn)
-{
-const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
-intptr_t i, oprsz = simd_oprsz(desc) / 8;
-
-for (i = 0; i < oprsz; i++) {
-uint8_t pg = *(uint8_t *)(vg + H1(i));
-if (likely(pg & 1)) {
-target_ulong off = off_fn(vm, i * 8);
-tlb_fn(env, vd, i * 8, base + (off << scale), ra);
+/*
+ * Now that we have recognized all exceptions except SyncExternal
+ * (from TLB_MMIO), which we cannot avoid, perform all of the stores.
+ *
+ * Note for the common case of an element in RAM, not crossing a page
+ * boundary, we have stored the host address in host[].  This doubles
+ * as a first-level check against the predicate, since only enabled
+ * elements have non-null host addresses.
+ */
+i = reg_off = 0;
+do {
+void *h = host[i];
+if (likely(h != NULL)) {
+host_fn(vd, reg_off, h);
+} else if ((vg[reg_off >> 6] >> (reg_off & 63)) & 1) {
+target_ulong addr = base + (off_fn(vm, reg_off) << scale);
+tlb_fn(env, vd, reg_off, addr, retaddr);
 }
-}
+i += 1;
+reg_off += esize;
+} while (reg_off < reg_max);
 }
 
-#define DO_ST1_ZPZ_S(MEM, OFS) \
-void QEMU_FLATTEN HELPER(sve_st##MEM##_##OFS) \
-(CPUARMState *env, void *vd, void *vg, void *vm, \
- target_ulong base, uint32_t desc)   \
-{\
-sve_st1_zs(env, vd, vg, vm, base, desc, GETPC(), \
-

[PATCH v3 06/18] target/arm: Use cpu_*_data_ra for sve_ldst_tlb_fn

2020-04-21 Thread Richard Henderson

Use the "normal" memory access functions, rather than the
softmmu internal helper functions directly.

Since fb901c905dc3, cpu_mem_index is now a simple extract
from env->hflags and not a large computation.  Which means
that it's now more work to pass around this value than it
is to recompute it.

This only adjusts the primitives, and does not clean up
all of the uses within sve_helper.c.

Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 221 
 1 file changed, 86 insertions(+), 135 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index fdfa652094..655bc9476f 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3991,9 +3991,8 @@ typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void 
*host,
  * Load one element into @vd + @reg_off from (@env, @vaddr, @ra).
  * The controlling predicate is known to be true.
  */
-typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
-target_ulong vaddr, TCGMemOpIdx oi, uintptr_t ra);
-typedef sve_ld1_tlb_fn sve_st1_tlb_fn;
+typedef void sve_ldst1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
+  target_ulong vaddr, uintptr_t retaddr);
 
 /*
  * Generate the above primitives.
@@ -4016,27 +4015,23 @@ static intptr_t sve_##NAME##_host(void *vd, void *vg, 
void *host,   \
 return mem_off; \
 }
 
-#ifdef CONFIG_SOFTMMU
-#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
+#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, TLB) \
 static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
- target_ulong addr, TCGMemOpIdx oi, uintptr_t ra)  
\
+ target_ulong addr, uintptr_t ra)   \
 {   \
-TYPEM val = TLB(env, addr, oi, ra); \
-*(TYPEE *)(vd + H(reg_off)) = val;  \
+*(TYPEE *)(vd + H(reg_off)) = (TYPEM)TLB(env, addr, ra);\
 }
-#else
-#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB)  \
+
+#define DO_ST_TLB(NAME, H, TYPEE, TYPEM, TLB) \
 static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
- target_ulong addr, TCGMemOpIdx oi, uintptr_t ra)  
\
+ target_ulong addr, uintptr_t ra)   \
 {   \
-TYPEM val = HOST(g2h(addr));\
-*(TYPEE *)(vd + H(reg_off)) = val;  \
+TLB(env, addr, (TYPEM)*(TYPEE *)(vd + H(reg_off)), ra); \
 }
-#endif
 
 #define DO_LD_PRIM_1(NAME, H, TE, TM)   \
 DO_LD_HOST(NAME, H, TE, TM, ldub_p) \
-DO_LD_TLB(NAME, H, TE, TM, ldub_p, 0, helper_ret_ldub_mmu)
+DO_LD_TLB(NAME, H, TE, TM, cpu_ldub_data_ra)
 
 DO_LD_PRIM_1(ld1bb,  H1,   uint8_t,  uint8_t)
 DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
@@ -4046,39 +4041,51 @@ DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
 DO_LD_PRIM_1(ld1bdu, , uint64_t, uint8_t)
 DO_LD_PRIM_1(ld1bds, , uint64_t,  int8_t)
 
-#define DO_LD_PRIM_2(NAME, end, MOEND, H, TE, TM, PH, PT)  \
-DO_LD_HOST(NAME##_##end, H, TE, TM, PH##_##end##_p)\
-DO_LD_TLB(NAME##_##end, H, TE, TM, PH##_##end##_p, \
-  MOEND, helper_##end##_##PT##_mmu)
+#define DO_ST_PRIM_1(NAME, H, TE, TM)   \
+DO_ST_TLB(st1##NAME, H, TE, TM, cpu_stb_data_ra)
 
-DO_LD_PRIM_2(ld1hh,  le, MO_LE, H1_2, uint16_t, uint16_t, lduw, lduw)
-DO_LD_PRIM_2(ld1hsu, le, MO_LE, H1_4, uint32_t, uint16_t, lduw, lduw)
-DO_LD_PRIM_2(ld1hss, le, MO_LE, H1_4, uint32_t,  int16_t, lduw, lduw)
-DO_LD_PRIM_2(ld1hdu, le, MO_LE, , uint64_t, uint16_t, lduw, lduw)
-DO_LD_PRIM_2(ld1hds, le, MO_LE, , uint64_t,  int16_t, lduw, lduw)
+DO_ST_PRIM_1(bb,   H1,  uint8_t, uint8_t)
+DO_ST_PRIM_1(bh, H1_2, uint16_t, uint8_t)
+DO_ST_PRIM_1(bs, H1_4, uint32_t, uint8_t)
+DO_ST_PRIM_1(bd, , uint64_t, uint8_t)
 
-DO_LD_PRIM_2(ld1ss,  le, MO_LE, H1_4, uint32_t, uint32_t, ldl, ldul)
-DO_LD_PRIM_2(ld1sdu, le, MO_LE, , uint64_t, uint32_t, ldl, ldul)
-DO_LD_PRIM_2(ld1sds, le, MO_LE, , uint64_t,  int32_t, ldl, ldul)
+#define DO_LD_PRIM_2(NAME, H, TE, TM, LD) \
+DO_LD_HOST(ld1##NAME##_be, H, TE, TM, LD##_be_p)\
+DO_LD_HOST(ld1##NAME##_le, H, TE, TM, LD##_le_p)\
+DO_LD_TLB(ld1##NAME##_be, H, TE, TM, cpu_##LD##_be_data_ra) \
+DO_LD_TLB(ld1##NAME##_le, H, TE, TM, cpu_##LD##_le_data_ra)
 
-DO_LD_PRIM_2(ld1dd,  le, MO_LE, , uint64_t, uint64_t, ldq, ldq)
+#define DO_ST_PRIM_2(NAME, H, TE, TM, ST) \
+DO_ST_TLB(st1##NAME##_be, H, TE, TM, cpu_##ST##_be_data_ra) \
+DO_ST_TLB(st1##NAME##_le, H, TE,

[PATCH v3 13/18] target/arm: Update contiguous first-fault and no-fault loads

2020-04-21 Thread Richard Henderson

With sve_cont_ldst_pages, the differences between first-fault and no-fault
are minimal, so unify the routines.  With cpu_probe_watchpoint, we are able
to make progress through pages with TLB_WATCHPOINT set when the watchpoint
does not actually fire.

Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 342 +++-
 1 file changed, 158 insertions(+), 184 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 9365e32646..3473b53928 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4101,18 +4101,6 @@ static intptr_t find_next_active(uint64_t *vg, intptr_t 
reg_off,
 return reg_off;
 }
 
-/*
- * Return the maximum offset <= @mem_max which is still within the page
- * referenced by @base + @mem_off.
- */
-static intptr_t max_for_page(target_ulong base, intptr_t mem_off,
- intptr_t mem_max)
-{
-target_ulong addr = base + mem_off;
-intptr_t split = -(intptr_t)(addr | TARGET_PAGE_MASK);
-return MIN(split, mem_max - mem_off) + mem_off;
-}
-
 /*
  * Resolve the guest virtual address to info->host and info->flags.
  * If @nofault, return false if the page is invalid, otherwise
@@ -4435,19 +4423,6 @@ static void sve_cont_ldst_watchpoints(SVEContLdSt *info, 
CPUARMState *env,
 #endif
 }
 
-/*
- * The result of tlb_vaddr_to_host for user-only is just g2h(x),
- * which is always non-null.  Elide the useless test.
- */
-static inline bool test_host_page(void *host)
-{
-#ifdef CONFIG_USER_ONLY
-return true;
-#else
-return likely(host != NULL);
-#endif
-}
-
 /*
  * Common helper for all contiguous 1,2,3,4-register predicated stores.
  */
@@ -4705,167 +4680,163 @@ static void record_fault(CPUARMState *env, uintptr_t 
i, uintptr_t oprsz)
 }
 
 /*
- * Common helper for all contiguous first-fault loads.
+ * Common helper for all contiguous no-fault and first-fault loads.
  */
-static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
-uint32_t desc, const uintptr_t retaddr,
-const int esz, const int msz,
-sve_ldst1_host_fn *host_fn,
-sve_ldst1_tlb_fn *tlb_fn)
+static inline QEMU_ALWAYS_INLINE
+void sve_ldnfff1_r(CPUARMState *env, void *vg, const target_ulong addr,
+   uint32_t desc, const uintptr_t retaddr,
+   const int esz, const int msz, const SVEContFault fault,
+   sve_ldst1_host_fn *host_fn,
+   sve_ldst1_tlb_fn *tlb_fn)
 {
-const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
-const int mmu_idx = get_mmuidx(oi);
 const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
 void *vd = >vfp.zregs[rd];
-const int diffsz = esz - msz;
 const intptr_t reg_max = simd_oprsz(desc);
-const intptr_t mem_max = reg_max >> diffsz;
-intptr_t split, reg_off, mem_off, i;
+intptr_t reg_off, mem_off, reg_last;
+SVEContLdSt info;
+int flags;
 void *host;
 
-/* Skip to the first active element.  */
-reg_off = find_next_active(vg, 0, reg_max, esz);
-if (unlikely(reg_off == reg_max)) {
+/* Find the active elements.  */
+if (!sve_cont_ldst_elements(, addr, vg, reg_max, esz, 1 << msz)) {
 /* The entire predicate was false; no load occurs.  */
 memset(vd, 0, reg_max);
 return;
 }
-mem_off = reg_off >> diffsz;
+reg_off = info.reg_off_first[0];
 
-/*
- * If the (remaining) load is entirely within a single page, then:
- * For softmmu, and the tlb hits, then no faults will occur;
- * For user-only, either the first load will fault or none will.
- * We can thus perform the load directly to the destination and
- * Vd will be unmodified on any exception path.
- */
-split = max_for_page(addr, mem_off, mem_max);
-if (likely(split == mem_max)) {
-host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
-if (test_host_page(host)) {
-i = reg_off;
-host -= mem_off;
-do {
-host_fn(vd, i, host + (i >> diffsz));
-i = find_next_active(vg, i + (1 << esz), reg_max, esz);
-} while (i < reg_max);
-/* After any fault, zero any leading inactive elements.  */
+/* Probe the page(s). */
+if (!sve_cont_ldst_pages(, fault, env, addr, MMU_DATA_LOAD, retaddr)) 
{
+/* Fault on first element. */
+tcg_debug_assert(fault == FAULT_NO);
+memset(vd, 0, reg_max);
+goto do_fault;
+}
+
+mem_off = info.mem_off_first[0];
+flags = info.page[0].flags;
+
+if (fault == FAULT_FIRST) {
+/*
+ * Special handling of the first active element,
+ * if it crosses a page boundary or is MMIO.
+ */
+bool is_split = mem_off == info.mem_off_split;
+/* TODO: MTE check. */
+if

[PATCH v3 04/18] accel/tcg: Add probe_access_flags

2020-04-21 Thread Richard Henderson

This new interface will allow targets to probe for a page
and then handle watchpoints themselves.  This will be most
useful for vector predicated memory operations, where one
page lookup can be used for many operations, and one test
can avoid many watchpoint checks.

Signed-off-by: Richard Henderson 
---
v2: Fix return of host pointer in softmmu probe_access_flags.
---
 include/exec/cpu-all.h  |  13 ++-
 include/exec/exec-all.h |  22 +
 accel/tcg/cputlb.c  | 177 
 accel/tcg/user-exec.c   |  36 +---
 4 files changed, 149 insertions(+), 99 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 49384bb66a..43ddcf024c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -328,7 +328,18 @@ CPUArchState *cpu_copy(CPUArchState *env);
  | CPU_INTERRUPT_TGT_EXT_3   \
  | CPU_INTERRUPT_TGT_EXT_4)
 
-#if !defined(CONFIG_USER_ONLY)
+#ifdef CONFIG_USER_ONLY
+
+/*
+ * Allow some level of source compatibility with softmmu.  We do not
+ * support any of the more exotic features, so only invalid pages may
+ * be signaled by probe_access_flags().
+ */
+#define TLB_INVALID_MASK(1 << (TARGET_PAGE_BITS_MIN - 1))
+#define TLB_MMIO0
+#define TLB_WATCHPOINT  0
+
+#else
 
 /*
  * Flags stored in the low bits of the TLB virtual address.
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index d656a1f05c..8792bea07a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -362,6 +362,28 @@ static inline void *probe_read(CPUArchState *env, 
target_ulong addr, int size,
 return probe_access(env, addr, size, MMU_DATA_LOAD, mmu_idx, retaddr);
 }
 
+/**
+ * probe_access_flags:
+ * @env: CPUArchState
+ * @addr: guest virtual address to look up
+ * @access_type: read, write or execute permission
+ * @mmu_idx: MMU index to use for lookup
+ * @nonfault: suppress the fault
+ * @phost: return value for host address
+ * @retaddr: return address for unwinding
+ *
+ * Similar to probe_access, loosely returning the TLB_FLAGS_MASK for
+ * the page, and storing the host address for RAM in @phost.
+ *
+ * If @nonfault is set, do not raise an exception but return TLB_INVALID_MASK.
+ * Do not handle watchpoints, but include TLB_WATCHPOINT in the returned flags.
+ * Do handle clean pages, so exclude TLB_NOTDIRY from the returned flags.
+ * For simplicity, all "mmio-like" flags are folded to TLB_MMIO.
+ */
+int probe_access_flags(CPUArchState *env, target_ulong addr,
+   MMUAccessType access_type, int mmu_idx,
+   bool nonfault, void **phost, uintptr_t retaddr);
+
 #define CODE_GEN_ALIGN   16 /* must be >= of the size of a icache line 
*/
 
 /* Estimated block size for TB allocation.  */
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index e3b5750c3b..bbe265ce28 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1231,86 +1231,16 @@ static void notdirty_write(CPUState *cpu, vaddr 
mem_vaddr, unsigned size,
 }
 }
 
-/*
- * Probe for whether the specified guest access is permitted. If it is not
- * permitted then an exception will be taken in the same way as if this
- * were a real access (and we will not return).
- * If the size is 0 or the page requires I/O access, returns NULL; otherwise,
- * returns the address of the host page similar to tlb_vaddr_to_host().
- */
-void *probe_access(CPUArchState *env, target_ulong addr, int size,
-   MMUAccessType access_type, int mmu_idx, uintptr_t retaddr)
+static int probe_access_internal(CPUArchState *env, target_ulong addr,
+ int fault_size, MMUAccessType access_type,
+ int mmu_idx, bool nonfault,
+ void **phost, uintptr_t retaddr)
 {
 uintptr_t index = tlb_index(env, mmu_idx, addr);
 CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-target_ulong tlb_addr;
-size_t elt_ofs;
-int wp_access;
-
-g_assert(-(addr | TARGET_PAGE_MASK) >= size);
-
-switch (access_type) {
-case MMU_DATA_LOAD:
-elt_ofs = offsetof(CPUTLBEntry, addr_read);
-wp_access = BP_MEM_READ;
-break;
-case MMU_DATA_STORE:
-elt_ofs = offsetof(CPUTLBEntry, addr_write);
-wp_access = BP_MEM_WRITE;
-break;
-case MMU_INST_FETCH:
-elt_ofs = offsetof(CPUTLBEntry, addr_code);
-wp_access = BP_MEM_READ;
-break;
-default:
-g_assert_not_reached();
-}
-tlb_addr = tlb_read_ofs(entry, elt_ofs);
-
-if (unlikely(!tlb_hit(tlb_addr, addr))) {
-if (!victim_tlb_hit(env, mmu_idx, index, elt_ofs,
-addr & TARGET_PAGE_MASK)) {
-tlb_fill(env_cpu(env), addr, size, access_type, mmu_idx, retaddr);
-/* TLB resize via tlb_fill may have moved the entry. */
-index = tlb_index(env, mmu_idx, addr);
-entry = tlb_entry(env, mmu_idx, addr);
-

[PATCH v3 00/18] target/arm: sve load/store improvements

2020-04-21 Thread Richard Henderson

Because there was a separate v2 of one of the patches,
avoid confusion and call the whole thing v3.

The goal here is to support MTE, but there's some cleanup to do.

Technically, we have sufficient interfaces in cputlb.c now, but it
requires multiple tlb lookups on different interfaces to do so.

Adding probe_access_flags() allows probing the tlb and getting out
some of the flags buried in the tlb comparator, such as TLB_MMIO
and TLB_WATCHPOINT.  In addition, we get no-fault semantics,
which we don't have via probe_acccess().

Looking forward to MTE, we can examine the Tagged bit on a per-page
basis and avoid dozens of mte_check calls that must be Unchecked.
That comes later, in a new version of the MTE patch set, but I do
add comments for where the checks should be added.

Version 3 drops cpu_probe_watchpoint, because while adding new
documentation I found we already had cpu_watchpoint_address_matches
which could do the job.


r~


Richard Henderson (18):
  exec: Add block comments for watchpoint routines
  exec: Fix cpu_watchpoint_address_matches address length
  accel/tcg: Add block comment for probe_access
  accel/tcg: Add probe_access_flags
  accel/tcg: Add endian-specific cpu_{ld,st}* operations
  target/arm: Use cpu_*_data_ra for sve_ldst_tlb_fn
  target/arm: Drop manual handling of set/clear_helper_retaddr
  target/arm: Add sve infrastructure for page lookup
  target/arm: Adjust interface of sve_ld1_host_fn
  target/arm: Use SVEContLdSt in sve_ld1_r
  target/arm: Handle watchpoints in sve_ld1_r
  target/arm: Use SVEContLdSt for multi-register contiguous loads
  target/arm: Update contiguous first-fault and no-fault loads
  target/arm: Use SVEContLdSt for contiguous stores
  target/arm: Reuse sve_probe_page for gather first-fault loads
  target/arm: Reuse sve_probe_page for scatter stores
  target/arm: Reuse sve_probe_page for gather loads
  target/arm: Remove sve_memopidx

 docs/devel/loads-stores.rst |   39 +-
 include/exec/cpu-all.h  |   13 +-
 include/exec/cpu_ldst.h |  277 -
 include/exec/exec-all.h |   39 +
 include/hw/core/cpu.h   |   23 +
 target/arm/internals.h  |5 -
 accel/tcg/cputlb.c  |  413 ---
 accel/tcg/user-exec.c   |  247 +++-
 exec.c  |2 +-
 target/arm/sve_helper.c | 2237 +++
 target/arm/translate-sve.c  |   17 +-
 11 files changed, 1985 insertions(+), 1327 deletions(-)

-- 
2.20.1

[PATCH v3 03/18] accel/tcg: Add block comment for probe_access

2020-04-21 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 350c4b451b..d656a1f05c 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -330,6 +330,23 @@ static inline void 
tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
 {
 }
 #endif
+/**
+ * probe_access:
+ * @env: CPUArchState
+ * @addr: guest virtual address to look up
+ * @size: size of the access
+ * @access_type: read, write or execute permission
+ * @mmu_idx: MMU index to use for lookup
+ * @retaddr: return address for unwinding
+ *
+ * Look up the guest virtual address @addr.  Raise an exception if the
+ * page does not satisfy @access_type.  Raise an exception if the
+ * access (@addr, @size) hits a watchpoint.  For writes, mark a clean
+ * page as dirty.
+ *
+ * Finally, return the host address for a page that is backed by RAM,
+ * or NULL if the page requires I/O.
+ */
 void *probe_access(CPUArchState *env, target_ulong addr, int size,
MMUAccessType access_type, int mmu_idx, uintptr_t retaddr);
 
-- 
2.20.1

[PATCH v3 01/18] exec: Add block comments for watchpoint routines

2020-04-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 include/hw/core/cpu.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 5bf94d28cf..07f7698155 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -1100,8 +1100,31 @@ int cpu_watchpoint_remove(CPUState *cpu, vaddr addr,
   vaddr len, int flags);
 void cpu_watchpoint_remove_by_ref(CPUState *cpu, CPUWatchpoint *watchpoint);
 void cpu_watchpoint_remove_all(CPUState *cpu, int mask);
+
+/**
+ * cpu_check_watchpoint:
+ * @cpu: cpu context
+ * @addr: guest virtual address
+ * @len: access length
+ * @attrs: memory access attributes
+ * @flags: watchpoint access type
+ * @ra: unwind return address
+ *
+ * Check for a watchpoint hit in [addr, addr+len) of the type
+ * specified by @flags.  Exit via exception with a hit.
+ */
 void cpu_check_watchpoint(CPUState *cpu, vaddr addr, vaddr len,
   MemTxAttrs attrs, int flags, uintptr_t ra);
+
+/**
+ * cpu_watchpoint_address_matches:
+ * @cpu: cpu context
+ * @addr: guest virtual address
+ * @len: access length
+ *
+ * Return the watchpoint flags that apply to [addr, addr+len).
+ * If no watchpoint is registered for the range, the result is 0.
+ */
 int cpu_watchpoint_address_matches(CPUState *cpu, vaddr addr, vaddr len);
 #endif
 
-- 
2.20.1

[PATCH v3 02/18] exec: Fix cpu_watchpoint_address_matches address length

2020-04-21 Thread Richard Henderson

The only caller of cpu_watchpoint_address_matches passes
TARGET_PAGE_SIZE, so the bug is not currently visible.

Signed-off-by: Richard Henderson 
---
 exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index 2874bb5088..5162f0d12f 100644
--- a/exec.c
+++ b/exec.c
@@ -1127,7 +1127,7 @@ int cpu_watchpoint_address_matches(CPUState *cpu, vaddr 
addr, vaddr len)
 int ret = 0;
 
 QTAILQ_FOREACH(wp, >watchpoints, entry) {
-if (watchpoint_address_matches(wp, addr, TARGET_PAGE_SIZE)) {
+if (watchpoint_address_matches(wp, addr, len)) {
 ret |= wp->flags;
 }
 }
-- 
2.20.1

[PATCH v3 07/18] target/arm: Drop manual handling of set/clear_helper_retaddr

2020-04-21 Thread Richard Henderson

Since we converted back to cpu_*_data_ra, we do not need to
do this ourselves.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve_helper.c | 38 --
 1 file changed, 38 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 655bc9476f..aad2c8c237 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4133,12 +4133,6 @@ static intptr_t max_for_page(target_ulong base, intptr_t 
mem_off,
 return MIN(split, mem_max - mem_off) + mem_off;
 }
 
-#ifndef CONFIG_USER_ONLY
-/* These are normally defined only for CONFIG_USER_ONLY in  */
-static inline void set_helper_retaddr(uintptr_t ra) { }
-static inline void clear_helper_retaddr(void) { }
-#endif
-
 /*
  * The result of tlb_vaddr_to_host for user-only is just g2h(x),
  * which is always non-null.  Elide the useless test.
@@ -4180,7 +4174,6 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 return;
 }
 mem_off = reg_off >> diffsz;
-set_helper_retaddr(retaddr);
 
 /*
  * If the (remaining) load is entirely within a single page, then:
@@ -4195,7 +4188,6 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 if (test_host_page(host)) {
 mem_off = host_fn(vd, vg, host - mem_off, mem_off, mem_max);
 tcg_debug_assert(mem_off == mem_max);
-clear_helper_retaddr();
 /* After having taken any fault, zero leading inactive elements. */
 swap_memzero(vd, reg_off);
 return;
@@ -4246,7 +4238,6 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 }
 #endif
 
-clear_helper_retaddr();
 memcpy(vd, , reg_max);
 }
 
@@ -4306,7 +4297,6 @@ static void sve_ld2_r(CPUARMState *env, void *vg, 
target_ulong addr,
 intptr_t i, oprsz = simd_oprsz(desc);
 ARMVectorReg scratch[2] = { };
 
-set_helper_retaddr(ra);
 for (i = 0; i < oprsz; ) {
 uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
 do {
@@ -4318,7 +4308,6 @@ static void sve_ld2_r(CPUARMState *env, void *vg, 
target_ulong addr,
 addr += 2 * size;
 } while (i & 15);
 }
-clear_helper_retaddr();
 
 /* Wait until all exceptions have been raised to write back.  */
 memcpy(>vfp.zregs[rd], [0], oprsz);
@@ -4333,7 +4322,6 @@ static void sve_ld3_r(CPUARMState *env, void *vg, 
target_ulong addr,
 intptr_t i, oprsz = simd_oprsz(desc);
 ARMVectorReg scratch[3] = { };
 
-set_helper_retaddr(ra);
 for (i = 0; i < oprsz; ) {
 uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
 do {
@@ -4346,7 +4334,6 @@ static void sve_ld3_r(CPUARMState *env, void *vg, 
target_ulong addr,
 addr += 3 * size;
 } while (i & 15);
 }
-clear_helper_retaddr();
 
 /* Wait until all exceptions have been raised to write back.  */
 memcpy(>vfp.zregs[rd], [0], oprsz);
@@ -4362,7 +4349,6 @@ static void sve_ld4_r(CPUARMState *env, void *vg, 
target_ulong addr,
 intptr_t i, oprsz = simd_oprsz(desc);
 ARMVectorReg scratch[4] = { };
 
-set_helper_retaddr(ra);
 for (i = 0; i < oprsz; ) {
 uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
 do {
@@ -4376,7 +4362,6 @@ static void sve_ld4_r(CPUARMState *env, void *vg, 
target_ulong addr,
 addr += 4 * size;
 } while (i & 15);
 }
-clear_helper_retaddr();
 
 /* Wait until all exceptions have been raised to write back.  */
 memcpy(>vfp.zregs[rd], [0], oprsz);
@@ -4483,7 +4468,6 @@ static void sve_ldff1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 return;
 }
 mem_off = reg_off >> diffsz;
-set_helper_retaddr(retaddr);
 
 /*
  * If the (remaining) load is entirely within a single page, then:
@@ -4498,7 +4482,6 @@ static void sve_ldff1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 if (test_host_page(host)) {
 mem_off = host_fn(vd, vg, host - mem_off, mem_off, mem_max);
 tcg_debug_assert(mem_off == mem_max);
-clear_helper_retaddr();
 /* After any fault, zero any leading inactive elements.  */
 swap_memzero(vd, reg_off);
 return;
@@ -4541,7 +4524,6 @@ static void sve_ldff1_r(CPUARMState *env, void *vg, const 
target_ulong addr,
 }
 #endif
 
-clear_helper_retaddr();
 record_fault(env, reg_off, reg_max);
 }
 
@@ -4687,7 +4669,6 @@ static void sve_st1_r(CPUARMState *env, void *vg, 
target_ulong addr,
 intptr_t i, oprsz = simd_oprsz(desc);
 void *vd = >vfp.zregs[rd];
 
-set_helper_retaddr(ra);
 for (i = 0; i < oprsz; ) {
 uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
 do {
@@ -4698,7 +4679,6 @@ static void sve_st1_r(CPUARMState *env, void *vg, 
target_ulong addr,
 addr += msize;
 } while (i & 15);
 }
-clear_helper_retaddr();
 }
 
 static void

Re: [PATCH 02/15] KVM: MIPS: Define KVM_ENTRYHI_ASID to cpu_asid_mask(_cpu_data)

2020-04-21 Thread chen huacai

Hi, Sasha,

On Wed, Apr 22, 2020 at 9:40 AM Sasha Levin  wrote:
>
> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.6.5, v5.5.18, v5.4.33, v4.19.116, 
> v4.14.176, v4.9.219, v4.4.219.
>
> v5.6.5: Build OK!
> v5.5.18: Build OK!
> v5.4.33: Build OK!
> v4.19.116: Build OK!
> v4.14.176: Build OK!
> v4.9.219: Build OK!
> v4.4.219: Failed to apply! Possible dependencies:
> 029499b47738 ("KVM: x86: MMU: Make mmu_set_spte() return emulate value")
> 19d194c62b25 ("MIPS: KVM: Simplify TLB_* macros")
> 403015b323a2 ("MIPS: KVM: Move non-TLB handling code out of tlb.c")
> 7ee0e5b29d27 ("KVM: x86: MMU: Remove unused parameter of __direct_map()")
> 9fbfb06a4065 ("MIPS: KVM: Arrayify struct kvm_mips_tlb::tlb_lo*")
> ba049e93aef7 ("kvm: rename pfn_t to kvm_pfn_t")
> bdb7ed8608f8 ("MIPS: KVM: Convert headers to kernel sized types")
> ca64c2beecd4 ("MIPS: KVM: Abstract guest ASID mask")
> caa1faa7aba6 ("MIPS: KVM: Trivial whitespace and style fixes")
> e6207bbea16c ("MIPS: KVM: Use MIPS_ENTRYLO_* defs from mipsregs.h")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
Please ignore this patch in linux-4.4 branch and even below.
>
> --
> Thanks
> Sasha
>
Thanks,
Huacai


-- 
Huacai Chen

Re: [PATCH 6/6] target/arm: Restrict TCG cpus to TCG accel

2020-04-21 Thread Richard Henderson

On 4/21/20 6:19 AM, Philippe Mathieu-Daudé wrote:
> A KVM-only build won't be able to run TCG cpus.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> This patch review is funnier using:
> 'git-diff --color-moved=dimmed-zebra'
> ---
>  target/arm/cpu.c | 634 -
>  target/arm/cpu_tcg.c | 663 +++
>  target/arm/Makefile.objs |   1 +
>  3 files changed, 664 insertions(+), 634 deletions(-)
>  create mode 100644 target/arm/cpu_tcg.c

Reviewed-by: Richard Henderson 

r~

Re: [PATCH 5/6] target/arm/cpu: Update coding style to make checkpatch.pl happy

2020-04-21 Thread Richard Henderson

On 4/21/20 6:19 AM, Philippe Mathieu-Daudé wrote:
> We will move this code in the next commit. Clean it up
> first to avoid checkpatch.pl errors.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/arm/cpu.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH 4/6] target/arm/cpu: Use ARRAY_SIZE() to iterate over ARMCPUInfo[]

2020-04-21 Thread Richard Henderson

On 4/21/20 6:19 AM, Philippe Mathieu-Daudé wrote:
> Suggested-by: Richard Henderson 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/arm/cpu.c   | 8 +++-
>  target/arm/cpu64.c | 8 +++-
>  2 files changed, 6 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH 1/6] target/arm: Restric the Address Translate write operation to TCG accel

2020-04-21 Thread Richard Henderson

On 4/21/20 6:19 AM, Philippe Mathieu-Daudé wrote:
> Under KVM these registers are written by the hardware.
> Restrict the writefn handlers to TCG to avoid when building
> without TCG:
> 
>   LINKaarch64-softmmu/qemu-system-aarch64
> target/arm/helper.o: In function `do_ats_write':
> target/arm/helper.c:3524: undefined reference to `raise_exception'
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Better explanation:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg689388.html
> ---
>  target/arm/helper.c | 17 +
>  1 file changed, 17 insertions(+)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH] linux-user/riscv: fix up struct target_ucontext definition

2020-04-21 Thread Richard Henderson

On 4/21/20 7:34 PM, LIU Zhiwei wrote:
> Ping.
> 
> When I port RISU, I find this bug. I can't get the correct registers from the
> struct ucontext_t parameter in the signal handler.

The RISC-V Linux ABI will need to be extended to handle RVV state.

There is room in your sigcontext structure:

> struct __riscv_q_ext_state {
> __u64 f[64] __attribute__((aligned(16)));
> __u32 fcsr;
> /*
>  * Reserved for expansion of sigcontext structure.  Currently zeroed
>  * upon signal, and must be zero upon sigreturn.
>  */
> __u32 reserved[3];
> };

in uc->uc_mcontext.sc_fpregs.q.

That reserved field is going to have to be used in some way.

My suggestion is to use some sort of extendable record list, akin to AArch64:

struct _aarch64_ctx {
__u32 magic;
__u32 size;
};

One of the 3 zeros could be the total size of the extensions, so that it's easy
to validate the size or memcpy the lot without parsing each individual record.
 The other two zeros could be the first header of the next record.  Which in
this case also allows the payload of that first record to be aligned mod 16,
which could come in handy.

Talk to the risc-v kernel engineers and come up with a plan that includes room
for the next architecture extension as well.  They may have already done so,
but I'm not monitoring the correct mailing list to know.

r~

Re: [PATCH] target/arm: Implement SVE2 scatter store insns

2020-04-21 Thread Richard Henderson

On 4/20/20 1:42 PM, Stephen Long wrote:
> +static bool trans_ST1_zprz_sve2(DisasContext *s, arg_ST1_zprz_sve2 *a)
> +{
> +gen_helper_gvec_mem_scatter *fn;
> +bool be = s->be_data == MO_BE;
> +bool mte = s->mte_active[0];
> +
> +if (!dc_isar_feature(aa64_sve2, s) || a->esz < a->msz
> +|| (a->msz == 0 && a->scale)) {
> +return false;
> +}
> +if (!sve_access_check(s)) {
> +return true;
> +}
> +switch (a->esz) {
> +case MO_32:
> +fn = scatter_store_fn32[mte][be][a->xs][a->msz];
> +break;
> +case MO_64:
> +fn = scatter_store_fn64[mte][be][a->xs][a->msz];
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
> +   cpu_reg_sp(s, a->rn), a->msz, true, fn);
> +return true;
> +}

I was thinking of something more along the lines of

static bool STNT1_zprz(DisasContext *s, arg_ST1_zprz *a)
{
if (!dc_isar_feature(aa64_sve2, s)) {
return false;
}
return trans_ST1_zprz(s, a);
}

The fields should be identical, and so decodetree should pick the same type for
'a', underneath all of the typedefs.

If decodetree cannot find a common argument set for the two insns, we might
need to help it along, like we do with e.g. _esz.  I don't know without
trying if that will be required.


r~

Re: [PATCH] target/arm: Implement SVE2 FMMLA

2020-04-21 Thread Richard Henderson

On 4/20/20 8:10 AM, Stephen Long wrote:
> +#define DO_FP_MATRIX_MUL(NAME, TYPE, H) \
> +void HELPER(NAME)(void *vd, void *va, void *vn, void *vm,   \
> + void *status, uint32_t desc)   \
> +{   \
> +intptr_t s, i, j;   \
> +intptr_t opr_sz = simd_oprsz(desc) / (sizeof(TYPE) >> 2);   \
> +\
> +for (s = 0; s < opr_sz; ++s) {  \
> +TYPE *n = vn + s * (sizeof(TYPE) >> 2); \
> +TYPE *m = vm + s * (sizeof(TYPE) >> 2); \
> +TYPE *a = va + s * (sizeof(TYPE) >> 2); \
> +TYPE *d = vd + s * (sizeof(TYPE) >> 2); \
> +\
> +for (i = 0; i < 1; ++i) {   \
> +for (j = 0; j < 1; ++j) {   \
> +TYPE addend = a[H(2*i + j)];\
> +\
> +TYPE nn0 = n[H(2*i)];   \
> +TYPE mm0 = m[H(2*j)];   \
> +TYPE prod0 = TYPE##_mul(nn0, mm0, status);  \
> +\
> +TYPE nn1 = n[H4(2*i + 1)];  \
> +TYPE mm1 = m[H4(2*j + 1)];  \
> +TYPE prod1 = TYPE##_mul(nn1, mm1, status);  \
> +\
> +TYPE sum = TYPE##_add(prod0, prod1, status);\
> +d[H(2*i + j)] = TYPE##_add(sum, addend, status);\
> +}   \
> +}   \

This has a read-after-write problem, when D overlaps any of the inputs.  You
need to read all of the inputs before writing anything.

It might be easiest to just unroll these two inner loops:

TYPE n00 = n[0], n01 = n[1], n10 = n[2], n11 = n[3];
TYPE m00 = m[0], m01 = m[1], m10 = m[2], m11 = m[3];
TYPE p0, p1;

// i = 0, j = 0
p0 = mul(n00, m00, status);
p1 = mul(n01, m01, status);
a[0] = add(a[0], add(p0, p1, status), status);

// i = 0, j = 1
p0 = mul(n00, m10, status);
p1 = mul(n01, m11, status);
a[1] = add(a[1], add(p0, p1, status), status);
...



r~

Re: [PATCH for-5.1 00/31] target/arm: SVE2, part 1

2020-04-21 Thread LIU Zhiwei


Hi Richard,

I find BF16 is included in the ISA.  Will you extend  the softfpu in 
this patch set?


Zhiwei

On 2020/3/27 7:08, Richard Henderson wrote:

Posting this for early review.  It's based on some other patch
sets that I have posted recently that also touch SVE, listed
below.  But it might just be easier to clone the devel tree [2].
While the branch itself will rebase frequently for development,
I've also created a tag, post-sve2-20200326, for this posting.

This is mostly untested, as the most recently released Foundation
Model does not support SVE2.  Some of the new instructions overlap
with old fashioned NEON, and I can verify that those have not
broken, and show that SVE2 will use the same code path.  But the
predicated insns and bottom/top interleaved insns are not yet
RISU testable, as I have nothing to compare against.

The patches are in general arranged so that one complete group
of insns are added at once.  The groups within the manual [1]
have so far been small-ish.


r~

---

[1] ISA manual: 
https://static.docs.arm.com/ddi0602/d/ISA_A64_xml_futureA-2019-12_OPT.pdf

[2] Devel tree: https://github.com/rth7680/qemu/tree/tgt-arm-sve-2

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=163610
("target/arm: sve load/store improvements")

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164500
("target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA")

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164048
("target/arm: Implement ARMv8.5-MemTag, system mode")

Richard Henderson (31):
   target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2
   target/arm: Implement SVE2 Integer Multiply - Unpredicated
   target/arm: Implement SVE2 integer pairwise add and accumulate long
   target/arm: Remove fp_status from helper_{recpe,rsqrte}_u32
   target/arm: Implement SVE2 integer unary operations (predicated)
   target/arm: Split out saturating/rounding shifts from neon
   target/arm: Implement SVE2 saturating/rounding bitwise shift left
 (predicated)
   target/arm: Implement SVE2 integer halving add/subtract (predicated)
   target/arm: Implement SVE2 integer pairwise arithmetic
   target/arm: Implement SVE2 saturating add/subtract (predicated)
   target/arm: Implement SVE2 integer add/subtract long
   target/arm: Implement SVE2 integer add/subtract interleaved long
   target/arm: Implement SVE2 integer add/subtract wide
   target/arm: Implement SVE2 integer multiply long
   target/arm: Implement PMULLB and PMULLT
   target/arm: Tidy SVE tszimm shift formats
   target/arm: Implement SVE2 bitwise shift left long
   target/arm: Implement SVE2 bitwise exclusive-or interleaved
   target/arm: Implement SVE2 bitwise permute
   target/arm: Implement SVE2 complex integer add
   target/arm: Implement SVE2 integer absolute difference and accumulate
 long
   target/arm: Implement SVE2 integer add/subtract long with carry
   target/arm: Create arm_gen_gvec_[us]sra
   target/arm: Create arm_gen_gvec_{u,s}{rshr,rsra}
   target/arm: Implement SVE2 bitwise shift right and accumulate
   target/arm: Create arm_gen_gvec_{sri,sli}
   target/arm: Tidy handle_vec_simd_shri
   target/arm: Implement SVE2 bitwise shift and insert
   target/arm: Vectorize SABD/UABD
   target/arm: Vectorize SABA/UABA
   target/arm: Implement SVE2 integer absolute difference and accumulate

  target/arm/cpu.h   |  31 ++
  target/arm/helper-sve.h| 345 +
  target/arm/helper.h|  81 +++-
  target/arm/translate-a64.h |   9 +
  target/arm/translate.h |  24 +-
  target/arm/vec_internal.h  | 161 
  target/arm/sve.decode  | 217 ++-
  target/arm/helper.c|   3 +-
  target/arm/kvm64.c |   2 +
  target/arm/neon_helper.c   | 515 -
  target/arm/sve_helper.c| 757 ++---
  target/arm/translate-a64.c | 557 +++
  target/arm/translate-sve.c | 557 +++
  target/arm/translate.c | 626 ++
  target/arm/vec_helper.c| 411 
  target/arm/vfp_helper.c|   4 +-
  16 files changed, 3532 insertions(+), 768 deletions(-)
  create mode 100644 target/arm/vec_internal.h

Re: [PATCH for-5.1 00/31] target/arm: SVE2, part 1

2020-04-21 Thread Richard Henderson

On 4/21/20 7:51 PM, LIU Zhiwei wrote:
> I find BF16 is included in the ISA.  Will you extend  the softfpu in this 
> patch
> set?

I will do that eventually, but probably not part of the first full SVE2 patch 
set.

There are several optional extensions to SVE2, of which BF16 is one.  But BF16
also requires changes to the normal FPU as well, and Arm requires SVE and FPU
be in sync.

r~

Re: [PATCH v1 1/2] migration/xbzrle: replace transferred xbzrle bytes with encoded bytes

2020-04-21 Thread Wei Wang


On 04/22/2020 03:21 AM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

Like compressed_size which indicates how many bytes are compressed, we
need encoded_size to understand how many bytes are encoded with xbzrle
during migration.

Replace the old xbzrle_counter.bytes, instead of adding a new counter,
because we don't find a usage of xbzrle_counter.bytes currently, which
includes 3 more bytes of the migration transfer protocol header (in
addition to the encoding header). The encoded_size will further be used
to calculate the encoding rate.

Signed-off-by: Yi Sun 
Signed-off-by: Wei Wang 

Can you explain why these 3 bytes matter?  Certainly the 2 bytes of the
encoded_len are an overhead that's a cost of using XBZRLE; so if you're
trying to figure out whether xbzrle is worth it, then you should include
those 2 bytes in the cost.
That other byte, that holds ENCODING_FLAG_XBZRLE also seems to be pure
oerhead of XBZRLE; so your cost of using XBZRLE really does include
those 3 bytes.

SO to me it makes sense to include the 3 bytes as it currently does.

Dave


Thanks Dave for sharing your thoughts.

We hope to do a fair comparison of compression rate and xbzrle encoding 
rate.
The current compression_rate doesn't include the migration flag overhead 
(please see
update_compress thread_counts() ). So for xbzrle encoding rate, we 
wanted it not include the migration
protocol flags as well (but the 2 bytes xbzrle encoding overhead is kept 
there, as the compression rate

includes the compression header overhead).

Or would you think it is necessary to add the migration flag (8 bytes) 
for compression

when calculating the compression rate?

Best,
Wei

Re: [PATCH] linux-user/riscv: fix up struct target_ucontext definition

2020-04-21 Thread LIU Zhiwei


Ping.

When I port RISU, I find this bug. I can't get the correct registers 
from the

struct ucontext_t parameter in the signal handler.

If you want to reproduce it, just   register a signal handler for SIGILL,
and  output an illegal instruction, such as

#include 
#include 
#include 
#include 
#include 
#include 

void sigill(int sig, siginfo_t *si, void *uc)
{
printf("Illegal pc: %016" PRIx64 "\n",
   ((ucontext_t *)uc)->uc_mcontext.__gregs[0]);
}

static void set_sigill_handler(void (*fn) (int, siginfo_t *, void *))
{
struct sigaction sa;
memset(, 0, sizeof(struct sigaction));

sa.sa_sigaction = fn;
sa.sa_flags = SA_SIGINFO;
sigemptyset(_mask);
if (sigaction(SIGILL, , 0) != 0) {
perror("sigaction");
exit(1);
}
}

int main()
{
set_sigill_handler(sigill);
asm(".dword 0x006b");
return 0;
}
~

Zhiwei

On 2020/4/12 10:08, LIU Zhiwei wrote:

As struct target_ucontext will be transfered to signal handler, it
must keep pace with struct ucontext_t defined in Linux kernel.

Signed-off-by: LIU Zhiwei 
---
  linux-user/riscv/signal.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
index 83ecc6f799..67a95dbc7b 100644
--- a/linux-user/riscv/signal.c
+++ b/linux-user/riscv/signal.c
@@ -40,8 +40,9 @@ struct target_ucontext {
  unsigned long uc_flags;
  struct target_ucontext *uc_link;
  target_stack_t uc_stack;
-struct target_sigcontext uc_mcontext;
  target_sigset_t uc_sigmask;
+uint8_t   __unused[1024 / 8 - sizeof(target_sigset_t)];
+struct target_sigcontext uc_mcontext QEMU_ALIGNED(16);
  };
  
  struct target_rt_sigframe {

Re: [PATCH 03/15] KVM: MIPS: Fix VPN2_MASK definition for variable cpu_vmbits

2020-04-21 Thread Sasha Levin

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.6.5, v5.5.18, v5.4.33, v4.19.116, 
v4.14.176, v4.9.219, v4.4.219.

v5.6.5: Build OK!
v5.5.18: Build OK!
v5.4.33: Build OK!
v4.19.116: Build OK!
v4.14.176: Build OK!
v4.9.219: Build OK!
v4.4.219: Failed to apply! Possible dependencies:
029499b47738 ("KVM: x86: MMU: Make mmu_set_spte() return emulate value")
19d194c62b25 ("MIPS: KVM: Simplify TLB_* macros")
403015b323a2 ("MIPS: KVM: Move non-TLB handling code out of tlb.c")
7ee0e5b29d27 ("KVM: x86: MMU: Remove unused parameter of __direct_map()")
9a99c4fd6586 ("KVM: MIPS: Define KVM_ENTRYHI_ASID to 
cpu_asid_mask(_cpu_data)")
9fbfb06a4065 ("MIPS: KVM: Arrayify struct kvm_mips_tlb::tlb_lo*")
ba049e93aef7 ("kvm: rename pfn_t to kvm_pfn_t")
bdb7ed8608f8 ("MIPS: KVM: Convert headers to kernel sized types")
ca64c2beecd4 ("MIPS: KVM: Abstract guest ASID mask")
caa1faa7aba6 ("MIPS: KVM: Trivial whitespace and style fixes")
e6207bbea16c ("MIPS: KVM: Use MIPS_ENTRYLO_* defs from mipsregs.h")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

Re: [PATCH] roms: opensbi: Upgrade from v0.6 to v0.7

2020-04-21 Thread Bin Meng

Hi Alistair,

On Tue, Apr 21, 2020 at 9:34 AM Bin Meng  wrote:
>
> Hi Alistair,
>
> On Tue, Apr 21, 2020 at 2:41 AM Alistair Francis  wrote:
> >
> > On Mon, Apr 20, 2020 at 6:25 AM Bin Meng  wrote:
> > >
> > > Upgrade OpenSBI from v0.6 to v0.7 and the pre-built bios images.
> > >
> > > The v0.7 release includes the following commits:
> > >
> > > f64f4b9 lib: Add a new platform feature to bringup secondary harts
> > > b677a9b lib: Implement hart hotplug
> > > 5b48240 lib: Add possible hart status values
> > > e3f69fc lib: Implement Hart State Management (HSM) SBI extension
> > > 6704216 lib: Check MSIP bit after returning from WFI
> > > 82ae8e8 makefile: Do setup of the install target more flexible
> > > e1a5b73 platform: sifive: fu540: allow sv32 as an mmu-type
> > > 8c83fb2 lib: Fix return type of sbi_hsm_hart_started()
> > > 00d332b include: Move bits related defines and macros to sbi_bitops.h
> > > a148996 include: sbi_bitops: More useful bit operations
> > > 4a603eb platform: kendryte/k210: Set per-HART stack size to 8KB
> > > 678c3c3 include: sbi_scratch: Set per-HART scratch size to 4KB
> > > 2abc55b lib: Sort build objects in alphabetical order
> > > 6e87507 platform: ae350: Sort build objects in alphabetical order
> > > 650c0e5 lib: sbi: Fix coding style issues
> > > 078686d lib: serial: Fix coding style issues
> > > 3226bd9 lib: Simple bitmap library
> > > c741abc include: Simple hartmask library
> > > d6d7e18 lib: sbi_init: Don't allow HARTID greater than 
> > > SBI_HARTMASK_MAX_BITS
> > > a4a6a81 lib: Introduce SBI_TLB_INFO_INIT() helper macro
> > > d963164 lib: sbi_tlb: Use sbi_hartmask in sbi_tlb_info
> > > 71d2b83 lib: Move all coldboot wait APIs to sbi_init.c
> > > 2b945fc lib: sbi_init: Use hartmask for coldboot wait
> > > 44ce5b9 include: Remove disabled_hart_mask from sbi_platform
> > > 2db381f lib: Introduce sbi_hsm_hart_started_mask() API
> > > 61f7768 lib: sbi_ecall_legacy: Use sbi_hsm_hart_started_mask() API
> > > 466fecb lib: sbi_system: Use sbi_hsm_hart_started_mask() API
> > > 9aad831 lib: sbi_ipi: Use sbi_hsm_hart_started_mask() API
> > > eede1aa lib: sbi_hart: Remove HART available mask and related APIs
> > > 757bb44 docs: Remove out-of-date documentation
> > > 86d37bb lib: sbi: Fix misaligned trap handling
> > > ffdc858 platform: ariane-fpga: Change license for ariane-fpga from 
> > > GPL-2.0 to BSD-2
> > > 4b2f594 sbi: Add definitions for true/false
> > > 0cfe49a libfdt: Add INT32_MAX and UINT32_MAX in libfdt_env.h
> > > baac7e0 libfdt: Upgrade to v1.5.1 release
> > > f92147c include: Make sbi_hart_id_to_scratch() as macro
> > > eeae3d9 firmware: fw_base: Optimize _hartid_to_scratch() implementation
> > > 16e7071 lib: sbi_hsm: Optimize sbi_hsm_hart_get_state() implementation
> > > 823345e include: Make sbi_current_hartid() as macro in riscv_asm.h
> > > 9aabba2 Makefile: Fix distclean make target
> > > 9275ed3 platform: ariane-fpga: Set per-HART stack size to 8KB
> > > 2343efd platform: Set per-HART stack size to 8KB in the template platform 
> > > codes
> > > 72a0628 platform: Use one unified per-HART stack size macro for all 
> > > platforms
> > > 327ba36 scripts: Cover sifive/fu540 in the 32-bit build
> > > 5fbcd62 lib: sbi: Update pmp_get() to return decoded size directly
> > > dce8846 libfdt: Compile fdt_addresses.c
> > > fcb1ded lib: utils: Add a fdt_reserved_memory_fixup() helper
> > > 666be6d platform: Clean up include header files
> > > 6af5576 lib: utils: Move PLIC DT fix up codes to fdt_helper.c
> > > e846ce1 platform: andes/ae350: Fix up DT for reserved memory
> > > 8135520 platform: ariane-fpga: Fix up DT for reserved memory
> > > c9a5268 platform: qemu/virt: Fix up DT for reserved memory
> > > 6f9bb83 platform: sifive/fu540: Fix up DT for reserved memory
> > > 1071f05 platform: sifive/fu540: Remove "stdout-path" fix-up
> > > dd9439f lib: utils: Add a fdt_cpu_fixup() helper
> > > 3f1c847 platform: sifive/fu540: Replace cpu0 node fix-up with the new 
> > > helper
> > > db6a2b5 lib: utils: Add a general device tree fix-up helper
> > > 3f8d754 platform: Update to call general DT fix-up helper
> > > 87a7ef7 lib: sbi_scratch: Introduce HART id to scratch table
> > > e23d3ba include: Simplify HART id to scratch macro
> > > 19bd531 lib: sbi_hsm: Simplify hart_get_state() and hart_started() APIs
> > > 3ebfe0e lib: sbi_tlb: Simplify sbi_tlb_entry_process() function
> > > 209134d lib: Handle failure of sbi_hartid_to_scratch() API
> > > bd6ef02 include: sbi_platform: Improve sbi_platform_hart_disabled() API
> > > c9f60fc lib: sbi_scratch: Don't set hartid_to_scratch table for disabled 
> > > HART
> > > 680b098 lib: sbi_hsm: Don't use sbi_platform_hart_count() API
> > > db187d6 lib: sbi_hsm: Remove scratch parameter from hart_started_mask() 
> > > API
> > > 814f38d lib: sbi_hsm: Don't use sbi_platform_hart_disabled() API
> > > 75eec9d lib: Don't use sbi_platform_hart_count() API
> > > c51f02c include: sbi_platform: Introduce HART index to HART id table
> > > 315a877 platform:

Re: [PATCH 02/15] KVM: MIPS: Define KVM_ENTRYHI_ASID to cpu_asid_mask(_cpu_data)

2020-04-21 Thread Sasha Levin

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.6.5, v5.5.18, v5.4.33, v4.19.116, 
v4.14.176, v4.9.219, v4.4.219.

v5.6.5: Build OK!
v5.5.18: Build OK!
v5.4.33: Build OK!
v4.19.116: Build OK!
v4.14.176: Build OK!
v4.9.219: Build OK!
v4.4.219: Failed to apply! Possible dependencies:
029499b47738 ("KVM: x86: MMU: Make mmu_set_spte() return emulate value")
19d194c62b25 ("MIPS: KVM: Simplify TLB_* macros")
403015b323a2 ("MIPS: KVM: Move non-TLB handling code out of tlb.c")
7ee0e5b29d27 ("KVM: x86: MMU: Remove unused parameter of __direct_map()")
9fbfb06a4065 ("MIPS: KVM: Arrayify struct kvm_mips_tlb::tlb_lo*")
ba049e93aef7 ("kvm: rename pfn_t to kvm_pfn_t")
bdb7ed8608f8 ("MIPS: KVM: Convert headers to kernel sized types")
ca64c2beecd4 ("MIPS: KVM: Abstract guest ASID mask")
caa1faa7aba6 ("MIPS: KVM: Trivial whitespace and style fixes")
e6207bbea16c ("MIPS: KVM: Use MIPS_ENTRYLO_* defs from mipsregs.h")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

[PATCH v2 35/36] target/ppc: Use tcg_gen_gvec_rotlv

2020-04-21 Thread Richard Henderson

Cc: David Gibson 
Signed-off-by: Richard Henderson 
---
 target/ppc/helper.h |  4 
 target/ppc/int_helper.c | 17 -
 target/ppc/translate/vmx-impl.inc.c |  8 
 3 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index a95c010391..b0114fc915 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -213,10 +213,6 @@ DEF_HELPER_3(vsubuqm, void, avr, avr, avr)
 DEF_HELPER_4(vsubecuq, void, avr, avr, avr, avr)
 DEF_HELPER_4(vsubeuqm, void, avr, avr, avr, avr)
 DEF_HELPER_3(vsubcuq, void, avr, avr, avr)
-DEF_HELPER_3(vrlb, void, avr, avr, avr)
-DEF_HELPER_3(vrlh, void, avr, avr, avr)
-DEF_HELPER_3(vrlw, void, avr, avr, avr)
-DEF_HELPER_3(vrld, void, avr, avr, avr)
 DEF_HELPER_4(vsldoi, void, avr, avr, avr, i32)
 DEF_HELPER_3(vextractub, void, avr, avr, i32)
 DEF_HELPER_3(vextractuh, void, avr, avr, i32)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 6d238b989d..ee308da2ca 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1347,23 +1347,6 @@ VRFI(p, float_round_up)
 VRFI(z, float_round_to_zero)
 #undef VRFI
 
-#define VROTATE(suffix, element, mask)  \
-void helper_vrl##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
-{   \
-int i;  \
-\
-for (i = 0; i < ARRAY_SIZE(r->element); i++) {  \
-unsigned int shift = b->element[i] & mask;  \
-r->element[i] = (a->element[i] << shift) |  \
-(a->element[i] >> (sizeof(a->element[0]) * 8 - shift)); \
-}   \
-}
-VROTATE(b, u8, 0x7)
-VROTATE(h, u16, 0xF)
-VROTATE(w, u32, 0x1F)
-VROTATE(d, u64, 0x3F)
-#undef VROTATE
-
 void helper_vrsqrtefp(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *b)
 {
 int i;
diff --git a/target/ppc/translate/vmx-impl.inc.c 
b/target/ppc/translate/vmx-impl.inc.c
index 403ed3a01c..de2fd136ff 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -900,13 +900,13 @@ GEN_VXFORM3(vsubeuqm, 31, 0);
 GEN_VXFORM3(vsubecuq, 31, 0);
 GEN_VXFORM_DUAL(vsubeuqm, PPC_NONE, PPC2_ALTIVEC_207, \
 vsubecuq, PPC_NONE, PPC2_ALTIVEC_207)
-GEN_VXFORM(vrlb, 2, 0);
-GEN_VXFORM(vrlh, 2, 1);
-GEN_VXFORM(vrlw, 2, 2);
+GEN_VXFORM_V(vrlb, MO_8, tcg_gen_gvec_rotlv, 2, 0);
+GEN_VXFORM_V(vrlh, MO_16, tcg_gen_gvec_rotlv, 2, 1);
+GEN_VXFORM_V(vrlw, MO_32, tcg_gen_gvec_rotlv, 2, 2);
 GEN_VXFORM(vrlwmi, 2, 2);
 GEN_VXFORM_DUAL(vrlw, PPC_ALTIVEC, PPC_NONE, \
 vrlwmi, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM(vrld, 2, 3);
+GEN_VXFORM_V(vrld, MO_64, tcg_gen_gvec_rotlv, 2, 3);
 GEN_VXFORM(vrldmi, 2, 3);
 GEN_VXFORM_DUAL(vrld, PPC_NONE, PPC2_ALTIVEC_207, \
 vrldmi, PPC_NONE, PPC2_ISA300)
-- 
2.20.1

[PATCH v2 33/36] tcg/aarch64: Implement INDEX_op_rotli_vec

2020-04-21 Thread Richard Henderson

We can implement this in two instructions, using SLI.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.opc.h |  1 +
 tcg/aarch64/tcg-target.inc.c | 20 +++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.opc.h b/tcg/aarch64/tcg-target.opc.h
index 26bfd9c460..bce30accd9 100644
--- a/tcg/aarch64/tcg-target.opc.h
+++ b/tcg/aarch64/tcg-target.opc.h
@@ -12,3 +12,4 @@
  */
 
 DEF(aa64_sshl_vec, 1, 2, 0, IMPLVEC)
+DEF(aa64_sli_vec, 1, 2, 1, IMPLVEC)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 3b5a5d78c7..4bc9b30254 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -557,6 +557,7 @@ typedef enum {
 I3614_SSHR  = 0x0f000400,
 I3614_SSRA  = 0x0f001400,
 I3614_SHL   = 0x0f005400,
+I3614_SLI   = 0x2f005400,
 I3614_USHR  = 0x2f000400,
 I3614_USRA  = 0x2f001400,
 
@@ -2402,6 +2403,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_sari_vec:
 tcg_out_insn(s, 3614, SSHR, is_q, a0, a1, (16 << vece) - a2);
 break;
+case INDEX_op_aa64_sli_vec:
+tcg_out_insn(s, 3614, SLI, is_q, a0, a2, args[3] + (8 << vece));
+break;
 case INDEX_op_shlv_vec:
 tcg_out_insn(s, 3616, USHL, is_q, vece, a0, a1, a2);
 break;
@@ -2488,6 +2492,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_shlv_vec:
 case INDEX_op_bitsel_vec:
 return 1;
+case INDEX_op_rotli_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
 return -1;
@@ -2508,13 +2513,23 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
 {
 va_list va;
 TCGv_vec v0, v1, v2, t1;
+TCGArg a2;
 
 va_start(va, a0);
 v0 = temp_tcgv_vec(arg_temp(a0));
 v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
-v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+a2 = va_arg(va, TCGArg);
+v2 = temp_tcgv_vec(arg_temp(a2));
 
 switch (opc) {
+case INDEX_op_rotli_vec:
+t1 = tcg_temp_new_vec(type);
+tcg_gen_shri_vec(vece, t1, v1, -a2 & ((8 << vece) - 1));
+vec_gen_4(INDEX_op_aa64_sli_vec, type, vece,
+  tcgv_vec_arg(v0), tcgv_vec_arg(t1), tcgv_vec_arg(v1), a2);
+tcg_temp_free_vec(t1);
+break;
+
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
 /* Right shifts are negative left shifts for AArch64.  */
@@ -2547,6 +2562,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } };
 static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };
 static const TCGTargetOpDef w_w_w = { .args_ct_str = { "w", "w", "w" } };
+static const TCGTargetOpDef w_0_w = { .args_ct_str = { "w", "0", "w" } };
 static const TCGTargetOpDef w_w_wO = { .args_ct_str = { "w", "w", "wO" } };
 static const TCGTargetOpDef w_w_wN = { .args_ct_str = { "w", "w", "wN" } };
 static const TCGTargetOpDef w_w_wZ = { .args_ct_str = { "w", "w", "wZ" } };
@@ -2741,6 +2757,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 return _w_wZ;
 case INDEX_op_bitsel_vec:
 return _w_w_w;
+case INDEX_op_aa64_sli_vec:
+return _0_w;
 
 default:
 return NULL;
-- 
2.20.1

[PATCH v2 28/36] tcg: Implement gvec support for rotate by immediate

2020-04-21 Thread Richard Henderson

No host backend support yet, but the interfaces for rotli
are in place.  Canonicalize immediate rotate to the left,
based on a survey of architectures, but provide both left
and right shift interfaces to the translators.

Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime.h  |  5 +++
 include/tcg/tcg-op-gvec.h|  6 
 include/tcg/tcg-op.h |  2 ++
 include/tcg/tcg-opc.h|  1 +
 include/tcg/tcg.h|  1 +
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  1 +
 tcg/ppc/tcg-target.h |  1 +
 accel/tcg/tcg-runtime-gvec.c | 48 +
 tcg/tcg-op-gvec.c| 68 
 tcg/tcg-op-vec.c | 12 +++
 tcg/tcg.c|  2 ++
 tcg/README   |  3 +-
 13 files changed, 150 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 4fa61b49b4..cf10c8361e 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -259,6 +259,11 @@ DEF_HELPER_FLAGS_3(gvec_sar16i, TCG_CALL_NO_RWG, void, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_sar32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(gvec_sar64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(gvec_rotl8i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_rotl16i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_rotl32i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(gvec_rotl64i, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(gvec_shl8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_shl16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_shl32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index cea6497341..1afc3ebf03 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -334,6 +334,10 @@ void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, 
uint32_t aofs,
int64_t shift, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_sari(unsigned vece, uint32_t dofs, uint32_t aofs,
int64_t shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotli(unsigned vece, uint32_t dofs, uint32_t aofs,
+int64_t shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotri(unsigned vece, uint32_t dofs, uint32_t aofs,
+int64_t shift, uint32_t oprsz, uint32_t maxsz);
 
 void tcg_gen_gvec_shls(unsigned vece, uint32_t dofs, uint32_t aofs,
TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
@@ -388,5 +392,7 @@ void tcg_gen_vec_shr8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_shr16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_sar8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
 void tcg_gen_vec_sar16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t);
+void tcg_gen_vec_rotl8i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
+void tcg_gen_vec_rotl16i_i64(TCGv_i64 d, TCGv_i64 a, int64_t c);
 
 #endif
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index b07bf7b524..c624e371d5 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -986,6 +986,8 @@ void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, TCGv_vec b);
 void tcg_gen_shli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_shri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 void tcg_gen_sari_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
+void tcg_gen_rotli_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
+void tcg_gen_rotri_vec(unsigned vece, TCGv_vec r, TCGv_vec a, int64_t i);
 
 void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 4a9cbf5426..c46c096c3e 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -245,6 +245,7 @@ DEF(not_vec, 1, 1, 0, IMPLVEC | 
IMPL(TCG_TARGET_HAS_not_vec))
 DEF(shli_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec))
 DEF(shri_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec))
 DEF(sari_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_shi_vec))
+DEF(rotli_vec, 1, 1, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_roti_vec))
 
 DEF(shls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
 DEF(shrs_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index f72530dfda..d2034d9334 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -182,6 +182,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_not_vec  0
 #define TCG_TARGET_HAS_andc_vec 0
 #define TCG_TARGET_HAS_orc_vec  0
+#define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  0
diff --git a/tcg/aarch64/tcg-target.h

[PATCH v2 36/36] target/s390x: Use tcg_gen_gvec_rotl{i,s,v}

2020-04-21 Thread Richard Henderson

Merge VERLL and VERLLV into op_vesv and op_ves, alongside
all of the other vector shift operations.

Cc: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h   |  4 --
 target/s390x/translate_vx.inc.c | 66 +
 target/s390x/vec_int_helper.c   | 31 
 target/s390x/insn-data.def  |  4 +-
 4 files changed, 11 insertions(+), 94 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index b5813c2ac2..b7887b552b 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -198,10 +198,6 @@ DEF_HELPER_FLAGS_4(gvec_vmlo16, TCG_CALL_NO_RWG, void, 
ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vmlo32, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vpopct8, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
 DEF_HELPER_FLAGS_3(gvec_vpopct16, TCG_CALL_NO_RWG, void, ptr, cptr, i32)
-DEF_HELPER_FLAGS_4(gvec_verllv8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
-DEF_HELPER_FLAGS_4(gvec_verllv16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
-DEF_HELPER_FLAGS_4(gvec_verll8, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
-DEF_HELPER_FLAGS_4(gvec_verll16, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
 DEF_HELPER_FLAGS_4(gvec_verim8, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_verim16, TCG_CALL_NO_RWG, void, ptr, cptr, cptr, i32)
 DEF_HELPER_FLAGS_4(gvec_vsl, TCG_CALL_NO_RWG, void, ptr, cptr, i64, i32)
diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 12347f8a03..eb767f5288 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -1825,63 +1825,6 @@ static DisasJumpType op_vpopct(DisasContext *s, DisasOps 
*o)
 return DISAS_NEXT;
 }
 
-static void gen_rll_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
-{
-TCGv_i32 t0 = tcg_temp_new_i32();
-
-tcg_gen_andi_i32(t0, b, 31);
-tcg_gen_rotl_i32(d, a, t0);
-tcg_temp_free_i32(t0);
-}
-
-static void gen_rll_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
-{
-TCGv_i64 t0 = tcg_temp_new_i64();
-
-tcg_gen_andi_i64(t0, b, 63);
-tcg_gen_rotl_i64(d, a, t0);
-tcg_temp_free_i64(t0);
-}
-
-static DisasJumpType op_verllv(DisasContext *s, DisasOps *o)
-{
-const uint8_t es = get_field(s, m4);
-static const GVecGen3 g[4] = {
-{ .fno = gen_helper_gvec_verllv8, },
-{ .fno = gen_helper_gvec_verllv16, },
-{ .fni4 = gen_rll_i32, },
-{ .fni8 = gen_rll_i64, },
-};
-
-if (es > ES_64) {
-gen_program_exception(s, PGM_SPECIFICATION);
-return DISAS_NORETURN;
-}
-
-gen_gvec_3(get_field(s, v1), get_field(s, v2),
-   get_field(s, v3), [es]);
-return DISAS_NEXT;
-}
-
-static DisasJumpType op_verll(DisasContext *s, DisasOps *o)
-{
-const uint8_t es = get_field(s, m4);
-static const GVecGen2s g[4] = {
-{ .fno = gen_helper_gvec_verll8, },
-{ .fno = gen_helper_gvec_verll16, },
-{ .fni4 = gen_rll_i32, },
-{ .fni8 = gen_rll_i64, },
-};
-
-if (es > ES_64) {
-gen_program_exception(s, PGM_SPECIFICATION);
-return DISAS_NORETURN;
-}
-gen_gvec_2s(get_field(s, v1), get_field(s, v3), o->addr1,
-[es]);
-return DISAS_NEXT;
-}
-
 static void gen_rim_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b, int32_t c)
 {
 TCGv_i32 t = tcg_temp_new_i32();
@@ -1946,6 +1889,9 @@ static DisasJumpType op_vesv(DisasContext *s, DisasOps *o)
 case 0x70:
 gen_gvec_fn_3(shlv, es, v1, v2, v3);
 break;
+case 0x73:
+gen_gvec_fn_3(rotlv, es, v1, v2, v3);
+break;
 case 0x7a:
 gen_gvec_fn_3(sarv, es, v1, v2, v3);
 break;
@@ -1977,6 +1923,9 @@ static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
 case 0x30:
 gen_gvec_fn_2i(shli, es, v1, v3, d2);
 break;
+case 0x33:
+gen_gvec_fn_2i(rotli, es, v1, v3, d2);
+break;
 case 0x3a:
 gen_gvec_fn_2i(sari, es, v1, v3, d2);
 break;
@@ -1994,6 +1943,9 @@ static DisasJumpType op_ves(DisasContext *s, DisasOps *o)
 case 0x30:
 gen_gvec_fn_2s(shls, es, v1, v3, shift);
 break;
+case 0x33:
+gen_gvec_fn_2s(rotls, es, v1, v3, shift);
+break;
 case 0x3a:
 gen_gvec_fn_2s(sars, es, v1, v3, shift);
 break;
diff --git a/target/s390x/vec_int_helper.c b/target/s390x/vec_int_helper.c
index 0d6bc13dd6..5561b3ed90 100644
--- a/target/s390x/vec_int_helper.c
+++ b/target/s390x/vec_int_helper.c
@@ -515,37 +515,6 @@ void HELPER(gvec_vpopct##BITS)(void *v1, const void *v2, 
uint32_t desc)\
 DEF_VPOPCT(8)
 DEF_VPOPCT(16)
 
-#define DEF_VERLLV(BITS)   
\
-void HELPER(gvec_verllv##BITS)(void *v1, const void *v2, const void *v3,   
\
-   uint32_t desc)  
\
-{

[PATCH v2 26/36] tcg: Add load_dest parameter to GVecGen2

2020-04-21 Thread Richard Henderson

We have this same parameter for GVecGen2i, GVecGen3,
and GVecGen3i.  This will make some SVE2 insns easier
to parameterize.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op-gvec.h |  2 ++
 tcg/tcg-op-gvec.c | 45 ---
 2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index d89f91f40e..cea6497341 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -109,6 +109,8 @@ typedef struct {
 uint8_t vece;
 /* Prefer i64 to v64.  */
 bool prefer_i64;
+/* Load dest as a 2nd source operand.  */
+bool load_dest;
 } GVecGen2;
 
 typedef struct {
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 43cac1a0bf..049a55e700 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -663,17 +663,22 @@ static void expand_clr(uint32_t dofs, uint32_t maxsz)
 
 /* Expand OPSZ bytes worth of two-operand operations using i32 elements.  */
 static void expand_2_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
- void (*fni)(TCGv_i32, TCGv_i32))
+ bool load_dest, void (*fni)(TCGv_i32, TCGv_i32))
 {
 TCGv_i32 t0 = tcg_temp_new_i32();
+TCGv_i32 t1 = tcg_temp_new_i32();
 uint32_t i;
 
 for (i = 0; i < oprsz; i += 4) {
 tcg_gen_ld_i32(t0, cpu_env, aofs + i);
-fni(t0, t0);
-tcg_gen_st_i32(t0, cpu_env, dofs + i);
+if (load_dest) {
+tcg_gen_ld_i32(t1, cpu_env, dofs + i);
+}
+fni(t1, t0);
+tcg_gen_st_i32(t1, cpu_env, dofs + i);
 }
 tcg_temp_free_i32(t0);
+tcg_temp_free_i32(t1);
 }
 
 static void expand_2i_i32(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
@@ -793,17 +798,22 @@ static void expand_4_i32(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
 
 /* Expand OPSZ bytes worth of two-operand operations using i64 elements.  */
 static void expand_2_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
- void (*fni)(TCGv_i64, TCGv_i64))
+ bool load_dest, void (*fni)(TCGv_i64, TCGv_i64))
 {
 TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
 uint32_t i;
 
 for (i = 0; i < oprsz; i += 8) {
 tcg_gen_ld_i64(t0, cpu_env, aofs + i);
-fni(t0, t0);
-tcg_gen_st_i64(t0, cpu_env, dofs + i);
+if (load_dest) {
+tcg_gen_ld_i64(t1, cpu_env, dofs + i);
+}
+fni(t1, t0);
+tcg_gen_st_i64(t1, cpu_env, dofs + i);
 }
 tcg_temp_free_i64(t0);
+tcg_temp_free_i64(t1);
 }
 
 static void expand_2i_i64(uint32_t dofs, uint32_t aofs, uint32_t oprsz,
@@ -924,17 +934,23 @@ static void expand_4_i64(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
 /* Expand OPSZ bytes worth of two-operand operations using host vectors.  */
 static void expand_2_vec(unsigned vece, uint32_t dofs, uint32_t aofs,
  uint32_t oprsz, uint32_t tysz, TCGType type,
+ bool load_dest,
  void (*fni)(unsigned, TCGv_vec, TCGv_vec))
 {
 TCGv_vec t0 = tcg_temp_new_vec(type);
+TCGv_vec t1 = tcg_temp_new_vec(type);
 uint32_t i;
 
 for (i = 0; i < oprsz; i += tysz) {
 tcg_gen_ld_vec(t0, cpu_env, aofs + i);
-fni(vece, t0, t0);
-tcg_gen_st_vec(t0, cpu_env, dofs + i);
+if (load_dest) {
+tcg_gen_ld_vec(t1, cpu_env, dofs + i);
+}
+fni(vece, t1, t0);
+tcg_gen_st_vec(t1, cpu_env, dofs + i);
 }
 tcg_temp_free_vec(t0);
+tcg_temp_free_vec(t1);
 }
 
 /* Expand OPSZ bytes worth of two-vector operands and an immediate operand
@@ -1088,7 +1104,8 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
  * that e.g. size == 80 would be expanded with 2x32 + 1x16.
  */
 some = QEMU_ALIGN_DOWN(oprsz, 32);
-expand_2_vec(g->vece, dofs, aofs, some, 32, TCG_TYPE_V256, g->fniv);
+expand_2_vec(g->vece, dofs, aofs, some, 32, TCG_TYPE_V256,
+ g->load_dest, g->fniv);
 if (some == oprsz) {
 break;
 }
@@ -1098,17 +1115,19 @@ void tcg_gen_gvec_2(uint32_t dofs, uint32_t aofs,
 maxsz -= some;
 /* fallthru */
 case TCG_TYPE_V128:
-expand_2_vec(g->vece, dofs, aofs, oprsz, 16, TCG_TYPE_V128, g->fniv);
+expand_2_vec(g->vece, dofs, aofs, oprsz, 16, TCG_TYPE_V128,
+ g->load_dest, g->fniv);
 break;
 case TCG_TYPE_V64:
-expand_2_vec(g->vece, dofs, aofs, oprsz, 8, TCG_TYPE_V64, g->fniv);
+expand_2_vec(g->vece, dofs, aofs, oprsz, 8, TCG_TYPE_V64,
+ g->load_dest, g->fniv);
 break;
 
 case 0:
 if (g->fni8 && check_size_impl(oprsz, 8)) {
-expand_2_i64(dofs, aofs, oprsz, g->fni8);
+expand_2_i64(dofs, aofs, oprsz, g->load_dest, g->fni8);
 } else if (g->fni4 && check_size_impl(oprsz, 4)) {
-

[PATCH v2 30/36] tcg: Remove expansion to shift by vector from do_shifts

2020-04-21 Thread Richard Henderson

We do not reflect this expansion in tcg_can_emit_vecop_list,
so it is unused and unusable.  However, we actually perform
the same expansion in do_gvec_shifts, so it is also unneeded.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-vec.c | 35 +++
 1 file changed, 11 insertions(+), 24 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 4af92d6b0a..52c1b66283 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -781,7 +781,7 @@ void tcg_gen_rotrv_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, TCGv_vec b)
 }
 
 static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
-  TCGv_i32 s, TCGOpcode opc_s, TCGOpcode opc_v)
+  TCGv_i32 s, TCGOpcode opc)
 {
 TCGTemp *rt = tcgv_vec_temp(r);
 TCGTemp *at = tcgv_vec_temp(a);
@@ -790,48 +790,35 @@ static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec 
a,
 TCGArg ai = temp_arg(at);
 TCGArg si = temp_arg(st);
 TCGType type = rt->base_type;
-const TCGOpcode *hold_list;
 int can;
 
 tcg_debug_assert(at->base_type >= type);
-tcg_assert_listed_vecop(opc_s);
-hold_list = tcg_swap_vecop_list(NULL);
-
-can = tcg_can_emit_vec_op(opc_s, type, vece);
+tcg_assert_listed_vecop(opc);
+can = tcg_can_emit_vec_op(opc, type, vece);
 if (can > 0) {
-vec_gen_3(opc_s, type, vece, ri, ai, si);
+vec_gen_3(opc, type, vece, ri, ai, si);
 } else if (can < 0) {
-tcg_expand_vec_op(opc_s, type, vece, ri, ai, si);
+const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
+tcg_expand_vec_op(opc, type, vece, ri, ai, si);
+tcg_swap_vecop_list(hold_list);
 } else {
-TCGv_vec vec_s = tcg_temp_new_vec(type);
-
-if (vece == MO_64) {
-TCGv_i64 s64 = tcg_temp_new_i64();
-tcg_gen_extu_i32_i64(s64, s);
-tcg_gen_dup_i64_vec(MO_64, vec_s, s64);
-tcg_temp_free_i64(s64);
-} else {
-tcg_gen_dup_i32_vec(vece, vec_s, s);
-}
-do_op3_nofail(vece, r, a, vec_s, opc_v);
-tcg_temp_free_vec(vec_s);
+g_assert_not_reached();
 }
-tcg_swap_vecop_list(hold_list);
 }
 
 void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
 {
-do_shifts(vece, r, a, b, INDEX_op_shls_vec, INDEX_op_shlv_vec);
+do_shifts(vece, r, a, b, INDEX_op_shls_vec);
 }
 
 void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
 {
-do_shifts(vece, r, a, b, INDEX_op_shrs_vec, INDEX_op_shrv_vec);
+do_shifts(vece, r, a, b, INDEX_op_shrs_vec);
 }
 
 void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 b)
 {
-do_shifts(vece, r, a, b, INDEX_op_sars_vec, INDEX_op_sarv_vec);
+do_shifts(vece, r, a, b, INDEX_op_sars_vec);
 }
 
 void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
-- 
2.20.1

[PATCH v2 25/36] tcg: Remove tcg_gen_dup{8,16,32,64}i_vec

2020-04-21 Thread Richard Henderson

These interfaces have been replaced by tcg_gen_dupi_vec
and tcg_constant_vec.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op.h |  4 
 tcg/tcg-op-vec.c | 20 
 2 files changed, 24 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 11ed9192f7..a39eb13ff0 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -959,10 +959,6 @@ void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);
 void tcg_gen_dup_i32_vec(unsigned vece, TCGv_vec, TCGv_i32);
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec, TCGv_i64);
 void tcg_gen_dup_mem_vec(unsigned vece, TCGv_vec, TCGv_ptr, tcg_target_long);
-void tcg_gen_dup8i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup16i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup32i_vec(TCGv_vec, uint32_t);
-void tcg_gen_dup64i_vec(TCGv_vec, uint64_t);
 void tcg_gen_dupi_vec(unsigned vece, TCGv_vec, uint64_t);
 void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
 void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b);
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 6343046e18..a9c16d85c5 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -284,26 +284,6 @@ void tcg_gen_dupi_vec(unsigned vece, TCGv_vec dest, 
uint64_t val)
 tcg_gen_mov_vec(dest, tcg_constant_vec(type, vece, val));
 }
 
-void tcg_gen_dup64i_vec(TCGv_vec dest, uint64_t val)
-{
-tcg_gen_dupi_vec(MO_64, dest, val);
-}
-
-void tcg_gen_dup32i_vec(TCGv_vec dest, uint32_t val)
-{
-tcg_gen_dupi_vec(MO_32, dest, val);
-}
-
-void tcg_gen_dup16i_vec(TCGv_vec dest, uint32_t val)
-{
-tcg_gen_dupi_vec(MO_16, dest, val);
-}
-
-void tcg_gen_dup8i_vec(TCGv_vec dest, uint32_t val)
-{
-tcg_gen_dupi_vec(MO_8, dest, val);
-}
-
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a)
 {
 TCGArg ri = tcgv_vec_arg(r);
-- 
2.20.1

[PATCH v2 24/36] tcg/i386: Use tcg_constant_vec with tcg vec expanders

2020-04-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 9cb627d6eb..deace219d2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3452,7 +3452,7 @@ static void expand_vec_sari(TCGType type, unsigned vece,
 static void expand_vec_mul(TCGType type, unsigned vece,
TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
 {
-TCGv_vec t1, t2, t3, t4;
+TCGv_vec t1, t2, t3, t4, zero;
 
 tcg_debug_assert(vece == MO_8);
 
@@ -3470,11 +3470,11 @@ static void expand_vec_mul(TCGType type, unsigned vece,
 case TCG_TYPE_V64:
 t1 = tcg_temp_new_vec(TCG_TYPE_V128);
 t2 = tcg_temp_new_vec(TCG_TYPE_V128);
-tcg_gen_dup16i_vec(t2, 0);
+zero = tcg_constant_vec(TCG_TYPE_V128, MO_8, 0);
 vec_gen_3(INDEX_op_x86_punpckl_vec, TCG_TYPE_V128, MO_8,
-  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(t2));
+  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
 vec_gen_3(INDEX_op_x86_punpckl_vec, TCG_TYPE_V128, MO_8,
-  tcgv_vec_arg(t2), tcgv_vec_arg(t2), tcgv_vec_arg(v2));
+  tcgv_vec_arg(t2), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
 tcg_gen_mul_vec(MO_16, t1, t1, t2);
 tcg_gen_shri_vec(MO_16, t1, t1, 8);
 vec_gen_3(INDEX_op_x86_packus_vec, TCG_TYPE_V128, MO_8,
@@ -3489,15 +3489,15 @@ static void expand_vec_mul(TCGType type, unsigned vece,
 t2 = tcg_temp_new_vec(type);
 t3 = tcg_temp_new_vec(type);
 t4 = tcg_temp_new_vec(type);
-tcg_gen_dup16i_vec(t4, 0);
+zero = tcg_constant_vec(TCG_TYPE_V128, MO_8, 0);
 vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
-  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(t4));
+  tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
 vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
-  tcgv_vec_arg(t2), tcgv_vec_arg(t4), tcgv_vec_arg(v2));
+  tcgv_vec_arg(t2), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
 vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
-  tcgv_vec_arg(t3), tcgv_vec_arg(v1), tcgv_vec_arg(t4));
+  tcgv_vec_arg(t3), tcgv_vec_arg(v1), tcgv_vec_arg(zero));
 vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
-  tcgv_vec_arg(t4), tcgv_vec_arg(t4), tcgv_vec_arg(v2));
+  tcgv_vec_arg(t4), tcgv_vec_arg(zero), tcgv_vec_arg(v2));
 tcg_gen_mul_vec(MO_16, t1, t1, t2);
 tcg_gen_mul_vec(MO_16, t3, t3, t4);
 tcg_gen_shri_vec(MO_16, t1, t1, 8);
@@ -3525,7 +3525,7 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned 
vece, TCGv_vec v0,
 NEED_UMIN = 8,
 NEED_UMAX = 16,
 };
-TCGv_vec t1, t2;
+TCGv_vec t1, t2, t3;
 uint8_t fixup;
 
 switch (cond) {
@@ -3596,9 +3596,9 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned 
vece, TCGv_vec v0,
 } else if (fixup & NEED_BIAS) {
 t1 = tcg_temp_new_vec(type);
 t2 = tcg_temp_new_vec(type);
-tcg_gen_dupi_vec(vece, t2, 1ull << ((8 << vece) - 1));
-tcg_gen_sub_vec(vece, t1, v1, t2);
-tcg_gen_sub_vec(vece, t2, v2, t2);
+t3 = tcg_constant_vec(type, vece, 1ull << ((8 << vece) - 1));
+tcg_gen_sub_vec(vece, t1, v1, t3);
+tcg_gen_sub_vec(vece, t2, v2, t3);
 v1 = t1;
 v2 = t2;
 cond = tcg_signed_cond(cond);
-- 
2.20.1

[PATCH v2 32/36] tcg/i386: Implement INDEX_op_rotl[is]_vec

2020-04-21 Thread Richard Henderson

We must continue the special casing of 8-bit elements and the
other element sizes are trivially implemented with shifts.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 85 +++
 1 file changed, 69 insertions(+), 16 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index deace219d2..6039ae4fc6 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3255,6 +3255,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shls_vec:
 case INDEX_op_shrs_vec:
 case INDEX_op_sars_vec:
+case INDEX_op_rotls_vec:
 case INDEX_op_cmp_vec:
 case INDEX_op_x86_shufps_vec:
 case INDEX_op_x86_blend_vec:
@@ -3293,6 +3294,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_xor_vec:
 case INDEX_op_andc_vec:
 return 1;
+case INDEX_op_rotli_vec:
 case INDEX_op_cmp_vec:
 case INDEX_op_cmpsel_vec:
 return -1;
@@ -3316,6 +3318,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 
 case INDEX_op_shls_vec:
 case INDEX_op_shrs_vec:
+case INDEX_op_rotls_vec:
 return vece >= MO_16;
 case INDEX_op_sars_vec:
 return vece >= MO_16 && vece <= MO_32;
@@ -3353,7 +3356,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 }
 }
 
-static void expand_vec_shi(TCGType type, unsigned vece, bool shr,
+static void expand_vec_shi(TCGType type, unsigned vece, TCGOpcode opc,
TCGv_vec v0, TCGv_vec v1, TCGArg imm)
 {
 TCGv_vec t1, t2;
@@ -3363,26 +3366,31 @@ static void expand_vec_shi(TCGType type, unsigned vece, 
bool shr,
 t1 = tcg_temp_new_vec(type);
 t2 = tcg_temp_new_vec(type);
 
-/* Unpack to W, shift, and repack.  Tricky bits:
-   (1) Use punpck*bw x,x to produce DDCCBBAA,
-   i.e. duplicate in other half of the 16-bit lane.
-   (2) For right-shift, add 8 so that the high half of
-   the lane becomes zero.  For left-shift, we must
-   shift up and down again.
-   (3) Step 2 leaves high half zero such that PACKUSWB
-   (pack with unsigned saturation) does not modify
-   the quantity.  */
+/*
+ * Unpack to W, shift, and repack.  Tricky bits:
+ * (1) Use punpck*bw x,x to produce DDCCBBAA,
+ * i.e. duplicate in other half of the 16-bit lane.
+ * (2) For right-shift, add 8 so that the high half of the lane
+ * becomes zero.  For left-shift, and left-rotate, we must
+ * shift up and down again.
+ * (3) Step 2 leaves high half zero such that PACKUSWB
+ * (pack with unsigned saturation) does not modify
+ * the quantity.
+ */
 vec_gen_3(INDEX_op_x86_punpckl_vec, type, MO_8,
   tcgv_vec_arg(t1), tcgv_vec_arg(v1), tcgv_vec_arg(v1));
 vec_gen_3(INDEX_op_x86_punpckh_vec, type, MO_8,
   tcgv_vec_arg(t2), tcgv_vec_arg(v1), tcgv_vec_arg(v1));
 
-if (shr) {
-tcg_gen_shri_vec(MO_16, t1, t1, imm + 8);
-tcg_gen_shri_vec(MO_16, t2, t2, imm + 8);
+if (opc != INDEX_op_rotli_vec) {
+imm += 8;
+}
+if (opc == INDEX_op_shri_vec) {
+tcg_gen_shri_vec(MO_16, t1, t1, imm);
+tcg_gen_shri_vec(MO_16, t2, t2, imm);
 } else {
-tcg_gen_shli_vec(MO_16, t1, t1, imm + 8);
-tcg_gen_shli_vec(MO_16, t2, t2, imm + 8);
+tcg_gen_shli_vec(MO_16, t1, t1, imm);
+tcg_gen_shli_vec(MO_16, t2, t2, imm);
 tcg_gen_shri_vec(MO_16, t1, t1, 8);
 tcg_gen_shri_vec(MO_16, t2, t2, 8);
 }
@@ -3449,6 +3457,43 @@ static void expand_vec_sari(TCGType type, unsigned vece,
 }
 }
 
+static void expand_vec_rotli(TCGType type, unsigned vece,
+ TCGv_vec v0, TCGv_vec v1, TCGArg imm)
+{
+TCGv_vec t;
+
+if (vece == MO_8) {
+expand_vec_shi(type, vece, INDEX_op_rotli_vec, v0, v1, imm);
+return;
+}
+
+t = tcg_temp_new_vec(type);
+tcg_gen_shli_vec(vece, t, v1, imm);
+tcg_gen_shri_vec(vece, v0, v1, (8 << vece) - imm);
+tcg_gen_or_vec(vece, v0, v0, t);
+tcg_temp_free_vec(t);
+}
+
+static void expand_vec_rotls(TCGType type, unsigned vece,
+ TCGv_vec v0, TCGv_vec v1, TCGv_i32 lsh)
+{
+TCGv_i32 rsh;
+TCGv_vec t;
+
+tcg_debug_assert(vece != MO_8);
+
+t = tcg_temp_new_vec(type);
+rsh = tcg_temp_new_i32();
+
+tcg_gen_neg_i32(rsh, lsh);
+tcg_gen_andi_i32(rsh, rsh, (8 << vece) - 1);
+tcg_gen_shls_vec(vece, t, v1, lsh);
+tcg_gen_shrs_vec(vece, v0, v1, rsh);
+tcg_gen_or_vec(vece, v0, v0, t);
+tcg_temp_free_vec(t);
+tcg_temp_free_i32(rsh);
+}
+
 static void expand_vec_mul(TCGType type, unsigned vece,
TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
 {
@@ -3658,13 +3703,21 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
 switch (opc) {

[PATCH v2 20/36] tcg: Remove movi and dupi opcodes

2020-04-21 Thread Richard Henderson

These are now completely covered by mov from a
TYPE_CONST temporary.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-opc.h|  3 ---
 tcg/aarch64/tcg-target.inc.c |  3 ---
 tcg/arm/tcg-target.inc.c |  1 -
 tcg/i386/tcg-target.inc.c|  3 ---
 tcg/mips/tcg-target.inc.c|  2 --
 tcg/optimize.c   |  4 
 tcg/ppc/tcg-target.inc.c |  3 ---
 tcg/riscv/tcg-target.inc.c   |  2 --
 tcg/s390/tcg-target.inc.c|  2 --
 tcg/sparc/tcg-target.inc.c   |  2 --
 tcg/tcg-op-vec.c |  1 -
 tcg/tcg.c| 18 +-
 tcg/tci/tcg-target.inc.c |  2 --
 13 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 7dee9b38f7..4a9cbf5426 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -45,7 +45,6 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 DEF(mb, 0, 0, 1, 0)
 
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
-DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
 DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
 /* load/store */
@@ -110,7 +109,6 @@ DEF(ctz_i32, 1, 2, 0, IMPL(TCG_TARGET_HAS_ctz_i32))
 DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
-DEF(movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(setcond_i64, 1, 2, 1, IMPL64)
 DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
 /* load/store */
@@ -215,7 +213,6 @@ DEF(qemu_st_i64, 0, TLADDR_ARGS + DATA64_ARGS, 1,
 #define IMPLVEC  TCG_OPF_VECTOR | IMPL(TCG_TARGET_MAYBE_vec)
 
 DEF(mov_vec, 1, 1, 0, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
-DEF(dupi_vec, 1, 0, 1, TCG_OPF_VECTOR | TCG_OPF_NOT_PRESENT)
 
 DEF(dup_vec, 1, 1, 0, IMPLVEC)
 DEF(dup2_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_REG_BITS == 32))
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 843fd0ca69..7918aeb9d5 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -2261,8 +2261,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_mov_i64:
-case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-case INDEX_op_movi_i64:
 case INDEX_op_call: /* Always emitted via tcg_out_call.  */
 default:
 g_assert_not_reached();
@@ -2467,7 +2465,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
-case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
 case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
 default:
 g_assert_not_reached();
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 6aa7757aac..b967499fa4 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2068,7 +2068,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
-case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
 case INDEX_op_call: /* Always emitted via tcg_out_call.  */
 default:
 tcg_abort();
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index ec083bddcf..320a4bddd1 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2678,8 +2678,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_mov_i64:
-case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-case INDEX_op_movi_i64:
 case INDEX_op_call: /* Always emitted via tcg_out_call.  */
 default:
 tcg_abort();
@@ -2965,7 +2963,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
-case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
 case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
 default:
 g_assert_not_reached();
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 4d32ebc1df..09dc5a94fa 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2155,8 +2155,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_mov_i64:
-case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
-case INDEX_op_movi_i64:
 case INDEX_op_call: /* Always emitted via tcg_out_call.  */
 default:
 tcg_abort();
diff --git a/tcg/optimize.c b/tcg/optimize.c
index dd5187be31..9a2c945dbe 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1099,10 +1099,6 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(mov):
 tcg_opt_gen_mov(s, op, op->args[0],

[PATCH v2 29/36] tcg: Implement gvec support for rotate by vector

2020-04-21 Thread Richard Henderson

No host backend support yet, but the interfaces for rotlv
and rotrv are in place.

Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime.h  |  10 +++
 include/tcg/tcg-op-gvec.h|   4 ++
 include/tcg/tcg-op.h |   2 +
 include/tcg/tcg-opc.h|   2 +
 include/tcg/tcg.h|   1 +
 tcg/aarch64/tcg-target.h |   1 +
 tcg/i386/tcg-target.h|   1 +
 tcg/ppc/tcg-target.h |   1 +
 accel/tcg/tcg-runtime-gvec.c |  96 +++
 tcg/tcg-op-gvec.c| 122 +++
 tcg/tcg-op-vec.c |  83 
 tcg/tcg.c|   3 +
 tcg/README   |   4 +-
 13 files changed, 329 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index cf10c8361e..4eda24e63a 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -279,6 +279,16 @@ DEF_HELPER_FLAGS_4(gvec_sar16v, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_sar32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_sar64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_rotl8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotl16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotl32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotl64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_rotr8v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotr16v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotr32v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_rotr64v, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(gvec_eq8, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_eq16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_eq32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 1afc3ebf03..2d768f1160 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -356,6 +356,10 @@ void tcg_gen_gvec_shrv(unsigned vece, uint32_t dofs, 
uint32_t aofs,
uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_sarv(unsigned vece, uint32_t dofs, uint32_t aofs,
uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotlv(unsigned vece, uint32_t dofs, uint32_t aofs,
+uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotrv(unsigned vece, uint32_t dofs, uint32_t aofs,
+uint32_t bofs, uint32_t oprsz, uint32_t maxsz);
 
 void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, uint32_t dofs,
   uint32_t aofs, uint32_t bofs,
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index c624e371d5..0468009713 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -996,6 +996,8 @@ void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, TCGv_i32 s);
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+void tcg_gen_rotlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
+void tcg_gen_rotrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
  TCGv_vec a, TCGv_vec b);
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index c46c096c3e..d80335ba0d 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -254,6 +254,8 @@ DEF(sars_vec, 1, 2, 0, IMPLVEC | 
IMPL(TCG_TARGET_HAS_shs_vec))
 DEF(shlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(shrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(sarv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
+DEF(rotlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_rotv_vec))
+DEF(rotrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_rotv_vec))
 
 DEF(cmp_vec, 1, 2, 1, IMPLVEC)
 
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index d2034d9334..6bb2e3fe3c 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -183,6 +183,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_andc_vec 0
 #define TCG_TARGET_HAS_orc_vec  0
 #define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  0
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 225a597f84..a5477bbc07 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -134,6 +134,7 @@ typedef enum {
 #define TCG_TARGET_HAS_neg_vec  1
 #define TCG_TARGET_HAS_abs_vec  1
 #define

[PATCH v2 19/36] tcg/tci: Add special tci_movi_{i32,i64} opcodes

2020-04-21 Thread Richard Henderson

The normal movi opcodes are going away.  We need something
for TCI to use internally.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-opc.h| 8 
 tcg/tci.c| 4 ++--
 tcg/tci/tcg-target.inc.c | 4 ++--
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 9288a04946..7dee9b38f7 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -268,6 +268,14 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 #include "tcg-target.opc.h"
 #endif
 
+#ifdef TCG_TARGET_INTERPRETER
+/* These opcodes are only for use between the tci generator and interpreter. */
+DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+#if TCG_TARGET_REG_BITS == 64
+DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
+#endif
+#endif
+
 #undef TLADDR_ARGS
 #undef DATA64_ARGS
 #undef IMPL
diff --git a/tcg/tci.c b/tcg/tci.c
index 46fe9ce63f..a6c1aaf5af 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -576,7 +576,7 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
*tb_ptr)
 t1 = tci_read_r32(regs, _ptr);
 tci_write_reg32(regs, t0, t1);
 break;
-case INDEX_op_movi_i32:
+case INDEX_op_tci_movi_i32:
 t0 = *tb_ptr++;
 t1 = tci_read_i32(_ptr);
 tci_write_reg32(regs, t0, t1);
@@ -847,7 +847,7 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
*tb_ptr)
 t1 = tci_read_r64(regs, _ptr);
 tci_write_reg64(regs, t0, t1);
 break;
-case INDEX_op_movi_i64:
+case INDEX_op_tci_movi_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_i64(_ptr);
 tci_write_reg64(regs, t0, t1);
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index 992d50cb1e..1f1639df0d 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -530,13 +530,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 uint8_t *old_code_ptr = s->code_ptr;
 uint32_t arg32 = arg;
 if (type == TCG_TYPE_I32 || arg == arg32) {
-tcg_out_op_t(s, INDEX_op_movi_i32);
+tcg_out_op_t(s, INDEX_op_tci_movi_i32);
 tcg_out_r(s, t0);
 tcg_out32(s, arg32);
 } else {
 tcg_debug_assert(type == TCG_TYPE_I64);
 #if TCG_TARGET_REG_BITS == 64
-tcg_out_op_t(s, INDEX_op_movi_i64);
+tcg_out_op_t(s, INDEX_op_tci_movi_i64);
 tcg_out_r(s, t0);
 tcg_out64(s, arg);
 #else
-- 
2.20.1

[PATCH v2 10/36] tcg: Add temp_readonly

2020-04-21 Thread Richard Henderson

In most, but not all, places that we check for TEMP_FIXED,
we are really testing that we do not modify the temporary.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h |  5 +
 tcg/tcg.c | 21 ++---
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 3534dce77f..27e1b509a6 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -678,6 +678,11 @@ struct TCGContext {
 target_ulong gen_insn_data[TCG_MAX_INSNS][TARGET_INSN_START_WORDS];
 };
 
+static inline bool temp_readonly(TCGTemp *ts)
+{
+return ts->kind == TEMP_FIXED;
+}
+
 extern TCGContext tcg_init_ctx;
 extern __thread TCGContext *tcg_ctx;
 extern TCGv_env cpu_env;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index eaf81397a3..92b3767097 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3132,7 +3132,7 @@ static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, 
TCGRegSet, TCGRegSet);
mark it free; otherwise mark it dead.  */
 static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
 {
-if (ts->kind == TEMP_FIXED) {
+if (temp_readonly(ts)) {
 return;
 }
 if (ts->val_type == TEMP_VAL_REG) {
@@ -3156,7 +3156,7 @@ static inline void temp_dead(TCGContext *s, TCGTemp *ts)
 static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs,
   TCGRegSet preferred_regs, int free_or_dead)
 {
-if (ts->kind == TEMP_FIXED) {
+if (temp_readonly(ts)) {
 return;
 }
 if (!ts->mem_coherent) {
@@ -3314,8 +3314,7 @@ static void temp_save(TCGContext *s, TCGTemp *ts, 
TCGRegSet allocated_regs)
 {
 /* The liveness analysis already ensures that globals are back
in memory. Keep an tcg_debug_assert for safety. */
-tcg_debug_assert(ts->val_type == TEMP_VAL_MEM
- || ts->kind == TEMP_FIXED);
+tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || temp_readonly(ts));
 }
 
 /* save globals to their canonical location and assume they can be
@@ -3373,7 +3372,7 @@ static void tcg_reg_alloc_do_movi(TCGContext *s, TCGTemp 
*ots,
   TCGRegSet preferred_regs)
 {
 /* ENV should not be modified.  */
-tcg_debug_assert(ots->kind != TEMP_FIXED);
+tcg_debug_assert(!temp_readonly(ots));
 
 /* The movi is not explicitly generated here.  */
 if (ots->val_type == TEMP_VAL_REG) {
@@ -3413,7 +3412,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp 
*op)
 ts = arg_temp(op->args[1]);
 
 /* ENV should not be modified.  */
-tcg_debug_assert(ots->kind != TEMP_FIXED);
+tcg_debug_assert(!temp_readonly(ots));
 
 /* Note that otype != itype for no-op truncation.  */
 otype = ots->type;
@@ -3474,7 +3473,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOp 
*op)
  * Store the source register into the destination slot
  * and leave the destination temp as TEMP_VAL_MEM.
  */
-assert(ots->kind != TEMP_FIXED);
+assert(!temp_readonly(ots));
 if (!ts->mem_allocated) {
 temp_allocate_frame(s, ots);
 }
@@ -3511,7 +3510,7 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp 
*op)
 its = arg_temp(op->args[1]);
 
 /* ENV should not be modified.  */
-tcg_debug_assert(ots->kind != TEMP_FIXED);
+tcg_debug_assert(!temp_readonly(ots));
 
 itype = its->type;
 vece = TCGOP_VECE(op);
@@ -3742,7 +3741,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 ts = arg_temp(arg);
 
 /* ENV should not be modified.  */
-tcg_debug_assert(ts->kind != TEMP_FIXED);
+tcg_debug_assert(!temp_readonly(ts));
 
 if ((arg_ct->ct & TCG_CT_ALIAS)
 && !const_args[arg_ct->alias_index]) {
@@ -3784,7 +3783,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 ts = arg_temp(op->args[i]);
 
 /* ENV should not be modified.  */
-tcg_debug_assert(ts->kind != TEMP_FIXED);
+tcg_debug_assert(!temp_readonly(ts));
 
 if (NEED_SYNC_ARG(i)) {
 temp_sync(s, ts, o_allocated_regs, 0, IS_DEAD_ARG(i));
@@ -3916,7 +3915,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 ts = arg_temp(arg);
 
 /* ENV should not be modified.  */
-tcg_debug_assert(ts->kind != TEMP_FIXED);
+tcg_debug_assert(!temp_readonly(ts));
 
 reg = tcg_target_call_oarg_regs[i];
 tcg_debug_assert(s->reg_to_temp[reg] == NULL);
-- 
2.20.1

[PATCH v2 23/36] tcg: Add tcg_reg_alloc_dup2

2020-04-21 Thread Richard Henderson

There are several ways we can expand a vector dup of a 64-bit
element on a 32-bit host.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 88 +++
 1 file changed, 88 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index fc1c97d586..d712d19842 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3870,6 +3870,91 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 }
 }
 
+static void tcg_reg_alloc_dup2(TCGContext *s, const TCGOp *op)
+{
+const TCGLifeData arg_life = op->life;
+TCGTemp *ots, *itsl, *itsh;
+TCGType vtype = TCGOP_VECL(op) + TCG_TYPE_V64;
+
+/* This opcode is only valid for 32-bit hosts, for 64-bit elements. */
+tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+tcg_debug_assert(TCGOP_VECE(op) == MO_64);
+
+ots = arg_temp(op->args[0]);
+itsl = arg_temp(op->args[1]);
+itsh = arg_temp(op->args[2]);
+
+/* ENV should not be modified.  */
+tcg_debug_assert(!temp_readonly(ots));
+
+/* Allocate the output register now.  */
+if (ots->val_type != TEMP_VAL_REG) {
+TCGRegSet allocated_regs = s->reserved_regs;
+TCGRegSet dup_out_regs =
+tcg_op_defs[INDEX_op_dup_vec].args_ct[0].u.regs;
+
+/* Make sure to not spill the input registers. */
+if (!IS_DEAD_ARG(1) && itsl->val_type == TEMP_VAL_REG) {
+tcg_regset_set_reg(allocated_regs, itsl->reg);
+}
+if (!IS_DEAD_ARG(2) && itsh->val_type == TEMP_VAL_REG) {
+tcg_regset_set_reg(allocated_regs, itsh->reg);
+}
+
+ots->reg = tcg_reg_alloc(s, dup_out_regs, allocated_regs,
+ op->output_pref[0], ots->indirect_base);
+ots->val_type = TEMP_VAL_REG;
+ots->mem_coherent = 0;
+s->reg_to_temp[ots->reg] = ots;
+}
+
+/* Promote dup2 of immediates to dupi_vec. */
+if (itsl->val_type == TEMP_VAL_CONST &&
+itsh->val_type == TEMP_VAL_CONST) {
+tcg_out_dupi_vec(s, vtype, ots->reg,
+ (uint32_t)itsl->val | ((uint64_t)itsh->val << 32));
+goto done;
+}
+
+/* If the two inputs form one 64-bit value, try dupm_vec. */
+if (itsl + 1 == itsh &&
+itsl->base_type == TCG_TYPE_I64 &&
+itsh->base_type == TCG_TYPE_I64) {
+if (!itsl->mem_coherent) {
+temp_sync(s, itsl, s->reserved_regs, 0, 0);
+}
+if (!itsl->mem_coherent) {
+temp_sync(s, itsl, s->reserved_regs, 0, 0);
+}
+#ifdef HOST_WORDS_BIGENDIAN
+TCGTemp *its = itsh;
+#else
+TCGTemp *its = itsl;
+#endif
+if (tcg_out_dupm_vec(s, vtype, MO_64, ots->reg,
+ its->mem_base->reg, its->mem_offset)) {
+goto done;
+}
+}
+
+/* Fall back to generic expansion. */
+tcg_reg_alloc_op(s, op);
+return;
+
+ done:
+if (IS_DEAD_ARG(1)) {
+temp_dead(s, itsl);
+}
+if (IS_DEAD_ARG(2)) {
+temp_dead(s, itsh);
+}
+if (NEED_SYNC_ARG(0)) {
+temp_sync(s, ots, s->reserved_regs, 0, IS_DEAD_ARG(0));
+} else if (IS_DEAD_ARG(0)) {
+temp_dead(s, ots);
+}
+}
+
 #ifdef TCG_TARGET_STACK_GROWSUP
 #define STACK_DIR(x) (-(x))
 #else
@@ -4261,6 +4346,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
 case INDEX_op_dup_vec:
 tcg_reg_alloc_dup(s, op);
 break;
+case INDEX_op_dup2_vec:
+tcg_reg_alloc_dup2(s, op);
+break;
 case INDEX_op_insn_start:
 if (num_insns >= 0) {
 size_t off = tcg_current_code_size(s);
-- 
2.20.1

[PATCH v2 18/36] tcg/optimize: Use tcg_constant_internal with constant folding

2020-04-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 106 ++---
 1 file changed, 48 insertions(+), 58 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index d36d7e1d7f..dd5187be31 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -178,37 +178,6 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
 return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg val)
-{
-const TCGOpDef *def;
-TCGOpcode new_op;
-tcg_target_ulong mask;
-TempOptInfo *di = arg_info(dst);
-
-def = _op_defs[op->opc];
-if (def->flags & TCG_OPF_VECTOR) {
-new_op = INDEX_op_dupi_vec;
-} else if (def->flags & TCG_OPF_64BIT) {
-new_op = INDEX_op_movi_i64;
-} else {
-new_op = INDEX_op_movi_i32;
-}
-op->opc = new_op;
-/* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
-op->args[0] = dst;
-op->args[1] = val;
-
-reset_temp(dst);
-di->is_const = true;
-di->val = val;
-mask = val;
-if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_movi_i32) {
-/* High bits of the destination are now garbage.  */
-mask |= ~0xull;
-}
-di->mask = mask;
-}
-
 static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
 {
 TCGTemp *dst_ts = arg_temp(dst);
@@ -259,6 +228,27 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg src)
 }
 }
 
+static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
+ TCGOp *op, TCGArg dst, TCGArg val)
+{
+const TCGOpDef *def = _op_defs[op->opc];
+TCGType type;
+TCGTemp *tv;
+
+if (def->flags & TCG_OPF_VECTOR) {
+type = TCGOP_VECL(op) + TCG_TYPE_V64;
+} else if (def->flags & TCG_OPF_64BIT) {
+type = TCG_TYPE_I64;
+} else {
+type = TCG_TYPE_I32;
+}
+
+/* Convert movi to mov with constant temp. */
+tv = tcg_constant_internal(type, val);
+init_ts_info(temps_used, tv);
+tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
+}
+
 static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg x, TCGArg y)
 {
 uint64_t l64, h64;
@@ -621,7 +611,7 @@ void tcg_optimize(TCGContext *s)
 nb_temps = s->nb_temps;
 nb_globals = s->nb_globals;
 
-bitmap_zero(temps_used.l, nb_temps);
+memset(_used, 0, sizeof(temps_used));
 for (i = 0; i < nb_temps; ++i) {
 s->temps[i].state_ptr = NULL;
 }
@@ -727,7 +717,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(rotr):
 if (arg_is_const(op->args[1])
 && arg_info(op->args[1])->val == 0) {
-tcg_opt_gen_movi(s, op, op->args[0], 0);
+tcg_opt_gen_movi(s, _used, op, op->args[0], 0);
 continue;
 }
 break;
@@ -1050,7 +1040,7 @@ void tcg_optimize(TCGContext *s)
 
 if (partmask == 0) {
 tcg_debug_assert(nb_oargs == 1);
-tcg_opt_gen_movi(s, op, op->args[0], 0);
+tcg_opt_gen_movi(s, _used, op, op->args[0], 0);
 continue;
 }
 if (affected == 0) {
@@ -1067,7 +1057,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(mulsh):
 if (arg_is_const(op->args[2])
 && arg_info(op->args[2])->val == 0) {
-tcg_opt_gen_movi(s, op, op->args[0], 0);
+tcg_opt_gen_movi(s, _used, op, op->args[0], 0);
 continue;
 }
 break;
@@ -1094,7 +1084,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(sub):
 CASE_OP_32_64_VEC(xor):
 if (args_are_copies(op->args[1], op->args[2])) {
-tcg_opt_gen_movi(s, op, op->args[0], 0);
+tcg_opt_gen_movi(s, _used, op, op->args[0], 0);
 continue;
 }
 break;
@@ -,14 +1101,14 @@ void tcg_optimize(TCGContext *s)
 break;
 CASE_OP_32_64(movi):
 case INDEX_op_dupi_vec:
-tcg_opt_gen_movi(s, op, op->args[0], op->args[1]);
+tcg_opt_gen_movi(s, _used, op, op->args[0], op->args[1]);
 break;
 
 case INDEX_op_dup_vec:
 if (arg_is_const(op->args[1])) {
 tmp = arg_info(op->args[1])->val;
 tmp = dup_const(TCGOP_VECE(op), tmp);
-tcg_opt_gen_movi(s, op, op->args[0], tmp);
+tcg_opt_gen_movi(s, _used, op, op->args[0], tmp);
 break;
 }
 goto do_default;
@@ -1141,7 +1131,7 @@ void tcg_optimize(TCGContext *s)
 case INDEX_op_extrh_i64_i32:
 if (arg_is_const(op->args[1])) {
 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
-tcg_opt_gen_movi(s, op, op->args[0], tmp);
+tcg_opt_gen_movi(s, _used, op, op->args[0], tmp);
 break;
 }

[PATCH v2 16/36] tcg: Rename struct tcg_temp_info to TempOptInfo

2020-04-21 Thread Richard Henderson

Fix this name vs our coding style.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index effb47eefd..b86bf3d707 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -35,20 +35,20 @@
 glue(glue(case INDEX_op_, x), _i64):\
 glue(glue(case INDEX_op_, x), _vec)
 
-struct tcg_temp_info {
+typedef struct TempOptInfo {
 bool is_const;
 TCGTemp *prev_copy;
 TCGTemp *next_copy;
 tcg_target_ulong val;
 tcg_target_ulong mask;
-};
+} TempOptInfo;
 
-static inline struct tcg_temp_info *ts_info(TCGTemp *ts)
+static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
 return ts->state_ptr;
 }
 
-static inline struct tcg_temp_info *arg_info(TCGArg arg)
+static inline TempOptInfo *arg_info(TCGArg arg)
 {
 return ts_info(arg_temp(arg));
 }
@@ -71,9 +71,9 @@ static inline bool ts_is_copy(TCGTemp *ts)
 /* Reset TEMP's state, possibly removing the temp for the list of copies.  */
 static void reset_ts(TCGTemp *ts)
 {
-struct tcg_temp_info *ti = ts_info(ts);
-struct tcg_temp_info *pi = ts_info(ti->prev_copy);
-struct tcg_temp_info *ni = ts_info(ti->next_copy);
+TempOptInfo *ti = ts_info(ts);
+TempOptInfo *pi = ts_info(ti->prev_copy);
+TempOptInfo *ni = ts_info(ti->next_copy);
 
 ni->prev_copy = ti->prev_copy;
 pi->next_copy = ti->next_copy;
@@ -89,12 +89,12 @@ static void reset_temp(TCGArg arg)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_ts_info(struct tcg_temp_info *infos,
+static void init_ts_info(TempOptInfo *infos,
  TCGTempSet *temps_used, TCGTemp *ts)
 {
 size_t idx = temp_idx(ts);
 if (!test_bit(idx, temps_used->l)) {
-struct tcg_temp_info *ti = [idx];
+TempOptInfo *ti = [idx];
 
 ts->state_ptr = ti;
 ti->next_copy = ts;
@@ -114,7 +114,7 @@ static void init_ts_info(struct tcg_temp_info *infos,
 }
 }
 
-static void init_arg_info(struct tcg_temp_info *infos,
+static void init_arg_info(TempOptInfo *infos,
   TCGTempSet *temps_used, TCGArg arg)
 {
 init_ts_info(infos, temps_used, arg_temp(arg));
@@ -177,7 +177,7 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg val)
 const TCGOpDef *def;
 TCGOpcode new_op;
 tcg_target_ulong mask;
-struct tcg_temp_info *di = arg_info(dst);
+TempOptInfo *di = arg_info(dst);
 
 def = _op_defs[op->opc];
 if (def->flags & TCG_OPF_VECTOR) {
@@ -208,8 +208,8 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg src)
 TCGTemp *dst_ts = arg_temp(dst);
 TCGTemp *src_ts = arg_temp(src);
 const TCGOpDef *def;
-struct tcg_temp_info *di;
-struct tcg_temp_info *si;
+TempOptInfo *di;
+TempOptInfo *si;
 tcg_target_ulong mask;
 TCGOpcode new_op;
 
@@ -242,7 +242,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg src)
 di->mask = mask;
 
 if (src_ts->type == dst_ts->type) {
-struct tcg_temp_info *ni = ts_info(si->next_copy);
+TempOptInfo *ni = ts_info(si->next_copy);
 
 di->next_copy = si->next_copy;
 di->prev_copy = src_ts;
@@ -605,7 +605,7 @@ void tcg_optimize(TCGContext *s)
 {
 int nb_temps, nb_globals;
 TCGOp *op, *op_next, *prev_mb = NULL;
-struct tcg_temp_info *infos;
+TempOptInfo *infos;
 TCGTempSet temps_used;
 
 /* Array VALS has an element for each temp.
@@ -616,7 +616,7 @@ void tcg_optimize(TCGContext *s)
 nb_temps = s->nb_temps;
 nb_globals = s->nb_globals;
 bitmap_zero(temps_used.l, nb_temps);
-infos = tcg_malloc(sizeof(struct tcg_temp_info) * nb_temps);
+infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
 
 QTAILQ_FOREACH_SAFE(op, >ops, link, op_next) {
 tcg_target_ulong mask, partmask, affected;
-- 
2.20.1

[PATCH v2 21/36] tcg: Use tcg_out_dupi_vec from temp_load

2020-04-21 Thread Richard Henderson

Having dupi pass though movi is confusing and arguably wrong.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c |  7 
 tcg/i386/tcg-target.inc.c| 63 
 tcg/ppc/tcg-target.inc.c |  6 
 tcg/tcg.c|  8 -
 4 files changed, 49 insertions(+), 35 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 7918aeb9d5..e5c9ab70a9 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1009,13 +1009,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg rd,
 case TCG_TYPE_I64:
 tcg_debug_assert(rd < 32);
 break;
-
-case TCG_TYPE_V64:
-case TCG_TYPE_V128:
-tcg_debug_assert(rd >= 32);
-tcg_out_dupi_vec(s, type, rd, value);
-return;
-
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 320a4bddd1..07424f7ef9 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -977,30 +977,32 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 }
 }
 
-static void tcg_out_movi(TCGContext *s, TCGType type,
- TCGReg ret, tcg_target_long arg)
+static void tcg_out_movi_vec(TCGContext *s, TCGType type,
+ TCGReg ret, tcg_target_long arg)
+{
+if (arg == 0) {
+tcg_out_vex_modrm(s, OPC_PXOR, ret, ret, ret);
+return;
+}
+if (arg == -1) {
+tcg_out_vex_modrm(s, OPC_PCMPEQB, ret, ret, ret);
+return;
+}
+
+int rexw = (type == TCG_TYPE_I32 ? 0 : P_REXW);
+tcg_out_vex_modrm_pool(s, OPC_MOVD_VyEy + rexw, ret);
+if (TCG_TARGET_REG_BITS == 64) {
+new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
+} else {
+new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
+}
+}
+
+static void tcg_out_movi_int(TCGContext *s, TCGType type,
+ TCGReg ret, tcg_target_long arg)
 {
 tcg_target_long diff;
 
-switch (type) {
-case TCG_TYPE_I32:
-#if TCG_TARGET_REG_BITS == 64
-case TCG_TYPE_I64:
-#endif
-if (ret < 16) {
-break;
-}
-/* fallthru */
-case TCG_TYPE_V64:
-case TCG_TYPE_V128:
-case TCG_TYPE_V256:
-tcg_debug_assert(ret >= 16);
-tcg_out_dupi_vec(s, type, ret, arg);
-return;
-default:
-g_assert_not_reached();
-}
-
 if (arg == 0) {
 tgen_arithr(s, ARITH_XOR, ret, ret);
 return;
@@ -1029,6 +1031,25 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 tcg_out64(s, arg);
 }
 
+static void tcg_out_movi(TCGContext *s, TCGType type,
+ TCGReg ret, tcg_target_long arg)
+{
+switch (type) {
+case TCG_TYPE_I32:
+#if TCG_TARGET_REG_BITS == 64
+case TCG_TYPE_I64:
+#endif
+if (ret < 16) {
+tcg_out_movi_int(s, type, ret, arg);
+} else {
+tcg_out_movi_vec(s, type, ret, arg);
+}
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
 {
 if (val == (int8_t)val) {
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index fb390ad978..7ab1e32064 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -987,12 +987,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg ret,
 tcg_out_movi_int(s, type, ret, arg, false);
 break;
 
-case TCG_TYPE_V64:
-case TCG_TYPE_V128:
-tcg_debug_assert(ret >= TCG_REG_V0);
-tcg_out_dupi_vec(s, type, ret, arg);
-break;
-
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index adb71f16ae..4f1ed1d2fe 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3359,7 +3359,13 @@ static void temp_load(TCGContext *s, TCGTemp *ts, 
TCGRegSet desired_regs,
 case TEMP_VAL_CONST:
 reg = tcg_reg_alloc(s, desired_regs, allocated_regs,
 preferred_regs, ts->indirect_base);
-tcg_out_movi(s, ts->type, reg, ts->val);
+if (ts->type <= TCG_TYPE_I64) {
+tcg_out_movi(s, ts->type, reg, ts->val);
+} else if (TCG_TARGET_REG_BITS == 64) {
+tcg_out_dupi_vec(s, ts->type, reg, ts->val);
+} else {
+tcg_out_dupi_vec(s, ts->type, reg, dup_const(MO_32, ts->val));
+}
 ts->mem_coherent = 0;
 break;
 case TEMP_VAL_MEM:
-- 
2.20.1

[PATCH v2 17/36] tcg/optimize: Adjust TempOptInfo allocation

2020-04-21 Thread Richard Henderson

Do not allocate a large block for indexing.  Instead, allocate
for each temporary as they are seen.

In general, this will use less memory, if we consider that most
TBs do not touch every target register.  This also allows us to
allocate TempOptInfo for new temps created during optimization.

Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 60 --
 1 file changed, 34 insertions(+), 26 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b86bf3d707..d36d7e1d7f 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -89,35 +89,41 @@ static void reset_temp(TCGArg arg)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_ts_info(TempOptInfo *infos,
- TCGTempSet *temps_used, TCGTemp *ts)
+static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
 {
 size_t idx = temp_idx(ts);
-if (!test_bit(idx, temps_used->l)) {
-TempOptInfo *ti = [idx];
+TempOptInfo *ti;
 
+if (test_bit(idx, temps_used->l)) {
+return;
+}
+set_bit(idx, temps_used->l);
+
+ti = ts->state_ptr;
+if (ti == NULL) {
+ti = tcg_malloc(sizeof(TempOptInfo));
 ts->state_ptr = ti;
-ti->next_copy = ts;
-ti->prev_copy = ts;
-if (ts->kind == TEMP_CONST) {
-ti->is_const = true;
-ti->val = ti->mask = ts->val;
-if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
-/* High bits of a 32-bit quantity are garbage.  */
-ti->mask |= ~0xull;
-}
-} else {
-ti->is_const = false;
-ti->mask = -1;
+}
+
+ti->next_copy = ts;
+ti->prev_copy = ts;
+if (ts->kind == TEMP_CONST) {
+ti->is_const = true;
+ti->val = ts->val;
+ti->mask = ts->val;
+if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
+/* High bits of a 32-bit quantity are garbage.  */
+ti->mask |= ~0xull;
 }
-set_bit(idx, temps_used->l);
+} else {
+ti->is_const = false;
+ti->mask = -1;
 }
 }
 
-static void init_arg_info(TempOptInfo *infos,
-  TCGTempSet *temps_used, TCGArg arg)
+static void init_arg_info(TCGTempSet *temps_used, TCGArg arg)
 {
-init_ts_info(infos, temps_used, arg_temp(arg));
+init_ts_info(temps_used, arg_temp(arg));
 }
 
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
@@ -603,9 +609,8 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
-int nb_temps, nb_globals;
+int nb_temps, nb_globals, i;
 TCGOp *op, *op_next, *prev_mb = NULL;
-TempOptInfo *infos;
 TCGTempSet temps_used;
 
 /* Array VALS has an element for each temp.
@@ -615,12 +620,15 @@ void tcg_optimize(TCGContext *s)
 
 nb_temps = s->nb_temps;
 nb_globals = s->nb_globals;
+
 bitmap_zero(temps_used.l, nb_temps);
-infos = tcg_malloc(sizeof(TempOptInfo) * nb_temps);
+for (i = 0; i < nb_temps; ++i) {
+s->temps[i].state_ptr = NULL;
+}
 
 QTAILQ_FOREACH_SAFE(op, >ops, link, op_next) {
 tcg_target_ulong mask, partmask, affected;
-int nb_oargs, nb_iargs, i;
+int nb_oargs, nb_iargs;
 TCGArg tmp;
 TCGOpcode opc = op->opc;
 const TCGOpDef *def = _op_defs[opc];
@@ -633,14 +641,14 @@ void tcg_optimize(TCGContext *s)
 for (i = 0; i < nb_oargs + nb_iargs; i++) {
 TCGTemp *ts = arg_temp(op->args[i]);
 if (ts) {
-init_ts_info(infos, _used, ts);
+init_ts_info(_used, ts);
 }
 }
 } else {
 nb_oargs = def->nb_oargs;
 nb_iargs = def->nb_iargs;
 for (i = 0; i < nb_oargs + nb_iargs; i++) {
-init_arg_info(infos, _used, op->args[i]);
+init_arg_info(_used, op->args[i]);
 }
 }
 
-- 
2.20.1

[PATCH v2 09/36] tcg: Consolidate 3 bits into enum TCGTempKind

2020-04-21 Thread Richard Henderson

The temp_fixed, temp_global, temp_local bits are all related.
Combine them into a single enumeration.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h |  20 +---
 tcg/optimize.c|   8 +--
 tcg/tcg.c | 122 --
 3 files changed, 90 insertions(+), 60 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index c48bd76b0a..3534dce77f 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -480,23 +480,27 @@ typedef enum TCGTempVal {
 TEMP_VAL_CONST,
 } TCGTempVal;
 
+typedef enum TCGTempKind {
+/* Temp is dead at the end of all basic blocks. */
+TEMP_NORMAL,
+/* Temp is saved across basic blocks but dead at the end of TBs. */
+TEMP_LOCAL,
+/* Temp is saved across both basic blocks and translation blocks. */
+TEMP_GLOBAL,
+/* Temp is in a fixed register. */
+TEMP_FIXED,
+} TCGTempKind;
+
 typedef struct TCGTemp {
 TCGReg reg:8;
 TCGTempVal val_type:8;
 TCGType base_type:8;
 TCGType type:8;
-unsigned int fixed_reg:1;
+TCGTempKind kind:3;
 unsigned int indirect_reg:1;
 unsigned int indirect_base:1;
 unsigned int mem_coherent:1;
 unsigned int mem_allocated:1;
-/* If true, the temp is saved across both basic blocks and
-   translation blocks.  */
-unsigned int temp_global:1;
-/* If true, the temp is saved across basic blocks but dead
-   at the end of translation blocks.  If false, the temp is
-   dead at the end of basic blocks.  */
-unsigned int temp_local:1;
 unsigned int temp_allocated:1;
 
 tcg_target_long val;
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 53aa8e5329..afb4a9a5a9 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -116,21 +116,21 @@ static TCGTemp *find_better_copy(TCGContext *s, TCGTemp 
*ts)
 TCGTemp *i;
 
 /* If this is already a global, we can't do better. */
-if (ts->temp_global) {
+if (ts->kind >= TEMP_GLOBAL) {
 return ts;
 }
 
 /* Search for a global first. */
 for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-if (i->temp_global) {
+if (i->kind >= TEMP_GLOBAL) {
 return i;
 }
 }
 
 /* If it is a temp, search for a temp local. */
-if (!ts->temp_local) {
+if (ts->kind == TEMP_NORMAL) {
 for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-if (ts->temp_local) {
+if (i->kind >= TEMP_LOCAL) {
 return i;
 }
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index dd4b3d7684..eaf81397a3 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1155,7 +1155,7 @@ static inline TCGTemp *tcg_global_alloc(TCGContext *s)
 tcg_debug_assert(s->nb_globals == s->nb_temps);
 s->nb_globals++;
 ts = tcg_temp_alloc(s);
-ts->temp_global = 1;
+ts->kind = TEMP_GLOBAL;
 
 return ts;
 }
@@ -1172,7 +1172,7 @@ static TCGTemp *tcg_global_reg_new_internal(TCGContext 
*s, TCGType type,
 ts = tcg_global_alloc(s);
 ts->base_type = type;
 ts->type = type;
-ts->fixed_reg = 1;
+ts->kind = TEMP_FIXED;
 ts->reg = reg;
 ts->name = name;
 tcg_regset_set_reg(s->reserved_regs, reg);
@@ -1199,7 +1199,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, 
TCGv_ptr base,
 bigendian = 1;
 #endif
 
-if (!base_ts->fixed_reg) {
+if (base_ts->kind != TEMP_FIXED) {
 /* We do not support double-indirect registers.  */
 tcg_debug_assert(!base_ts->indirect_reg);
 base_ts->indirect_base = 1;
@@ -1247,6 +1247,7 @@ TCGTemp *tcg_global_mem_new_internal(TCGType type, 
TCGv_ptr base,
 TCGTemp *tcg_temp_new_internal(TCGType type, bool temp_local)
 {
 TCGContext *s = tcg_ctx;
+TCGTempKind kind = temp_local ? TEMP_LOCAL : TEMP_NORMAL;
 TCGTemp *ts;
 int idx, k;
 
@@ -1259,7 +1260,7 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool 
temp_local)
 ts = >temps[idx];
 ts->temp_allocated = 1;
 tcg_debug_assert(ts->base_type == type);
-tcg_debug_assert(ts->temp_local == temp_local);
+tcg_debug_assert(ts->kind == kind);
 } else {
 ts = tcg_temp_alloc(s);
 if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
@@ -1268,18 +1269,18 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool 
temp_local)
 ts->base_type = type;
 ts->type = TCG_TYPE_I32;
 ts->temp_allocated = 1;
-ts->temp_local = temp_local;
+ts->kind = kind;
 
 tcg_debug_assert(ts2 == ts + 1);
 ts2->base_type = TCG_TYPE_I64;
 ts2->type = TCG_TYPE_I32;
 ts2->temp_allocated = 1;
-ts2->temp_local = temp_local;
+ts2->kind = kind;
 } else {
 ts->base_type = type;
 ts->type = type;
 ts->temp_allocated = 1;
-ts->temp_local = temp_local;
+ts->kind = kind;
 }
 }
 
@@

[PATCH v2 13/36] tcg: Use tcg_constant_{i32, i64} with tcg int expanders

2020-04-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op.h |  13 +--
 tcg/tcg-op.c | 216 ---
 2 files changed, 100 insertions(+), 129 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 230db6e022..11ed9192f7 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -271,6 +271,7 @@ void tcg_gen_mb(TCGBar);
 
 /* 32 bit ops */
 
+void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg);
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
 void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 arg2);
 void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
@@ -349,11 +350,6 @@ static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 
arg)
 }
 }
 
-static inline void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
-{
-tcg_gen_op2i_i32(INDEX_op_movi_i32, ret, arg);
-}
-
 static inline void tcg_gen_ld8u_i32(TCGv_i32 ret, TCGv_ptr arg2,
 tcg_target_long offset)
 {
@@ -467,6 +463,7 @@ static inline void tcg_gen_not_i32(TCGv_i32 ret, TCGv_i32 
arg)
 
 /* 64 bit ops */
 
+void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
 void tcg_gen_addi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
 void tcg_gen_subfi_i64(TCGv_i64 ret, int64_t arg1, TCGv_i64 arg2);
 void tcg_gen_subi_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
@@ -550,11 +547,6 @@ static inline void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 
arg)
 }
 }
 
-static inline void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg)
-{
-tcg_gen_op2i_i64(INDEX_op_movi_i64, ret, arg);
-}
-
 static inline void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2,
 tcg_target_long offset)
 {
@@ -698,7 +690,6 @@ static inline void tcg_gen_sub_i64(TCGv_i64 ret, TCGv_i64 
arg1, TCGv_i64 arg2)
 
 void tcg_gen_discard_i64(TCGv_i64 arg);
 void tcg_gen_mov_i64(TCGv_i64 ret, TCGv_i64 arg);
-void tcg_gen_movi_i64(TCGv_i64 ret, int64_t arg);
 void tcg_gen_ld8u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
 void tcg_gen_ld8s_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
 void tcg_gen_ld16u_i64(TCGv_i64 ret, TCGv_ptr arg2, tcg_target_long offset);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index e2e25ebf7d..07eb661a07 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -104,15 +104,18 @@ void tcg_gen_mb(TCGBar mb_type)
 
 /* 32 bit ops */
 
+void tcg_gen_movi_i32(TCGv_i32 ret, int32_t arg)
+{
+tcg_gen_mov_i32(ret, tcg_constant_i32(arg));
+}
+
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
 /* some cases can be optimized here */
 if (arg2 == 0) {
 tcg_gen_mov_i32(ret, arg1);
 } else {
-TCGv_i32 t0 = tcg_const_i32(arg2);
-tcg_gen_add_i32(ret, arg1, t0);
-tcg_temp_free_i32(t0);
+tcg_gen_add_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 }
 
@@ -122,9 +125,7 @@ void tcg_gen_subfi_i32(TCGv_i32 ret, int32_t arg1, TCGv_i32 
arg2)
 /* Don't recurse with tcg_gen_neg_i32.  */
 tcg_gen_op2_i32(INDEX_op_neg_i32, ret, arg2);
 } else {
-TCGv_i32 t0 = tcg_const_i32(arg1);
-tcg_gen_sub_i32(ret, t0, arg2);
-tcg_temp_free_i32(t0);
+tcg_gen_sub_i32(ret, tcg_constant_i32(arg1), arg2);
 }
 }
 
@@ -134,15 +135,12 @@ void tcg_gen_subi_i32(TCGv_i32 ret, TCGv_i32 arg1, 
int32_t arg2)
 if (arg2 == 0) {
 tcg_gen_mov_i32(ret, arg1);
 } else {
-TCGv_i32 t0 = tcg_const_i32(arg2);
-tcg_gen_sub_i32(ret, arg1, t0);
-tcg_temp_free_i32(t0);
+tcg_gen_sub_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 }
 
 void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
-TCGv_i32 t0;
 /* Some cases can be optimized here.  */
 switch (arg2) {
 case 0:
@@ -165,9 +163,8 @@ void tcg_gen_andi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t 
arg2)
 }
 break;
 }
-t0 = tcg_const_i32(arg2);
-tcg_gen_and_i32(ret, arg1, t0);
-tcg_temp_free_i32(t0);
+
+tcg_gen_and_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 
 void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
@@ -178,9 +175,7 @@ void tcg_gen_ori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t 
arg2)
 } else if (arg2 == 0) {
 tcg_gen_mov_i32(ret, arg1);
 } else {
-TCGv_i32 t0 = tcg_const_i32(arg2);
-tcg_gen_or_i32(ret, arg1, t0);
-tcg_temp_free_i32(t0);
+tcg_gen_or_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 }
 
@@ -193,9 +188,7 @@ void tcg_gen_xori_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t 
arg2)
 /* Don't recurse with tcg_gen_not_i32.  */
 tcg_gen_op2_i32(INDEX_op_not_i32, ret, arg1);
 } else {
-TCGv_i32 t0 = tcg_const_i32(arg2);
-tcg_gen_xor_i32(ret, arg1, t0);
-tcg_temp_free_i32(t0);
+tcg_gen_xor_i32(ret, arg1, tcg_constant_i32(arg2));
 }
 }
 
@@ -205,9 +198,7 @@ void tcg_gen_shli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t 
arg2)
 if (arg2 == 0) {

[PATCH v2 15/36] tcg: Use tcg_constant_{i32,i64} with tcg plugins

2020-04-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 accel/tcg/plugin-gen.c | 49 +++---
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 51580d51a0..e5dc9d0ca9 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -284,8 +284,8 @@ static TCGOp *copy_extu_i32_i64(TCGOp **begin_op, TCGOp *op)
 if (TCG_TARGET_REG_BITS == 32) {
 /* mov_i32 */
 op = copy_op(begin_op, op, INDEX_op_mov_i32);
-/* movi_i32 */
-op = copy_op(begin_op, op, INDEX_op_movi_i32);
+/* mov_i32 w/ $0 */
+op = copy_op(begin_op, op, INDEX_op_mov_i32);
 } else {
 /* extu_i32_i64 */
 op = copy_op(begin_op, op, INDEX_op_extu_i32_i64);
@@ -306,39 +306,34 @@ static TCGOp *copy_mov_i64(TCGOp **begin_op, TCGOp *op)
 return op;
 }
 
-static TCGOp *copy_movi_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
-{
-if (TCG_TARGET_REG_BITS == 32) {
-/* 2x movi_i32 */
-op = copy_op(begin_op, op, INDEX_op_movi_i32);
-op->args[1] = v;
-
-op = copy_op(begin_op, op, INDEX_op_movi_i32);
-op->args[1] = v >> 32;
-} else {
-/* movi_i64 */
-op = copy_op(begin_op, op, INDEX_op_movi_i64);
-op->args[1] = v;
-}
-return op;
-}
-
 static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
 {
 if (UINTPTR_MAX == UINT32_MAX) {
-/* movi_i32 */
-op = copy_op(begin_op, op, INDEX_op_movi_i32);
-op->args[1] = (uintptr_t)ptr;
+/* mov_i32 */
+op = copy_op(begin_op, op, INDEX_op_mov_i32);
+op->args[1] = tcgv_i32_arg(tcg_constant_i32((uintptr_t)ptr));
 } else {
-/* movi_i64 */
-op = copy_movi_i64(begin_op, op, (uint64_t)(uintptr_t)ptr);
+/* mov_i64 */
+op = copy_op(begin_op, op, INDEX_op_mov_i64);
+op->args[1] = tcgv_i64_arg(tcg_constant_i64((uintptr_t)ptr));
 }
 return op;
 }
 
 static TCGOp *copy_const_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
 {
-return copy_movi_i64(begin_op, op, v);
+if (TCG_TARGET_REG_BITS == 32) {
+/* 2x mov_i32 */
+op = copy_op(begin_op, op, INDEX_op_mov_i32);
+op->args[1] = tcgv_i32_arg(tcg_constant_i32(v));
+op = copy_op(begin_op, op, INDEX_op_mov_i32);
+op->args[1] = tcgv_i32_arg(tcg_constant_i32(v >> 32));
+} else {
+/* mov_i64 */
+op = copy_op(begin_op, op, INDEX_op_mov_i64);
+op->args[1] = tcgv_i64_arg(tcg_constant_i64(v));
+}
+return op;
 }
 
 static TCGOp *copy_extu_tl_i64(TCGOp **begin_op, TCGOp *op)
@@ -486,8 +481,8 @@ static TCGOp *append_mem_cb(const struct qemu_plugin_dyn_cb 
*cb,
 
 tcg_debug_assert(type == PLUGIN_GEN_CB_MEM);
 
-/* const_i32 == movi_i32 ("info", so it remains as is) */
-op = copy_op(_op, op, INDEX_op_movi_i32);
+/* const_i32 == mov_i32 ("info", so it remains as is) */
+op = copy_op(_op, op, INDEX_op_mov_i32);
 
 /* const_ptr */
 op = copy_const_ptr(_op, op, cb->userp);
-- 
2.20.1

[PATCH v2 34/36] tcg/ppc: Implement INDEX_op_rot[lr]v_vec

2020-04-21 Thread Richard Henderson

We already had support for rotlv, using a target-specific opcode;
convert to use the generic opcode.  Handle rotrv via simple negation.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |  2 +-
 tcg/ppc/tcg-target.opc.h |  1 -
 tcg/ppc/tcg-target.inc.c | 23 +++
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 4a17aebc5a..be5b2901c3 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -163,7 +163,7 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_roti_vec 0
 #define TCG_TARGET_HAS_rots_vec 0
-#define TCG_TARGET_HAS_rotv_vec 0
+#define TCG_TARGET_HAS_rotv_vec 1
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
index 1373f77e82..db514403c3 100644
--- a/tcg/ppc/tcg-target.opc.h
+++ b/tcg/ppc/tcg-target.opc.h
@@ -30,4 +30,3 @@ DEF(ppc_msum_vec, 1, 3, 0, IMPLVEC)
 DEF(ppc_muleu_vec, 1, 2, 0, IMPLVEC)
 DEF(ppc_mulou_vec, 1, 2, 0, IMPLVEC)
 DEF(ppc_pkum_vec, 1, 2, 0, IMPLVEC)
-DEF(ppc_rotl_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index b55766..3f9690418f 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -2988,6 +2988,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_rotlv_vec:
 return vece <= MO_32 || have_isa_2_07;
 case INDEX_op_ssadd_vec:
 case INDEX_op_sssub_vec:
@@ -2998,6 +2999,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
 case INDEX_op_sari_vec:
+case INDEX_op_rotli_vec:
 return vece <= MO_32 || have_isa_2_07 ? -1 : 0;
 case INDEX_op_neg_vec:
 return vece >= MO_32 && have_isa_3_00;
@@ -3012,6 +3014,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 return 0;
 case INDEX_op_bitsel_vec:
 return have_vsx;
+case INDEX_op_rotrv_vec:
+return -1;
 default:
 return 0;
 }
@@ -3294,7 +3298,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ppc_pkum_vec:
 insn = pkum_op[vece];
 break;
-case INDEX_op_ppc_rotl_vec:
+case INDEX_op_rotlv_vec:
 insn = rotl_op[vece];
 break;
 case INDEX_op_ppc_msum_vec:
@@ -3401,7 +3405,7 @@ static void expand_vec_mul(TCGType type, unsigned vece, 
TCGv_vec v0,
 t3 = tcg_temp_new_vec(type);
 t4 = tcg_temp_new_vec(type);
 tcg_gen_dupi_vec(MO_8, t4, -16);
-vec_gen_3(INDEX_op_ppc_rotl_vec, type, MO_32, tcgv_vec_arg(t1),
+vec_gen_3(INDEX_op_rotlv_vec, type, MO_32, tcgv_vec_arg(t1),
   tcgv_vec_arg(v2), tcgv_vec_arg(t4));
 vec_gen_3(INDEX_op_ppc_mulou_vec, type, MO_16, tcgv_vec_arg(t2),
   tcgv_vec_arg(v1), tcgv_vec_arg(v2));
@@ -3426,7 +3430,7 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
TCGArg a0, ...)
 {
 va_list va;
-TCGv_vec v0, v1, v2;
+TCGv_vec v0, v1, v2, t0;
 TCGArg a2;
 
 va_start(va, a0);
@@ -3444,6 +3448,9 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
 case INDEX_op_sari_vec:
 expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_sarv_vec);
 break;
+case INDEX_op_rotli_vec:
+expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_rotlv_vec);
+break;
 case INDEX_op_cmp_vec:
 v2 = temp_tcgv_vec(arg_temp(a2));
 expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
@@ -3452,6 +3459,13 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
 v2 = temp_tcgv_vec(arg_temp(a2));
 expand_vec_mul(type, vece, v0, v1, v2);
 break;
+case INDEX_op_rotlv_vec:
+v2 = temp_tcgv_vec(arg_temp(a2));
+t0 = tcg_temp_new_vec(type);
+tcg_gen_neg_vec(vece, t0, v2);
+tcg_gen_rotlv_vec(vece, v0, v1, t0);
+tcg_temp_free_vec(t0);
+break;
 default:
 g_assert_not_reached();
 }
@@ -3656,12 +3670,13 @@ static const TCGTargetOpDef 
*tcg_target_op_def(TCGOpcode op)
 case INDEX_op_shlv_vec:
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
+case INDEX_op_rotlv_vec:
+case INDEX_op_rotrv_vec:
 case INDEX_op_ppc_mrgh_vec:
 case INDEX_op_ppc_mrgl_vec:
 case INDEX_op_ppc_muleu_vec:
 case INDEX_op_ppc_mulou_vec:
 case INDEX_op_ppc_pkum_vec:
-case INDEX_op_ppc_rotl_vec:
 case INDEX_op_dup2_vec:
 return _v_v;
 case INDEX_op_not_vec:
-- 
2.20.1

[PATCH v2 05/36] tcg: Use tcg_gen_gvec_dup_imm in logical simplifications

2020-04-21 Thread Richard Henderson

Replace the outgoing interface.

Reviewed-by: LIU Zhiwei 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 593bb4542e..de16c027b3 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2326,7 +2326,7 @@ void tcg_gen_gvec_xor(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 };
 
 if (aofs == bofs) {
-tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, 0);
+tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, 0);
 } else {
 tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, );
 }
@@ -2343,7 +2343,7 @@ void tcg_gen_gvec_andc(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 };
 
 if (aofs == bofs) {
-tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, 0);
+tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, 0);
 } else {
 tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, );
 }
@@ -2360,7 +2360,7 @@ void tcg_gen_gvec_orc(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 };
 
 if (aofs == bofs) {
-tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, -1);
+tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, -1);
 } else {
 tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, );
 }
@@ -2411,7 +2411,7 @@ void tcg_gen_gvec_eqv(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 };
 
 if (aofs == bofs) {
-tcg_gen_gvec_dup8i(dofs, oprsz, maxsz, -1);
+tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, -1);
 } else {
 tcg_gen_gvec_3(dofs, aofs, bofs, oprsz, maxsz, );
 }
-- 
2.20.1

[PATCH v2 14/36] tcg: Use tcg_constant_{i32, vec} with tcg vec expanders

2020-04-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-vec.c | 63 ++--
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index f3927089a7..655b3ae32d 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -233,25 +233,17 @@ void tcg_gen_mov_vec(TCGv_vec r, TCGv_vec a)
 }
 }
 
-#define MO_REG  (TCG_TARGET_REG_BITS == 64 ? MO_64 : MO_32)
-
-static void do_dupi_vec(TCGv_vec r, unsigned vece, TCGArg a)
-{
-TCGTemp *rt = tcgv_vec_temp(r);
-vec_gen_2(INDEX_op_dupi_vec, rt->base_type, vece, temp_arg(rt), a);
-}
-
 TCGv_vec tcg_const_zeros_vec(TCGType type)
 {
 TCGv_vec ret = tcg_temp_new_vec(type);
-do_dupi_vec(ret, MO_REG, 0);
+tcg_gen_mov_vec(ret, tcg_constant_vec(type, MO_8, 0));
 return ret;
 }
 
 TCGv_vec tcg_const_ones_vec(TCGType type)
 {
 TCGv_vec ret = tcg_temp_new_vec(type);
-do_dupi_vec(ret, MO_REG, -1);
+tcg_gen_mov_vec(ret, tcg_constant_vec(type, MO_8, -1));
 return ret;
 }
 
@@ -267,37 +259,50 @@ TCGv_vec tcg_const_ones_vec_matching(TCGv_vec m)
 return tcg_const_ones_vec(t->base_type);
 }
 
-void tcg_gen_dup64i_vec(TCGv_vec r, uint64_t a)
+void tcg_gen_dupi_vec(unsigned vece, TCGv_vec dest, uint64_t val)
 {
-if (TCG_TARGET_REG_BITS == 32 && a == deposit64(a, 32, 32, a)) {
-do_dupi_vec(r, MO_32, a);
-} else if (TCG_TARGET_REG_BITS == 64 || a == (uint64_t)(int32_t)a) {
-do_dupi_vec(r, MO_64, a);
-} else {
-TCGv_i64 c = tcg_const_i64(a);
-tcg_gen_dup_i64_vec(MO_64, r, c);
-tcg_temp_free_i64(c);
+TCGType type = tcgv_vec_temp(dest)->base_type;
+
+/*
+ * For MO_64 constants that can't be represented in tcg_target_long,
+ * we must use INDEX_op_dup2_vec.
+ */
+if (TCG_TARGET_REG_BITS == 32) {
+val = dup_const(vece, val);
+if (val != deposit64(val, 32, 32, val) &&
+val != (uint64_t)(int32_t)val) {
+uint32_t vl = extract64(val, 0, 32);
+uint32_t vh = extract64(val, 32, 32);
+TCGArg al = tcgv_i32_arg(tcg_constant_i32(vl));
+TCGArg ah = tcgv_i32_arg(tcg_constant_i32(vh));
+TCGArg di = tcgv_vec_arg(dest);
+
+vec_gen_3(INDEX_op_dup2_vec, type, MO_64, di, al, ah);
+return;
+}
 }
+
+tcg_gen_mov_vec(dest, tcg_constant_vec(type, vece, val));
 }
 
-void tcg_gen_dup32i_vec(TCGv_vec r, uint32_t a)
+void tcg_gen_dup64i_vec(TCGv_vec dest, uint64_t val)
 {
-do_dupi_vec(r, MO_REG, dup_const(MO_32, a));
+tcg_gen_dupi_vec(MO_64, dest, val);
 }
 
-void tcg_gen_dup16i_vec(TCGv_vec r, uint32_t a)
+void tcg_gen_dup32i_vec(TCGv_vec dest, uint32_t val)
 {
-do_dupi_vec(r, MO_REG, dup_const(MO_16, a));
+tcg_gen_dupi_vec(MO_32, dest, val);
 }
 
-void tcg_gen_dup8i_vec(TCGv_vec r, uint32_t a)
+void tcg_gen_dup16i_vec(TCGv_vec dest, uint32_t val)
 {
-do_dupi_vec(r, MO_REG, dup_const(MO_8, a));
+tcg_gen_dupi_vec(MO_16, dest, val);
 }
 
-void tcg_gen_dupi_vec(unsigned vece, TCGv_vec r, uint64_t a)
+void tcg_gen_dup8i_vec(TCGv_vec dest, uint32_t val)
 {
-do_dupi_vec(r, MO_REG, dup_const(vece, a));
+tcg_gen_dupi_vec(MO_8, dest, val);
 }
 
 void tcg_gen_dup_i64_vec(unsigned vece, TCGv_vec r, TCGv_i64 a)
@@ -502,8 +507,8 @@ void tcg_gen_abs_vec(unsigned vece, TCGv_vec r, TCGv_vec a)
 if (tcg_can_emit_vec_op(INDEX_op_sari_vec, type, vece) > 0) {
 tcg_gen_sari_vec(vece, t, a, (8 << vece) - 1);
 } else {
-do_dupi_vec(t, MO_REG, 0);
-tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a, t);
+tcg_gen_cmp_vec(TCG_COND_LT, vece, t, a,
+tcg_constant_vec(type, vece, 0));
 }
 tcg_gen_xor_vec(vece, r, a, t);
 tcg_gen_sub_vec(vece, r, r, t);
-- 
2.20.1

[PATCH v2 12/36] tcg: Use tcg_constant_i32 with icount expander

2020-04-21 Thread Richard Henderson

We must do this before we adjust how tcg_out_movi_i32,
lest the under-the-hood poking that we do be broken.

Signed-off-by: Richard Henderson 
---
 include/exec/gen-icount.h | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 822c43cfd3..404732518a 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -34,7 +34,7 @@ static inline void gen_io_end(void)
 
 static inline void gen_tb_start(TranslationBlock *tb)
 {
-TCGv_i32 count, imm;
+TCGv_i32 count;
 
 tcg_ctx->exitreq_label = gen_new_label();
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
@@ -48,15 +48,13 @@ static inline void gen_tb_start(TranslationBlock *tb)
offsetof(ArchCPU, env));
 
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
-imm = tcg_temp_new_i32();
-/* We emit a movi with a dummy immediate argument. Keep the insn index
- * of the movi so that we later (when we know the actual insn count)
- * can update the immediate argument with the actual insn count.  */
-tcg_gen_movi_i32(imm, 0xdeadbeef);
+/*
+ * We emit a sub with a dummy immediate argument. Keep the insn index
+ * of the sub so that we later (when we know the actual insn count)
+ * can update the argument with the actual insn count.
+ */
+tcg_gen_sub_i32(count, count, tcg_constant_i32(0));
 icount_start_insn = tcg_last_op();
-
-tcg_gen_sub_i32(count, count, imm);
-tcg_temp_free_i32(imm);
 }
 
 tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
@@ -74,9 +72,12 @@ static inline void gen_tb_start(TranslationBlock *tb)
 static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
 {
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
-/* Update the num_insn immediate parameter now that we know
- * the actual insn count.  */
-tcg_set_insn_param(icount_start_insn, 1, num_insns);
+/*
+ * Update the num_insn immediate parameter now that we know
+ * the actual insn count.
+ */
+tcg_set_insn_param(icount_start_insn, 2,
+   tcgv_i32_arg(tcg_constant_i32(num_insns)));
 }
 
 gen_set_label(tcg_ctx->exitreq_label);
-- 
2.20.1

[PATCH v2 31/36] tcg: Implement gvec support for rotate by scalar

2020-04-21 Thread Richard Henderson

No host backend support yet, but the interfaces for rotls
are in place.  Only implement left-rotate for now, as the
only known use of vector rotate by scalar is s390x, so any
right-rotate would be unused and untestable.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op-gvec.h |  2 ++
 include/tcg/tcg-op.h  |  1 +
 include/tcg/tcg-opc.h |  1 +
 include/tcg/tcg.h |  1 +
 tcg/aarch64/tcg-target.h  |  1 +
 tcg/i386/tcg-target.h |  1 +
 tcg/ppc/tcg-target.h  |  1 +
 tcg/tcg-op-gvec.c | 22 ++
 tcg/tcg-op-vec.c  |  5 +
 tcg/tcg.c |  2 ++
 10 files changed, 37 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 2d768f1160..c69a7de984 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -345,6 +345,8 @@ void tcg_gen_gvec_shrs(unsigned vece, uint32_t dofs, 
uint32_t aofs,
TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, uint32_t aofs,
TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
+void tcg_gen_gvec_rotls(unsigned vece, uint32_t dofs, uint32_t aofs,
+TCGv_i32 shift, uint32_t oprsz, uint32_t maxsz);
 
 /*
  * Perform vector shift by vector element, modulo the element size.
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 0468009713..d0319692ec 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -992,6 +992,7 @@ void tcg_gen_rotri_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, int64_t i);
 void tcg_gen_shls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 void tcg_gen_shrs_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 void tcg_gen_sars_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
+void tcg_gen_rotls_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_i32 s);
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec s);
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index d80335ba0d..d63c6bcb3d 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -250,6 +250,7 @@ DEF(rotli_vec, 1, 1, 1, IMPLVEC | 
IMPL(TCG_TARGET_HAS_roti_vec))
 DEF(shls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
 DEF(shrs_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
 DEF(sars_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shs_vec))
+DEF(rotls_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_rots_vec))
 
 DEF(shlv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(shrv_vec, 1, 2, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_shv_vec))
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 6bb2e3fe3c..57d6b0216c 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -183,6 +183,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_andc_vec 0
 #define TCG_TARGET_HAS_orc_vec  0
 #define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index a5477bbc07..9bc2a5ecbe 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -134,6 +134,7 @@ typedef enum {
 #define TCG_TARGET_HAS_neg_vec  1
 #define TCG_TARGET_HAS_abs_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  0
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 4c806c97db..99ac1e3958 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -184,6 +184,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_abs_vec  1
 #define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_shi_vec  1
 #define TCG_TARGET_HAS_shs_vec  1
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7993422526..4a17aebc5a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -162,6 +162,7 @@ extern bool have_vsx;
 #define TCG_TARGET_HAS_neg_vec  have_isa_3_00
 #define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_rots_vec 0
 #define TCG_TARGET_HAS_rotv_vec 0
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 2b71725883..3707c0effb 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2976,6 +2976,28 @@ void tcg_gen_gvec_sars(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 do_gvec_shifts(vece, dofs, aofs, shift, oprsz, maxsz, );
 }
 
+void tcg_gen_gvec_rotls(unsigned vece, uint32_t

[PATCH v2 04/36] target/arm: Use tcg_gen_gvec_dup_imm

2020-04-21 Thread Richard Henderson

In a few cases, we're able to remove some manual replication.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 10 +-
 target/arm/translate-sve.c | 12 +---
 target/arm/translate.c |  9 ++---
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 7580e46367..095638e09a 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -519,7 +519,7 @@ static void clear_vec_high(DisasContext *s, bool is_q, int 
rd)
 tcg_temp_free_i64(tcg_zero);
 }
 if (vsz > 16) {
-tcg_gen_gvec_dup8i(ofs + 16, vsz - 16, vsz - 16, 0);
+tcg_gen_gvec_dup_imm(MO_64, ofs + 16, vsz - 16, vsz - 16, 0);
 }
 }
 
@@ -7794,8 +7794,8 @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t 
insn)
 
 if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
 /* MOVI or MVNI, with MVNI negation handled above.  */
-tcg_gen_gvec_dup64i(vec_full_reg_offset(s, rd), is_q ? 16 : 8,
-vec_full_reg_size(s), imm);
+tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), is_q ? 16 : 8,
+ vec_full_reg_size(s), imm);
 } else {
 /* ORR or BIC, with BIC negation to AND handled above.  */
 if (is_neg) {
@@ -10223,8 +10223,8 @@ static void handle_vec_simd_shri(DisasContext *s, bool 
is_q, bool is_u,
 if (is_u) {
 if (shift == 8 << size) {
 /* Shift count the same size as element size produces zero.  */
-tcg_gen_gvec_dup8i(vec_full_reg_offset(s, rd),
-   is_q ? 16 : 8, vec_full_reg_size(s), 0);
+tcg_gen_gvec_dup_imm(size, vec_full_reg_offset(s, rd),
+ is_q ? 16 : 8, vec_full_reg_size(s), 0);
 } else {
 gen_gvec_fn2i(s, is_q, rd, rn, shift, tcg_gen_gvec_shri, size);
 }
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b35bad245e..6c8bda4e4c 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -177,7 +177,7 @@ static bool do_mov_z(DisasContext *s, int rd, int rn)
 static void do_dupi_z(DisasContext *s, int rd, uint64_t word)
 {
 unsigned vsz = vec_full_reg_size(s);
-tcg_gen_gvec_dup64i(vec_full_reg_offset(s, rd), vsz, vsz, word);
+tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), vsz, vsz, word);
 }
 
 /* Invoke a vector expander on two Pregs.  */
@@ -1453,7 +1453,7 @@ static bool do_predset(DisasContext *s, int esz, int rd, 
int pat, bool setflag)
 unsigned oprsz = size_for_gvec(setsz / 8);
 
 if (oprsz * 8 == setsz) {
-tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
+tcg_gen_gvec_dup_imm(MO_64, ofs, oprsz, maxsz, word);
 goto done;
 }
 }
@@ -2044,7 +2044,7 @@ static bool trans_DUP_x(DisasContext *s, arg_DUP_x *a)
 unsigned nofs = vec_reg_offset(s, a->rn, index, esz);
 tcg_gen_gvec_dup_mem(esz, dofs, nofs, vsz, vsz);
 } else {
-tcg_gen_gvec_dup64i(dofs, vsz, vsz, 0);
+tcg_gen_gvec_dup_imm(esz, dofs, vsz, vsz, 0);
 }
 }
 return true;
@@ -3260,9 +3260,7 @@ static bool trans_FDUP(DisasContext *s, arg_FDUP *a)
 
 /* Decode the VFP immediate.  */
 imm = vfp_expand_imm(a->esz, a->imm);
-imm = dup_const(a->esz, imm);
-
-tcg_gen_gvec_dup64i(dofs, vsz, vsz, imm);
+tcg_gen_gvec_dup_imm(a->esz, dofs, vsz, vsz, imm);
 }
 return true;
 }
@@ -3276,7 +3274,7 @@ static bool trans_DUP_i(DisasContext *s, arg_DUP_i *a)
 unsigned vsz = vec_full_reg_size(s);
 int dofs = vec_full_reg_offset(s, a->rd);
 
-tcg_gen_gvec_dup64i(dofs, vsz, vsz, dup_const(a->esz, a->imm));
+tcg_gen_gvec_dup_imm(a->esz, dofs, vsz, vsz, a->imm);
 }
 return true;
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9f9f4e19e0..af4d3ff4c9 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5386,7 +5386,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t 
insn)
   MIN(shift, (8 << size) - 1),
   vec_size, vec_size);
 } else if (shift >= 8 << size) {
-tcg_gen_gvec_dup8i(rd_ofs, vec_size, vec_size, 0);
+tcg_gen_gvec_dup_imm(MO_8, rd_ofs, vec_size,
+ vec_size, 0);
 } else {
 tcg_gen_gvec_shri(size, rd_ofs, rm_ofs, shift,
   vec_size, vec_size);
@@ -5437,7 +5438,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t 
insn)
  * architecturally valid and results in zero.
  */

[PATCH v2 11/36] tcg: Introduce TYPE_CONST temporaries

2020-04-21 Thread Richard Henderson

These will hold a single constant for the duration of the TB.
They are hashed, so that each value has one temp across the TB.

Not used yet, this is all infrastructure.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h |  27 ++-
 tcg/optimize.c|  40 ++---
 tcg/tcg-op-vec.c  |  17 +++
 tcg/tcg.c | 111 +-
 4 files changed, 166 insertions(+), 29 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 27e1b509a6..f72530dfda 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -489,6 +489,8 @@ typedef enum TCGTempKind {
 TEMP_GLOBAL,
 /* Temp is in a fixed register. */
 TEMP_FIXED,
+/* Temp is a fixed constant. */
+TEMP_CONST,
 } TCGTempKind;
 
 typedef struct TCGTemp {
@@ -664,6 +666,7 @@ struct TCGContext {
 QSIMPLEQ_HEAD(, TCGOp) plugin_ops;
 #endif
 
+GHashTable *const_table[TCG_TYPE_COUNT];
 TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
 TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
 
@@ -680,7 +683,7 @@ struct TCGContext {
 
 static inline bool temp_readonly(TCGTemp *ts)
 {
-return ts->kind == TEMP_FIXED;
+return ts->kind >= TEMP_FIXED;
 }
 
 extern TCGContext tcg_init_ctx;
@@ -1038,6 +1041,7 @@ TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, 
TCGOpcode opc);
 
 void tcg_optimize(TCGContext *s);
 
+/* Allocate a new temporary and initialize it with a constant. */
 TCGv_i32 tcg_const_i32(int32_t val);
 TCGv_i64 tcg_const_i64(int64_t val);
 TCGv_i32 tcg_const_local_i32(int32_t val);
@@ -1047,6 +1051,27 @@ TCGv_vec tcg_const_ones_vec(TCGType);
 TCGv_vec tcg_const_zeros_vec_matching(TCGv_vec);
 TCGv_vec tcg_const_ones_vec_matching(TCGv_vec);
 
+/*
+ * Locate or create a read-only temporary that is a constant.
+ * This kind of temporary need not and should not be freed.
+ */
+TCGTemp *tcg_constant_internal(TCGType type, tcg_target_long val);
+
+static inline TCGv_i32 tcg_constant_i32(int32_t val)
+{
+return temp_tcgv_i32(tcg_constant_internal(TCG_TYPE_I32, val));
+}
+
+static inline TCGv_i64 tcg_constant_i64(int64_t val)
+{
+if (TCG_TARGET_REG_BITS == 32) {
+qemu_build_not_reached();
+}
+return temp_tcgv_i64(tcg_constant_internal(TCG_TYPE_I64, val));
+}
+
+TCGv_vec tcg_constant_vec(TCGType type, unsigned vece, int64_t val);
+
 #if UINTPTR_MAX == UINT32_MAX
 # define tcg_const_ptr(x)((TCGv_ptr)tcg_const_i32((intptr_t)(x)))
 # define tcg_const_local_ptr(x)  ((TCGv_ptr)tcg_const_local_i32((intptr_t)(x)))
diff --git a/tcg/optimize.c b/tcg/optimize.c
index afb4a9a5a9..effb47eefd 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -99,8 +99,17 @@ static void init_ts_info(struct tcg_temp_info *infos,
 ts->state_ptr = ti;
 ti->next_copy = ts;
 ti->prev_copy = ts;
-ti->is_const = false;
-ti->mask = -1;
+if (ts->kind == TEMP_CONST) {
+ti->is_const = true;
+ti->val = ti->mask = ts->val;
+if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
+/* High bits of a 32-bit quantity are garbage.  */
+ti->mask |= ~0xull;
+}
+} else {
+ti->is_const = false;
+ti->mask = -1;
+}
 set_bit(idx, temps_used->l);
 }
 }
@@ -113,31 +122,28 @@ static void init_arg_info(struct tcg_temp_info *infos,
 
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
 {
-TCGTemp *i;
+TCGTemp *i, *g, *l;
 
-/* If this is already a global, we can't do better. */
-if (ts->kind >= TEMP_GLOBAL) {
+/* If this is already readonly, we can't do better. */
+if (temp_readonly(ts)) {
 return ts;
 }
 
-/* Search for a global first. */
+g = l = NULL;
 for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-if (i->kind >= TEMP_GLOBAL) {
+if (temp_readonly(i)) {
 return i;
-}
-}
-
-/* If it is a temp, search for a temp local. */
-if (ts->kind == TEMP_NORMAL) {
-for (i = ts_info(ts)->next_copy; i != ts; i = ts_info(i)->next_copy) {
-if (i->kind >= TEMP_LOCAL) {
-return i;
+} else if (i->kind > ts->kind) {
+if (i->kind == TEMP_GLOBAL) {
+g = i;
+} else if (i->kind == TEMP_LOCAL) {
+l = i;
 }
 }
 }
 
-/* Failure to find a better representation, return the same temp. */
-return ts;
+/* If we didn't find a better representation, return the same temp. */
+return g ? g : l ? l : ts;
 }
 
 static bool ts_are_copies(TCGTemp *ts1, TCGTemp *ts2)
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index b6937e8d64..f3927089a7 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -209,6 +209,23 @@ static void vec_gen_op3(TCGOpcode opc, unsigned vece,
 vec_gen_3(opc, type, vece, temp_arg(rt), temp_arg(at), temp_arg(bt));
 }
 
+TCGv_vec

[PATCH v2 08/36] tcg: Improve vector tail clearing

2020-04-21 Thread Richard Henderson

Better handling of non-power-of-2 tails as seen with Arm 8-byte
vector operations.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.c | 82 ---
 1 file changed, 63 insertions(+), 19 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 5a6cc19812..43cac1a0bf 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -326,11 +326,34 @@ void tcg_gen_gvec_5_ptr(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
in units of LNSZ.  This limits the expansion of inline code.  */
 static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz)
 {
-if (oprsz % lnsz == 0) {
-uint32_t lnct = oprsz / lnsz;
-return lnct >= 1 && lnct <= MAX_UNROLL;
+uint32_t q, r;
+
+if (oprsz < lnsz) {
+return false;
 }
-return false;
+
+q = oprsz / lnsz;
+r = oprsz % lnsz;
+tcg_debug_assert((r & 7) == 0);
+
+if (lnsz < 16) {
+/* For sizes below 16, accept no remainder. */
+if (r != 0) {
+return false;
+}
+} else {
+/*
+ * Recall that ARM SVE allows vector sizes that are not a
+ * power of 2, but always a multiple of 16.  The intent is
+ * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+ * In addition, expand_clr needs to handle a multiple of 8.
+ * Thus we can handle the tail with one more operation per
+ * diminishing power of 2.
+ */
+q += ctpop32(r);
+}
+
+return q <= MAX_UNROLL;
 }
 
 static void expand_clr(uint32_t dofs, uint32_t maxsz);
@@ -402,22 +425,31 @@ static void gen_dup_i64(unsigned vece, TCGv_i64 out, 
TCGv_i64 in)
 static TCGType choose_vector_type(const TCGOpcode *list, unsigned vece,
   uint32_t size, bool prefer_i64)
 {
-if (TCG_TARGET_HAS_v256 && check_size_impl(size, 32)) {
-/*
- * Recall that ARM SVE allows vector sizes that are not a
- * power of 2, but always a multiple of 16.  The intent is
- * that e.g. size == 80 would be expanded with 2x32 + 1x16.
- * It is hard to imagine a case in which v256 is supported
- * but v128 is not, but check anyway.
- */
-if (tcg_can_emit_vecop_list(list, TCG_TYPE_V256, vece)
-&& (size % 32 == 0
-|| tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece))) {
-return TCG_TYPE_V256;
-}
+/*
+ * Recall that ARM SVE allows vector sizes that are not a
+ * power of 2, but always a multiple of 16.  The intent is
+ * that e.g. size == 80 would be expanded with 2x32 + 1x16.
+ * It is hard to imagine a case in which v256 is supported
+ * but v128 is not, but check anyway.
+ * In addition, expand_clr needs to handle a multiple of 8.
+ */
+if (TCG_TARGET_HAS_v256 &&
+check_size_impl(size, 32) &&
+tcg_can_emit_vecop_list(list, TCG_TYPE_V256, vece) &&
+(!(size & 16) ||
+ (TCG_TARGET_HAS_v128 &&
+  tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece))) &&
+(!(size & 8) ||
+ (TCG_TARGET_HAS_v64 &&
+  tcg_can_emit_vecop_list(list, TCG_TYPE_V64, vece {
+return TCG_TYPE_V256;
 }
-if (TCG_TARGET_HAS_v128 && check_size_impl(size, 16)
-&& tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece)) {
+if (TCG_TARGET_HAS_v128 &&
+check_size_impl(size, 16) &&
+tcg_can_emit_vecop_list(list, TCG_TYPE_V128, vece) &&
+(!(size & 8) ||
+ (TCG_TARGET_HAS_v64 &&
+  tcg_can_emit_vecop_list(list, TCG_TYPE_V64, vece {
 return TCG_TYPE_V128;
 }
 if (TCG_TARGET_HAS_v64 && !prefer_i64 && check_size_impl(size, 8)
@@ -432,6 +464,18 @@ static void do_dup_store(TCGType type, uint32_t dofs, 
uint32_t oprsz,
 {
 uint32_t i = 0;
 
+tcg_debug_assert(oprsz >= 8);
+
+/*
+ * This may be expand_clr for the tail of an operation, e.g.
+ * oprsz == 8 && maxsz == 64.  The first 8 bytes of this store
+ * are misaligned wrt the maximum vector size, so do that first.
+ */
+if (dofs & 8) {
+tcg_gen_stl_vec(t_vec, cpu_env, dofs + i, TCG_TYPE_V64);
+i += 8;
+}
+
 switch (type) {
 case TCG_TYPE_V256:
 /*
-- 
2.20.1

[PATCH v2 02/36] target/s390x: Use tcg_gen_gvec_dup_imm

2020-04-21 Thread Richard Henderson

The gen_gvec_dupi switch is unnecessary with the new function.
Replace it with a local gen_gvec_dup_imm that takes care of the
register to offset conversion and length arguments.

Drop zero_vec and use use gen_gvec_dup_imm with 0.

Reviewed-by: David Hildenbrand 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/s390x/translate_vx.inc.c | 41 +++--
 1 file changed, 8 insertions(+), 33 deletions(-)

diff --git a/target/s390x/translate_vx.inc.c b/target/s390x/translate_vx.inc.c
index 24558cce80..12347f8a03 100644
--- a/target/s390x/translate_vx.inc.c
+++ b/target/s390x/translate_vx.inc.c
@@ -231,8 +231,8 @@ static void get_vec_element_ptr_i64(TCGv_ptr ptr, uint8_t 
reg, TCGv_i64 enr,
 #define gen_gvec_mov(v1, v2) \
 tcg_gen_gvec_mov(0, vec_full_reg_offset(v1), vec_full_reg_offset(v2), 16, \
  16)
-#define gen_gvec_dup64i(v1, c) \
-tcg_gen_gvec_dup64i(vec_full_reg_offset(v1), 16, 16, c)
+#define gen_gvec_dup_imm(es, v1, c) \
+tcg_gen_gvec_dup_imm(es, vec_full_reg_offset(v1), 16, 16, c);
 #define gen_gvec_fn_2(fn, es, v1, v2) \
 tcg_gen_gvec_##fn(es, vec_full_reg_offset(v1), vec_full_reg_offset(v2), \
   16, 16)
@@ -316,31 +316,6 @@ static void gen_gvec128_4_i64(gen_gvec128_4_i64_fn fn, 
uint8_t d, uint8_t a,
 tcg_temp_free_i64(cl);
 }
 
-static void gen_gvec_dupi(uint8_t es, uint8_t reg, uint64_t c)
-{
-switch (es) {
-case ES_8:
-tcg_gen_gvec_dup8i(vec_full_reg_offset(reg), 16, 16, c);
-break;
-case ES_16:
-tcg_gen_gvec_dup16i(vec_full_reg_offset(reg), 16, 16, c);
-break;
-case ES_32:
-tcg_gen_gvec_dup32i(vec_full_reg_offset(reg), 16, 16, c);
-break;
-case ES_64:
-gen_gvec_dup64i(reg, c);
-break;
-default:
-g_assert_not_reached();
-}
-}
-
-static void zero_vec(uint8_t reg)
-{
-tcg_gen_gvec_dup8i(vec_full_reg_offset(reg), 16, 16, 0);
-}
-
 static void gen_addi2_i64(TCGv_i64 dl, TCGv_i64 dh, TCGv_i64 al, TCGv_i64 ah,
   uint64_t b)
 {
@@ -396,8 +371,8 @@ static DisasJumpType op_vgbm(DisasContext *s, DisasOps *o)
  * Masks for both 64 bit elements of the vector are the same.
  * Trust tcg to produce a good constant loading.
  */
-gen_gvec_dup64i(get_field(s, v1),
-generate_byte_mask(i2 & 0xff));
+gen_gvec_dup_imm(ES_64, get_field(s, v1),
+ generate_byte_mask(i2 & 0xff));
 } else {
 TCGv_i64 t = tcg_temp_new_i64();
 
@@ -432,7 +407,7 @@ static DisasJumpType op_vgm(DisasContext *s, DisasOps *o)
 }
 }
 
-gen_gvec_dupi(es, get_field(s, v1), mask);
+gen_gvec_dup_imm(es, get_field(s, v1), mask);
 return DISAS_NEXT;
 }
 
@@ -585,7 +560,7 @@ static DisasJumpType op_vllez(DisasContext *s, DisasOps *o)
 
 t = tcg_temp_new_i64();
 tcg_gen_qemu_ld_i64(t, o->addr1, get_mem_index(s), MO_TE | es);
-zero_vec(get_field(s, v1));
+gen_gvec_dup_imm(es, get_field(s, v1), 0);
 write_vec_element_i64(t, get_field(s, v1), enr, es);
 tcg_temp_free_i64(t);
 return DISAS_NEXT;
@@ -892,7 +867,7 @@ static DisasJumpType op_vrepi(DisasContext *s, DisasOps *o)
 return DISAS_NORETURN;
 }
 
-gen_gvec_dupi(es, get_field(s, v1), data);
+gen_gvec_dup_imm(es, get_field(s, v1), data);
 return DISAS_NEXT;
 }
 
@@ -1372,7 +1347,7 @@ static DisasJumpType op_vcksm(DisasContext *s, DisasOps 
*o)
 read_vec_element_i32(tmp, get_field(s, v2), i, ES_32);
 tcg_gen_add2_i32(tmp, sum, sum, sum, tmp, tmp);
 }
-zero_vec(get_field(s, v1));
+gen_gvec_dup_imm(ES_32, get_field(s, v1), 0);
 write_vec_element_i32(sum, get_field(s, v1), 1, ES_32);
 
 tcg_temp_free_i32(tmp);
-- 
2.20.1

[PATCH v2 07/36] tcg: Add tcg_gen_gvec_dup_tl

2020-04-21 Thread Richard Henderson

For use when a target needs to pass a configure-specific
target_ulong value to duplicate.

Reviewed-by: LIU Zhiwei 
Reviewed-by: David Hildenbrand 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op-gvec.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index fa8a0c8d03..d89f91f40e 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -320,6 +320,12 @@ void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, 
uint32_t s,
 void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s,
   uint32_t m, TCGv_i64);
 
+#if TARGET_LONG_BITS == 64
+# define tcg_gen_gvec_dup_tl  tcg_gen_gvec_dup_i64
+#else
+# define tcg_gen_gvec_dup_tl  tcg_gen_gvec_dup_i32
+#endif
+
 void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs,
int64_t shift, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
-- 
2.20.1

[PATCH v2 03/36] target/ppc: Use tcg_gen_gvec_dup_imm

2020-04-21 Thread Richard Henderson

We can now unify the implementation of the 3 VSPLTI instructions.

Acked-by: David Gibson 
Signed-off-by: Richard Henderson 
---
 target/ppc/translate/vmx-impl.inc.c | 32 -
 target/ppc/translate/vsx-impl.inc.c |  2 +-
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.inc.c 
b/target/ppc/translate/vmx-impl.inc.c
index 81d5a7a341..403ed3a01c 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -1035,21 +1035,25 @@ GEN_VXRFORM_DUAL(vcmpbfp, PPC_ALTIVEC, PPC_NONE, \
 GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
  vcmpgtud, PPC_NONE, PPC2_ALTIVEC_207)
 
-#define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)   \
-static void glue(gen_, name)(DisasContext *ctx) \
-{   \
-int simm;   \
-if (unlikely(!ctx->altivec_enabled)) {  \
-gen_exception(ctx, POWERPC_EXCP_VPU);   \
-return; \
-}   \
-simm = SIMM5(ctx->opcode);  \
-tcg_op(avr_full_offset(rD(ctx->opcode)), 16, 16, simm); \
+static void gen_vsplti(DisasContext *ctx, int vece)
+{
+int simm;
+
+if (unlikely(!ctx->altivec_enabled)) {
+gen_exception(ctx, POWERPC_EXCP_VPU);
+return;
 }
 
-GEN_VXFORM_DUPI(vspltisb, tcg_gen_gvec_dup8i, 6, 12);
-GEN_VXFORM_DUPI(vspltish, tcg_gen_gvec_dup16i, 6, 13);
-GEN_VXFORM_DUPI(vspltisw, tcg_gen_gvec_dup32i, 6, 14);
+simm = SIMM5(ctx->opcode);
+tcg_gen_gvec_dup_imm(vece, avr_full_offset(rD(ctx->opcode)), 16, 16, simm);
+}
+
+#define GEN_VXFORM_VSPLTI(name, vece, opc2, opc3) \
+static void glue(gen_, name)(DisasContext *ctx) { gen_vsplti(ctx, vece); }
+
+GEN_VXFORM_VSPLTI(vspltisb, MO_8, 6, 12);
+GEN_VXFORM_VSPLTI(vspltish, MO_16, 6, 13);
+GEN_VXFORM_VSPLTI(vspltisw, MO_32, 6, 14);
 
 #define GEN_VXFORM_NOA(name, opc2, opc3)\
 static void glue(gen_, name)(DisasContext *ctx) \
@@ -1559,7 +1563,7 @@ GEN_VXFORM_DUAL(vsldoi, PPC_ALTIVEC, PPC_NONE,
 #undef GEN_VXRFORM_DUAL
 #undef GEN_VXRFORM1
 #undef GEN_VXRFORM
-#undef GEN_VXFORM_DUPI
+#undef GEN_VXFORM_VSPLTI
 #undef GEN_VXFORM_NOA
 #undef GEN_VXFORM_UIMM
 #undef GEN_VAFORM_PAIRED
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 8287e272f5..b518de46db 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1579,7 +1579,7 @@ static void gen_xxspltib(DisasContext *ctx)
 return;
 }
 }
-tcg_gen_gvec_dup8i(vsr_full_offset(rt), 16, 16, uim8);
+tcg_gen_gvec_dup_imm(MO_8, vsr_full_offset(rt), 16, 16, uim8);
 }
 
 static void gen_xxsldwi(DisasContext *ctx)
-- 
2.20.1

[PATCH v2 27/36] tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32, 64}

2020-04-21 Thread Richard Henderson

For the benefit of compatibility of function pointer types,
we have standardized on int32_t and int64_t as the integral
argument to tcg expanders.

We converted most of them in 474b2e8f0f7, but missed the rotates.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op.h |  8 
 tcg/tcg-op.c | 16 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index a39eb13ff0..b07bf7b524 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -298,9 +298,9 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t 
arg2);
 void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_ctpop_i32(TCGv_i32 a1, TCGv_i32 a2);
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
-void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
+void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
 void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
-void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
+void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2);
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
  unsigned int ofs, unsigned int len);
 void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
@@ -490,9 +490,9 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t 
arg2);
 void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_ctpop_i64(TCGv_i64 a1, TCGv_i64 a2);
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
-void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
+void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
 void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
-void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
+void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2);
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
  unsigned int ofs, unsigned int len);
 void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 07eb661a07..202d8057c5 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -516,9 +516,9 @@ void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 
arg2)
 }
 }
 
-void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2)
+void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
-tcg_debug_assert(arg2 < 32);
+tcg_debug_assert(arg2 >= 0 && arg2 < 32);
 /* some cases can be optimized here */
 if (arg2 == 0) {
 tcg_gen_mov_i32(ret, arg1);
@@ -554,9 +554,9 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 
arg2)
 }
 }
 
-void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2)
+void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
-tcg_debug_assert(arg2 < 32);
+tcg_debug_assert(arg2 >= 0 && arg2 < 32);
 /* some cases can be optimized here */
 if (arg2 == 0) {
 tcg_gen_mov_i32(ret, arg1);
@@ -1949,9 +1949,9 @@ void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, 
TCGv_i64 arg2)
 }
 }
 
-void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
+void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-tcg_debug_assert(arg2 < 64);
+tcg_debug_assert(arg2 >= 0 && arg2 < 64);
 /* some cases can be optimized here */
 if (arg2 == 0) {
 tcg_gen_mov_i64(ret, arg1);
@@ -1986,9 +1986,9 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, 
TCGv_i64 arg2)
 }
 }
 
-void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2)
+void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
-tcg_debug_assert(arg2 < 64);
+tcg_debug_assert(arg2 >= 0 && arg2 < 64);
 /* some cases can be optimized here */
 if (arg2 == 0) {
 tcg_gen_mov_i64(ret, arg1);
-- 
2.20.1

[PATCH v2 01/36] tcg: Add tcg_gen_gvec_dup_imm

2020-04-21 Thread Richard Henderson

Add a version of tcg_gen_dup_* that takes both immediate and
a vector element size operand.  This will replace the set of
tcg_gen_gvec_dup{8,16,32,64}i functions that encode the element
size within the function name.

Reviewed-by: LIU Zhiwei 
Reviewed-by: David Hildenbrand 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op-gvec.h | 2 ++
 tcg/tcg-op-gvec.c | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index 74534e2480..eb0d47a42b 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -313,6 +313,8 @@ void tcg_gen_gvec_ors(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 
 void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
   uint32_t s, uint32_t m);
+void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t s,
+  uint32_t m, uint64_t imm);
 void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, uint32_t s,
   uint32_t m, TCGv_i32);
 void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s,
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 327d9588e0..593bb4542e 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1569,6 +1569,13 @@ void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t oprsz,
 do_dup(MO_8, dofs, oprsz, maxsz, NULL, NULL, x);
 }
 
+void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t oprsz,
+  uint32_t maxsz, uint64_t x)
+{
+check_size_align(oprsz, maxsz, dofs);
+do_dup(vece, dofs, oprsz, maxsz, NULL, NULL, x);
+}
+
 void tcg_gen_gvec_not(unsigned vece, uint32_t dofs, uint32_t aofs,
   uint32_t oprsz, uint32_t maxsz)
 {
-- 
2.20.1

[PATCH v2 22/36] tcg: Increase tcg_out_dupi_vec immediate to int64_t

2020-04-21 Thread Richard Henderson

While we don't store more than tcg_target_long in TCGTemp,
we shouldn't be limited to that for code generation.  We will
be able to use this for INDEX_op_dup2_vec with 2 constants.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c |  2 +-
 tcg/i386/tcg-target.inc.c| 20 
 tcg/ppc/tcg-target.inc.c | 15 ---
 tcg/tcg.c|  4 ++--
 4 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index e5c9ab70a9..3b5a5d78c7 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -856,7 +856,7 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn 
insn, TCGType ext,
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
- TCGReg rd, tcg_target_long v64)
+ TCGReg rd, int64_t v64)
 {
 bool q = type == TCG_TYPE_V128;
 int cmode, imm8, i;
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 07424f7ef9..9cb627d6eb 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -945,7 +945,7 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, 
unsigned vece,
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
- TCGReg ret, tcg_target_long arg)
+ TCGReg ret, int64_t arg)
 {
 int vex_l = (type == TCG_TYPE_V256 ? P_VEXL : 0);
 
@@ -958,7 +958,14 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 return;
 }
 
-if (TCG_TARGET_REG_BITS == 64) {
+if (TCG_TARGET_REG_BITS == 32 && arg == dup_const(MO_32, arg)) {
+if (have_avx2) {
+tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
+} else {
+tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
+}
+new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
+} else {
 if (type == TCG_TYPE_V64) {
 tcg_out_vex_modrm_pool(s, OPC_MOVQ_VqWq, ret);
 } else if (have_avx2) {
@@ -966,14 +973,11 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 } else {
 tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
 }
-new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
-} else {
-if (have_avx2) {
-tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
+if (TCG_TARGET_REG_BITS == 64) {
+new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
 } else {
-tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
+new_pool_l2(s, R_386_32, s->code_ptr - 4, 0, arg, arg >> 32);
 }
-new_pool_label(s, arg, R_386_32, s->code_ptr - 4, 0);
 }
 }
 
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 7ab1e32064..b55766 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -913,7 +913,7 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, 
TCGReg ret,
 }
 
 static void tcg_out_dupi_vec(TCGContext *s, TCGType type, TCGReg ret,
- tcg_target_long val)
+ int64_t val)
 {
 uint32_t load_insn;
 int rel, low;
@@ -921,20 +921,20 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, 
TCGReg ret,
 
 low = (int8_t)val;
 if (low >= -16 && low < 16) {
-if (val == (tcg_target_long)dup_const(MO_8, low)) {
+if (val == dup_const(MO_8, low)) {
 tcg_out32(s, VSPLTISB | VRT(ret) | ((val & 31) << 16));
 return;
 }
-if (val == (tcg_target_long)dup_const(MO_16, low)) {
+if (val == dup_const(MO_16, low)) {
 tcg_out32(s, VSPLTISH | VRT(ret) | ((val & 31) << 16));
 return;
 }
-if (val == (tcg_target_long)dup_const(MO_32, low)) {
+if (val == dup_const(MO_32, low)) {
 tcg_out32(s, VSPLTISW | VRT(ret) | ((val & 31) << 16));
 return;
 }
 }
-if (have_isa_3_00 && val == (tcg_target_long)dup_const(MO_8, val)) {
+if (have_isa_3_00 && val == dup_const(MO_8, val)) {
 tcg_out32(s, XXSPLTIB | VRT(ret) | ((val & 0xff) << 11));
 return;
 }
@@ -956,14 +956,15 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, 
TCGReg ret,
 if (TCG_TARGET_REG_BITS == 64) {
 new_pool_label(s, val, rel, s->code_ptr, add);
 } else {
-new_pool_l2(s, rel, s->code_ptr, add, val, val);
+new_pool_l2(s, rel, s->code_ptr, add, val >> 32, val);
 }
 } else {
 load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
 if (TCG_TARGET_REG_BITS == 64) {
 new_pool_l2(s, rel, s->code_ptr, add, val, val);
 } else {
-new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+new_pool_l4(s, rel, s->code_ptr, add,
+val >> 32, val, val >> 32, val);

[PATCH v2 06/36] tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i

2020-04-21 Thread Richard Henderson

These interfaces are now unused.

Reviewed-by: LIU Zhiwei 
Reviewed-by: David Hildenbrand 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op-gvec.h |  5 -
 tcg/tcg-op-gvec.c | 28 
 2 files changed, 33 deletions(-)

diff --git a/include/tcg/tcg-op-gvec.h b/include/tcg/tcg-op-gvec.h
index eb0d47a42b..fa8a0c8d03 100644
--- a/include/tcg/tcg-op-gvec.h
+++ b/include/tcg/tcg-op-gvec.h
@@ -320,11 +320,6 @@ void tcg_gen_gvec_dup_i32(unsigned vece, uint32_t dofs, 
uint32_t s,
 void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, uint32_t s,
   uint32_t m, TCGv_i64);
 
-void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t s, uint32_t m, uint8_t x);
-void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t s, uint32_t m, uint16_t x);
-void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t s, uint32_t m, uint32_t x);
-void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t s, uint32_t m, uint64_t x);
-
 void tcg_gen_gvec_shli(unsigned vece, uint32_t dofs, uint32_t aofs,
int64_t shift, uint32_t oprsz, uint32_t maxsz);
 void tcg_gen_gvec_shri(unsigned vece, uint32_t dofs, uint32_t aofs,
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index de16c027b3..5a6cc19812 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1541,34 +1541,6 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 }
 }
 
-void tcg_gen_gvec_dup64i(uint32_t dofs, uint32_t oprsz,
- uint32_t maxsz, uint64_t x)
-{
-check_size_align(oprsz, maxsz, dofs);
-do_dup(MO_64, dofs, oprsz, maxsz, NULL, NULL, x);
-}
-
-void tcg_gen_gvec_dup32i(uint32_t dofs, uint32_t oprsz,
- uint32_t maxsz, uint32_t x)
-{
-check_size_align(oprsz, maxsz, dofs);
-do_dup(MO_32, dofs, oprsz, maxsz, NULL, NULL, x);
-}
-
-void tcg_gen_gvec_dup16i(uint32_t dofs, uint32_t oprsz,
- uint32_t maxsz, uint16_t x)
-{
-check_size_align(oprsz, maxsz, dofs);
-do_dup(MO_16, dofs, oprsz, maxsz, NULL, NULL, x);
-}
-
-void tcg_gen_gvec_dup8i(uint32_t dofs, uint32_t oprsz,
- uint32_t maxsz, uint8_t x)
-{
-check_size_align(oprsz, maxsz, dofs);
-do_dup(MO_8, dofs, oprsz, maxsz, NULL, NULL, x);
-}
-
 void tcg_gen_gvec_dup_imm(unsigned vece, uint32_t dofs, uint32_t oprsz,
   uint32_t maxsz, uint64_t x)
 {
-- 
2.20.1

[PATCH v2 00/36] tcg 5.1 omnibus patch set

2020-04-21 Thread Richard Henderson

For v1, I had split this into 4 logically distinct parts.  But
apparently there are minor interdependencies, because the later
sets would not apply standalone, says Alex.

Rather than tease them apart, and then have to undo that work
in order to actually apply them later, I'll just lump them.

So:

  Part 1, patches 1-7, tcg_gen_gvec_dup_imm, is reviewed.

  Part 2, patch 8, vector tail clearing, is reviewed, and I have
  moved the target/arm patches into a different queue.

  Part 3, patches 9-25, TYPE_CONST temporaries, is mostly unreviewed.

  Part 4, patch 26, load_dest for GVecGen2, a support patch for SVE2.

  Part 5, patches 27-36, add vector rotate patterns, is brand new.
  I include two demonstrators for target/ppc and target/s390x.
  It will also be used by SVE2.


r~

Richard Henderson (36):
  tcg: Add tcg_gen_gvec_dup_imm
  target/s390x: Use tcg_gen_gvec_dup_imm
  target/ppc: Use tcg_gen_gvec_dup_imm
  target/arm: Use tcg_gen_gvec_dup_imm
  tcg: Use tcg_gen_gvec_dup_imm in logical simplifications
  tcg: Remove tcg_gen_gvec_dup{8,16,32,64}i
  tcg: Add tcg_gen_gvec_dup_tl
  tcg: Improve vector tail clearing
  tcg: Consolidate 3 bits into enum TCGTempKind
  tcg: Add temp_readonly
  tcg: Introduce TYPE_CONST temporaries
  tcg: Use tcg_constant_i32 with icount expander
  tcg: Use tcg_constant_{i32,i64} with tcg int expanders
  tcg: Use tcg_constant_{i32,vec} with tcg vec expanders
  tcg: Use tcg_constant_{i32,i64} with tcg plugins
  tcg: Rename struct tcg_temp_info to TempOptInfo
  tcg/optimize: Adjust TempOptInfo allocation
  tcg/optimize: Use tcg_constant_internal with constant folding
  tcg/tci: Add special tci_movi_{i32,i64} opcodes
  tcg: Remove movi and dupi opcodes
  tcg: Use tcg_out_dupi_vec from temp_load
  tcg: Increase tcg_out_dupi_vec immediate to int64_t
  tcg: Add tcg_reg_alloc_dup2
  tcg/i386: Use tcg_constant_vec with tcg vec expanders
  tcg: Remove tcg_gen_dup{8,16,32,64}i_vec
  tcg: Add load_dest parameter to GVecGen2
  tcg: Fix integral argument type to tcg_gen_rot[rl]i_i{32,64}
  tcg: Implement gvec support for rotate by immediate
  tcg: Implement gvec support for rotate by vector
  tcg: Remove expansion to shift by vector from do_shifts
  tcg: Implement gvec support for rotate by scalar
  tcg/i386: Implement INDEX_op_rotl[is]_vec
  tcg/aarch64: Implement INDEX_op_rotli_vec
  tcg/ppc: Implement INDEX_op_rot[lr]v_vec
  target/ppc: Use tcg_gen_gvec_rotlv
  target/s390x: Use tcg_gen_gvec_rotl{i,s,v}

 accel/tcg/tcg-runtime.h |  15 ++
 include/exec/gen-icount.h   |  25 +-
 include/tcg/tcg-op-gvec.h   |  25 +-
 include/tcg/tcg-op.h|  30 +--
 include/tcg/tcg-opc.h   |  15 +-
 include/tcg/tcg.h   |  53 +++-
 target/ppc/helper.h |   4 -
 target/s390x/helper.h   |   4 -
 tcg/aarch64/tcg-target.h|   3 +
 tcg/aarch64/tcg-target.opc.h|   1 +
 tcg/i386/tcg-target.h   |   3 +
 tcg/ppc/tcg-target.h|   3 +
 tcg/ppc/tcg-target.opc.h|   1 -
 accel/tcg/plugin-gen.c  |  49 ++--
 accel/tcg/tcg-runtime-gvec.c| 144 +++
 target/arm/translate-a64.c  |  10 +-
 target/arm/translate-sve.c  |  12 +-
 target/arm/translate.c  |   9 +-
 target/ppc/int_helper.c |  17 --
 target/ppc/translate/vmx-impl.inc.c |  40 +--
 target/ppc/translate/vsx-impl.inc.c |   2 +-
 target/s390x/translate_vx.inc.c | 107 ++--
 target/s390x/vec_int_helper.c   |  31 ---
 tcg/aarch64/tcg-target.inc.c|  32 ++-
 tcg/arm/tcg-target.inc.c|   1 -
 tcg/i386/tcg-target.inc.c   | 195 ++-
 tcg/mips/tcg-target.inc.c   |   2 -
 tcg/optimize.c  | 204 +++
 tcg/ppc/tcg-target.inc.c|  47 ++--
 tcg/riscv/tcg-target.inc.c  |   2 -
 tcg/s390/tcg-target.inc.c   |   2 -
 tcg/sparc/tcg-target.inc.c  |   2 -
 tcg/tcg-op-gvec.c   | 374 +++-
 tcg/tcg-op-vec.c| 218 +++-
 tcg/tcg-op.c| 232 -
 tcg/tcg.c   | 347 --
 tcg/tci.c   |   4 +-
 tcg/tci/tcg-target.inc.c|   6 +-
 target/s390x/insn-data.def  |   4 +-
 tcg/README  |   7 +-
 40 files changed, 1490 insertions(+), 792 deletions(-)

-- 
2.20.1

[PATCH] fixup! qemu-img: Add bitmap sub-command

2020-04-21 Thread Eric Blake

Signed-off-by: Eric Blake 
---

Squash this into patch 3/6 to fix docker-test-mingw@fedora

 qemu-img.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 6cfc1f52ef98..cc87eaf12778 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4529,7 +4529,7 @@ static int img_bitmap(int argc, char **argv)
 BlockBackend *blk = NULL, *src = NULL;
 BlockDriverState *bs = NULL, *src_bs = NULL;
 bool image_opts = false;
-unsigned long granularity = 0;
+uint64_t granularity = 0;
 bool add = false, remove = false, clear = false;
 bool enable = false, disable = false, add_disabled = false;
 const char *merge = NULL;
-- 
2.26.2

Re: [PATCH v2 0/6] qemu-img: Add convert --bitmaps

2020-04-21 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20200421212019.170707-1-ebl...@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  LINKqemu-edid.exe
  LINKqemu-ga.exe
/tmp/qemu-test/src/qemu-img.c: In function 'img_bitmap':
/tmp/qemu-test/src/qemu-img.c:4579:44: error: passing argument 3 of 
'qemu_strtosz' from incompatible pointer type 
[-Werror=incompatible-pointer-types]
 if (qemu_strtosz(optarg, NULL, )) {
^~~~
In file included from /tmp/qemu-test/src/qemu-img.c:37:
---
 int qemu_strtosz(const char *nptr, const char **end, uint64_t *result);
  ~~^~
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: qemu-img.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=e06abfa47fc2454097264ac821c5dceb', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-d9esijcl/src/docker-src.2020-04-21-18.18.59.19445:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=e06abfa47fc2454097264ac821c5dceb
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-d9esijcl/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real3m19.767s
user0m8.957s


The full log is available at
http://patchew.org/logs/20200421212019.170707-1-ebl...@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2 2/6] block/nbd-client: drop max_block restriction from discard

2020-04-21 Thread Eric Blake


On 4/1/20 10:01 AM, Vladimir Sementsov-Ogievskiy wrote:

NBD spec is updated, so that max_block doesn't relate to
NBD_CMD_TRIM. So, drop the restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/nbd.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Eric Blake 

I might tweak the commit message of 1/6 and here to call out the NBD 
spec commit id (nbd.git 9f30fedb), but that doesn't change the patch proper.




diff --git a/block/nbd.c b/block/nbd.c
index d4d518a780..4ac23c8f62 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1955,7 +1955,7 @@ static void nbd_refresh_limits(BlockDriverState *bs, 
Error **errp)
  }
  
  bs->bl.request_alignment = min;

-bs->bl.max_pdiscard = max;
+bs->bl.max_pdiscard = QEMU_ALIGN_DOWN(INT_MAX, min);
  bs->bl.max_pwrite_zeroes = max;
  bs->bl.max_transfer = max;
  



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH] virtio-vga: fix virtio-vga bar ordering

2020-04-21 Thread Anthoine Bourgeois

With virtio-vga, pci bar are reordered. Bar #2 is used for compatibility
with stdvga. By default, bar #2 is used by virtio modern io bar.
This bar is the last one introduce in the virtio pci bar layout and it's
crushed by the virtio-vga reordering. So virtio-vga and
modern-pio-notify are incompatible because virtio-vga failed to
initialize with this option.

This fix exchange the modern io bar with the modern memory bar,
replacing the msix bar that is never impacted anyway.

Signed-off-by: Anthoine Bourgeois 
---
 hw/display/virtio-vga.c | 2 +-
 hw/virtio/virtio-pci.c  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index 2b4c2aa126..f5f8737c60 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -113,7 +113,7 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
  * the stdvga mmio registers at the start of bar #2.
  */
 vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
+vpci_dev->modern_io_bar_idx = 4;
 
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 4cb784389c..9c5efaa06e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1705,6 +1705,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
  *
  *   region 0   --  virtio legacy io bar
  *   region 1   --  msi-x bar
+ *   region 2   --  virtio modern io bar
  *   region 4+5 --  virtio modern memory (64bit) bar
  *
  */
-- 
2.20.1

Re: [PATCH 2/2] tools: Fix use of fcntl(F_SETFD) during socket activation

2020-04-21 Thread Eric Blake


On 4/20/20 12:53 PM, Eric Blake wrote:

Blindly setting FD_CLOEXEC without a read-modify-write will
inadvertently clear any other intentionally-set bits, such as a
proposed new bit for designating a fd that must behave in 32-bit mode.
However, we cannot use our wrapper qemu_set_cloexec(), because that
wrapper intentionally abort()s on failure, whereas the probe here
intentionally tolerates failure to deal with incorrect socket
activation gracefully.  Instead, fix the code to do the proper
read-modify-write.

Signed-off-by: Eric Blake 
---
  util/systemd.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)


As qemu-nbd is impacted, I'll queue 2/2 through my NBD tree if 
qemu-trivial doesn't pick it up first.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v2] qcow2: Expose bitmaps' size during measure

2020-04-21 Thread Eric Blake


On 4/17/20 11:34 AM, Eric Blake wrote:

It's useful to know how much space can be occupied by qcow2 persistent
bitmaps, even though such metadata is unrelated to the guest-visible
data.  Report this value as an additional field.  Update iotest 190 to
cover it.

The addition of a new field demonstrates why we should always
zero-initialize qapi C structs; while the qcow2 driver still fully
populates all fields, the raw and crypto drivers had to be tweaked.

See also: https://bugzilla.redhat.com/1779904

Reported-by: Nir Soffer 
Signed-off-by: Eric Blake 
---

In v2:
- fix uninit memory [patchew]
- add BZ number [Nir]
- add iotest coverage


Now subsumed into a larger series:
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg03464.html

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH v2 6/6] iotests: Add test 291 to for qemu-img bitmap coverage

2020-04-21 Thread Eric Blake

Add a new test covering the 'qemu-img bitmap' subcommand, as well as
'qemu-img convert --bitmaps', both added in recent patches.

Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/291 | 103 +
 tests/qemu-iotests/291.out |  78 
 tests/qemu-iotests/group   |   1 +
 3 files changed, 182 insertions(+)
 create mode 100755 tests/qemu-iotests/291
 create mode 100644 tests/qemu-iotests/291.out

diff --git a/tests/qemu-iotests/291 b/tests/qemu-iotests/291
new file mode 100755
index ..77713c0cfea7
--- /dev/null
+++ b/tests/qemu-iotests/291
@@ -0,0 +1,103 @@
+#!/usr/bin/env bash
+#
+# Test qemu-img bitmap handling
+#
+# Copyright (C) 2018-2020 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1 # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+nbd_server_stop
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.nbd
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_require_command QEMU_NBD
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+# Create backing image with one bitmap
+TEST_IMG="$TEST_IMG.base" _make_test_img 10M
+$QEMU_IMG bitmap --add -f $IMGFMT "$TEST_IMG.base" b0
+$QEMU_IO -c 'w 3M 1M' -f $IMGFMT "$TEST_IMG.base" | _filter_qemu_io
+# Create initial image and populate two bitmaps: one active, one inactive.
+_make_test_img -b "$TEST_IMG.base" -F $IMGFMT 10M
+$QEMU_IO -c 'w 0 1M' -f $IMGFMT "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG bitmap --add -g 512k -f $IMGFMT "$TEST_IMG" b1
+$QEMU_IMG bitmap --add --disabled -f $IMGFMT "$TEST_IMG" b2
+$QEMU_IO -c 'w 3M 1M' -f $IMGFMT "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG bitmap --clear -f $IMGFMT "$TEST_IMG" b1
+$QEMU_IO -c 'w 1M 1M' -f $IMGFMT "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG bitmap --disable -f $IMGFMT "$TEST_IMG" b1
+$QEMU_IMG bitmap --enable -f $IMGFMT "$TEST_IMG" b2
+$QEMU_IO -c 'w 2M 1M' -f $IMGFMT "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo "=== Bitmap preservation not possible to non-qcow2 ==="
+echo
+
+mv "$TEST_IMG" "$TEST_IMG.orig"
+$QEMU_IMG convert --bitmaps -O raw "$TEST_IMG.orig" "$TEST_IMG"
+
+echo
+echo "=== Convert with bitmap preservation ==="
+echo
+
+# Only bitmaps from the active layer are copied
+$QEMU_IMG convert --bitmaps -O qcow2 "$TEST_IMG.orig" "$TEST_IMG"
+$QEMU_IMG info "$TEST_IMG" | _filter_img_info --format-specific
+# But we can also merge in bitmaps from other layers
+$QEMU_IMG bitmap --add --disabled -f $IMGFMT "$TEST_IMG" b0
+$QEMU_IMG bitmap --add -f $IMGFMT "$TEST_IMG" tmp
+$QEMU_IMG bitmap --merge b0 -b "$TEST_IMG.base" -F $IMGFMT "$TEST_IMG" tmp
+$QEMU_IMG bitmap --merge tmp "$TEST_IMG" b0
+$QEMU_IMG bitmap --remove -f $IMGFMT "$TEST_IMG" tmp
+$QEMU_IMG info "$TEST_IMG" | _filter_img_info --format-specific
+
+echo
+echo "=== Check bitmap contents ==="
+echo
+
+IMG="driver=nbd,server.type=unix,server.path=$nbd_unix_socket"
+nbd_server_start_unix_socket -r -f qcow2 -B b0 "$TEST_IMG"
+$QEMU_IMG map --output=json --image-opts \
+"$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b0" | _filter_qemu_img_map
+nbd_server_start_unix_socket -r -f qcow2 -B b1 "$TEST_IMG"
+$QEMU_IMG map --output=json --image-opts \
+"$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b1" | _filter_qemu_img_map
+nbd_server_start_unix_socket -r -f qcow2 -B b2 "$TEST_IMG"
+$QEMU_IMG map --output=json --image-opts \
+"$IMG,x-dirty-bitmap=qemu:dirty-bitmap:b2" | _filter_qemu_img_map
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/291.out b/tests/qemu-iotests/291.out
new file mode 100644
index ..d716c0c7cc0b
--- /dev/null
+++ b/tests/qemu-iotests/291.out
@@ -0,0 +1,78 @@
+QA output created by 291
+
+=== Initial image setup ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=10485760
+wrote 1048576/1048576 bytes at offset 3145728
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=10485760 
backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576 bytes at offset 3145728
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576

[PATCH v2 2/6] blockdev: Split off basic bitmap operations for qemu-img

2020-04-21 Thread Eric Blake

Upcoming patches want to add some basic bitmap manipulation abilities
to qemu-img.  But blockdev.o is too heavyweight to link into qemu-img
(among other things, it would drag in block jobs and transaction
support - qemu-img does offline manipulation, where atomicity is less
important because there are no concurrent modifications to compete
with), so it's time to split off the bare bones of what we will need
into a new file blockbitmaps.o.

In addition to exposing 6 QMP commands for use by qemu-img (add,
remove, clear, enable, disable, merge), this also has to export three
previously-static functions for use by blockdev.c transactions.

Signed-off-by: Eric Blake 
---
 Makefile.objs |   2 +-
 include/sysemu/blockdev.h |  14 ++
 blockbitmaps.c| 324 ++
 blockdev.c| 293 --
 MAINTAINERS   |   1 +
 5 files changed, 340 insertions(+), 294 deletions(-)
 create mode 100644 blockbitmaps.c

diff --git a/Makefile.objs b/Makefile.objs
index a7c967633acf..44e30fa9a6e3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -14,7 +14,7 @@ chardev-obj-y = chardev/
 authz-obj-y = authz/

 block-obj-y = nbd/
-block-obj-y += block.o blockjob.o job.o
+block-obj-y += block.o blockbitmaps.o blockjob.o job.o
 block-obj-y += block/ scsi/
 block-obj-y += qemu-io-cmds.o
 block-obj-$(CONFIG_REPLICATION) += replication.o
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index a86d99b3d875..523b7493b1cd 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -57,4 +57,18 @@ QemuOpts *drive_add(BlockInterfaceType type, int index, 
const char *file,
 DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type,
  Error **errp);

+BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node,
+   const char *name,
+   BlockDriverState **pbs,
+   Error **errp);
+BdrvDirtyBitmap *do_block_dirty_bitmap_remove(const char *node,
+  const char *name, bool release,
+  BlockDriverState **bitmap_bs,
+  Error **errp);
+BdrvDirtyBitmap *do_block_dirty_bitmap_merge(const char *node,
+ const char *target,
+ BlockDirtyBitmapMergeSourceList 
*bitmaps,
+ HBitmap **backup, Error **errp);
+
+
 #endif
diff --git a/blockbitmaps.c b/blockbitmaps.c
new file mode 100644
index ..fea80efcd03a
--- /dev/null
+++ b/blockbitmaps.c
@@ -0,0 +1,324 @@
+/*
+ * QEMU host block device bitmaps
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ *
+ * This file incorporates work covered by the following copyright and
+ * permission notice:
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/blockdev.h"
+#include "block/block.h"
+#include "block/block_int.h"
+#include "qapi/qapi-commands-block.h"
+#include "qapi/error.h"
+
+/**
+ * block_dirty_bitmap_lookup:
+ * Return a dirty bitmap (if present), after validating
+ * the node reference and bitmap names.
+ *
+ * @node: The name of the BDS node to search for bitmaps
+ * @name: The name of the bitmap to search for
+ * @pbs: Output pointer for BDS lookup, if desired. Can be NULL.
+ * @errp: Output pointer for error information. Can be NULL.
+ *
+ * @return: A bitmap object on success, or NULL on failure.
+ */
+BdrvDirtyBitmap *block_dirty_bitmap_lookup(const char *node,
+

[PATCH v2 4/6] qcow2: Expose bitmaps' size during measure

2020-04-21 Thread Eric Blake

It's useful to know how much space can be occupied by qcow2 persistent
bitmaps, even though such metadata is unrelated to the guest-visible
data.  Report this value as an additional field.  Update iotest 190 to
cover it and a portion of the just-added qemu-img bitmap command.

The addition of a new field demonstrates why we should always
zero-initialize qapi C structs; while the qcow2 driver still fully
populates all fields, the raw and crypto drivers had to be tweaked.

See also: https://bugzilla.redhat.com/1779904

Reported-by: Nir Soffer 
Signed-off-by: Eric Blake 
---
 qapi/block-core.json   | 15 ++-
 block/crypto.c |  2 +-
 block/qcow2.c  | 29 -
 block/raw-format.c |  2 +-
 qemu-img.c |  3 +++
 tests/qemu-iotests/190 | 15 +--
 tests/qemu-iotests/190.out | 13 -
 7 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a91..b47c6d69ba27 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -633,18 +633,23 @@
 # efficiently so file size may be smaller than virtual disk size.
 #
 # The values are upper bounds that are guaranteed to fit the new image file.
-# Subsequent modification, such as internal snapshot or bitmap creation, may
-# require additional space and is not covered here.
+# Subsequent modification, such as internal snapshot or further bitmap
+# creation, may require additional space and is not covered here.
 #
-# @required: Size required for a new image file, in bytes.
+# @required: Size required for a new image file, in bytes, when copying just
+#guest-visible contents.
 #
 # @fully-allocated: Image file size, in bytes, once data has been written
-#   to all sectors.
+#   to all sectors, when copying just guest-visible contents.
+#
+# @bitmaps: Additional size required for bitmap metadata not directly used
+#   for guest contents, when that metadata can be copied in addition
+#   to guest contents. (since 5.1)
 #
 # Since: 2.10
 ##
 { 'struct': 'BlockMeasureInfo',
-  'data': {'required': 'int', 'fully-allocated': 'int'} }
+  'data': {'required': 'int', 'fully-allocated': 'int', '*bitmaps': 'int'} }

 ##
 # @query-block:
diff --git a/block/crypto.c b/block/crypto.c
index d577f89659fa..4e0f3ec97f0e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -535,7 +535,7 @@ static BlockMeasureInfo *block_crypto_measure(QemuOpts 
*opts,
  * Unallocated blocks are still encrypted so allocation status makes no
  * difference to the file size.
  */
-info = g_new(BlockMeasureInfo, 1);
+info = g_new0(BlockMeasureInfo, 1);
 info->fully_allocated = luks_payload_size + size;
 info->required = luks_payload_size + size;
 return info;
diff --git a/block/qcow2.c b/block/qcow2.c
index b524b0c53f84..9fd650928016 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4657,6 +4657,7 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 PreallocMode prealloc;
 bool has_backing_file;
 bool has_luks;
+uint64_t bitmaps_size = 0; /* size occupied by bitmaps in in_bs */

 /* Parse image creation options */
 cluster_size = qcow2_opt_get_cluster_size_del(opts, _err);
@@ -4732,6 +4733,8 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,

 /* Account for input image */
 if (in_bs) {
+BdrvDirtyBitmap *bm;
+size_t bitmap_overhead = 0;
 int64_t ssize = bdrv_getlength(in_bs);
 if (ssize < 0) {
 error_setg_errno(_err, -ssize,
@@ -4739,6 +4742,28 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 goto err;
 }

+FOR_EACH_DIRTY_BITMAP(in_bs, bm) {
+if (bdrv_dirty_bitmap_get_persistence(bm)) {
+const char *name = bdrv_dirty_bitmap_name(bm);
+uint32_t granularity = bdrv_dirty_bitmap_granularity(bm);
+uint64_t bmbits = DIV_ROUND_UP(bdrv_dirty_bitmap_size(bm),
+   granularity);
+uint64_t bmclusters = DIV_ROUND_UP(DIV_ROUND_UP(bmbits,
+CHAR_BIT),
+   cluster_size);
+
+/* Assume the entire bitmap is allocated */
+bitmaps_size += bmclusters * cluster_size;
+/* Also reserve space for the bitmap table entries */
+bitmaps_size += ROUND_UP(bmclusters * sizeof(uint64_t),
+ cluster_size);
+/* Guess at contribution to bitmap directory size */
+bitmap_overhead += ROUND_UP(strlen(name) + 24,
+sizeof(uint64_t));
+}
+}
+bitmaps_size +=

[PATCH v2 3/6] qemu-img: Add bitmap sub-command

2020-04-21 Thread Eric Blake

Include actions for --add, --remove, --clear, --enable, --disable, and
--merge (note that --clear is a bit of fluff, because the same can be
accomplished by removing a bitmap and then adding a new one in its
place, but it matches what QMP commands exist).  Listing is omitted,
because it does not require a bitmap name and because it was already
possible with 'qemu-img info'.  Merge can work either from another
bitmap in the same image, or from a bitmap in a distinct image.

While this supports --image-opts for the file being modified, I did
not think it worth the extra complexity to support that for the source
file in a cross-file bitmap merge.  Likewise, I chose to have --merge
only take a single source rather than following the QMP support for
multiple merges in one go; in part to simplify the command line, and
in part because an offline image can achieve the same effect by
multiple qemu-img bitmap --merge calls.  We can enhance that if needed
in the future (the same way that 'qemu-img convert' has a mode that
concatenates multiple sources into one destination).

Upcoming patches will add iotest coverage of these commands while
also testing other features.

Signed-off-by: Eric Blake 
---
 docs/tools/qemu-img.rst |  24 +
 qemu-img.c  | 198 
 qemu-img-cmds.hx|   7 ++
 3 files changed, 229 insertions(+)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 7d08c48d308f..4f3b0e2c9ace 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -281,6 +281,30 @@ Command description:
   For write tests, by default a buffer filled with zeros is written. This can 
be
   overridden with a pattern byte specified by *PATTERN*.

+.. option:: bitmap {--add [-g GRANULARITY] [--disabled] | --remove | --clear | 
--enable | --disable | --merge SOURCE_BITMAP [-b SOURCE_FILE [-F SOURCE_FMT]]} 
[--object OBJECTDEF] [--image-opts] [-f FMT] FILENAME BITMAP
+
+  Perform a modification of the persistent bitmap *BITMAP* in the disk
+  image *FILENAME*.  The various modifications are:
+
+  ``--add`` to create *BITMAP*, with additional options ``-g`` to
+  specify a non-default *GRANULARITY*, or whether the bitmap should be
+  ``--disabled`` instead of enabled.
+
+  ``--remove`` to remove *BITMAP*.
+
+  ``--clear`` to clear *BITMAP*.
+
+  ``--enable`` to change *BITMAP* to start recording future edits.
+
+  ``--disable`` to change *BITMAP* to stop recording future edits.
+
+  ``--merge`` to merge the contents of *SOURCE_BITMAP* into *BITMAP*.
+  This defaults to requiring a source bitmap from the same *FILENAME*,
+  but can also be used for cross-image merge by supplying ``-b`` to
+  specify a different *SOURCE_FILE*.
+
+  To see what bitmaps are present in an image, use ``qemu-img info``.
+
 .. option:: check [--object OBJECTDEF] [--image-opts] [-q] [-f FMT] 
[--output=OFMT] [-r [leaks | all]] [-T SRC_CACHE] [-U] FILENAME

   Perform a consistency check on the disk image *FILENAME*. The command can
diff --git a/qemu-img.c b/qemu-img.c
index 821cbf610e5f..02ebd870faa1 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -28,6 +28,7 @@
 #include "qemu-common.h"
 #include "qemu-version.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-block-core.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "qapi/qobject-output-visitor.h"
 #include "qapi/qmp/qjson.h"
@@ -71,6 +72,12 @@ enum {
 OPTION_SHRINK = 266,
 OPTION_SALVAGE = 267,
 OPTION_TARGET_IS_ZERO = 268,
+OPTION_ADD = 269,
+OPTION_REMOVE = 270,
+OPTION_CLEAR = 271,
+OPTION_ENABLE = 272,
+OPTION_DISABLE = 273,
+OPTION_MERGE = 274,
 };

 typedef enum OutputFormat {
@@ -4438,6 +4445,197 @@ out:
 return 0;
 }

+static int img_bitmap(int argc, char **argv)
+{
+Error *err = NULL;
+int c, ret = -1;
+QemuOpts *opts = NULL;
+const char *fmt = NULL, *src_fmt = NULL, *src_filename = NULL;
+const char *filename, *bitmap;
+BlockBackend *blk = NULL, *src = NULL;
+BlockDriverState *bs = NULL, *src_bs = NULL;
+bool image_opts = false;
+unsigned long granularity = 0;
+bool add = false, remove = false, clear = false;
+bool enable = false, disable = false, add_disabled = false;
+const char *merge = NULL;
+
+for (;;) {
+static const struct option long_options[] = {
+{"help", no_argument, 0, 'h'},
+{"object", required_argument, 0, OPTION_OBJECT},
+{"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
+{"add", no_argument, 0, OPTION_ADD},
+{"remove", no_argument, 0, OPTION_REMOVE},
+{"clear", no_argument, 0, OPTION_CLEAR},
+{"enable", no_argument, 0, OPTION_ENABLE},
+{"disable", no_argument, 0, OPTION_DISABLE},
+{"disabled", no_argument, 0, OPTION_DISABLE},
+{"merge", required_argument, 0, OPTION_MERGE},
+{"granularity", required_argument, 0, 'g'},
+{"source-file",

[PATCH v2 5/6] qemu-img: Add convert --bitmaps option

2020-04-21 Thread Eric Blake

Make it easier to copy all the persistent bitmaps of a source image
along with the contents, by adding a boolean flag for use with
qemu-img convert.  This is basically shorthand, as the same effect
could be accomplished with a series of 'qemu-img bitmap --add' and
'qemu-img bitmap --merge -b source' commands, or by QMP commands.

See also https://bugzilla.redhat.com/show_bug.cgi?id=1779893

Signed-off-by: Eric Blake 
---
 docs/tools/qemu-img.rst |  6 +++-
 qemu-img.c  | 80 +++--
 qemu-img-cmds.hx|  4 +--
 3 files changed, 84 insertions(+), 6 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 4f3b0e2c9ace..430fb5b46e43 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -162,6 +162,10 @@ Parameters to convert subcommand:

 .. program:: qemu-img-convert

+.. option:: --bitmaps
+
+  Additionally copy all bitmaps
+
 .. option:: -n

   Skip the creation of the target volume
@@ -397,7 +401,7 @@ Command description:
   4
 Error on reading data

-.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] 
[--target-is-zero] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t CACHE] [-T 
SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l SNAPSHOT_PARAM] 
[-S SPARSE_SIZE] [-m NUM_COROUTINES] [-W] FILENAME [FILENAME2 [...]] 
OUTPUT_FILENAME
+.. option:: convert [--object OBJECTDEF] [--image-opts] [--target-image-opts] 
[--target-is-zero] [--bitmaps] [-U] [-C] [-c] [-p] [-q] [-n] [-f FMT] [-t 
CACHE] [-T SRC_CACHE] [-O OUTPUT_FMT] [-B BACKING_FILE] [-o OPTIONS] [-l 
SNAPSHOT_PARAM] [-S SPARSE_SIZE] [-m NUM_COROUTINES] [-W] FILENAME [FILENAME2 
[...]] OUTPUT_FILENAME

   Convert the disk image *FILENAME* or a snapshot *SNAPSHOT_PARAM*
   to disk image *OUTPUT_FILENAME* using format *OUTPUT_FMT*. It can
diff --git a/qemu-img.c b/qemu-img.c
index e1127273f21e..6cfc1f52ef98 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -78,6 +78,7 @@ enum {
 OPTION_ENABLE = 272,
 OPTION_DISABLE = 273,
 OPTION_MERGE = 274,
+OPTION_BITMAPS = 275,
 };

 typedef enum OutputFormat {
@@ -183,6 +184,7 @@ static void QEMU_NORETURN help(void)
"   hiding corruption that has already occurred.\n"
"\n"
"Parameters to convert subcommand:\n"
+   "  '--bitmaps' copies all persistent bitmaps to destination\n"
"  '-m' specifies how many coroutines work in parallel during the 
convert\n"
"   process (defaults to 8)\n"
"  '-W' allow to write to the target out of order rather than 
sequential\n"
@@ -2061,6 +2063,47 @@ static int convert_do_copy(ImgConvertState *s)
 return s->ret;
 }

+static int convert_copy_bitmaps(BlockDriverState *src, BlockDriverState *dst)
+{
+BdrvDirtyBitmap *bm;
+Error *err = NULL;
+BlockDirtyBitmapMergeSource *merge;
+BlockDirtyBitmapMergeSourceList *list;
+
+FOR_EACH_DIRTY_BITMAP(src, bm) {
+const char *name;
+
+if (!bdrv_dirty_bitmap_get_persistence(bm)) {
+continue;
+}
+name = bdrv_dirty_bitmap_name(bm);
+qmp_block_dirty_bitmap_add(dst->node_name, name,
+   true, bdrv_dirty_bitmap_granularity(bm),
+   true, true,
+   true, !bdrv_dirty_bitmap_enabled(bm),
+   );
+if (err) {
+error_reportf_err(err, "Failed to create bitmap %s: ", name);
+return -1;
+}
+
+merge = g_new0(BlockDirtyBitmapMergeSource, 1);
+merge->type = QTYPE_QDICT;
+merge->u.external.node = g_strdup(src->node_name);
+merge->u.external.name = g_strdup(name);
+list = g_new0(BlockDirtyBitmapMergeSourceList, 1);
+list->value = merge;
+qmp_block_dirty_bitmap_merge(dst->node_name, name, list, );
+qapi_free_BlockDirtyBitmapMergeSourceList(list);
+if (err) {
+error_reportf_err(err, "Failed to populate bitmap %s: ", name);
+return -1;
+}
+}
+
+return 0;
+}
+
 #define MAX_BUF_SECTORS 32768

 static int img_convert(int argc, char **argv)
@@ -2082,6 +2125,8 @@ static int img_convert(int argc, char **argv)
 int64_t ret = -EINVAL;
 bool force_share = false;
 bool explict_min_sparse = false;
+bool bitmaps = false;
+size_t nbitmaps = 0;

 ImgConvertState s = (ImgConvertState) {
 /* Need at least 4k of zeros for sparse detection */
@@ -2101,6 +2146,7 @@ static int img_convert(int argc, char **argv)
 {"target-image-opts", no_argument, 0, OPTION_TARGET_IMAGE_OPTS},
 {"salvage", no_argument, 0, OPTION_SALVAGE},
 {"target-is-zero", no_argument, 0, OPTION_TARGET_IS_ZERO},
+{"bitmaps", no_argument, 0, OPTION_BITMAPS},
 {0, 0, 0, 0}
 };
 c = getopt_long(argc, argv, ":hf:O:B:Cco:l:S:pt:T:qnm:WU",
@@ -2232,6

[PATCH v2 1/6] docs: Sort sections on qemu-img subcommand parameters

2020-04-21 Thread Eric Blake

We already list the subcommand summaries alphabetically, we should do
the same for the documentation related to subcommand-specific
parameters.

Signed-off-by: Eric Blake 
---
 docs/tools/qemu-img.rst | 48 -
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 0080f83a76c9..7d08c48d308f 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -142,30 +142,6 @@ by the used format or see the format descriptions below 
for details.
   the documentation of the emulator's ``-drive cache=...`` option for allowed
   values.

-Parameters to snapshot subcommand:
-
-.. program:: qemu-img-snapshot
-
-.. option:: snapshot
-
-  Is the name of the snapshot to create, apply or delete
-
-.. option:: -a
-
-  Applies a snapshot (revert disk to saved state)
-
-.. option:: -c
-
-  Creates a snapshot
-
-.. option:: -d
-
-  Deletes a snapshot
-
-.. option:: -l
-
-  Lists all snapshots in the given image
-
 Parameters to compare subcommand:

 .. program:: qemu-img-compare
@@ -245,6 +221,30 @@ Parameters to dd subcommand:

   Sets the number of input blocks to skip

+Parameters to snapshot subcommand:
+
+.. program:: qemu-img-snapshot
+
+.. option:: snapshot
+
+  Is the name of the snapshot to create, apply or delete
+
+.. option:: -a
+
+  Applies a snapshot (revert disk to saved state)
+
+.. option:: -c
+
+  Creates a snapshot
+
+.. option:: -d
+
+  Deletes a snapshot
+
+.. option:: -l
+
+  Lists all snapshots in the given image
+
 Command description:

 .. program:: qemu-img-commands
-- 
2.26.2

[PATCH v2 0/6] qemu-img: Add convert --bitmaps

2020-04-21 Thread Eric Blake

Without this series, the process for copying one qcow2 image to
another including all of its bitmaps involves running qemu and doing
the copying by hand with a series of QMP commands.  This makes the
process a bit more convenient.

Series can also be downloaded at:
https://repo.or.cz/qemu/ericb.git/shortlog/refs/tags/qemu-img-bitmaps-v2

Since v1:
- patches 1, 3 are new
- patch 2 (old 1/3) moves more operations into new file
- patch 4 no longer a separate thread (was at v2 in isolation)
- patches 4, 6 rebased to take advantage of 3
- patch 5, 6 (old 2-3/3) have other minor tweaks

Eric Blake (6):
  docs: Sort sections on qemu-img subcommand parameters
  blockdev: Split off basic bitmap operations for qemu-img
  qemu-img: Add bitmap sub-command
  qcow2: Expose bitmaps' size during measure
  qemu-img: Add convert --bitmaps option
  iotests: Add test 291 to for qemu-img bitmap coverage

 docs/tools/qemu-img.rst|  78 ++---
 Makefile.objs  |   2 +-
 qapi/block-core.json   |  15 +-
 include/sysemu/blockdev.h  |  14 ++
 block/crypto.c |   2 +-
 block/qcow2.c  |  29 +++-
 block/raw-format.c |   2 +-
 blockbitmaps.c | 324 +
 blockdev.c | 293 -
 qemu-img.c | 281 +++-
 MAINTAINERS|   1 +
 qemu-img-cmds.hx   |  11 +-
 tests/qemu-iotests/190 |  15 +-
 tests/qemu-iotests/190.out |  13 +-
 tests/qemu-iotests/291 | 103 
 tests/qemu-iotests/291.out |  78 +
 tests/qemu-iotests/group   |   1 +
 17 files changed, 927 insertions(+), 335 deletions(-)
 create mode 100644 blockbitmaps.c
 create mode 100755 tests/qemu-iotests/291
 create mode 100644 tests/qemu-iotests/291.out

-- 
2.26.2

Re: [PULL] RISC-V Patches for 5.0-rc4

2020-04-21 Thread Palmer Dabbelt


On Tue, 21 Apr 2020 12:27:50 PDT (-0700), Peter Maydell wrote:

On Tue, 21 Apr 2020 at 20:19, Palmer Dabbelt  wrote:


RISC-V Patches for 5.0-rc4

This contains handful of patches that I'd like to target for 5.0.  I know it's
a bit late, I thought I'd already sent these out but must have managed to miss
doing so.  The patches include:

* A handful of fixes to PTE lookups related to H-mode support.
* The addition of a serial number fo the SiFive U implementetaion, which allows
  bootloaders to generate a sane MAC address.

These pass "make check" and boot Linux for me.


Peter: Sorry I dropped the ball here.  I can understand if it's too late for
5.0, but if there's still going to be an rc4 then I'd love to have these
included.



Nope, sorry. rc4 has technically not been tagged yet, but especially
the serial-property stuff is too big a code change at this point
(it includes one "let's just refactor and rearrange some code"
patch which is really not rc4 material.)
Also these patches have been on the list for over a month -- if
they were 5.0-worthy there's been plenty of time for them to be
put in.

Plus the last email from Alistair on the "target/riscv: Don't set
write permissions on dirty PTEs" patch thread is a note saying
it shouldn't be applied, unless I've got confused.


OK, no problem.



thanks
-- PMM

Re: [PULL 1/6] target/riscv: Don't set write permissions on dirty PTEs

2020-04-21 Thread Alistair Francis

On Tue, Apr 21, 2020 at 12:19 PM Palmer Dabbelt
 wrote:
>
> From: Alistair Francis 
>
> The RISC-V spec specifies that when a write happens and the D bit is
> clear the implementation will set the bit in the PTE. It does not
> describe that the PTE being dirty means that we should provide write
> access. This patch removes the write access granted to pages when the
> dirty bit is set.
>
> Following the prot variable we can see that it affects all of these
> functions:
>  riscv_cpu_tlb_fill()
>tlb_set_page()
>  tlb_set_page_with_attrs()
>address_space_translate_for_iotlb()
>
> Looking at the cputlb code (tlb_set_page_with_attrs() and
> address_space_translate_for_iotlb()) it looks like the main affect of
> setting write permissions is that the page can be marked as TLB_NOTDIRTY.
>
> I don't see any other impacts (related to the dirty bit) for giving a
> page write permissions.
>
> Setting write permission on dirty PTEs results in userspace inside a
> Hypervisor guest (VU) becoming corrupted. This appears to be because it
> ends up with write permission in the second stage translation in cases
> where we aren't doing a store.
>
> Signed-off-by: Alistair Francis 
> Reviewed-by: Bin Meng 
> Signed-off-by: Palmer Dabbelt 

This commit is wrong. Please do not apply this.

Alistair

> ---
>  target/riscv/cpu_helper.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index d3ba9efb02..e2da2a4787 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -572,10 +572,8 @@ restart:
>  if ((pte & PTE_X)) {
>  *prot |= PAGE_EXEC;
>  }
> -/* add write permission on stores or if the page is already 
> dirty,
> -   so that we TLB miss on later writes to update the dirty bit */
> -if ((pte & PTE_W) &&
> -(access_type == MMU_DATA_STORE || (pte & PTE_D))) {
> +/* add write permission on stores */
> +if ((pte & PTE_W) && (access_type == MMU_DATA_STORE)) {
>  *prot |= PAGE_WRITE;
>  }
>  return TRANSLATE_SUCCESS;
> --
> 2.26.1.301.g55bc3eb7cb9-goog
>
>

Re: [PULL] RISC-V Patches for 5.0-rc4

2020-04-21 Thread Peter Maydell

On Tue, 21 Apr 2020 at 20:19, Palmer Dabbelt  wrote:
> 
> RISC-V Patches for 5.0-rc4
>
> This contains handful of patches that I'd like to target for 5.0.  I know it's
> a bit late, I thought I'd already sent these out but must have managed to miss
> doing so.  The patches include:
>
> * A handful of fixes to PTE lookups related to H-mode support.
> * The addition of a serial number fo the SiFive U implementetaion, which 
> allows
>   bootloaders to generate a sane MAC address.
>
> These pass "make check" and boot Linux for me.
>
> 
> Peter: Sorry I dropped the ball here.  I can understand if it's too late for
> 5.0, but if there's still going to be an rc4 then I'd love to have these
> included.
> 

Nope, sorry. rc4 has technically not been tagged yet, but especially
the serial-property stuff is too big a code change at this point
(it includes one "let's just refactor and rearrange some code"
patch which is really not rc4 material.)
Also these patches have been on the list for over a month -- if
they were 5.0-worthy there's been plenty of time for them to be
put in.

Plus the last email from Alistair on the "target/riscv: Don't set
write permissions on dirty PTEs" patch thread is a note saying
it shouldn't be applied, unless I've got confused.

thanks
-- PMM

[PULL 4/6] riscv/sifive_u: Fix up file ordering

2020-04-21 Thread Palmer Dabbelt

From: Alistair Francis 

Split the file into clear machine and SoC sections.

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
Signed-off-by: Palmer Dabbelt 
---
 hw/riscv/sifive_u.c | 109 ++--
 1 file changed, 55 insertions(+), 54 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 56351c4faa..d0ea6803db 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -312,7 +312,7 @@ static void create_fdt(SiFiveUState *s, const struct 
MemmapEntry *memmap,
 g_free(nodename);
 }
 
-static void riscv_sifive_u_init(MachineState *machine)
+static void sifive_u_machine_init(MachineState *machine)
 {
 const struct MemmapEntry *memmap = sifive_u_memmap;
 SiFiveUState *s = RISCV_U_MACHINE(machine);
@@ -403,6 +403,60 @@ static void riscv_sifive_u_init(MachineState *machine)
   _space_memory);
 }
 
+static bool sifive_u_machine_get_start_in_flash(Object *obj, Error **errp)
+{
+SiFiveUState *s = RISCV_U_MACHINE(obj);
+
+return s->start_in_flash;
+}
+
+static void sifive_u_machine_set_start_in_flash(Object *obj, bool value, Error 
**errp)
+{
+SiFiveUState *s = RISCV_U_MACHINE(obj);
+
+s->start_in_flash = value;
+}
+
+static void sifive_u_machine_instance_init(Object *obj)
+{
+SiFiveUState *s = RISCV_U_MACHINE(obj);
+
+s->start_in_flash = false;
+object_property_add_bool(obj, "start-in-flash", 
sifive_u_machine_get_start_in_flash,
+ sifive_u_machine_set_start_in_flash, NULL);
+object_property_set_description(obj, "start-in-flash",
+"Set on to tell QEMU's ROM to jump to " \
+"flash. Otherwise QEMU will jump to DRAM",
+NULL);
+}
+
+
+static void sifive_u_machine_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+
+mc->desc = "RISC-V Board compatible with SiFive U SDK";
+mc->init = sifive_u_machine_init;
+mc->max_cpus = SIFIVE_U_MANAGEMENT_CPU_COUNT + SIFIVE_U_COMPUTE_CPU_COUNT;
+mc->min_cpus = SIFIVE_U_MANAGEMENT_CPU_COUNT + 1;
+mc->default_cpus = mc->min_cpus;
+}
+
+static const TypeInfo sifive_u_machine_typeinfo = {
+.name   = MACHINE_TYPE_NAME("sifive_u"),
+.parent = TYPE_MACHINE,
+.class_init = sifive_u_machine_class_init,
+.instance_init = sifive_u_machine_instance_init,
+.instance_size = sizeof(SiFiveUState),
+};
+
+static void sifive_u_machine_init_register_types(void)
+{
+type_register_static(_u_machine_typeinfo);
+}
+
+type_init(sifive_u_machine_init_register_types)
+
 static void riscv_sifive_u_soc_init(Object *obj)
 {
 MachineState *ms = MACHINE(qdev_get_machine());
@@ -443,33 +497,6 @@ static void riscv_sifive_u_soc_init(Object *obj)
   TYPE_CADENCE_GEM);
 }
 
-static bool sifive_u_get_start_in_flash(Object *obj, Error **errp)
-{
-SiFiveUState *s = RISCV_U_MACHINE(obj);
-
-return s->start_in_flash;
-}
-
-static void sifive_u_set_start_in_flash(Object *obj, bool value, Error **errp)
-{
-SiFiveUState *s = RISCV_U_MACHINE(obj);
-
-s->start_in_flash = value;
-}
-
-static void riscv_sifive_u_machine_instance_init(Object *obj)
-{
-SiFiveUState *s = RISCV_U_MACHINE(obj);
-
-s->start_in_flash = false;
-object_property_add_bool(obj, "start-in-flash", 
sifive_u_get_start_in_flash,
- sifive_u_set_start_in_flash, NULL);
-object_property_set_description(obj, "start-in-flash",
-"Set on to tell QEMU's ROM to jump to " \
-"flash. Otherwise QEMU will jump to DRAM",
-NULL);
-}
-
 static void riscv_sifive_u_soc_realize(DeviceState *dev, Error **errp)
 {
 MachineState *ms = MACHINE(qdev_get_machine());
@@ -607,29 +634,3 @@ static void riscv_sifive_u_soc_register_types(void)
 }
 
 type_init(riscv_sifive_u_soc_register_types)
-
-static void riscv_sifive_u_machine_class_init(ObjectClass *oc, void *data)
-{
-MachineClass *mc = MACHINE_CLASS(oc);
-
-mc->desc = "RISC-V Board compatible with SiFive U SDK";
-mc->init = riscv_sifive_u_init;
-mc->max_cpus = SIFIVE_U_MANAGEMENT_CPU_COUNT + SIFIVE_U_COMPUTE_CPU_COUNT;
-mc->min_cpus = SIFIVE_U_MANAGEMENT_CPU_COUNT + 1;
-mc->default_cpus = mc->min_cpus;
-}
-
-static const TypeInfo riscv_sifive_u_machine_typeinfo = {
-.name   = MACHINE_TYPE_NAME("sifive_u"),
-.parent = TYPE_MACHINE,
-.class_init = riscv_sifive_u_machine_class_init,
-.instance_init = riscv_sifive_u_machine_instance_init,
-.instance_size = sizeof(SiFiveUState),
-};
-
-static void riscv_sifive_u_machine_init_register_types(void)
-{
-type_register_static(_sifive_u_machine_typeinfo);
-}
-
-type_init(riscv_sifive_u_machine_init_register_types)
-- 
2.26.1.301.g55bc3eb7cb9-goog

[PULL 3/6] riscv: AND stage-1 and stage-2 protection flags

2020-04-21 Thread Palmer Dabbelt

From: Alistair Francis 

Take the result of stage-1 and stage-2 page table walks and AND the two
protection flags together. This way we require both to set permissions
instead of just stage-2.

Signed-off-by: Alistair Francis 
Reviewed-by: Richard Henderson 
Signed-off-by: Palmer Dabbelt 
---
 target/riscv/cpu_helper.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 48e112808b..700ef052b0 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -705,7 +705,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 #ifndef CONFIG_USER_ONLY
 vaddr im_address;
 hwaddr pa = 0;
-int prot;
+int prot, prot2;
 bool pmp_violation = false;
 bool m_mode_two_stage = false;
 bool hs_mode_two_stage = false;
@@ -755,13 +755,15 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 /* Second stage lookup */
 im_address = pa;
 
-ret = get_physical_address(env, , , im_address,
+ret = get_physical_address(env, , , im_address,
access_type, mmu_idx, false, true);
 
 qemu_log_mask(CPU_LOG_MMU,
 "%s 2nd-stage address=%" VADDR_PRIx " ret %d physical "
 TARGET_FMT_plx " prot %d\n",
-__func__, im_address, ret, pa, prot);
+__func__, im_address, ret, pa, prot2);
+
+prot &= prot2;
 
 if (riscv_feature(env, RISCV_FEATURE_PMP) &&
 (ret == TRANSLATE_SUCCESS) &&
-- 
2.26.1.301.g55bc3eb7cb9-goog

[PULL 5/6] riscv/sifive_u: Add a serial property to the sifive_u SoC

2020-04-21 Thread Palmer Dabbelt

From: Alistair Francis 

At present the board serial number is hard-coded to 1, and passed
to OTP model during initialization. Firmware (FSBL, U-Boot) uses
the serial number to generate a unique MAC address for the on-chip
ethernet controller. When multiple QEMU 'sifive_u' instances are
created and connected to the same subnet, they all have the same
MAC address hence it creates a unusable network.

A new "serial" property is introduced to the sifive_u SoC to specify
the board serial number. When not given, the default serial number
1 is used.

Suggested-by: Bin Meng 
Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
Tested-by: Bin Meng 
Signed-off-by: Palmer Dabbelt 
---
 hw/riscv/sifive_u.c | 8 +++-
 include/hw/riscv/sifive_u.h | 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index d0ea6803db..9bfd16d2bb 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -492,7 +492,6 @@ static void riscv_sifive_u_soc_init(Object *obj)
   TYPE_SIFIVE_U_PRCI);
 sysbus_init_child_obj(obj, "otp", >otp, sizeof(s->otp),
   TYPE_SIFIVE_U_OTP);
-qdev_prop_set_uint32(DEVICE(>otp), "serial", OTP_SERIAL);
 sysbus_init_child_obj(obj, "gem", >gem, sizeof(s->gem),
   TYPE_CADENCE_GEM);
 }
@@ -585,6 +584,7 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, 
Error **errp)
 object_property_set_bool(OBJECT(>prci), true, "realized", );
 sysbus_mmio_map(SYS_BUS_DEVICE(>prci), 0, memmap[SIFIVE_U_PRCI].base);
 
+qdev_prop_set_uint32(DEVICE(>otp), "serial", s->serial);
 object_property_set_bool(OBJECT(>otp), true, "realized", );
 sysbus_mmio_map(SYS_BUS_DEVICE(>otp), 0, memmap[SIFIVE_U_OTP].base);
 
@@ -611,10 +611,16 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, 
Error **errp)
 memmap[SIFIVE_U_GEM_MGMT].base, memmap[SIFIVE_U_GEM_MGMT].size);
 }
 
+static Property riscv_sifive_u_soc_props[] = {
+DEFINE_PROP_UINT32("serial", SiFiveUSoCState, serial, OTP_SERIAL),
+DEFINE_PROP_END_OF_LIST()
+};
+
 static void riscv_sifive_u_soc_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 
+device_class_set_props(dc, riscv_sifive_u_soc_props);
 dc->realize = riscv_sifive_u_soc_realize;
 /* Reason: Uses serial_hds in realize function, thus can't be used twice */
 dc->user_creatable = false;
diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
index 82667b5746..a2baa1de5f 100644
--- a/include/hw/riscv/sifive_u.h
+++ b/include/hw/riscv/sifive_u.h
@@ -42,6 +42,8 @@ typedef struct SiFiveUSoCState {
 SiFiveUPRCIState prci;
 SiFiveUOTPState otp;
 CadenceGEMState gem;
+
+uint32_t serial;
 } SiFiveUSoCState;
 
 #define TYPE_RISCV_U_MACHINE MACHINE_TYPE_NAME("sifive_u")
-- 
2.26.1.301.g55bc3eb7cb9-goog

Re: [PATCH for-5.0?] slirp: update to fix CVE-2020-1983

2020-04-21 Thread Peter Maydell

On Tue, 21 Apr 2020 at 18:03, Marc-André Lureau
 wrote:
>
> This is an update on the stable-4.2 branch of libslirp.git:
>
> git shortlog 55ab21c9a3..2faae0f778f81
>
> Marc-André Lureau (1):
>   Fix use-afte-free in ip_reass() (CVE-2020-1983)
>
> CVE-2020-1983 is actually a follow up fix for commit
> 126c04acbabd7ad32c2b018fe10dfac2a3bc1210 ("Fix heap overflow in
> ip_reass on big packet input") which was was included in qemu
> v4.1 (commit e1a4a24d262ba5ac74ea1795adb3ab1cd574c7fb "slirp: update
> with CVE-2019-14378 fix").
>
> Signed-off-by: Marc-André Lureau 

Hi; thanks for putting together this stable-branch update.
I've run it through my test setup and it's fine; I'm just
going to wait a little until I push it to master just in case
anybody wants to speak up with an opinion/objection.
I'll do that tomorrow afternoon UK time and then tag rc4.

thanks
-- PMM

Re: [PATCH v1 1/2] migration/xbzrle: replace transferred xbzrle bytes with encoded bytes

2020-04-21 Thread Dr. David Alan Gilbert

* Wei Wang (wei.w.w...@intel.com) wrote:
> Like compressed_size which indicates how many bytes are compressed, we
> need encoded_size to understand how many bytes are encoded with xbzrle
> during migration.
> 
> Replace the old xbzrle_counter.bytes, instead of adding a new counter,
> because we don't find a usage of xbzrle_counter.bytes currently, which
> includes 3 more bytes of the migration transfer protocol header (in
> addition to the encoding header). The encoded_size will further be used
> to calculate the encoding rate.
> 
> Signed-off-by: Yi Sun 
> Signed-off-by: Wei Wang 

Can you explain why these 3 bytes matter?  Certainly the 2 bytes of the
encoded_len are an overhead that's a cost of using XBZRLE; so if you're
trying to figure out whether xbzrle is worth it, then you should include
those 2 bytes in the cost.
That other byte, that holds ENCODING_FLAG_XBZRLE also seems to be pure
oerhead of XBZRLE; so your cost of using XBZRLE really does include
those 3 bytes.

SO to me it makes sense to include the 3 bytes as it currently does.

Dave

> ---
>  migration/migration.c |  2 +-
>  migration/ram.c   | 18 +-
>  monitor/hmp-cmds.c|  4 ++--
>  qapi/migration.json   |  6 +++---
>  4 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 187ac0410c..8e7ee0d76b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -926,7 +926,7 @@ static void populate_ram_info(MigrationInfo *info, 
> MigrationState *s)
>  info->has_xbzrle_cache = true;
>  info->xbzrle_cache = g_malloc0(sizeof(*info->xbzrle_cache));
>  info->xbzrle_cache->cache_size = migrate_xbzrle_cache_size();
> -info->xbzrle_cache->bytes = xbzrle_counters.bytes;
> +info->xbzrle_cache->encoded_size = xbzrle_counters.encoded_size;
>  info->xbzrle_cache->pages = xbzrle_counters.pages;
>  info->xbzrle_cache->cache_miss = xbzrle_counters.cache_miss;
>  info->xbzrle_cache->cache_miss_rate = 
> xbzrle_counters.cache_miss_rate;
> diff --git a/migration/ram.c b/migration/ram.c
> index 04f13feb2e..bca5878f25 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -677,7 +677,7 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
> **current_data,
>  ram_addr_t current_addr, RAMBlock *block,
>  ram_addr_t offset, bool last_stage)
>  {
> -int encoded_len = 0, bytes_xbzrle;
> +int encoded_size = 0, bytes_xbzrle;
>  uint8_t *prev_cached_page;
>  
>  if (!cache_is_cached(XBZRLE.cache, current_addr,
> @@ -702,7 +702,7 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
> **current_data,
>  memcpy(XBZRLE.current_buf, *current_data, TARGET_PAGE_SIZE);
>  
>  /* XBZRLE encoding (if there is no overflow) */
> -encoded_len = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
> +encoded_size = xbzrle_encode_buffer(prev_cached_page, XBZRLE.current_buf,
> TARGET_PAGE_SIZE, XBZRLE.encoded_buf,
> TARGET_PAGE_SIZE);
>  
> @@ -710,7 +710,7 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
> **current_data,
>   * Update the cache contents, so that it corresponds to the data
>   * sent, in all cases except where we skip the page.
>   */
> -if (!last_stage && encoded_len != 0) {
> +if (!last_stage && encoded_size != 0) {
>  memcpy(prev_cached_page, XBZRLE.current_buf, TARGET_PAGE_SIZE);
>  /*
>   * In the case where we couldn't compress, ensure that the caller
> @@ -720,10 +720,10 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
> **current_data,
>  *current_data = prev_cached_page;
>  }
>  
> -if (encoded_len == 0) {
> +if (encoded_size == 0) {
>  trace_save_xbzrle_page_skipping();
>  return 0;
> -} else if (encoded_len == -1) {
> +} else if (encoded_size == -1) {
>  trace_save_xbzrle_page_overflow();
>  xbzrle_counters.overflow++;
>  return -1;
> @@ -733,11 +733,11 @@ static int save_xbzrle_page(RAMState *rs, uint8_t 
> **current_data,
>  bytes_xbzrle = save_page_header(rs, rs->f, block,
>  offset | RAM_SAVE_FLAG_XBZRLE);
>  qemu_put_byte(rs->f, ENCODING_FLAG_XBZRLE);
> -qemu_put_be16(rs->f, encoded_len);
> -qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_len);
> -bytes_xbzrle += encoded_len + 1 + 2;
> +qemu_put_be16(rs->f, encoded_size);
> +qemu_put_buffer(rs->f, XBZRLE.encoded_buf, encoded_size);
> +bytes_xbzrle += encoded_size + 1 + 2;
>  xbzrle_counters.pages++;
> -xbzrle_counters.bytes += bytes_xbzrle;
> +xbzrle_counters.encoded_size += encoded_size;
>  ram_counters.transferred += bytes_xbzrle;
>  
>  return 1;
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 9b94e67879..6d3b35ca64 100644
> ---

[PULL 2/6] riscv: Don't use stage-2 PTE lookup protection flags

2020-04-21 Thread Palmer Dabbelt

From: Alistair Francis 

When doing the fist of a two stage lookup (Hypervisor extensions) don't
set the current protection flags from the second stage lookup of the
base address PTE.

Signed-off-by: Alistair Francis 
Reviewed-by: Richard Henderson 
Signed-off-by: Palmer Dabbelt 
---
 target/riscv/cpu_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index e2da2a4787..48e112808b 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -452,10 +452,11 @@ restart:
 hwaddr pte_addr;
 
 if (two_stage && first_stage) {
+int vbase_prot;
 hwaddr vbase;
 
 /* Do the second stage translation on the base PTE address. */
-get_physical_address(env, , prot, base, access_type,
+get_physical_address(env, , _prot, base, access_type,
  mmu_idx, false, true);
 
 pte_addr = vbase + idx * ptesize;
-- 
2.26.1.301.g55bc3eb7cb9-goog

[PULL] RISC-V Patches for 5.0-rc4

2020-04-21 Thread Palmer Dabbelt

The following changes since commit 20038cd7a8412feeb49c01f6ede89e36c8995472:

  Update version for v5.0.0-rc3 release (2020-04-15 20:51:54 +0100)

are available in the Git repository at:

  g...@github.com:palmer-dabbelt/qemu.git tags/riscv-for-master-5.0-rc4

for you to fetch changes up to 8a7ce6ac908a8ef30d6a81fe8334c3c942670949:

  riscv/sifive_u: Add a serial property to the sifive_u machine (2020-04-21 
10:54:59 -0700)


RISC-V Patches for 5.0-rc4

This contains handful of patches that I'd like to target for 5.0.  I know it's
a bit late, I thought I'd already sent these out but must have managed to miss
doing so.  The patches include:

* A handful of fixes to PTE lookups related to H-mode support.
* The addition of a serial number fo the SiFive U implementetaion, which allows
  bootloaders to generate a sane MAC address.

These pass "make check" and boot Linux for me.


Peter: Sorry I dropped the ball here.  I can understand if it's too late for
5.0, but if there's still going to be an rc4 then I'd love to have these
included.

Alistair Francis (5):
  target/riscv: Don't set write permissions on dirty PTEs
  riscv: Don't use stage-2 PTE lookup protection flags
  riscv: AND stage-1 and stage-2 protection flags
  riscv/sifive_u: Fix up file ordering
  riscv/sifive_u: Add a serial property to the sifive_u SoC

Bin Meng (1):
  riscv/sifive_u: Add a serial property to the sifive_u machine

 hw/riscv/sifive_u.c | 137 ++--
 include/hw/riscv/sifive_u.h |   3 +
 target/riscv/cpu_helper.c   |  17 +++---
 3 files changed, 94 insertions(+), 63 deletions(-)

[PULL 1/6] target/riscv: Don't set write permissions on dirty PTEs

2020-04-21 Thread Palmer Dabbelt

From: Alistair Francis 

The RISC-V spec specifies that when a write happens and the D bit is
clear the implementation will set the bit in the PTE. It does not
describe that the PTE being dirty means that we should provide write
access. This patch removes the write access granted to pages when the
dirty bit is set.

Following the prot variable we can see that it affects all of these
functions:
 riscv_cpu_tlb_fill()
   tlb_set_page()
 tlb_set_page_with_attrs()
   address_space_translate_for_iotlb()

Looking at the cputlb code (tlb_set_page_with_attrs() and
address_space_translate_for_iotlb()) it looks like the main affect of
setting write permissions is that the page can be marked as TLB_NOTDIRTY.

I don't see any other impacts (related to the dirty bit) for giving a
page write permissions.

Setting write permission on dirty PTEs results in userspace inside a
Hypervisor guest (VU) becoming corrupted. This appears to be because it
ends up with write permission in the second stage translation in cases
where we aren't doing a store.

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
Signed-off-by: Palmer Dabbelt 
---
 target/riscv/cpu_helper.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index d3ba9efb02..e2da2a4787 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -572,10 +572,8 @@ restart:
 if ((pte & PTE_X)) {
 *prot |= PAGE_EXEC;
 }
-/* add write permission on stores or if the page is already dirty,
-   so that we TLB miss on later writes to update the dirty bit */
-if ((pte & PTE_W) &&
-(access_type == MMU_DATA_STORE || (pte & PTE_D))) {
+/* add write permission on stores */
+if ((pte & PTE_W) && (access_type == MMU_DATA_STORE)) {
 *prot |= PAGE_WRITE;
 }
 return TRANSLATE_SUCCESS;
-- 
2.26.1.301.g55bc3eb7cb9-goog

1 2 3 >

1 - 100 of 298 matches

Mail list logo