Re: [PATCH for-8.0] target/ppc: Fix temp usage in gen_op_arith_modw

2023-04-07 Thread Cédric Le Goater

On 4/7/23 20:36, Richard Henderson wrote:

Fix a crash writing to 't3', which is now a constant.
Instead, write the result of the remu to 'ret'.

Fixes: 7058ff5231a ("target/ppc: Avoid tcg_const_* in translate.c")
Reported-by: Nicholas Piggin 
Signed-off-by: Richard Henderson 
---
  target/ppc/translate.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 9d05357d03..906fc46723 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1807,8 +1807,8 @@ static inline void gen_op_arith_modw(DisasContext *ctx, 
TCGv ret, TCGv arg1,
  TCGv_i32 t2 = tcg_constant_i32(1);
  TCGv_i32 t3 = tcg_constant_i32(0);
  tcg_gen_movcond_i32(TCG_COND_EQ, t1, t1, t3, t2, t1);
-tcg_gen_remu_i32(t3, t0, t1);
-tcg_gen_extu_i32_tl(ret, t3);
+tcg_gen_remu_i32(ret, t0, t1);
+tcg_gen_extu_i32_tl(ret, ret);


These routines require a TCGv_i32 and ret is not on ppc64 :

../target/ppc/translate.c: In function ‘gen_op_arith_modw’:
../target/ppc/translate.c:1810:26: error: passing argument 1 of 
‘tcg_gen_remu_i32’ from incompatible pointer type 
[-Werror=incompatible-pointer-types]
 1810 | tcg_gen_remu_i32(ret, t0, t1);
  |  ^~~
  |  |
  |  TCGv_i64 {aka struct TCGv_i64_d *}

and

../target/ppc/translate.c:1811:34: error: passing argument 2 of 
‘tcg_gen_extu_i32_i64’ from incompatible pointer type 
[-Werror=incompatible-pointer-types]
 1811 | tcg_gen_extu_i32_tl(ret, ret);
  |  ^~~
  |  |
  |  TCGv_i64 {aka struct TCGv_i64_d *}


C.




[PATCH for-8.0] tcg/mips: Fix TCG_TARGET_CALL_RET_I128 for o32 abi

2023-04-07 Thread Richard Henderson
The return is by reference, not in 4 integer registers.

This error resulted in

  qemu-system-i386: tcg/mips/tcg-target.c.inc:140: \
tcg_target_call_oarg_reg: Assertion `slot >= 0 && slot <= 1' failed.

Fixes: 5427a9a7604 ("tcg: Add TCG_TARGET_CALL_{RET,ARG}_I128")
Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 88a8137bcc..88d45245e8 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -84,13 +84,14 @@ typedef enum {
 #if _MIPS_SIM == _ABIO32
 # define TCG_TARGET_CALL_STACK_OFFSET 16
 # define TCG_TARGET_CALL_ARG_I64  TCG_CALL_ARG_EVEN
+# define TCG_TARGET_CALL_RET_I128 TCG_CALL_RET_BY_REF
 #else
 # define TCG_TARGET_CALL_STACK_OFFSET 0
 # define TCG_TARGET_CALL_ARG_I64  TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_RET_I128 TCG_CALL_RET_NORMAL
 #endif
 #define TCG_TARGET_CALL_ARG_I32   TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I128  TCG_CALL_ARG_EVEN
-#define TCG_TARGET_CALL_RET_I128  TCG_CALL_RET_NORMAL
 
 /* MOVN/MOVZ instructions detection */
 #if (defined(__mips_isa_rev) && (__mips_isa_rev >= 1)) || \
-- 
2.34.1




Re: [RFC PATCH 09/10] target/riscv: Restrict KVM-specific fields from ArchCPU

2023-04-07 Thread Richard Henderson

On 4/7/23 21:28, Richard Henderson wrote:

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

These fields shouldn't be accessed when KVM is not available.

Signed-off-by: Philippe Mathieu-Daudé
---
RFC: The migration part is likely invalid...

kvmtimer_needed() is defined in target/riscv/machine.c as

   static bool kvmtimer_needed(void *opaque)
   {
   return kvm_enabled();
   }

which depends on a host feature.
---
  target/riscv/cpu.h | 2 ++
  target/riscv/machine.c | 4 
  2 files changed, 6 insertions(+)


Yeah, the kvm parts need to be extracted to their own subsection.


Oh, but they are.  Ho hum, it's getting late.


r~




Re: [RFC PATCH 09/10] target/riscv: Restrict KVM-specific fields from ArchCPU

2023-04-07 Thread Richard Henderson

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

These fields shouldn't be accessed when KVM is not available.

Signed-off-by: Philippe Mathieu-Daudé
---
RFC: The migration part is likely invalid...

kvmtimer_needed() is defined in target/riscv/machine.c as

   static bool kvmtimer_needed(void *opaque)
   {
   return kvm_enabled();
   }

which depends on a host feature.
---
  target/riscv/cpu.h | 2 ++
  target/riscv/machine.c | 4 
  2 files changed, 6 insertions(+)


Yeah, the kvm parts need to be extracted to their own subsection.


r~



Re: [PATCH 07/10] target/arm: Restrict KVM-specific fields from ArchCPU

2023-04-07 Thread Richard Henderson

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

These fields shouldn't be accessed when KVM is not available.

Signed-off-by: Philippe Mathieu-Daudé
---
  target/arm/cpu.h | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 06/10] target/arm: Reduce QMP header pressure by not including 'kvm_arm.h'

2023-04-07 Thread Richard Henderson

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

We only need "sysemu/kvm.h" for kvm_enabled() and "cpu.h"
for the QOM type definitions (TYPE_ARM_CPU). Avoid including
the heavy "kvm_arm.h" header.

Signed-off-by: Philippe Mathieu-Daudé
---
  target/arm/arm-qmp-cmds.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 05/10] hw/arm/sbsa-ref: Include missing 'sysemu/kvm.h' header

2023-04-07 Thread Richard Henderson

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

"sysemu/kvm.h" is indirectly pulled in. Explicit its
inclusion to avoid when refactoring include/:

   hw/arm/sbsa-ref.c:693:9: error: implicit declaration of function 
'kvm_enabled' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
 if (kvm_enabled()) {
 ^

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/arm/sbsa-ref.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 04/10] hw/intc/arm_gic: Rename 'first_cpu' argument

2023-04-07 Thread Richard Henderson

On 4/7/23 21:23, Richard Henderson wrote:

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

"hw/core/cpu.h" defines 'first_cpu' as QTAILQ_FIRST_RCU().

arm_gic_common_reset_irq_state() calls its second argument
'first_cpu', producing a build failure when "hw/core/cpu.h"
is included:

   hw/intc/arm_gic_common.c:238:68: warning: omitting the parameter name in a function 
definition is a C2x extension [-Wc2x-extensions]

 static inline void arm_gic_common_reset_irq_state(GICState *s, int 
first_cpu,
    ^
   include/hw/core/cpu.h:451:26: note: expanded from macro 'first_cpu'
 #define first_cpu    QTAILQ_FIRST_RCU()
  ^

KISS, rename the function argument.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/intc/arm_gic_common.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)


Wow, that's ugly.  But a reasonable work-around.


Duh.
Reviewed-by: Richard Henderson 


r~




Re: [PATCH 04/10] hw/intc/arm_gic: Rename 'first_cpu' argument

2023-04-07 Thread Richard Henderson

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

"hw/core/cpu.h" defines 'first_cpu' as QTAILQ_FIRST_RCU().

arm_gic_common_reset_irq_state() calls its second argument
'first_cpu', producing a build failure when "hw/core/cpu.h"
is included:

   hw/intc/arm_gic_common.c:238:68: warning: omitting the parameter name in a 
function definition is a C2x extension [-Wc2x-extensions]
 static inline void arm_gic_common_reset_irq_state(GICState *s, int 
first_cpu,
^
   include/hw/core/cpu.h:451:26: note: expanded from macro 'first_cpu'
 #define first_cpuQTAILQ_FIRST_RCU()
  ^

KISS, rename the function argument.

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/intc/arm_gic_common.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)


Wow, that's ugly.  But a reasonable work-around.


r~



Re: [PATCH 03/10] hw/intc/arm_gic: Un-inline GIC*/ITS class_name() helpers

2023-04-07 Thread Richard Henderson

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

"kvm_arm.h" contains external and internal prototype declarations.
Files under the hw/ directory should only access the KVM external
API.

In order to avoid machine / device models to include "kvm_arm.h"
simply to get the QOM GIC/ITS class name, un-inline each class
name getter to the proper device model file.

Signed-off-by: Philippe Mathieu-Daudé
---
  include/hw/intc/arm_gic.h  |  2 ++
  include/hw/intc/arm_gicv3_common.h | 10 ++
  include/hw/intc/arm_gicv3_its_common.h |  9 ++
  target/arm/kvm_arm.h   | 45 --
  hw/arm/virt-acpi-build.c   |  2 +-
  hw/arm/virt.c  |  1 +
  hw/intc/arm_gic_common.c   |  7 
  hw/intc/arm_gicv3_common.c | 14 
  hw/intc/arm_gicv3_its_common.c | 12 +++
  9 files changed, 56 insertions(+), 46 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH] target/arm: Fix debugging of ARMv8M Secure code

2023-04-07 Thread Richard Henderson

On 4/7/23 17:01, pbart...@amazon.com wrote:

From: Paul Bartell 

Revert changes to arm_cpu_get_phys_page_attrs_debug made in commit
4a35855682cebb89f9630b07aa9fd37c4e8c733b.

Commit 4a35855682 modifies the arm_cpu_get_phys_page_attrs_debug function
so that it calls get_phys_addr_with_struct rather than get_phys_addr, which
leads to a variety of memory access errors when debugging secure state
code on qemu ARMv8M targets with gdb.

This commit fixes a variety of gdb memory access errors including:
"error reading variable" and "Cannot access memory at address" when
attempting to read any memory address via gdb.

Signed-off-by: Paul Bartell 
---
  target/arm/ptw.c | 8 ++--
  1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index ec3f51782a..5a1339d38f 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2999,16 +2999,12 @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, 
vaddr addr,
  {
  ARMCPU *cpu = ARM_CPU(cs);
  CPUARMState *env = >env;
-S1Translate ptw = {
-.in_mmu_idx = arm_mmu_idx(env),
-.in_secure = arm_is_secure(env),
-.in_debug = true,


Nack.  This will now affect vcpu state by changing the contents of the softmmu tlb, as 
well as changing the contents of memory (!) via PTE access/dirty bit updates.


A more complete description of "a variety of ... errors", and the conditions under which 
they are produced, would be appreciated.


r~



Re: [PATCH 1/2] accel/tcg/plugin: export host insn size

2023-04-07 Thread Richard Henderson

On 4/6/23 00:46, Alex Bennée wrote:

If your aim is to examine JIT efficiency what is wrong with the current
"info jit" that you can access via the HMP? Also I'm wondering if its
time to remove the #ifdefs from CONFIG_PROFILER because I doubt the
extra data it collects is that expensive.

Richard, what do you think?


What is it that you want from CONFIG_PROFILER that you can't get from perf?
I've been tempted to remove CONFIG_PROFILER entirely.


r~



[PATCH 11/12] tcg/mips: Use qemu_build_not_reached for LO/HI_OFF

2023-04-07 Thread Richard Henderson
The new(ish) macro produces a compile-time error instead
of a link-time error.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index e7930963fc..1df00bf027 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -37,11 +37,9 @@
 # define LO_OFF  (MIPS_BE * 4)
 # define HI_OFF  (4 - LO_OFF)
 #else
-/* To assert at compile-time that these values are never used
-   for TCG_TARGET_REG_BITS == 64.  */
-int link_error(void);
-# define LO_OFF  link_error()
-# define HI_OFF  link_error()
+/* Assert at compile-time that these values are never used for 64-bit. */
+# define LO_OFF  ({ qemu_build_not_reached(); 0; })
+# define HI_OFF  ({ qemu_build_not_reached(); 0; })
 #endif
 
 #ifdef CONFIG_DEBUG_TCG
-- 
2.34.1




[PATCH 06/12] tcg/mips: Split out tcg_out_movi_two

2023-04-07 Thread Richard Henderson
Emit all 32-bit signed constants, which can be loaded in two insns.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index f0ae418ba6..78710a25bf 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -531,6 +531,22 @@ static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, 
tcg_target_long arg)
 return false;
 }
 
+static bool tcg_out_movi_two(TCGContext *s, TCGReg ret, tcg_target_long arg)
+{
+/*
+ * All signed 32-bit constants are loadable with two immediates,
+ * and everything else requires more work.
+ */
+if (arg == (int32_t)arg) {
+if (!tcg_out_movi_one(s, ret, arg)) {
+tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
+tcg_out_opc_imm(s, OPC_ORI, ret, ret, arg & 0x);
+}
+return true;
+}
+return false;
+}
+
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg)
 {
@@ -538,21 +554,18 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 arg = (int32_t)arg;
 }
 
-if (tcg_out_movi_one(s, ret, arg)) {
+/* Load all 32-bit constants. */
+if (tcg_out_movi_two(s, ret, arg)) {
 return;
 }
 
-if (TCG_TARGET_REG_BITS == 32 || arg == (int32_t)arg) {
-tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
+tcg_out_movi(s, TCG_TYPE_I32, ret, arg >> 31 >> 1);
+if (arg & 0xull) {
+tcg_out_dsll(s, ret, ret, 16);
+tcg_out_opc_imm(s, OPC_ORI, ret, ret, arg >> 16);
+tcg_out_dsll(s, ret, ret, 16);
 } else {
-tcg_out_movi(s, TCG_TYPE_I32, ret, arg >> 31 >> 1);
-if (arg & 0xull) {
-tcg_out_dsll(s, ret, ret, 16);
-tcg_out_opc_imm(s, OPC_ORI, ret, ret, arg >> 16);
-tcg_out_dsll(s, ret, ret, 16);
-} else {
-tcg_out_dsll(s, ret, ret, 32);
-}
+tcg_out_dsll(s, ret, ret, 32);
 }
 if (arg & 0x) {
 tcg_out_opc_imm(s, OPC_ORI, ret, ret, arg & 0x);
-- 
2.34.1




[PATCH 08/12] tcg/mips: Aggressively use the constant pool for n64 calls

2023-04-07 Thread Richard Henderson
Repeated calls to a single helper are common -- especially
the ones for softmmu memory access.  Prefer the constant pool
to longer sequences to increase sharing.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 235295c689..e37aca5986 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -1105,9 +1105,19 @@ static void tcg_out_movcond(TCGContext *s, TCGCond cond, 
TCGReg ret,
 
 static void tcg_out_call_int(TCGContext *s, const tcg_insn_unit *arg, bool 
tail)
 {
-/* Note that the ABI requires the called function's address to be
-   loaded into T9, even if a direct branch is in range.  */
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T9, (uintptr_t)arg);
+/*
+ * Note that __mips_abicalls requires the called function's address
+ * to be loaded into $25 (t9), even if a direct branch is in range.
+ *
+ * For n64, always drop the pointer into the constant pool.
+ * We can re-use helper addresses often and do not want any
+ * of the longer sequences tcg_out_movi may try.
+ */
+if (sizeof(uintptr_t) == 8) {
+tcg_out_movi_pool(s, TCG_REG_T9, (uintptr_t)arg, TCG_REG_TB);
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_T9, (uintptr_t)arg);
+}
 
 /* But do try a direct branch, allowing the cpu better insn prefetch.  */
 if (tail) {
-- 
2.34.1




[PATCH 07/12] tcg/mips: Use the constant pool for 64-bit constants

2023-04-07 Thread Richard Henderson
During normal processing, the constant pool is accessible via
TCG_REG_TB.  During the prologue, it is accessible via TCG_REG_T9.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.h |  1 +
 tcg/mips/tcg-target.c.inc | 65 +--
 2 files changed, 49 insertions(+), 17 deletions(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 8cdc803523..88a8137bcc 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -207,5 +207,6 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
 #define TCG_TARGET_NEED_LDST_LABELS
+#define TCG_TARGET_NEED_POOL_LABELS
 
 #endif
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 78710a25bf..235295c689 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -25,6 +25,7 @@
  */
 
 #include "../tcg-ldst.c.inc"
+#include "../tcg-pool.c.inc"
 
 #if HOST_BIG_ENDIAN
 # define MIPS_BE  1
@@ -168,9 +169,18 @@ static bool reloc_pc16(tcg_insn_unit *src_rw, const 
tcg_insn_unit *target)
 static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
-tcg_debug_assert(type == R_MIPS_PC16);
-tcg_debug_assert(addend == 0);
-return reloc_pc16(code_ptr, (const tcg_insn_unit *)value);
+value += addend;
+switch (type) {
+case R_MIPS_PC16:
+return reloc_pc16(code_ptr, (const tcg_insn_unit *)value);
+case R_MIPS_16:
+if (value != (int16_t)value) {
+return false;
+}
+*code_ptr = deposit32(*code_ptr, 0, 16, value);
+return true;
+}
+g_assert_not_reached();
 }
 
 #define TCG_CT_CONST_ZERO 0x100
@@ -490,6 +500,11 @@ static void tcg_out_nop(TCGContext *s)
 tcg_out32(s, 0);
 }
 
+static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
+{
+memset(p, 0, count * sizeof(tcg_insn_unit));
+}
+
 static void tcg_out_dsll(TCGContext *s, TCGReg rd, TCGReg rt, TCGArg sa)
 {
 tcg_out_opc_sa64(s, OPC_DSLL, OPC_DSLL32, rd, rt, sa);
@@ -547,8 +562,15 @@ static bool tcg_out_movi_two(TCGContext *s, TCGReg ret, 
tcg_target_long arg)
 return false;
 }
 
-static void tcg_out_movi(TCGContext *s, TCGType type,
- TCGReg ret, tcg_target_long arg)
+static void tcg_out_movi_pool(TCGContext *s, TCGReg ret,
+  tcg_target_long arg, TCGReg tbreg)
+{
+new_pool_label(s, arg, R_MIPS_16, s->code_ptr, tcg_tbrel_diff(s, NULL));
+tcg_out_opc_imm(s, OPC_LD, ret, tbreg, 0);
+}
+
+static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
+ tcg_target_long arg, TCGReg tbreg)
 {
 if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
 arg = (int32_t)arg;
@@ -558,18 +580,17 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 if (tcg_out_movi_two(s, ret, arg)) {
 return;
 }
+assert(TCG_TARGET_REG_BITS == 64);
 
-tcg_out_movi(s, TCG_TYPE_I32, ret, arg >> 31 >> 1);
-if (arg & 0xull) {
-tcg_out_dsll(s, ret, ret, 16);
-tcg_out_opc_imm(s, OPC_ORI, ret, ret, arg >> 16);
-tcg_out_dsll(s, ret, ret, 16);
-} else {
-tcg_out_dsll(s, ret, ret, 32);
-}
-if (arg & 0x) {
-tcg_out_opc_imm(s, OPC_ORI, ret, ret, arg & 0x);
-}
+/* Otherwise, put 64-bit constants into the constant pool. */
+tcg_out_movi_pool(s, ret, arg, tbreg);
+}
+
+static void tcg_out_movi(TCGContext *s, TCGType type,
+ TCGReg ret, tcg_target_long arg)
+{
+TCGReg tbreg = TCG_TARGET_REG_BITS == 64 ? TCG_REG_TB : 0;
+tcg_out_movi_int(s, type, ret, arg, tbreg);
 }
 
 static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rs)
@@ -2693,10 +2714,20 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 #ifndef CONFIG_SOFTMMU
 if (guest_base != (int16_t)guest_base) {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
+/*
+ * The function call abi for n32 and n64 will have loaded $25 (t9)
+ * with the address of the prologue, so we can use that instead
+ * of TCG_REG_TB.
+ */
+#if TCG_TARGET_REG_BITS == 64 && !defined(__mips_abicalls)
+# error "Unknown mips abi"
+#endif
+tcg_out_movi_int(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base,
+ TCG_TARGET_REG_BITS == 64 ? TCG_REG_T9 : 0);
 tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
 }
 #endif
+
 if (TCG_TARGET_REG_BITS == 64) {
 tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, tcg_target_call_iarg_regs[1]);
 }
-- 
2.34.1




[PATCH 10/12] tcg/mips: Try three insns with shift and add in tcg_out_movi

2023-04-07 Thread Richard Henderson
These sequences are inexpensive to test.  Maxing out at three insns
results in the same space as a load plus the constant pool entry.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 44 +++
 1 file changed, 44 insertions(+)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 8c9a4cba9b..e7930963fc 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -573,6 +573,7 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, 
TCGReg ret,
  tcg_target_long arg, TCGReg tbreg)
 {
 tcg_target_long tmp;
+int sh, lo;
 
 if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
 arg = (int32_t)arg;
@@ -595,6 +596,49 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, 
TCGReg ret,
 return;
 }
 
+/*
+ * Load bitmasks with a right-shift.  This is good for things
+ * like 0x0fff___fff0: ADDUI r,0,0xff00 + DSRL r,r,4.
+ * or similarly using LUI.  For this to work, bit 31 must be set.
+ */
+if (arg > 0 && (int32_t)arg < 0) {
+sh = clz64(arg);
+if (tcg_out_movi_one(s, ret, arg << sh)) {
+tcg_out_dsrl(s, ret, ret, sh);
+return;
+}
+}
+
+/*
+ * Load slightly larger constants using left-shift.
+ * Limit this sequence to 3 insns to avoid too much expansion.
+ */
+sh = ctz64(arg);
+if (sh && tcg_out_movi_two(s, ret, arg >> sh)) {
+tcg_out_dsll(s, ret, ret, sh);
+return;
+}
+
+/*
+ * Load slightly larger constants using left-shift and add/or.
+ * Prefer addi with a negative immediate when that would produce
+ * a larger shift.  For this to work, bits 15 and 16 must be set.
+ */
+lo = arg & 0x;
+if (lo) {
+if ((arg & 0x18000) == 0x18000) {
+lo = (int16_t)arg;
+}
+tmp = arg - lo;
+sh = ctz64(tmp);
+tmp >>= sh;
+if (tcg_out_movi_one(s, ret, tmp)) {
+tcg_out_dsll(s, ret, ret, sh);
+tcg_out_opc_imm(s, lo < 0 ? OPC_DADDIU : OPC_ORI, ret, ret, lo);
+return;
+}
+}
+
 /* Otherwise, put 64-bit constants into the constant pool. */
 tcg_out_movi_pool(s, ret, arg, tbreg);
 }
-- 
2.34.1




[PATCH 09/12] tcg/mips: Try tb-relative addresses in tcg_out_movi

2023-04-07 Thread Richard Henderson
These addresses are often loaded by the qemu_ld/st slow path,
for loading the retaddr value.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index e37aca5986..8c9a4cba9b 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -572,6 +572,8 @@ static void tcg_out_movi_pool(TCGContext *s, TCGReg ret,
 static void tcg_out_movi_int(TCGContext *s, TCGType type, TCGReg ret,
  tcg_target_long arg, TCGReg tbreg)
 {
+tcg_target_long tmp;
+
 if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
 arg = (int32_t)arg;
 }
@@ -582,6 +584,17 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, 
TCGReg ret,
 }
 assert(TCG_TARGET_REG_BITS == 64);
 
+/* Load addresses within 2GB of TB with 1 or 3 insns. */
+tmp = tcg_tbrel_diff(s, (void *)arg);
+if (tmp == (int16_t)tmp) {
+tcg_out_opc_imm(s, OPC_DADDIU, ret, tbreg, tmp);
+return;
+}
+if (tcg_out_movi_two(s, ret, tmp)) {
+tcg_out_opc_reg(s, OPC_DADDU, ret, ret, tbreg);
+return;
+}
+
 /* Otherwise, put 64-bit constants into the constant pool. */
 tcg_out_movi_pool(s, ret, arg, tbreg);
 }
-- 
2.34.1




[PATCH 12/12] tcg/mips: Replace MIPS_BE with HOST_BIG_ENDIAN

2023-04-07 Thread Richard Henderson
Since e03b56863d2b, which replaced HOST_WORDS_BIGENDIAN
with HOST_BIG_ENDIAN, there is no need to define a second
symbol which is [0,1].

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 88 ++-
 1 file changed, 41 insertions(+), 47 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 1df00bf027..9767065af0 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -27,14 +27,8 @@
 #include "../tcg-ldst.c.inc"
 #include "../tcg-pool.c.inc"
 
-#if HOST_BIG_ENDIAN
-# define MIPS_BE  1
-#else
-# define MIPS_BE  0
-#endif
-
 #if TCG_TARGET_REG_BITS == 32
-# define LO_OFF  (MIPS_BE * 4)
+# define LO_OFF  (HOST_BIG_ENDIAN * 4)
 # define HI_OFF  (4 - LO_OFF)
 #else
 /* Assert at compile-time that these values are never used for 64-bit. */
@@ -1354,7 +1348,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
 /* We eliminated V0 from the possible output registers, so it
cannot be clobbered here.  So we must move V1 first.  */
-if (MIPS_BE) {
+if (HOST_BIG_ENDIAN) {
 tcg_out_mov(s, TCG_TYPE_I32, v0, TCG_REG_V1);
 v0 = l->datahi_reg;
 } else {
@@ -1438,8 +1432,8 @@ static bool tcg_out_fail_alignment(TCGContext *s, 
TCGLabelQemuLdst *l)
 
 if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
 /* A0 is env, A1 is skipped, A2:A3 is the uint64_t address. */
-TCGReg a2 = MIPS_BE ? l->addrhi_reg : l->addrlo_reg;
-TCGReg a3 = MIPS_BE ? l->addrlo_reg : l->addrhi_reg;
+TCGReg a2 = HOST_BIG_ENDIAN ? l->addrhi_reg : l->addrlo_reg;
+TCGReg a3 = HOST_BIG_ENDIAN ? l->addrlo_reg : l->addrhi_reg;
 
 if (a3 != TCG_REG_A2) {
 tcg_out_mov(s, TCG_TYPE_I32, TCG_REG_A2, a2);
@@ -1551,8 +1545,8 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
lo, TCGReg hi,
 tcg_out_opc_imm(s, OPC_LW, TCG_TMP1, base, 4);
 tcg_out_opc_reg(s, OPC_WSBH, TCG_TMP0, 0, TCG_TMP0);
 tcg_out_opc_reg(s, OPC_WSBH, TCG_TMP1, 0, TCG_TMP1);
-tcg_out_opc_sa(s, OPC_ROTR, MIPS_BE ? lo : hi, TCG_TMP0, 16);
-tcg_out_opc_sa(s, OPC_ROTR, MIPS_BE ? hi : lo, TCG_TMP1, 16);
+tcg_out_opc_sa(s, OPC_ROTR, HOST_BIG_ENDIAN ? lo : hi, TCG_TMP0, 
16);
+tcg_out_opc_sa(s, OPC_ROTR, HOST_BIG_ENDIAN ? hi : lo, TCG_TMP1, 
16);
 } else {
 tcg_out_bswap_subr(s, bswap32_addr);
 /* delay slot */
@@ -1560,15 +1554,15 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, 
TCGReg lo, TCGReg hi,
 tcg_out_opc_imm(s, OPC_LW, TCG_TMP0, base, 4);
 tcg_out_bswap_subr(s, bswap32_addr);
 /* delay slot */
-tcg_out_mov(s, TCG_TYPE_I32, MIPS_BE ? lo : hi, TCG_TMP3);
-tcg_out_mov(s, TCG_TYPE_I32, MIPS_BE ? hi : lo, TCG_TMP3);
+tcg_out_mov(s, TCG_TYPE_I32, HOST_BIG_ENDIAN ? lo : hi, TCG_TMP3);
+tcg_out_mov(s, TCG_TYPE_I32, HOST_BIG_ENDIAN ? hi : lo, TCG_TMP3);
 }
 break;
 case MO_UQ:
 /* Prefer to load from offset 0 first, but allow for overlap.  */
 if (TCG_TARGET_REG_BITS == 64) {
 tcg_out_opc_imm(s, OPC_LD, lo, base, 0);
-} else if (MIPS_BE ? hi != base : lo == base) {
+} else if (HOST_BIG_ENDIAN ? hi != base : lo == base) {
 tcg_out_opc_imm(s, OPC_LW, hi, base, HI_OFF);
 tcg_out_opc_imm(s, OPC_LW, lo, base, LO_OFF);
 } else {
@@ -1584,10 +1578,10 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, 
TCGReg lo, TCGReg hi,
 static void tcg_out_qemu_ld_unalign(TCGContext *s, TCGReg lo, TCGReg hi,
 TCGReg base, MemOp opc, TCGType type)
 {
-const MIPSInsn lw1 = MIPS_BE ? OPC_LWL : OPC_LWR;
-const MIPSInsn lw2 = MIPS_BE ? OPC_LWR : OPC_LWL;
-const MIPSInsn ld1 = MIPS_BE ? OPC_LDL : OPC_LDR;
-const MIPSInsn ld2 = MIPS_BE ? OPC_LDR : OPC_LDL;
+const MIPSInsn lw1 = HOST_BIG_ENDIAN ? OPC_LWL : OPC_LWR;
+const MIPSInsn lw2 = HOST_BIG_ENDIAN ? OPC_LWR : OPC_LWL;
+const MIPSInsn ld1 = HOST_BIG_ENDIAN ? OPC_LDL : OPC_LDR;
+const MIPSInsn ld2 = HOST_BIG_ENDIAN ? OPC_LDR : OPC_LDL;
 
 bool sgn = (opc & MO_SIGN);
 
@@ -1653,10 +1647,10 @@ static void tcg_out_qemu_ld_unalign(TCGContext *s, 
TCGReg lo, TCGReg hi,
 tcg_out_opc_imm(s, ld1, lo, base, 0);
 tcg_out_opc_imm(s, ld2, lo, base, 7);
 } else {
-tcg_out_opc_imm(s, lw1, MIPS_BE ? hi : lo, base, 0 + 0);
-tcg_out_opc_imm(s, lw2, MIPS_BE ? hi : lo, base, 0 + 3);
-tcg_out_opc_imm(s, lw1, MIPS_BE ? lo : hi, base, 4 + 0);
-tcg_out_opc_imm(s, lw2, MIPS_BE ? lo : hi, base, 4 + 3);
+tcg_out_opc_imm(s, lw1, HOST_BIG_ENDIAN ? hi : lo, base, 0 + 0);
+tcg_out_opc_imm(s, lw2, HOST_BIG_ENDIAN ? hi 

[PATCH 03/12] tcg/mips: Unify TCG_GUEST_BASE_REG tests

2023-04-07 Thread Richard Henderson
In tcg_out_qemu_ld/st, we already check for guest_base matching int16_t.
Mirror that when setting up TCG_GUEST_BASE_REG in the prologue.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index cc4ca2ddbe..0ade890ade 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -2622,7 +2622,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 }
 
 #ifndef CONFIG_SOFTMMU
-if (guest_base) {
+if (guest_base != (int16_t)guest_base) {
 tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
 tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
 }
-- 
2.34.1




[PATCH 01/12] tcg/mips: Move TCG_AREG0 to S8

2023-04-07 Thread Richard Henderson
No functional change; just moving the saved reserved regs to the end.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.h | 2 +-
 tcg/mips/tcg-target.c.inc | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 68b11e4d48..8cdc803523 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -76,7 +76,7 @@ typedef enum {
 TCG_REG_RA,
 
 TCG_REG_CALL_STACK = TCG_REG_SP,
-TCG_AREG0 = TCG_REG_S0,
+TCG_AREG0 = TCG_REG_S8,
 } TCGReg;
 
 /* used for function call generation */
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 34908c799a..c24b780818 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -2493,7 +2493,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 }
 
 static const int tcg_target_callee_save_regs[] = {
-TCG_REG_S0,   /* used for the global env (TCG_AREG0) */
+TCG_REG_S0,
 TCG_REG_S1,
 TCG_REG_S2,
 TCG_REG_S3,
@@ -2501,7 +2501,7 @@ static const int tcg_target_callee_save_regs[] = {
 TCG_REG_S5,
 TCG_REG_S6,
 TCG_REG_S7,
-TCG_REG_S8,
+TCG_REG_S8,   /* used for the global env (TCG_AREG0) */
 TCG_REG_RA,   /* should be last for ABI compliance */
 };
 
-- 
2.34.1




[PATCH 05/12] tcg/mips: Split out tcg_out_movi_one

2023-04-07 Thread Richard Henderson
Emit all constants that can be loaded in exactly one insn.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index c2f8d6550b..f0ae418ba6 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -514,20 +514,34 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, 
TCGReg ret, TCGReg arg)
 return true;
 }
 
+static bool tcg_out_movi_one(TCGContext *s, TCGReg ret, tcg_target_long arg)
+{
+if (arg == (int16_t)arg) {
+tcg_out_opc_imm(s, OPC_ADDIU, ret, TCG_REG_ZERO, arg);
+return true;
+}
+if (arg == (uint16_t)arg) {
+tcg_out_opc_imm(s, OPC_ORI, ret, TCG_REG_ZERO, arg);
+return true;
+}
+if (arg == (int32_t)arg && (arg & 0x) == 0) {
+tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
+return true;
+}
+return false;
+}
+
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg)
 {
 if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
 arg = (int32_t)arg;
 }
-if (arg == (int16_t)arg) {
-tcg_out_opc_imm(s, OPC_ADDIU, ret, TCG_REG_ZERO, arg);
-return;
-}
-if (arg == (uint16_t)arg) {
-tcg_out_opc_imm(s, OPC_ORI, ret, TCG_REG_ZERO, arg);
+
+if (tcg_out_movi_one(s, ret, arg)) {
 return;
 }
+
 if (TCG_TARGET_REG_BITS == 32 || arg == (int32_t)arg) {
 tcg_out_opc_imm(s, OPC_LUI, ret, TCG_REG_ZERO, arg >> 16);
 } else {
-- 
2.34.1




[PATCH 02/12] tcg/mips: Move TCG_GUEST_BASE_REG to S7

2023-04-07 Thread Richard Henderson
No functional change; just moving the saved reserved regs to the end.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index c24b780818..cc4ca2ddbe 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -86,7 +86,7 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #define TCG_TMP3  TCG_REG_T7
 
 #ifndef CONFIG_SOFTMMU
-#define TCG_GUEST_BASE_REG TCG_REG_S1
+#define TCG_GUEST_BASE_REG TCG_REG_S7
 #endif
 
 /* check if we really need so many registers :P */
@@ -2500,7 +2500,7 @@ static const int tcg_target_callee_save_regs[] = {
 TCG_REG_S4,
 TCG_REG_S5,
 TCG_REG_S6,
-TCG_REG_S7,
+TCG_REG_S7,   /* used for guest_base */
 TCG_REG_S8,   /* used for the global env (TCG_AREG0) */
 TCG_REG_RA,   /* should be last for ABI compliance */
 };
-- 
2.34.1




[PATCH 04/12] tcg/mips: Create and use TCG_REG_TB

2023-04-07 Thread Richard Henderson
This vastly reduces the size of code generated for 64-bit addresses.
The code for exit_tb, for instance, where we load a (tagged) pointer
to the current TB, goes from

0x400aa9725c:  li   v0,64
0x400aa97260:  dsll v0,v0,0x10
0x400aa97264:  ori  v0,v0,0xaa9
0x400aa97268:  dsll v0,v0,0x10
0x400aa9726c:  j0x400aa9703c
0x400aa97270:  ori  v0,v0,0x7083

to

0x400aa97240:  j0x400aa97040
0x400aa97244:  daddiu   v0,s6,-189

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 69 +--
 1 file changed, 59 insertions(+), 10 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 0ade890ade..c2f8d6550b 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -88,6 +88,11 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG TCG_REG_S7
 #endif
+#if TCG_TARGET_REG_BITS == 64
+#define TCG_REG_TB TCG_REG_S6
+#else
+#define TCG_REG_TB (qemu_build_not_reached(), TCG_REG_ZERO)
+#endif
 
 /* check if we really need so many registers :P */
 static const int tcg_target_reg_alloc_order[] = {
@@ -1895,27 +1900,61 @@ static void tcg_out_clz(TCGContext *s, MIPSInsn opcv2, 
MIPSInsn opcv6,
 
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t a0)
 {
-TCGReg b0 = TCG_REG_ZERO;
+TCGReg base = TCG_REG_ZERO;
+int16_t lo = 0;
 
-if (a0 & ~0x) {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_V0, a0 & ~0x);
-b0 = TCG_REG_V0;
+if (a0) {
+intptr_t ofs;
+if (TCG_TARGET_REG_BITS == 64) {
+ofs = tcg_tbrel_diff(s, (void *)a0);
+lo = ofs;
+if (ofs == lo) {
+base = TCG_REG_TB;
+} else {
+base = TCG_REG_V0;
+tcg_out_movi(s, TCG_TYPE_PTR, base, ofs - lo);
+tcg_out_opc_reg(s, ALIAS_PADD, base, base, TCG_REG_TB);
+}
+} else {
+ofs = a0;
+lo = ofs;
+base = TCG_REG_V0;
+tcg_out_movi(s, TCG_TYPE_PTR, base, ofs - lo);
+}
 }
 if (!tcg_out_opc_jmp(s, OPC_J, tb_ret_addr)) {
 tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, (uintptr_t)tb_ret_addr);
 tcg_out_opc_reg(s, OPC_JR, 0, TCG_TMP0, 0);
 }
-tcg_out_opc_imm(s, OPC_ORI, TCG_REG_V0, b0, a0 & 0x);
+/* delay slot */
+tcg_out_opc_imm(s, ALIAS_PADDI, TCG_REG_V0, base, lo);
 }
 
 static void tcg_out_goto_tb(TCGContext *s, int which)
 {
+intptr_t ofs = get_jmp_target_addr(s, which);
+TCGReg base, dest;
+
 /* indirect jump method */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP0, TCG_REG_ZERO,
-   get_jmp_target_addr(s, which));
-tcg_out_opc_reg(s, OPC_JR, 0, TCG_TMP0, 0);
+if (TCG_TARGET_REG_BITS == 64) {
+dest = TCG_REG_TB;
+base = TCG_REG_TB;
+ofs = tcg_tbrel_diff(s, (void *)ofs);
+} else {
+dest = TCG_TMP0;
+base = TCG_REG_ZERO;
+}
+tcg_out_ld(s, TCG_TYPE_PTR, dest, base, ofs);
+tcg_out_opc_reg(s, OPC_JR, 0, dest, 0);
+/* delay slot */
 tcg_out_nop(s);
+
 set_jmp_reset_offset(s, which);
+if (TCG_TARGET_REG_BITS == 64) {
+/* For the unlinked case, need to reset TCG_REG_TB. */
+tcg_out_ldst(s, ALIAS_PADDI, TCG_REG_TB, TCG_REG_TB,
+ -tcg_current_code_size(s));
+}
 }
 
 void tb_target_set_jmp_target(const TranslationBlock *tb, int n,
@@ -1946,7 +1985,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_goto_ptr:
 /* jmp to the given host address (could be epilogue) */
 tcg_out_opc_reg(s, OPC_JR, 0, a0, 0);
-tcg_out_nop(s);
+if (TCG_TARGET_REG_BITS == 64) {
+tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, a0);
+} else {
+tcg_out_nop(s);
+}
 break;
 case INDEX_op_br:
 tcg_out_brcond(s, TCG_COND_EQ, TCG_REG_ZERO, TCG_REG_ZERO,
@@ -2499,7 +2542,7 @@ static const int tcg_target_callee_save_regs[] = {
 TCG_REG_S3,
 TCG_REG_S4,
 TCG_REG_S5,
-TCG_REG_S6,
+TCG_REG_S6,   /* used for the tb base (TCG_REG_TB) */
 TCG_REG_S7,   /* used for guest_base */
 TCG_REG_S8,   /* used for the global env (TCG_AREG0) */
 TCG_REG_RA,   /* should be last for ABI compliance */
@@ -2627,6 +2670,9 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
 }
 #endif
+if (TCG_TARGET_REG_BITS == 64) {
+tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, tcg_target_call_iarg_regs[1]);
+}
 
 /* Call generated code */
 tcg_out_opc_reg(s, OPC_JR, 0, tcg_target_call_iarg_regs[1], 0);
@@ -2808,6 +2854,9 @@ static void tcg_target_init(TCGContext *s)
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_RA);   /* return address */
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);   /* stack 

[PATCH for-8.1 00/12] tcg/mips: Backend improvements

2023-04-07 Thread Richard Henderson
I've posted some of these before, perhaps a year or more ago, but
then failed to follow up and get them merged.

I don't think there are any real dependencies, but it has been
rebased upon today's load/store helpers patch set, so there might
be minor conflicts, therefore:

Based-on: 20230408024314.3357414-1-richard.hender...@linaro.org
("[PATCH for-8.1 00/42] tcg: Simplify calls to load/store helpers")


Richard Henderson (12):
  tcg/mips: Move TCG_AREG0 to S8
  tcg/mips: Move TCG_GUEST_BASE_REG to S7
  tcg/mips: Unify TCG_GUEST_BASE_REG tests
  tcg/mips: Create and use TCG_REG_TB
  tcg/mips: Split out tcg_out_movi_one
  tcg/mips: Split out tcg_out_movi_two
  tcg/mips: Use the constant pool for 64-bit constants
  tcg/mips: Aggressively use the constant pool for n64 calls
  tcg/mips: Try tb-relative addresses in tcg_out_movi
  tcg/mips: Try three insns with shift and add in tcg_out_movi
  tcg/mips: Use qemu_build_not_reached for LO/HI_OFF
  tcg/mips: Replace MIPS_BE with HOST_BIG_ENDIAN

 tcg/mips/tcg-target.h |   3 +-
 tcg/mips/tcg-target.c.inc | 350 --
 2 files changed, 260 insertions(+), 93 deletions(-)

-- 
2.34.1




[PATCH 33/42] tcg/mips: Reorg tcg_out_tlb_load

2023-04-07 Thread Richard Henderson
Compare the address vs the tlb entry with sign-extended values.
This simplifies the page+alignment mask constant, and the
generation of the last byte address for the misaligned test.

Move the tlb addend load up, and the zero-extension down.

This frees up a register, which allows us to drop the 'base'
parameter, with which the caller was giving us a 5th temporary.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 90 ---
 1 file changed, 46 insertions(+), 44 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 1206bda502..16b9d09959 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -370,6 +370,8 @@ typedef enum {
 ALIAS_PADDI= sizeof(void *) == 4 ? OPC_ADDIU : OPC_DADDIU,
 ALIAS_TSRL = TARGET_LONG_BITS == 32 || TCG_TARGET_REG_BITS == 32
  ? OPC_SRL : OPC_DSRL,
+ALIAS_TADDI= TARGET_LONG_BITS == 32 || TCG_TARGET_REG_BITS == 32
+ ? OPC_ADDIU : OPC_DADDIU,
 } MIPSInsn;
 
 /*
@@ -1121,12 +1123,12 @@ QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -32768);
 
 /*
  * Perform the tlb comparison operation.
- * The complete host address is placed in BASE.
  * Clobbers TMP0, TMP1, TMP2, TMP3.
+ * Returns the register containing the complete host address.
  */
-static void tcg_out_tlb_load(TCGContext *s, TCGReg base, TCGReg addrl,
- TCGReg addrh, MemOpIdx oi,
- tcg_insn_unit *label_ptr[2], bool is_load)
+static TCGReg tcg_out_tlb_load(TCGContext *s, TCGReg addrl, TCGReg addrh,
+   MemOpIdx oi, bool is_load,
+   tcg_insn_unit *label_ptr[2])
 {
 MemOp opc = get_memop(oi);
 unsigned a_bits = get_alignment_bits(opc);
@@ -1140,7 +1142,6 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg base, 
TCGReg addrl,
 int add_off = offsetof(CPUTLBEntry, addend);
 int cmp_off = (is_load ? offsetof(CPUTLBEntry, addr_read)
: offsetof(CPUTLBEntry, addr_write));
-target_ulong tlb_mask;
 
 /* Load tlb_mask[mmu_idx] and tlb_table[mmu_idx].  */
 tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP0, TCG_AREG0, mask_off);
@@ -1158,15 +1159,12 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg 
base, TCGReg addrl,
 if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
 tcg_out_ldst(s, OPC_LW, TCG_TMP0, TCG_TMP3, cmp_off + LO_OFF);
 } else {
-tcg_out_ldst(s, (TARGET_LONG_BITS == 64 ? OPC_LD
- : TCG_TARGET_REG_BITS == 64 ? OPC_LWU : OPC_LW),
- TCG_TMP0, TCG_TMP3, cmp_off);
+tcg_out_ld(s, TCG_TYPE_TL, TCG_TMP0, TCG_TMP3, cmp_off);
 }
 
-/* Zero extend a 32-bit guest address for a 64-bit host. */
-if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
-tcg_out_ext32u(s, base, addrl);
-addrl = base;
+if (TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
+/* Load the tlb addend for the fast path.  */
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP3, TCG_TMP3, add_off);
 }
 
 /*
@@ -1174,18 +1172,18 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg 
base, TCGReg addrl,
  * For unaligned accesses, compare against the end of the access to
  * verify that it does not cross a page boundary.
  */
-tlb_mask = (target_ulong)TARGET_PAGE_MASK | a_mask;
-tcg_out_movi(s, TCG_TYPE_I32, TCG_TMP1, tlb_mask);
-if (a_mask >= s_mask) {
-tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, addrl);
-} else {
-tcg_out_opc_imm(s, ALIAS_PADDI, TCG_TMP2, addrl, s_mask - a_mask);
+tcg_out_movi(s, TCG_TYPE_TL, TCG_TMP1, TARGET_PAGE_MASK | a_mask);
+if (a_mask < s_mask) {
+tcg_out_opc_imm(s, ALIAS_TADDI, TCG_TMP2, addrl, s_mask - a_mask);
 tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, TCG_TMP2);
+} else {
+tcg_out_opc_reg(s, OPC_AND, TCG_TMP1, TCG_TMP1, addrl);
 }
 
-if (TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
-/* Load the tlb addend for the fast path.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP2, TCG_TMP3, add_off);
+/* Zero extend a 32-bit guest address for a 64-bit host. */
+if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
+tcg_out_ext32u(s, TCG_TMP2, addrl);
+addrl = TCG_TMP2;
 }
 
 label_ptr[0] = s->code_ptr;
@@ -1197,14 +1195,15 @@ static void tcg_out_tlb_load(TCGContext *s, TCGReg 
base, TCGReg addrl,
 tcg_out_ldst(s, OPC_LW, TCG_TMP0, TCG_TMP3, cmp_off + HI_OFF);
 
 /* Load the tlb addend for the fast path.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP2, TCG_TMP3, add_off);
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_TMP3, TCG_TMP3, add_off);
 
 label_ptr[1] = s->code_ptr;
 tcg_out_opc_br(s, OPC_BNE, addrh, TCG_TMP0);
 }
 
 /* delay slot */
-tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_TMP2, addrl);
+tcg_out_opc_reg(s, ALIAS_PADD, TCG_TMP3, TCG_TMP3, addrl);
+return TCG_TMP3;
 }
 

[PATCH 39/42] tcg/s390x: Use ALGFR in constructing host address for qemu_ld/st

2023-04-07 Thread Richard Henderson
Rather than zero-extend the guest address into a register,
use an add instruction which zero-extends the second input.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target.c.inc | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 7d6cb30a06..b53eb70f24 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -149,6 +149,7 @@ typedef enum S390Opcode {
 RRE_ALGR= 0xb90a,
 RRE_ALCR= 0xb998,
 RRE_ALCGR   = 0xb988,
+RRE_ALGFR   = 0xb91a,
 RRE_CGR = 0xb920,
 RRE_CLGR= 0xb921,
 RRE_DLGR= 0xb987,
@@ -1716,8 +1717,10 @@ static void tcg_out_qemu_st_direct(TCGContext *s, MemOp 
opc, TCGReg data,
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -(1 << 19));
 
-/* Load and compare a TLB entry, leaving the flags set.  Loads the TLB
-   addend into R2.  Returns a register with the santitized guest address.  */
+/*
+ * Load and compare a TLB entry, leaving the flags set.
+ * Loads the TLB addend and returns the register.
+ */
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, MemOp opc,
int mem_index, bool is_ld)
 {
@@ -1761,12 +1764,7 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg 
addr_reg, MemOp opc,
 
 tcg_out_insn(s, RXY, LG, TCG_REG_R2, TCG_REG_R2, TCG_REG_NONE,
  offsetof(CPUTLBEntry, addend));
-
-if (TARGET_LONG_BITS == 32) {
-tcg_out_ext32u(s, TCG_REG_R3, addr_reg);
-return TCG_REG_R3;
-}
-return addr_reg;
+return TCG_REG_R2;
 }
 
 static void add_qemu_ldst_label(TCGContext *s, bool is_ld, MemOpIdx oi,
@@ -1888,16 +1886,20 @@ static void tcg_out_qemu_ld(TCGContext* s, TCGReg 
data_reg, TCGReg addr_reg,
 #ifdef CONFIG_SOFTMMU
 unsigned mem_index = get_mmuidx(oi);
 tcg_insn_unit *label_ptr;
-TCGReg base_reg;
+TCGReg addend;
 
-base_reg = tcg_out_tlb_read(s, addr_reg, opc, mem_index, 1);
+addend = tcg_out_tlb_read(s, addr_reg, opc, mem_index, 1);
 
 tcg_out16(s, RI_BRC | (S390_CC_NE << 4));
 label_ptr = s->code_ptr;
 s->code_ptr += 1;
 
-tcg_out_qemu_ld_direct(s, opc, data_reg, base_reg, TCG_REG_R2, 0);
-
+if (TARGET_LONG_BITS == 32) {
+tcg_out_insn(s, RRE, ALGFR, addend, addr_reg);
+tcg_out_qemu_ld_direct(s, opc, data_reg, addend, TCG_REG_NONE, 0);
+} else {
+tcg_out_qemu_ld_direct(s, opc, data_reg, addend, addr_reg, 0);
+}
 add_qemu_ldst_label(s, 1, oi, d_type, data_reg, addr_reg,
 s->code_ptr, label_ptr);
 #else
@@ -1920,16 +1922,20 @@ static void tcg_out_qemu_st(TCGContext* s, TCGReg 
data_reg, TCGReg addr_reg,
 #ifdef CONFIG_SOFTMMU
 unsigned mem_index = get_mmuidx(oi);
 tcg_insn_unit *label_ptr;
-TCGReg base_reg;
+TCGReg addend;
 
-base_reg = tcg_out_tlb_read(s, addr_reg, opc, mem_index, 0);
+addend = tcg_out_tlb_read(s, addr_reg, opc, mem_index, 0);
 
 tcg_out16(s, RI_BRC | (S390_CC_NE << 4));
 label_ptr = s->code_ptr;
 s->code_ptr += 1;
 
-tcg_out_qemu_st_direct(s, opc, data_reg, base_reg, TCG_REG_R2, 0);
-
+if (TARGET_LONG_BITS == 32) {
+tcg_out_insn(s, RRE, ALGFR, addend, addr_reg);
+tcg_out_qemu_st_direct(s, opc, data_reg, addend, TCG_REG_NONE, 0);
+} else {
+tcg_out_qemu_st_direct(s, opc, data_reg, addend, addr_reg, 0);
+}
 add_qemu_ldst_label(s, 0, oi, d_type, data_reg, addr_reg,
 s->code_ptr, label_ptr);
 #else
-- 
2.34.1




[PATCH 16/42] tcg: Introduce tcg_out_movext

2023-04-07 Thread Richard Henderson
This is common code in most qemu_{ld,st} slow paths, extending the
input value for the store helper data argument or extending the
return value from the load helper.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c| 59 
 tcg/aarch64/tcg-target.c.inc |  8 ++---
 tcg/arm/tcg-target.c.inc | 16 +++--
 tcg/i386/tcg-target.c.inc| 30 +++-
 tcg/loongarch64/tcg-target.c.inc | 53 +---
 tcg/ppc/tcg-target.c.inc | 38 ++--
 tcg/riscv/tcg-target.c.inc   | 13 ++-
 tcg/s390x/tcg-target.c.inc   | 19 ++
 tcg/sparc64/tcg-target.c.inc | 32 -
 9 files changed, 100 insertions(+), 168 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0188152c37..6fe7dd6564 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -352,6 +352,65 @@ void tcg_raise_tb_overflow(TCGContext *s)
 siglongjmp(s->jmp_trans, -2);
 }
 
+/**
+ * tcg_out_movext -- move and extend
+ * @s: tcg context
+ * @dst_type: integral type for destination
+ * @dst: destination register
+ * @src_type: integral type for source
+ * @src_ext: extension to apply to source
+ * @src: source register
+ *
+ * Move or extend @src into @dst, depending on @src_ext and the types.
+ */
+static void __attribute__((unused))
+tcg_out_movext(TCGContext *s, TCGType dst_type, TCGReg dst,
+   TCGType src_type, MemOp src_ext, TCGReg src)
+{
+switch (src_ext) {
+case MO_UB:
+tcg_out_ext8u(s, dst, src);
+break;
+case MO_SB:
+tcg_out_ext8s(s, dst_type, dst, src);
+break;
+case MO_UW:
+tcg_out_ext16u(s, dst, src);
+break;
+case MO_SW:
+tcg_out_ext16s(s, dst_type, dst, src);
+break;
+case MO_UL:
+case MO_SL:
+if (dst_type == TCG_TYPE_I32) {
+if (src_type == TCG_TYPE_I32) {
+tcg_out_mov(s, TCG_TYPE_I32, dst, src);
+} else {
+tcg_out_extrl_i64_i32(s, dst, src);
+}
+} else if (src_type == TCG_TYPE_I32) {
+if (src_ext & MO_SIGN) {
+tcg_out_exts_i32_i64(s, dst, src);
+} else {
+tcg_out_extu_i32_i64(s, dst, src);
+}
+} else {
+if (src_ext & MO_SIGN) {
+tcg_out_ext32s(s, dst, src);
+} else {
+tcg_out_ext32u(s, dst, src);
+}
+}
+break;
+case MO_UQ:
+tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
+tcg_out_mov(s, TCG_TYPE_I64, dst, src);
+break;
+default:
+g_assert_not_reached();
+}
+}
+
 #define C_PFX1(P, A)P##A
 #define C_PFX2(P, A, B) P##A##_##B
 #define C_PFX3(P, A, B, C)  P##A##_##B##_##C
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index bd1fab193e..29bc97ed1c 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1620,7 +1620,6 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 {
 MemOpIdx oi = lb->oi;
 MemOp opc = get_memop(oi);
-MemOp size = opc & MO_SIZE;
 
 if (!reloc_pc19(lb->label_ptr[0], tcg_splitwx_to_rx(s->code_ptr))) {
 return false;
@@ -1631,12 +1630,9 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, oi);
 tcg_out_adr(s, TCG_REG_X3, lb->raddr);
 tcg_out_call_int(s, qemu_ld_helpers[opc & MO_SIZE]);
-if (opc & MO_SIGN) {
-tcg_out_sxt(s, lb->type, size, lb->datalo_reg, TCG_REG_X0);
-} else {
-tcg_out_mov(s, size == MO_64, lb->datalo_reg, TCG_REG_X0);
-}
 
+tcg_out_movext(s, lb->type, lb->datalo_reg,
+   TCG_TYPE_REG, opc & MO_SSIZE, TCG_REG_X0);
 tcg_out_goto(s, lb->raddr);
 return true;
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 1820655ee3..f865294861 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1567,17 +1567,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 
 datalo = lb->datalo_reg;
 datahi = lb->datahi_reg;
-switch (opc & MO_SSIZE) {
-case MO_SB:
-tcg_out_ext8s(s, TCG_TYPE_I32, datalo, TCG_REG_R0);
-break;
-case MO_SW:
-tcg_out_ext16s(s, TCG_TYPE_I32, datalo, TCG_REG_R0);
-break;
-default:
-tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-break;
-case MO_UQ:
+if ((opc & MO_SIZE) == MO_64) {
 if (datalo != TCG_REG_R1) {
 tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
 tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
@@ -1589,7 +1579,9 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
 tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_TMP);
 

[PATCH 40/42] tcg/s390x: Simplify constraints on qemu_ld/st

2023-04-07 Thread Richard Henderson
Adjust the softmmu tlb to use R0+R1, not any of the normally available
registers.  Since we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target-con-set.h |  2 --
 tcg/s390x/tcg-target-con-str.h |  1 -
 tcg/s390x/tcg-target.c.inc | 36 --
 3 files changed, 12 insertions(+), 27 deletions(-)

diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 15f1c55103..ecc079bb6d 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -10,12 +10,10 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
-C_O0_I2(L, L)
 C_O0_I2(r, r)
 C_O0_I2(r, ri)
 C_O0_I2(r, rA)
 C_O0_I2(v, r)
-C_O1_I1(r, L)
 C_O1_I1(r, r)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
diff --git a/tcg/s390x/tcg-target-con-str.h b/tcg/s390x/tcg-target-con-str.h
index 6fa64a1ed6..25675b449e 100644
--- a/tcg/s390x/tcg-target-con-str.h
+++ b/tcg/s390x/tcg-target-con-str.h
@@ -9,7 +9,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)
 REGS('v', ALL_VECTOR_REGS)
 REGS('o', 0x) /* odd numbered general regs */
 
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index b53eb70f24..64033fb957 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -44,18 +44,6 @@
 #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 16)
 #define ALL_VECTOR_REGS  MAKE_64BIT_MASK(32, 32)
 
-/*
- * For softmmu, we need to avoid conflicts with the first 3
- * argument registers to perform the tlb lookup, and to call
- * the helper function.
- */
-#ifdef CONFIG_SOFTMMU
-#define SOFTMMU_RESERVE_REGS MAKE_64BIT_MASK(TCG_REG_R2, 3)
-#else
-#define SOFTMMU_RESERVE_REGS 0
-#endif
-
-
 /* Several places within the instruction set 0 means "no register"
rather than TCG_REG_R0.  */
 #define TCG_REG_NONE0
@@ -1734,10 +1722,10 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg 
addr_reg, MemOp opc,
 int ofs, a_off;
 uint64_t tlb_mask;
 
-tcg_out_sh64(s, RSY_SRLG, TCG_REG_R2, addr_reg, TCG_REG_NONE,
+tcg_out_sh64(s, RSY_SRLG, TCG_TMP0, addr_reg, TCG_REG_NONE,
  TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
-tcg_out_insn(s, RXY, NG, TCG_REG_R2, TCG_AREG0, TCG_REG_NONE, mask_off);
-tcg_out_insn(s, RXY, AG, TCG_REG_R2, TCG_AREG0, TCG_REG_NONE, table_off);
+tcg_out_insn(s, RXY, NG, TCG_TMP0, TCG_AREG0, TCG_REG_NONE, mask_off);
+tcg_out_insn(s, RXY, AG, TCG_TMP0, TCG_AREG0, TCG_REG_NONE, table_off);
 
 /* For aligned accesses, we check the first byte and include the alignment
bits within the address.  For unaligned access, we check that we don't
@@ -1745,10 +1733,10 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg 
addr_reg, MemOp opc,
 a_off = (a_bits >= s_bits ? 0 : s_mask - a_mask);
 tlb_mask = (uint64_t)TARGET_PAGE_MASK | a_mask;
 if (a_off == 0) {
-tgen_andi_risbg(s, TCG_REG_R3, addr_reg, tlb_mask);
+tgen_andi_risbg(s, TCG_REG_R0, addr_reg, tlb_mask);
 } else {
-tcg_out_insn(s, RX, LA, TCG_REG_R3, addr_reg, TCG_REG_NONE, a_off);
-tgen_andi(s, TCG_TYPE_TL, TCG_REG_R3, tlb_mask);
+tcg_out_insn(s, RX, LA, TCG_REG_R0, addr_reg, TCG_REG_NONE, a_off);
+tgen_andi(s, TCG_TYPE_TL, TCG_REG_R0, tlb_mask);
 }
 
 if (is_ld) {
@@ -1757,14 +1745,14 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg 
addr_reg, MemOp opc,
 ofs = offsetof(CPUTLBEntry, addr_write);
 }
 if (TARGET_LONG_BITS == 32) {
-tcg_out_insn(s, RX, C, TCG_REG_R3, TCG_REG_R2, TCG_REG_NONE, ofs);
+tcg_out_insn(s, RX, C, TCG_REG_R0, TCG_TMP0, TCG_REG_NONE, ofs);
 } else {
-tcg_out_insn(s, RXY, CG, TCG_REG_R3, TCG_REG_R2, TCG_REG_NONE, ofs);
+tcg_out_insn(s, RXY, CG, TCG_REG_R0, TCG_TMP0, TCG_REG_NONE, ofs);
 }
 
-tcg_out_insn(s, RXY, LG, TCG_REG_R2, TCG_REG_R2, TCG_REG_NONE,
+tcg_out_insn(s, RXY, LG, TCG_TMP0, TCG_TMP0, TCG_REG_NONE,
  offsetof(CPUTLBEntry, addend));
-return TCG_REG_R2;
+return TCG_TMP0;
 }
 
 static void add_qemu_ldst_label(TCGContext *s, bool is_ld, MemOpIdx oi,
@@ -3181,10 +3169,10 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_qemu_ld_i32:
 case INDEX_op_qemu_ld_i64:
-return C_O1_I1(r, L);
+return C_O1_I1(r, r);
 case INDEX_op_qemu_st_i64:
 case INDEX_op_qemu_st_i32:
-return C_O0_I2(L, L);
+return C_O0_I2(r, r);
 
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
-- 
2.34.1




[PATCH 31/42] tcg: Introduce tcg_out_st_helper_args

2023-04-07 Thread Richard Henderson
Centralize the logic to call the helper_stN_mmu functions.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c| 189 ++-
 tcg/aarch64/tcg-target.c.inc |  24 ++--
 tcg/arm/tcg-target.c.inc | 106 ++---
 tcg/i386/tcg-target.c.inc|  51 +
 tcg/loongarch64/tcg-target.c.inc |  11 +-
 tcg/mips/tcg-target.c.inc| 109 ++
 tcg/ppc/tcg-target.c.inc |  40 ++-
 tcg/riscv/tcg-target.c.inc   |  18 +--
 tcg/s390x/tcg-target.c.inc   |  15 +--
 9 files changed, 229 insertions(+), 334 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index e67b80aeeb..bd6676be69 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -184,6 +184,11 @@ static int tcg_out_ld_helper_args(TCGContext *s, const 
TCGLabelQemuLdst *l,
   void (*ra_gen)(TCGContext *s, TCGReg r),
   int ra_reg, int scratch_reg)
 __attribute__((unused));
+static int tcg_out_st_helper_args(TCGContext *s, const TCGLabelQemuLdst *l,
+  void (*ra_gen)(TCGContext *s, TCGReg r),
+  int ra_reg, int t1_reg,
+  int t2_reg, int t3_reg)
+__attribute__((unused));
 
 TCGContext tcg_init_ctx;
 __thread TCGContext *tcg_ctx;
@@ -5073,8 +5078,8 @@ static int tcg_out_helper_arg_ra(TCGContext *s, unsigned 
d_arg,
 }
 
 /*
- * Poor man's topological sort on 2 source+destination register pairs.
- * This is a simplified version of tcg_out_movext2 for 32-bit hosts.
+ * Poor man's topological sort on up to 4 source+destination register pairs.
+ * This first is a simplified version of tcg_out_movext2 for 32-bit hosts.
  */
 static void tcg_out_mov_32x2(TCGContext *s, TCGReg d1, TCGReg s1,
  TCGReg d2, TCGReg s2, int t1)
@@ -5098,6 +5103,67 @@ static void tcg_out_mov_32x2(TCGContext *s, TCGReg d1, 
TCGReg s1,
 tcg_out_mov(s, TCG_TYPE_I32, d1, s1);
 }
 
+static void tcg_out_mov_32x3(TCGContext *s, TCGReg d1, TCGReg s1,
+ TCGReg d2, TCGReg s2,
+ TCGReg d3, TCGReg s3, int t1, int t2)
+{
+tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+tcg_debug_assert(t2 >= 0);
+
+if (d1 != s2 && d1 != s3) {
+tcg_out_mov(s, TCG_TYPE_I32, d1, s1);
+tcg_out_mov_32x2(s, d3, s3, d2, s2, t1);
+return;
+}
+if (d2 != s1 && d2 != s3) {
+tcg_out_mov(s, TCG_TYPE_I32, d2, s2);
+tcg_out_mov_32x2(s, d1, s1, d3, s3, t1);
+return;
+}
+if (d3 != s1 && d3 != s2) {
+tcg_out_mov(s, TCG_TYPE_I32, d3, s3);
+tcg_out_mov_32x2(s, d1, s1, d2, s2, t1);
+return;
+}
+tcg_out_mov(s, TCG_TYPE_I32, t2, s3);
+tcg_out_mov_32x2(s, d1, s1, d2, s2, t1);
+tcg_out_mov(s, TCG_TYPE_I32, d3, t2);
+}
+
+static void tcg_out_mov_32x4(TCGContext *s, TCGReg d1, TCGReg s1,
+ TCGReg d2, TCGReg s2,
+ TCGReg d3, TCGReg s3,
+ TCGReg d4, TCGReg s4,
+ int t1, int t2, int t3)
+{
+tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+tcg_debug_assert(t3 >= 0);
+
+if (d1 != s2 && d1 != s3 && d1 != s4) {
+tcg_out_mov(s, TCG_TYPE_I32, d1, s1);
+tcg_out_mov_32x3(s, d4, s4, d2, s2, d3, s3, t1, t2);
+return;
+}
+if (d2 != s1 && d2 != s3 && d2 != s4) {
+tcg_out_mov(s, TCG_TYPE_I32, d2, s2);
+tcg_out_mov_32x3(s, d1, s1, d4, s4, d3, s3, t1, t2);
+return;
+}
+if (d3 != s1 && d3 != s2 && d3 != s4) {
+tcg_out_mov(s, TCG_TYPE_I32, d3, s3);
+tcg_out_mov_32x3(s, d1, s1, d2, s2, d4, s4, t1, t2);
+return;
+}
+if (d4 != s1 && d4 != s2 && d4 != s3) {
+tcg_out_mov(s, TCG_TYPE_I32, d4, s4);
+tcg_out_mov_32x3(s, d1, s1, d2, s2, d3, s3, t1, t2);
+return;
+}
+tcg_out_mov(s, TCG_TYPE_I32, t3, s4);
+tcg_out_mov_32x3(s, d1, s1, d2, s2, d3, s3, t1, t2);
+tcg_out_mov(s, TCG_TYPE_I32, d4, t3);
+}
+
 static void tcg_out_helper_arg_32x2(TCGContext *s, unsigned d_arg,
 TCGReg lo_reg, TCGReg hi_reg,
 int scratch_reg)
@@ -5160,6 +5226,125 @@ static int tcg_out_ld_helper_args(TCGContext *s, const 
TCGLabelQemuLdst *l,
  (uintptr_t)l->raddr, scratch_reg);
 }
 
+static int tcg_out_st_helper_args(TCGContext *s, const TCGLabelQemuLdst *l,
+  void (*ra_gen)(TCGContext *s, TCGReg r),
+  int ra_reg, int t1_reg,
+  int t2_reg, int t3_reg)
+{
+MemOp size = get_memop(l->oi) & MO_SIZE;
+/* These are the types of the helper_stX_mmu 'addr' and 'val' arguments. */
+TCGType a_type = TARGET_LONG_BITS == 32 ? TCG_TYPE_I32 : TCG_TYPE_I64;
+TCGType d_type = size == MO_64 ? TCG_TYPE_I64 

[PATCH 09/42] tcg: Split out tcg_out_exts_i32_i64

2023-04-07 Thread Richard Henderson
We will need a backend interface for type extension with sign.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c| 4 
 tcg/aarch64/tcg-target.c.inc | 9 ++---
 tcg/arm/tcg-target.c.inc | 5 +
 tcg/i386/tcg-target.c.inc| 9 ++---
 tcg/loongarch64/tcg-target.c.inc | 7 ++-
 tcg/mips/tcg-target.c.inc| 7 ++-
 tcg/ppc/tcg-target.c.inc | 9 ++---
 tcg/riscv/tcg-target.c.inc   | 7 ++-
 tcg/s390x/tcg-target.c.inc   | 9 ++---
 tcg/sparc64/tcg-target.c.inc | 9 ++---
 tcg/tci/tcg-target.c.inc | 7 ++-
 11 files changed, 63 insertions(+), 19 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index a182771c01..b0498170ea 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -111,6 +111,7 @@ static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg 
arg);
 static void tcg_out_ext16u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_ext32s(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_ext32u(TCGContext *s, TCGReg ret, TCGReg arg);
+static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
@@ -4529,6 +4530,9 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 case INDEX_op_ext32u_i64:
 tcg_out_ext32u(s, new_args[0], new_args[1]);
 break;
+case INDEX_op_ext_i32_i64:
+tcg_out_exts_i32_i64(s, new_args[0], new_args[1]);
+break;
 default:
 if (def->flags & TCG_OPF_VECTOR) {
 tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index bca5f03dfb..58596eaa4b 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1434,6 +1434,11 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg rd, 
TCGReg rn)
 tcg_out_sxt(s, TCG_TYPE_I64, MO_32, rd, rn);
 }
 
+static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_ext32s(s, rd, rn);
+}
+
 static inline void tcg_out_uxt(TCGContext *s, MemOp s_bits,
TCGReg rd, TCGReg rn)
 {
@@ -2260,9 +2265,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
-case INDEX_op_ext_i32_i64:
-tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
-break;
 case INDEX_op_extu_i32_i64:
 tcg_out_ext32u(s, a0, a1);
 break;
@@ -2332,6 +2334,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext16u_i32:
 case INDEX_op_ext32s_i64:
 case INDEX_op_ext32u_i64:
+case INDEX_op_ext_i32_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 5c48b92f83..2ca25a3d81 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1003,6 +1003,11 @@ static void tcg_out_ext32u(TCGContext *s, TCGReg rd, 
TCGReg rn)
 g_assert_not_reached();
 }
 
+static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+g_assert_not_reached();
+}
+
 static void tcg_out_bswap16(TCGContext *s, ARMCond cond,
 TCGReg rd, TCGReg rn, int flags)
 {
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 1e9f61dbf3..df7c2409cd 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1299,6 +1299,11 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg dest, 
TCGReg src)
 tcg_out_modrm(s, OPC_MOVSLQ, dest, src);
 }
 
+static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_ext32s(s, dest, src);
+}
+
 static inline void tcg_out_bswap64(TCGContext *s, int reg)
 {
 tcg_out_opc(s, OPC_BSWAP + P_REXW + LOWREGMASK(reg), 0, reg, 0);
@@ -2757,9 +2762,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_extrl_i64_i32:
 tcg_out_ext32u(s, a0, a1);
 break;
-case INDEX_op_ext_i32_i64:
-tcg_out_ext32s(s, a0, a1);
-break;
 case INDEX_op_extrh_i64_i32:
 tcg_out_shifti(s, SHIFT_SHR + P_REXW, a0, 32);
 break;
@@ -2838,6 +2840,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_ext16u_i64:
 case INDEX_op_ext32s_i64:
 case INDEX_op_ext32u_i64:
+case INDEX_op_ext_i32_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index d2511eda7a..989632e08a 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -456,6 +456,11 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg ret, 
TCGReg arg)
 tcg_out_opc_addi_w(s, ret, arg, 0);
 }
 
+static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+

[PATCH 23/42] tcg/arm: Use TCGType not bool is_64 in tcg_out_qemu_{ld, st}

2023-04-07 Thread Richard Henderson
We need to set this in TCGLabelQemuLdst, so plumb this
all the way through from tcg_out_op.

Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.c.inc | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index bad1e6d399..9bf831223a 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1526,15 +1526,17 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg 
addrlo, TCGReg addrhi,
 /* Record the context of a call to the out of line helper code for the slow
path for a load or store, so that we can later generate the correct
helper code.  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, MemOpIdx oi,
-TCGReg datalo, TCGReg datahi, TCGReg addrlo,
-TCGReg addrhi, tcg_insn_unit *raddr,
+static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGType type,
+MemOpIdx oi, TCGReg datalo, TCGReg datahi,
+TCGReg addrlo, TCGReg addrhi,
+tcg_insn_unit *raddr,
 tcg_insn_unit *label_ptr)
 {
 TCGLabelQemuLdst *label = new_ldst_label(s);
 
 label->is_ld = is_ld;
 label->oi = oi;
+label->type = type;
 label->datalo_reg = datalo;
 label->datahi_reg = datahi;
 label->addrlo_reg = addrlo;
@@ -1788,7 +1790,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, MemOp 
opc, TCGReg datalo,
 }
 #endif
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGType d_type)
 {
 TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
 MemOpIdx oi;
@@ -1802,7 +1804,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 #endif
 
 datalo = *args++;
-datahi = (is64 ? *args++ : 0);
+datahi = (d_type == TCG_TYPE_I32 ? 0 : *args++);
 addrlo = *args++;
 addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0);
 oi = *args++;
@@ -1819,7 +1821,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 
 tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, addend, true);
 
-add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
+add_qemu_ldst_label(s, true, oi, d_type, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
 a_bits = get_alignment_bits(opc);
@@ -1910,7 +1912,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, MemOp 
opc, TCGReg datalo,
 }
 #endif
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGType d_type)
 {
 TCGReg addrlo, datalo, datahi, addrhi __attribute__((unused));
 MemOpIdx oi;
@@ -1924,7 +1926,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 #endif
 
 datalo = *args++;
-datahi = (is64 ? *args++ : 0);
+datahi = (d_type == TCG_TYPE_I32 ? 0 : *args++);
 addrlo = *args++;
 addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0);
 oi = *args++;
@@ -1941,7 +1943,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 label_ptr = s->code_ptr;
 tcg_out_bl_imm(s, COND_NE, 0);
 
-add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
+add_qemu_ldst_label(s, false, oi, d_type, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
 a_bits = get_alignment_bits(opc);
@@ -2237,16 +2239,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_qemu_ld_i32:
-tcg_out_qemu_ld(s, args, 0);
+tcg_out_qemu_ld(s, args, TCG_TYPE_I32);
 break;
 case INDEX_op_qemu_ld_i64:
-tcg_out_qemu_ld(s, args, 1);
+tcg_out_qemu_ld(s, args, TCG_TYPE_I64);
 break;
 case INDEX_op_qemu_st_i32:
-tcg_out_qemu_st(s, args, 0);
+tcg_out_qemu_st(s, args, TCG_TYPE_I32);
 break;
 case INDEX_op_qemu_st_i64:
-tcg_out_qemu_st(s, args, 1);
+tcg_out_qemu_st(s, args, TCG_TYPE_I64);
 break;
 
 case INDEX_op_bswap16_i32:
-- 
2.34.1




[PATCH 08/42] tcg: Split out tcg_out_ext32u

2023-04-07 Thread Richard Henderson
We will need a backend interface for performing 32-bit zero-extend.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  4 
 tcg/aarch64/tcg-target.c.inc |  9 +++--
 tcg/arm/tcg-target.c.inc |  5 +
 tcg/i386/tcg-target.c.inc|  4 ++--
 tcg/loongarch64/tcg-target.c.inc |  2 +-
 tcg/mips/tcg-target.c.inc|  3 ++-
 tcg/ppc/tcg-target.c.inc |  4 +++-
 tcg/riscv/tcg-target.c.inc   |  2 +-
 tcg/s390x/tcg-target.c.inc   | 20 ++--
 tcg/sparc64/tcg-target.c.inc | 17 +++--
 tcg/tci/tcg-target.c.inc |  9 -
 11 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 84aa8d639e..a182771c01 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -110,6 +110,7 @@ static void tcg_out_ext16s(TCGContext *s, TCGType type, 
TCGReg ret, TCGReg arg);
 static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_ext16u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_ext32s(TCGContext *s, TCGReg ret, TCGReg arg);
+static void tcg_out_ext32u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
@@ -4525,6 +4526,9 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 case INDEX_op_ext32s_i64:
 tcg_out_ext32s(s, new_args[0], new_args[1]);
 break;
+case INDEX_op_ext32u_i64:
+tcg_out_ext32u(s, new_args[0], new_args[1]);
+break;
 default:
 if (def->flags & TCG_OPF_VECTOR) {
 tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index d7964734c3..bca5f03dfb 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1452,6 +1452,11 @@ static void tcg_out_ext16u(TCGContext *s, TCGReg rd, 
TCGReg rn)
 tcg_out_uxt(s, MO_16, rd, rn);
 }
 
+static void tcg_out_ext32u(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_movr(s, TCG_TYPE_I32, rd, rn);
+}
+
 static void tcg_out_addsubi(TCGContext *s, int ext, TCGReg rd,
 TCGReg rn, int64_t aimm)
 {
@@ -2259,8 +2264,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
 break;
 case INDEX_op_extu_i32_i64:
-case INDEX_op_ext32u_i64:
-tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
+tcg_out_ext32u(s, a0, a1);
 break;
 
 case INDEX_op_deposit_i64:
@@ -2327,6 +2331,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext16u_i64:
 case INDEX_op_ext16u_i32:
 case INDEX_op_ext32s_i64:
+case INDEX_op_ext32u_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 401769bdd6..5c48b92f83 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -998,6 +998,11 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg rd, 
TCGReg rn)
 g_assert_not_reached();
 }
 
+static void tcg_out_ext32u(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+g_assert_not_reached();
+}
+
 static void tcg_out_bswap16(TCGContext *s, ARMCond cond,
 TCGReg rd, TCGReg rn, int flags)
 {
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 8bb747b81d..1e9f61dbf3 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1287,7 +1287,7 @@ static void tcg_out_ext16s(TCGContext *s, TCGType type, 
TCGReg dest, TCGReg src)
 tcg_out_modrm(s, OPC_MOVSWL + rexw, dest, src);
 }
 
-static inline void tcg_out_ext32u(TCGContext *s, int dest, int src)
+static void tcg_out_ext32u(TCGContext *s, TCGReg dest, TCGReg src)
 {
 /* 32-bit mov zero extends.  */
 tcg_out_modrm(s, OPC_MOVL_GvEv, dest, src);
@@ -2754,7 +2754,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_bswap64(s, a0);
 break;
 case INDEX_op_extu_i32_i64:
-case INDEX_op_ext32u_i64:
 case INDEX_op_extrl_i64_i32:
 tcg_out_ext32u(s, a0, a1);
 break;
@@ -2838,6 +2837,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_ext16u_i32:
 case INDEX_op_ext16u_i64:
 case INDEX_op_ext32s_i64:
+case INDEX_op_ext32u_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 037474510c..d2511eda7a 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1246,7 +1246,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_brcond(s, a2, a0, a1, arg_label(args[3]));
 break;
 
-case INDEX_op_ext32u_i64:
 case INDEX_op_extu_i32_i64:
 tcg_out_ext32u(s, 

[PATCH 18/42] tcg: Introduce tcg_out_movext2

2023-04-07 Thread Richard Henderson
This is common code in most qemu_{ld,st} slow paths, moving two
registers when there may be overlap between sources and destinations.
At present, this is only used by 32-bit hosts for 64-bit data,
but will shortly be used for more than that.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 50 +++
 tcg/arm/tcg-target.c.inc  | 34 +++---
 tcg/i386/tcg-target.c.inc | 16 -
 3 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index d82d99e1b0..1c11f15bce 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -115,8 +115,7 @@ static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg ret, 
TCGReg arg);
 static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
-static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
-__attribute__((unused));
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
@@ -365,9 +364,8 @@ void tcg_raise_tb_overflow(TCGContext *s)
  *
  * Move or extend @src into @dst, depending on @src_ext and the types.
  */
-static void __attribute__((unused))
-tcg_out_movext(TCGContext *s, TCGType dst_type, TCGReg dst,
-   TCGType src_type, MemOp src_ext, TCGReg src)
+static void tcg_out_movext(TCGContext *s, TCGType dst_type, TCGReg dst,
+   TCGType src_type, MemOp src_ext, TCGReg src)
 {
 switch (src_ext) {
 case MO_UB:
@@ -413,6 +411,48 @@ tcg_out_movext(TCGContext *s, TCGType dst_type, TCGReg dst,
 }
 }
 
+/**
+ * tcg_out_movext2 -- move and extend two pair
+ * @s: tcg context
+ * @d1_type: integral type for destination
+ * @d1: destination register
+ * @s1_type: integral type for source
+ * @s1_ext: extension to apply to source
+ * @s1: source register
+ * @d2_type: integral type for destination
+ * @d2: destination register
+ * @s2_type: integral type for source
+ * @s2_ext: extension to apply to source
+ * @s2: source register
+ * @scratch: temporary register, or -1 for none
+ *
+ * As tcg_out_movext, for both s1->d1 and s2->d2, caring for overlap
+ * between the sources and destinations.
+ */
+static void __attribute__((unused))
+tcg_out_movext2(TCGContext *s, TCGType d1_type, TCGReg d1, TCGType s1_type,
+MemOp s1_ext, TCGReg s1, TCGType d2_type, TCGReg d2,
+TCGType s2_type, MemOp s2_ext, TCGReg s2, int scratch)
+{
+if (d1 != s2) {
+tcg_out_movext(s, d1_type, d1, s1_type, s1_ext, s1);
+tcg_out_movext(s, d2_type, d2, s2_type, s2_ext, s2);
+return;
+}
+if (d2 == s1) {
+if (tcg_out_xchg(s, MAX(s1_type, s2_type), s1, s2)) {
+/* The data is now in the correct registers, now extend. */
+s1 = d1, s2 = d2;
+} else {
+tcg_debug_assert(scratch >= 0);
+tcg_out_mov(s, s1_type, scratch, s1);
+s1 = scratch;
+}
+}
+tcg_out_movext(s, d2_type, d2, s2_type, s2_ext, s2);
+tcg_out_movext(s, d1_type, d1, s1_type, s1_ext, s1);
+}
+
 #define C_PFX1(P, A)P##A
 #define C_PFX2(P, A, B) P##A##_##B
 #define C_PFX3(P, A, B, C)  P##A##_##B##_##C
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 4a5d57a41c..bad1e6d399 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1545,7 +1545,7 @@ static void add_qemu_ldst_label(TCGContext *s, bool 
is_ld, MemOpIdx oi,
 
 static bool tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-TCGReg argreg, datalo, datahi;
+TCGReg argreg;
 MemOpIdx oi = lb->oi;
 MemOp opc = get_memop(oi);
 
@@ -1565,20 +1565,11 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 /* Use the canonical unsigned helpers and minimize icache usage. */
 tcg_out_call_int(s, qemu_ld_helpers[opc & MO_SIZE]);
 
-datalo = lb->datalo_reg;
-datahi = lb->datahi_reg;
 if ((opc & MO_SIZE) == MO_64) {
-if (datalo != TCG_REG_R1) {
-tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-} else if (datahi != TCG_REG_R0) {
-tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
-} else {
-tcg_out_mov_reg(s, COND_AL, TCG_REG_TMP, TCG_REG_R0);
-tcg_out_mov_reg(s, COND_AL, datahi, TCG_REG_R1);
-tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_TMP);
-}
+tcg_out_movext2(s, TCG_TYPE_I32, lb->datalo_reg,
+TCG_TYPE_I32, MO_UL, TCG_REG_R0,
+   

[PATCH 14/42] tcg/i386: Conditionalize tcg_out_extu_i32_i64

2023-04-07 Thread Richard Henderson
Since TCG_TYPE_I32 values are kept zero-extended in registers, via
omission of the REXW bit, we need not extend if the register matches.
This is already relied upon by qemu_{ld,st}.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 818e7cbc3d..71a2bff234 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1306,7 +1306,9 @@ static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg 
dest, TCGReg src)
 
 static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg dest, TCGReg src)
 {
-tcg_out_ext32u(s, dest, src);
+if (dest != src) {
+tcg_out_ext32u(s, dest, src);
+}
 }
 
 static inline void tcg_out_bswap64(TCGContext *s, int reg)
-- 
2.34.1




[PATCH 22/42] tcg/aarch64: Pass TGType to tcg_out_qemu_st

2023-04-07 Thread Richard Henderson
This evens out the interface to match tcg_out_qemu_ld,
and makes the argument to add_qemu_ldst_label less obscure.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 40122e1471..f8d3ef4714 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1884,7 +1884,7 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg 
data_reg, TCGReg addr_reg,
 }
 
 static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
-MemOpIdx oi)
+MemOpIdx oi, TCGType d_type)
 {
 MemOp memop = get_memop(oi);
 const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
@@ -1899,8 +1899,8 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg 
data_reg, TCGReg addr_reg,
 tcg_out_tlb_read(s, addr_reg, memop, _ptr, mem_index, 0);
 tcg_out_qemu_st_direct(s, memop, data_reg,
TCG_REG_X1, otype, addr_reg);
-add_qemu_ldst_label(s, false, oi, (memop & MO_SIZE)== MO_64,
-data_reg, addr_reg, s->code_ptr, label_ptr);
+add_qemu_ldst_label(s, false, oi, d_type, data_reg, addr_reg,
+s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
 unsigned a_bits = get_alignment_bits(memop);
 if (a_bits) {
@@ -2249,7 +2249,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 case INDEX_op_qemu_st_i32:
 case INDEX_op_qemu_st_i64:
-tcg_out_qemu_st(s, REG0(0), a1, a2);
+tcg_out_qemu_st(s, REG0(0), a1, a2, ext);
 break;
 
 case INDEX_op_bswap64_i64:
-- 
2.34.1




[PATCH 25/42] tcg/ppc: Use TCGType not bool is_64 in tcg_out_qemu_{ld, st}

2023-04-07 Thread Richard Henderson
We need to set this in TCGLabelQemuLdst, so plumb this
all the way through from tcg_out_op.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index d3e547998f..7c33404bd6 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2117,7 +2117,8 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, MemOp opc,
 /* Record the context of a call to the out of line helper code for the slow
path for a load or store, so that we can later generate the correct
helper code.  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, MemOpIdx oi,
+static void add_qemu_ldst_label(TCGContext *s, bool is_ld,
+TCGType type, MemOpIdx oi,
 TCGReg datalo_reg, TCGReg datahi_reg,
 TCGReg addrlo_reg, TCGReg addrhi_reg,
 tcg_insn_unit *raddr, tcg_insn_unit *lptr)
@@ -2125,6 +2126,7 @@ static void add_qemu_ldst_label(TCGContext *s, bool 
is_ld, MemOpIdx oi,
 TCGLabelQemuLdst *label = new_ldst_label(s);
 
 label->is_ld = is_ld;
+label->type = type;
 label->oi = oi;
 label->datalo_reg = datalo_reg;
 label->datahi_reg = datahi_reg;
@@ -2287,7 +2289,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 
 #endif /* SOFTMMU */
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGType d_type)
 {
 TCGReg datalo, datahi, addrlo, rbase;
 TCGReg addrhi __attribute__((unused));
@@ -2301,7 +2303,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is_64)
 #endif
 
 datalo = *args++;
-datahi = (TCG_TARGET_REG_BITS == 32 && is_64 ? *args++ : 0);
+datahi = TCG_TARGET_REG_BITS == 64 || d_type == TCG_TYPE_I32 ? 0 : *args++;
 addrlo = *args++;
 addrhi = (TCG_TARGET_REG_BITS < TARGET_LONG_BITS ? *args++ : 0);
 oi = *args++;
@@ -2363,12 +2365,12 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is_64)
 }
 
 #ifdef CONFIG_SOFTMMU
-add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
+add_qemu_ldst_label(s, true, d_type, oi, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #endif
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGType d_type)
 {
 TCGReg datalo, datahi, addrlo, rbase;
 TCGReg addrhi __attribute__((unused));
@@ -2382,7 +2384,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is_64)
 #endif
 
 datalo = *args++;
-datahi = (TCG_TARGET_REG_BITS == 32 && is_64 ? *args++ : 0);
+datahi = TCG_TARGET_REG_BITS == 64 || d_type == TCG_TYPE_I32 ? 0 : *args++;
 addrlo = *args++;
 addrhi = (TCG_TARGET_REG_BITS < TARGET_LONG_BITS ? *args++ : 0);
 oi = *args++;
@@ -2436,7 +2438,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is_64)
 }
 
 #ifdef CONFIG_SOFTMMU
-add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
+add_qemu_ldst_label(s, false, d_type, oi, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #endif
 }
@@ -2971,16 +2973,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_qemu_ld_i32:
-tcg_out_qemu_ld(s, args, false);
+tcg_out_qemu_ld(s, args, TCG_TYPE_I32);
 break;
 case INDEX_op_qemu_ld_i64:
-tcg_out_qemu_ld(s, args, true);
+tcg_out_qemu_ld(s, args, TCG_TYPE_I64);
 break;
 case INDEX_op_qemu_st_i32:
-tcg_out_qemu_st(s, args, false);
+tcg_out_qemu_st(s, args, TCG_TYPE_I32);
 break;
 case INDEX_op_qemu_st_i64:
-tcg_out_qemu_st(s, args, true);
+tcg_out_qemu_st(s, args, TCG_TYPE_I64);
 break;
 
 case INDEX_op_setcond_i32:
-- 
2.34.1




[PATCH 13/42] tcg: Split out tcg_out_extu_i32_i64

2023-04-07 Thread Richard Henderson
We will need a backend interface for type extension with zero.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  4 
 tcg/aarch64/tcg-target.c.inc | 10 ++
 tcg/arm/tcg-target.c.inc |  5 +
 tcg/i386/tcg-target.c.inc|  7 ++-
 tcg/loongarch64/tcg-target.c.inc | 10 ++
 tcg/mips/tcg-target.c.inc|  9 ++---
 tcg/ppc/tcg-target.c.inc | 10 ++
 tcg/riscv/tcg-target.c.inc   | 10 ++
 tcg/s390x/tcg-target.c.inc   | 10 ++
 tcg/sparc64/tcg-target.c.inc |  9 ++---
 tcg/tci/tcg-target.c.inc |  7 ++-
 11 files changed, 63 insertions(+), 28 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index b0498170ea..17bd6d4581 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -112,6 +112,7 @@ static void tcg_out_ext16u(TCGContext *s, TCGReg ret, 
TCGReg arg);
 static void tcg_out_ext32s(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_ext32u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg);
+static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
@@ -4533,6 +4534,9 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 case INDEX_op_ext_i32_i64:
 tcg_out_exts_i32_i64(s, new_args[0], new_args[1]);
 break;
+case INDEX_op_extu_i32_i64:
+tcg_out_extu_i32_i64(s, new_args[0], new_args[1]);
+break;
 default:
 if (def->flags & TCG_OPF_VECTOR) {
 tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 58596eaa4b..ca8b25865b 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1462,6 +1462,11 @@ static void tcg_out_ext32u(TCGContext *s, TCGReg rd, 
TCGReg rn)
 tcg_out_movr(s, TCG_TYPE_I32, rd, rn);
 }
 
+static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_ext32u(s, rd, rn);
+}
+
 static void tcg_out_addsubi(TCGContext *s, int ext, TCGReg rd,
 TCGReg rn, int64_t aimm)
 {
@@ -2265,10 +2270,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
-case INDEX_op_extu_i32_i64:
-tcg_out_ext32u(s, a0, a1);
-break;
-
 case INDEX_op_deposit_i64:
 case INDEX_op_deposit_i32:
 tcg_out_dep(s, ext, a0, REG0(2), args[3], args[4]);
@@ -2335,6 +2336,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext32s_i64:
 case INDEX_op_ext32u_i64:
 case INDEX_op_ext_i32_i64:
+case INDEX_op_extu_i32_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 2ca25a3d81..2135616e12 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1008,6 +1008,11 @@ static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg 
rd, TCGReg rn)
 g_assert_not_reached();
 }
 
+static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+g_assert_not_reached();
+}
+
 static void tcg_out_bswap16(TCGContext *s, ARMCond cond,
 TCGReg rd, TCGReg rn, int flags)
 {
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index df7c2409cd..818e7cbc3d 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1304,6 +1304,11 @@ static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg 
dest, TCGReg src)
 tcg_out_ext32s(s, dest, src);
 }
 
+static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_ext32u(s, dest, src);
+}
+
 static inline void tcg_out_bswap64(TCGContext *s, int reg)
 {
 tcg_out_opc(s, OPC_BSWAP + P_REXW + LOWREGMASK(reg), 0, reg, 0);
@@ -2758,7 +2763,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_bswap64_i64:
 tcg_out_bswap64(s, a0);
 break;
-case INDEX_op_extu_i32_i64:
 case INDEX_op_extrl_i64_i32:
 tcg_out_ext32u(s, a0, a1);
 break;
@@ -2841,6 +2845,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_ext32s_i64:
 case INDEX_op_ext32u_i64:
 case INDEX_op_ext_i32_i64:
+case INDEX_op_extu_i32_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index b2146988be..d83bd9de49 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -463,6 +463,11 @@ static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg 
ret, TCGReg arg)
 }
 }
 
+static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+tcg_out_ext32u(s, ret, arg);
+}
+
 

[PATCH 10/42] tcg/loongarch64: Conditionalize tcg_out_exts_i32_i64

2023-04-07 Thread Richard Henderson
Since TCG_TYPE_I32 values are kept sign-extended in registers,
via ".w" instructions, we need not extend if the register matches.
This is already relied upon by comparisons.

Signed-off-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target.c.inc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 989632e08a..b2146988be 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -458,7 +458,9 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg ret, 
TCGReg arg)
 
 static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg)
 {
-tcg_out_ext32s(s, ret, arg);
+if (ret != arg) {
+tcg_out_ext32s(s, ret, arg);
+}
 }
 
 static void tcg_out_clzctz(TCGContext *s, LoongArchInsn opc,
-- 
2.34.1




[PATCH 32/42] tcg/loongarch64: Simplify constraints on qemu_ld/st

2023-04-07 Thread Richard Henderson
The softmmu tlb uses TCG_REG_TMP[0-2], not any of the normally available
registers.  Now that we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.

Signed-off-by: Richard Henderson 
---
 tcg/loongarch64/tcg-target-con-set.h |  2 --
 tcg/loongarch64/tcg-target-con-str.h |  1 -
 tcg/loongarch64/tcg-target.c.inc | 23 ---
 3 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h 
b/tcg/loongarch64/tcg-target-con-set.h
index 172c107289..c2bde44613 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -17,9 +17,7 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
-C_O0_I2(LZ, L)
 C_O1_I1(r, r)
-C_O1_I1(r, L)
 C_O1_I2(r, r, rC)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/loongarch64/tcg-target-con-str.h 
b/tcg/loongarch64/tcg-target-con-str.h
index 541ff47fa9..6e9ccca3ad 100644
--- a/tcg/loongarch64/tcg-target-con-str.h
+++ b/tcg/loongarch64/tcg-target-con-str.h
@@ -14,7 +14,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index fb092330d4..d5063b035d 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -133,18 +133,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_C12   0x1000
 #define TCG_CT_CONST_WSZ   0x2000
 
-#define ALL_GENERAL_REGS  MAKE_64BIT_MASK(0, 32)
-/*
- * For softmmu, we need to avoid conflicts with the first 5
- * argument registers to call the helper.  Some of these are
- * also used for the tlb lookup.
- */
-#ifdef CONFIG_SOFTMMU
-#define SOFTMMU_RESERVE_REGS  MAKE_64BIT_MASK(TCG_REG_A0, 5)
-#else
-#define SOFTMMU_RESERVE_REGS  0
-#endif
-
+#define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 
 static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
 {
@@ -1599,16 +1588,14 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 case INDEX_op_st32_i64:
 case INDEX_op_st_i32:
 case INDEX_op_st_i64:
+case INDEX_op_qemu_st_i32:
+case INDEX_op_qemu_st_i64:
 return C_O0_I2(rZ, r);
 
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
 return C_O0_I2(rZ, rZ);
 
-case INDEX_op_qemu_st_i32:
-case INDEX_op_qemu_st_i64:
-return C_O0_I2(LZ, L);
-
 case INDEX_op_ext8s_i32:
 case INDEX_op_ext8s_i64:
 case INDEX_op_ext8u_i32:
@@ -1644,11 +1631,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_ld32u_i64:
 case INDEX_op_ld_i32:
 case INDEX_op_ld_i64:
-return C_O1_I1(r, r);
-
 case INDEX_op_qemu_ld_i32:
 case INDEX_op_qemu_ld_i64:
-return C_O1_I1(r, L);
+return C_O1_I1(r, r);
 
 case INDEX_op_andc_i32:
 case INDEX_op_andc_i64:
-- 
2.34.1




[PATCH 15/42] tcg: Split out tcg_out_extrl_i64_i32

2023-04-07 Thread Richard Henderson
We will need a backend interface for type truncation.  For those backends
that did not enable TCG_TARGET_HAS_extrl_i64_i32, use tcg_out_mov.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  4 
 tcg/aarch64/tcg-target.c.inc |  6 ++
 tcg/arm/tcg-target.c.inc |  5 +
 tcg/i386/tcg-target.c.inc|  9 ++---
 tcg/loongarch64/tcg-target.c.inc | 10 ++
 tcg/mips/tcg-target.c.inc|  9 ++---
 tcg/ppc/tcg-target.c.inc |  7 +++
 tcg/riscv/tcg-target.c.inc   | 10 ++
 tcg/s390x/tcg-target.c.inc   |  6 ++
 tcg/sparc64/tcg-target.c.inc |  9 ++---
 tcg/tci/tcg-target.c.inc |  7 +++
 11 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 17bd6d4581..0188152c37 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -113,6 +113,7 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg ret, 
TCGReg arg);
 static void tcg_out_ext32u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg);
+static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
@@ -4537,6 +4538,9 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 case INDEX_op_extu_i32_i64:
 tcg_out_extu_i32_i64(s, new_args[0], new_args[1]);
 break;
+case INDEX_op_extrl_i64_i32:
+tcg_out_extrl_i64_i32(s, new_args[0], new_args[1]);
+break;
 default:
 if (def->flags & TCG_OPF_VECTOR) {
 tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index ca8b25865b..bd1fab193e 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1467,6 +1467,11 @@ static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg 
rd, TCGReg rn)
 tcg_out_ext32u(s, rd, rn);
 }
 
+static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_mov(s, TCG_TYPE_I32, rd, rn);
+}
+
 static void tcg_out_addsubi(TCGContext *s, int ext, TCGReg rd,
 TCGReg rn, int64_t aimm)
 {
@@ -2337,6 +2342,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext32u_i64:
 case INDEX_op_ext_i32_i64:
 case INDEX_op_extu_i32_i64:
+case INDEX_op_extrl_i64_i32:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 2135616e12..1820655ee3 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1013,6 +1013,11 @@ static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg 
rd, TCGReg rn)
 g_assert_not_reached();
 }
 
+static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+g_assert_not_reached();
+}
+
 static void tcg_out_bswap16(TCGContext *s, ARMCond cond,
 TCGReg rd, TCGReg rn, int flags)
 {
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 71a2bff234..45b2054856 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1311,6 +1311,11 @@ static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg 
dest, TCGReg src)
 }
 }
 
+static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg dest, TCGReg src)
+{
+tcg_out_ext32u(s, dest, src);
+}
+
 static inline void tcg_out_bswap64(TCGContext *s, int reg)
 {
 tcg_out_opc(s, OPC_BSWAP + P_REXW + LOWREGMASK(reg), 0, reg, 0);
@@ -2765,9 +2770,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_bswap64_i64:
 tcg_out_bswap64(s, a0);
 break;
-case INDEX_op_extrl_i64_i32:
-tcg_out_ext32u(s, a0, a1);
-break;
 case INDEX_op_extrh_i64_i32:
 tcg_out_shifti(s, SHIFT_SHR + P_REXW, a0, 32);
 break;
@@ -2848,6 +2850,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_ext32u_i64:
 case INDEX_op_ext_i32_i64:
 case INDEX_op_extu_i32_i64:
+case INDEX_op_extrl_i64_i32:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index d83bd9de49..b0e076c462 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -468,6 +468,11 @@ static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg 
ret, TCGReg arg)
 tcg_out_ext32u(s, ret, arg);
 }
 
+static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg ret, TCGReg arg)
+{
+tcg_out_ext32s(s, ret, arg);
+}
+
 static void tcg_out_clzctz(TCGContext *s, LoongArchInsn opc,
TCGReg a0, TCGReg a1, TCGReg a2,
bool 

[PATCH 12/42] tcg/riscv: Conditionalize tcg_out_exts_i32_i64

2023-04-07 Thread Richard Henderson
Since TCG_TYPE_I32 values are kept sign-extended in registers,
via "w" instructions, we need not extend if the register matches.
This is already relied upon by comparisons.

Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target.c.inc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 7bd3b421ad..2b9aab29ec 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -604,7 +604,9 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg ret, 
TCGReg arg)
 
 static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg)
 {
-tcg_out_ext32s(s, ret, arg);
+if (ret != arg) {
+tcg_out_ext32s(s, ret, arg);
+}
 }
 
 static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
-- 
2.34.1




[PATCH 19/42] tcg: Clear TCGLabelQemuLdst on allocation

2023-04-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tcg-ldst.c.inc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/tcg-ldst.c.inc b/tcg/tcg-ldst.c.inc
index 6c6848d034..403cbb0f06 100644
--- a/tcg/tcg-ldst.c.inc
+++ b/tcg/tcg-ldst.c.inc
@@ -72,6 +72,7 @@ static inline TCGLabelQemuLdst *new_ldst_label(TCGContext *s)
 {
 TCGLabelQemuLdst *l = tcg_malloc(sizeof(*l));
 
+memset(l, 0, sizeof(*l));
 QSIMPLEQ_INSERT_TAIL(>ldst_labels, l, next);
 
 return l;
-- 
2.34.1




[PATCH 42/42] tcg/sparc64: Pass TCGType to tcg_out_qemu_{ld,st}

2023-04-07 Thread Richard Henderson
We need to set this in TCGLabelQemuLdst, so plumb this
all the way through from tcg_out_op.

Signed-off-by: Richard Henderson 
---
 tcg/sparc64/tcg-target.c.inc | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index f3e5e856d6..05fc65faac 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -1178,7 +1178,7 @@ static const int qemu_st_opc[(MO_SIZE | MO_BSWAP) + 1] = {
 };
 
 static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, TCGReg addr,
-MemOpIdx oi, bool is_64)
+MemOpIdx oi, TCGType d_type)
 {
 MemOp memop = get_memop(oi);
 tcg_insn_unit *label_ptr;
@@ -1324,7 +1324,7 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, 
TCGReg addr,
 }
 
 static void tcg_out_qemu_st(TCGContext *s, TCGReg data, TCGReg addr,
-MemOpIdx oi, bool is64)
+MemOpIdx oi, TCGType d_type)
 {
 MemOp memop = get_memop(oi);
 tcg_insn_unit *label_ptr;
@@ -1351,8 +1351,7 @@ static void tcg_out_qemu_st(TCGContext *s, TCGReg data, 
TCGReg addr,
 
 tcg_out_mov(s, TCG_TYPE_REG, TCG_REG_O1, addrz);
 tcg_out_movext(s, (memop & MO_SIZE) == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32,
-   TCG_REG_O2, is64 ? TCG_TYPE_I64 : TCG_TYPE_I32,
-   memop & MO_SIZE, data);
+   TCG_REG_O2, d_type, memop & MO_SIZE, data);
 
 func = qemu_st_trampoline[memop & (MO_BSWAP | MO_SIZE)];
 tcg_debug_assert(func != NULL);
@@ -1637,16 +1636,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_qemu_ld_i32:
-tcg_out_qemu_ld(s, a0, a1, a2, false);
+tcg_out_qemu_ld(s, a0, a1, a2, TCG_TYPE_I32);
 break;
 case INDEX_op_qemu_ld_i64:
-tcg_out_qemu_ld(s, a0, a1, a2, true);
+tcg_out_qemu_ld(s, a0, a1, a2, TCG_TYPE_I64);
 break;
 case INDEX_op_qemu_st_i32:
-tcg_out_qemu_st(s, a0, a1, a2, false);
+tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I32);
 break;
 case INDEX_op_qemu_st_i64:
-tcg_out_qemu_st(s, a0, a1, a2, true);
+tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I64);
 break;
 
 case INDEX_op_ld32s_i64:
-- 
2.34.1




[PATCH 38/42] tcg/riscv: Simplify constraints on qemu_ld/st

2023-04-07 Thread Richard Henderson
The softmmu tlb uses TCG_REG_TMP[0-2], not any of the normally available
registers.  Now that we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.

Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target-con-set.h |  2 --
 tcg/riscv/tcg-target-con-str.h |  1 -
 tcg/riscv/tcg-target.c.inc | 16 +++-
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index c11710d117..1a8b8e9f2b 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -10,11 +10,9 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I1(r)
-C_O0_I2(LZ, L)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
 C_O0_I4(rZ, rZ, rZ, rZ)
-C_O1_I1(r, L)
 C_O1_I1(r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index 8d8afaee53..6f1cfb976c 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -9,7 +9,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index ab70aa71a8..45a4bc3714 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -125,17 +125,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define TCG_CT_CONST_N12   0x400
 #define TCG_CT_CONST_M12   0x800
 
-#define ALL_GENERAL_REGS  MAKE_64BIT_MASK(0, 32)
-/*
- * For softmmu, we need to avoid conflicts with the first 5
- * argument registers to call the helper.  Some of these are
- * also used for the tlb lookup.
- */
-#ifdef CONFIG_SOFTMMU
-#define SOFTMMU_RESERVE_REGS  MAKE_64BIT_MASK(TCG_REG_A0, 5)
-#else
-#define SOFTMMU_RESERVE_REGS  0
-#endif
+#define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 
 #define sextreg  sextract64
 
@@ -1654,10 +1644,10 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_qemu_ld_i32:
 case INDEX_op_qemu_ld_i64:
-return C_O1_I1(r, L);
+return C_O1_I1(r, r);
 case INDEX_op_qemu_st_i32:
 case INDEX_op_qemu_st_i64:
-return C_O0_I2(LZ, L);
+return C_O0_I2(rZ, r);
 
 default:
 g_assert_not_reached();
-- 
2.34.1




[PATCH 24/42] tcg/i386: Use TCGType not bool is_64 in tcg_out_qemu_{ld, st}

2023-04-07 Thread Richard Henderson
There are several places where we already convert back from
bool to type.  Clean things up by using type throughout.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 56 +++
 1 file changed, 27 insertions(+), 29 deletions(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index a83ebe8729..568cfe7728 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -1479,7 +1479,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 #endif /* SOFTMMU */
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg lo, TCGReg hi,
-   TCGReg base, MemOp opc, bool is_64)
+   TCGReg base, MemOp opc, TCGType type)
 {
 switch (opc & (MO_SSIZE | MO_BSWAP)) {
 case MO_UB:
@@ -1503,7 +1503,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
lo, TCGReg hi,
 tcg_out_opc_imm(s, OPC_LH, lo, base, 0);
 break;
 case MO_UL | MO_BSWAP:
-if (TCG_TARGET_REG_BITS == 64 && is_64) {
+if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I64) {
 if (use_mips32r2_instructions) {
 tcg_out_opc_imm(s, OPC_LWU, lo, base, 0);
 tcg_out_bswap32(s, lo, lo, TCG_BSWAP_IZ | TCG_BSWAP_OZ);
@@ -1528,7 +1528,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
lo, TCGReg hi,
 }
 break;
 case MO_UL:
-if (TCG_TARGET_REG_BITS == 64 && is_64) {
+if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I64) {
 tcg_out_opc_imm(s, OPC_LWU, lo, base, 0);
 break;
 }
@@ -1583,7 +1583,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
lo, TCGReg hi,
 }
 
 static void tcg_out_qemu_ld_unalign(TCGContext *s, TCGReg lo, TCGReg hi,
-TCGReg base, MemOp opc, bool is_64)
+TCGReg base, MemOp opc, TCGType type)
 {
 const MIPSInsn lw1 = MIPS_BE ? OPC_LWL : OPC_LWR;
 const MIPSInsn lw2 = MIPS_BE ? OPC_LWR : OPC_LWL;
@@ -1623,7 +1623,7 @@ static void tcg_out_qemu_ld_unalign(TCGContext *s, TCGReg 
lo, TCGReg hi,
 case MO_UL:
 tcg_out_opc_imm(s, lw1, lo, base, 0);
 tcg_out_opc_imm(s, lw2, lo, base, 3);
-if (TCG_TARGET_REG_BITS == 64 && is_64 && !sgn) {
+if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I64 && !sgn) {
 tcg_out_ext32u(s, lo, lo);
 }
 break;
@@ -1634,18 +1634,18 @@ static void tcg_out_qemu_ld_unalign(TCGContext *s, 
TCGReg lo, TCGReg hi,
 tcg_out_opc_imm(s, lw1, lo, base, 0);
 tcg_out_opc_imm(s, lw2, lo, base, 3);
 tcg_out_bswap32(s, lo, lo,
-TCG_TARGET_REG_BITS == 64 && is_64
+TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I64
 ? (sgn ? TCG_BSWAP_OS : TCG_BSWAP_OZ) : 0);
 } else {
 const tcg_insn_unit *subr =
-(TCG_TARGET_REG_BITS == 64 && is_64 && !sgn
+(TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I64 && !sgn
  ? bswap32u_addr : bswap32_addr);
 
 tcg_out_opc_imm(s, lw1, TCG_TMP0, base, 0);
 tcg_out_bswap_subr(s, subr);
 /* delay slot */
 tcg_out_opc_imm(s, lw2, TCG_TMP0, base, 3);
-tcg_out_mov(s, is_64 ? TCG_TYPE_I64 : TCG_TYPE_I32, lo, TCG_TMP3);
+tcg_out_mov(s, type, lo, TCG_TMP3);
 }
 break;
 
@@ -1702,7 +1702,7 @@ static void tcg_out_qemu_ld_unalign(TCGContext *s, TCGReg 
lo, TCGReg hi,
 }
 }
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGType d_type)
 {
 TCGReg addr_regl, addr_regh __attribute__((unused));
 TCGReg data_regl, data_regh;
@@ -1716,7 +1716,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is_64)
 TCGReg base = TCG_REG_A0;
 
 data_regl = *args++;
-data_regh = (TCG_TARGET_REG_BITS == 32 && is_64 ? *args++ : 0);
+data_regh = (TCG_TARGET_REG_BITS == 64 || d_type == TCG_TYPE_I32
+ ? 0 : *args++);
 addr_regl = *args++;
 addr_regh = (TCG_TARGET_REG_BITS < TARGET_LONG_BITS ? *args++ : 0);
 oi = *args++;
@@ -1731,14 +1732,12 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is_64)
 #if defined(CONFIG_SOFTMMU)
 tcg_out_tlb_load(s, base, addr_regl, addr_regh, oi, label_ptr, 1);
 if (use_mips32r6_instructions || a_bits >= s_bits) {
-tcg_out_qemu_ld_direct(s, data_regl, data_regh, base, opc, is_64);
+tcg_out_qemu_ld_direct(s, data_regl, data_regh, base, opc, d_type);
 } else {
-tcg_out_qemu_ld_unalign(s, data_regl, data_regh, base, opc, is_64);
+tcg_out_qemu_ld_unalign(s, data_regl, data_regh, base, opc, d_type);
 }
-add_qemu_ldst_label(s, 1, oi,

[PATCH 41/42] tcg/sparc64: Drop is_64 test from tcg_out_qemu_ld data return

2023-04-07 Thread Richard Henderson
In tcg_canonicalize_memop, we remove MO_SIGN from MO_32 operations
with TCG_TYPE_I32.  Thus this is never set.  We already have an
identical test just above which does not include is_64

Signed-off-by: Richard Henderson 
---
 tcg/sparc64/tcg-target.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index 086981f097..f3e5e856d6 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -1220,7 +1220,7 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg data, 
TCGReg addr,
 tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_O2, oi);
 
 /* We let the helper sign-extend SB and SW, but leave SL for here.  */
-if (is_64 && (memop & MO_SSIZE) == MO_SL) {
+if ((memop & MO_SSIZE) == MO_SL) {
 tcg_out_ext32s(s, data, TCG_REG_O0);
 } else {
 tcg_out_mov(s, TCG_TYPE_REG, data, TCG_REG_O0);
-- 
2.34.1




[PATCH 21/42] tcg/aarch64: Rename ext to d_type in tcg_out_qemu_ld

2023-04-07 Thread Richard Henderson
The new name is slightly more descritive as "data type",
where "extend", despite the c type, sounds like a bool.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 4ec3cf3172..40122e1471 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1851,7 +1851,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, MemOp 
memop,
 }
 
 static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
-MemOpIdx oi, TCGType ext)
+MemOpIdx oi, TCGType d_type)
 {
 MemOp memop = get_memop(oi);
 const TCGType otype = TARGET_LONG_BITS == 64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
@@ -1864,9 +1864,9 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg 
data_reg, TCGReg addr_reg,
 tcg_insn_unit *label_ptr;
 
 tcg_out_tlb_read(s, addr_reg, memop, _ptr, mem_index, 1);
-tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
+tcg_out_qemu_ld_direct(s, memop, d_type, data_reg,
TCG_REG_X1, otype, addr_reg);
-add_qemu_ldst_label(s, true, oi, ext, data_reg, addr_reg,
+add_qemu_ldst_label(s, true, oi, d_type, data_reg, addr_reg,
 s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
 unsigned a_bits = get_alignment_bits(memop);
@@ -1874,10 +1874,10 @@ static void tcg_out_qemu_ld(TCGContext *s, TCGReg 
data_reg, TCGReg addr_reg,
 tcg_out_test_alignment(s, true, addr_reg, a_bits);
 }
 if (USE_GUEST_BASE) {
-tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
+tcg_out_qemu_ld_direct(s, memop, d_type, data_reg,
TCG_REG_GUEST_BASE, otype, addr_reg);
 } else {
-tcg_out_qemu_ld_direct(s, memop, ext, data_reg,
+tcg_out_qemu_ld_direct(s, memop, d_type, data_reg,
addr_reg, TCG_TYPE_I64, TCG_REG_XZR);
 }
 #endif /* CONFIG_SOFTMMU */
-- 
2.34.1




[PATCH for-8.1 00/42] tcg: Simplify calls to load/store helpers

2023-04-07 Thread Richard Henderson
There are several changes to the load/store helpers coming, and
making sure that those changes are properly reflected across all
of the backends was harrowing.

I have gone back and restarted by hoisting the code out of the
backends and into tcg.c.  We already have all of the parameters
for the host function call abi for "normal" helpers, we simply
need to apply that to the load/store slow path.

Unlike the normal helpers, we cannot use tcg_gen_foo(), so we start
by creating additional required backend primitives for extension.
This is followed by putting them together with knowledge of the types,
and some functions to handle register move/extend with overlap.
Finally, top-level tcg_out_{ld,st}_helper_args which contains all
knowledge of the helper function signatures.

There will be additional backend unification coming for user-only,
and for sparc64, but that needs to wait for some of the changes within
my atomicity patch set.  And this is quite large enough for now.

One final note about patch 27, dropping support for riscv32 as a host.
This is driven by the existing

/* We don't support oversize guests */
QEMU_BUILD_BUG_ON(TCG_TARGET_REG_BITS < TARGET_LONG_BITS);

which causes the build to fail for all 64-bit guests.

One of the upcoming changes is to build TCG once, which means that the
build would fail entirely.  Which means we might as well drop it entirely.
Doing this first simplifies everything else.

I have not yet simplified top-level meson.build to match, because
I don't know if we should leave something to support riscv32 with
--enable-tcg-interpreter.  I first reaction is no, because there
really is no way to test it, because no one ships such an OS.


r~


Richard Henderson (42):
  tcg: Replace if + tcg_abort with tcg_debug_assert
  tcg: Replace tcg_abort with g_assert_not_reached
  tcg: Split out tcg_out_ext8s
  tcg: Split out tcg_out_ext8u
  tcg: Split out tcg_out_ext16s
  tcg: Split out tcg_out_ext16u
  tcg: Split out tcg_out_ext32s
  tcg: Split out tcg_out_ext32u
  tcg: Split out tcg_out_exts_i32_i64
  tcg/loongarch64: Conditionalize tcg_out_exts_i32_i64
  tcg/mips: Conditionalize tcg_out_exts_i32_i64
  tcg/riscv: Conditionalize tcg_out_exts_i32_i64
  tcg: Split out tcg_out_extu_i32_i64
  tcg/i386: Conditionalize tcg_out_extu_i32_i64
  tcg: Split out tcg_out_extrl_i64_i32
  tcg: Introduce tcg_out_movext
  tcg: Introduce tcg_out_xchg
  tcg: Introduce tcg_out_movext2
  tcg: Clear TCGLabelQemuLdst on allocation
  tcg/i386: Use TCGType not bool is_64 in tcg_out_qemu_{ld,st}
  tcg/aarch64: Rename ext to d_type in tcg_out_qemu_ld
  tcg/aarch64: Pass TGType to tcg_out_qemu_st
  tcg/arm: Use TCGType not bool is_64 in tcg_out_qemu_{ld,st}
  tcg/i386: Use TCGType not bool is_64 in tcg_out_qemu_{ld,st}
  tcg/ppc: Use TCGType not bool is_64 in tcg_out_qemu_{ld,st}
  tcg/s390x: Pass TCGType to tcg_out_qemu_{ld,st}
  tcg/riscv: Require TCG_TARGET_REG_BITS == 64
  tcg/riscv: Expand arguments to tcg_out_qemu_{ld,st}
  tcg: Move TCGLabelQemuLdst to tcg.c
  tcg: Introduce tcg_out_ld_helper_args
  tcg: Introduce tcg_out_st_helper_args
  tcg/loongarch64: Simplify constraints on qemu_ld/st
  tcg/mips: Reorg tcg_out_tlb_load
  tcg/mips: Simplify constraints on qemu_ld/st
  tcg/ppc: Reorg tcg_out_tlb_read
  tcg/ppc: Adjust constraints on qemu_ld/st
  tcg/ppc: Remove unused constraints A, B, C, D
  tcg/riscv: Simplify constraints on qemu_ld/st
  tcg/s390x: Use ALGFR in constructing host address for qemu_ld/st
  tcg/s390x: Simplify constraints on qemu_ld/st
  tcg/sparc64: Drop is_64 test from tcg_out_qemu_ld data return
  tcg/sparc64: Pass TCGType to tcg_out_qemu_{ld,st}

 include/tcg/tcg.h|   6 -
 tcg/loongarch64/tcg-target-con-set.h |   2 -
 tcg/loongarch64/tcg-target-con-str.h |   1 -
 tcg/mips/tcg-target-con-set.h|  13 +-
 tcg/mips/tcg-target-con-str.h|   2 -
 tcg/ppc/tcg-target-con-set.h |  11 +-
 tcg/ppc/tcg-target-con-str.h |   6 -
 tcg/riscv/tcg-target-con-set.h   |   8 -
 tcg/riscv/tcg-target-con-str.h   |   1 -
 tcg/riscv/tcg-target.h   |  22 +-
 tcg/s390x/tcg-target-con-set.h   |   2 -
 tcg/s390x/tcg-target-con-str.h   |   1 -
 target/i386/tcg/translate.c  |  20 +-
 target/s390x/tcg/translate.c |   4 +-
 tcg/optimize.c   |  10 +-
 tcg/tcg.c| 556 ++-
 tcg/aarch64/tcg-target.c.inc | 156 
 tcg/arm/tcg-target.c.inc | 242 
 tcg/i386/tcg-target.c.inc| 257 +
 tcg/loongarch64/tcg-target.c.inc | 167 +++-
 tcg/mips/tcg-target.c.inc| 392 ---
 tcg/ppc/tcg-target.c.inc | 319 +++
 tcg/riscv/tcg-target.c.inc   | 347 ++---
 tcg/s390x/tcg-target.c.inc   | 243 +---
 tcg/sparc64/tcg-target.c.inc | 125 +++---
 tcg/tcg-ldst.c.inc   |  15 +-
 tcg/tci/tcg-target.c.inc   

[PATCH 04/42] tcg: Split out tcg_out_ext8u

2023-04-07 Thread Richard Henderson
We will need a backend interface for performing 8-bit zero-extend.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  5 +
 tcg/aarch64/tcg-target.c.inc | 11 +++
 tcg/arm/tcg-target.c.inc | 12 +---
 tcg/i386/tcg-target.c.inc|  7 +++
 tcg/loongarch64/tcg-target.c.inc |  7 ++-
 tcg/mips/tcg-target.c.inc|  9 -
 tcg/ppc/tcg-target.c.inc |  7 +++
 tcg/riscv/tcg-target.c.inc   |  7 ++-
 tcg/s390x/tcg-target.c.inc   | 14 +-
 tcg/sparc64/tcg-target.c.inc |  9 -
 tcg/tci/tcg-target.c.inc | 14 +-
 11 files changed, 69 insertions(+), 33 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 76ba3e28cd..b02ffc5679 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -106,6 +106,7 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg);
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg);
 static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
+static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
@@ -4504,6 +4505,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 case INDEX_op_ext8s_i64:
 tcg_out_ext8s(s, TCG_TYPE_I64, new_args[0], new_args[1]);
 break;
+case INDEX_op_ext8u_i32:
+case INDEX_op_ext8u_i64:
+tcg_out_ext8u(s, new_args[0], new_args[1]);
+break;
 default:
 if (def->flags & TCG_OPF_VECTOR) {
 tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 4f4f814293..cca91363ce 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1432,6 +1432,11 @@ static inline void tcg_out_uxt(TCGContext *s, MemOp 
s_bits,
 tcg_out_ubfm(s, 0, rd, rn, 0, bits);
 }
 
+static void tcg_out_ext8u(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_uxt(s, MO_8, rd, rn);
+}
+
 static void tcg_out_addsubi(TCGContext *s, int ext, TCGReg rd,
 TCGReg rn, int64_t aimm)
 {
@@ -2243,10 +2248,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext32s_i64:
 tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
 break;
-case INDEX_op_ext8u_i64:
-case INDEX_op_ext8u_i32:
-tcg_out_uxt(s, MO_8, a0, a1);
-break;
 case INDEX_op_ext16u_i64:
 case INDEX_op_ext16u_i32:
 tcg_out_uxt(s, MO_16, a0, a1);
@@ -2313,6 +2314,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_goto_tb:  /* Always emitted via tcg_out_goto_tb.  */
 case INDEX_op_ext8s_i32:  /* Always emitted via tcg_reg_alloc_op.  */
 case INDEX_op_ext8s_i64:
+case INDEX_op_ext8u_i32:
+case INDEX_op_ext8u_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 04a860897f..b99f08a54b 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -964,8 +964,13 @@ static void tcg_out_ext8s(TCGContext *s, TCGType t, TCGReg 
rd, TCGReg rn)
 tcg_out32(s, 0x06af0070 | (COND_AL << 28) | (rd << 12) | rn);
 }
 
+static void tcg_out_ext8u(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_dat_imm(s, COND_AL, ARITH_AND, rd, rn, 0xff);
+}
+
 static void __attribute__((unused))
-tcg_out_ext8u(TCGContext *s, ARMCond cond, TCGReg rd, TCGReg rn)
+tcg_out_ext8u_cond(TCGContext *s, ARMCond cond, TCGReg rd, TCGReg rn)
 {
 tcg_out_dat_imm(s, cond, ARITH_AND, rd, rn, 0xff);
 }
@@ -1365,8 +1370,8 @@ static TCGReg NAME(TCGContext *s, TCGReg argreg, ARGTYPE 
arg)  \
 
 DEFINE_TCG_OUT_ARG(tcg_out_arg_imm32, uint32_t, tcg_out_movi32,
 (tcg_out_movi32(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
-DEFINE_TCG_OUT_ARG(tcg_out_arg_reg8, TCGReg, tcg_out_ext8u,
-(tcg_out_ext8u(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
+DEFINE_TCG_OUT_ARG(tcg_out_arg_reg8, TCGReg, tcg_out_ext8u_cond,
+(tcg_out_ext8u_cond(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
 DEFINE_TCG_OUT_ARG(tcg_out_arg_reg16, TCGReg, tcg_out_ext16u,
 (tcg_out_ext16u(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
 DEFINE_TCG_OUT_ARG(tcg_out_arg_reg32, TCGReg, tcg_out_mov_reg, )
@@ -2299,6 +2304,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_exit_tb:  /* Always emitted via tcg_out_exit_tb.  */
 case INDEX_op_goto_tb:  /* Always emitted via tcg_out_goto_tb.  */
 case INDEX_op_ext8s_i32:  /* Always emitted via tcg_reg_alloc_op.  */
+case INDEX_op_ext8u_i32:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/i386/tcg-target.c.inc 

[PATCH 36/42] tcg/ppc: Adjust constraints on qemu_ld/st

2023-04-07 Thread Richard Henderson
The softmmu tlb uses TCG_REG_{TMP1,TMP2,R0}, not any of the normally
available registers.  Now that we handle overlap betwen inputs and
helper arguments, we can allow any allocatable reg.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target-con-set.h | 11 ---
 tcg/ppc/tcg-target-con-str.h |  2 --
 tcg/ppc/tcg-target.c.inc | 32 ++--
 3 files changed, 14 insertions(+), 31 deletions(-)

diff --git a/tcg/ppc/tcg-target-con-set.h b/tcg/ppc/tcg-target-con-set.h
index a1a345883d..f206b29205 100644
--- a/tcg/ppc/tcg-target-con-set.h
+++ b/tcg/ppc/tcg-target-con-set.h
@@ -12,18 +12,15 @@
 C_O0_I1(r)
 C_O0_I2(r, r)
 C_O0_I2(r, ri)
-C_O0_I2(S, S)
 C_O0_I2(v, r)
-C_O0_I3(S, S, S)
+C_O0_I3(r, r, r)
 C_O0_I4(r, r, ri, ri)
-C_O0_I4(S, S, S, S)
-C_O1_I1(r, L)
+C_O0_I4(r, r, r, r)
 C_O1_I1(r, r)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
 C_O1_I1(v, vr)
 C_O1_I2(r, 0, rZ)
-C_O1_I2(r, L, L)
 C_O1_I2(r, rI, ri)
 C_O1_I2(r, rI, rT)
 C_O1_I2(r, r, r)
@@ -36,7 +33,7 @@ C_O1_I2(v, v, v)
 C_O1_I3(v, v, v, v)
 C_O1_I4(r, r, ri, rZ, rZ)
 C_O1_I4(r, r, r, ri, ri)
-C_O2_I1(L, L, L)
-C_O2_I2(L, L, L, L)
+C_O2_I1(r, r, r)
+C_O2_I2(r, r, r, r)
 C_O2_I4(r, r, rI, rZM, r, r)
 C_O2_I4(r, r, r, r, rI, rZM)
diff --git a/tcg/ppc/tcg-target-con-str.h b/tcg/ppc/tcg-target-con-str.h
index 298ca20d5b..f3bf030bc3 100644
--- a/tcg/ppc/tcg-target-con-str.h
+++ b/tcg/ppc/tcg-target-con-str.h
@@ -14,8 +14,6 @@ REGS('A', 1u << TCG_REG_R3)
 REGS('B', 1u << TCG_REG_R4)
 REGS('C', 1u << TCG_REG_R5)
 REGS('D', 1u << TCG_REG_R6)
-REGS('L', ALL_QLOAD_REGS)
-REGS('S', ALL_QSTORE_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 7195c0b817..dc4e88db8e 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -93,18 +93,6 @@
 #define ALL_GENERAL_REGS  0xu
 #define ALL_VECTOR_REGS   0xull
 
-#ifdef CONFIG_SOFTMMU
-#define ALL_QLOAD_REGS \
-(ALL_GENERAL_REGS & \
- ~((1 << TCG_REG_R3) | (1 << TCG_REG_R4) | (1 << TCG_REG_R5)))
-#define ALL_QSTORE_REGS \
-(ALL_GENERAL_REGS & ~((1 << TCG_REG_R3) | (1 << TCG_REG_R4) | \
-  (1 << TCG_REG_R5) | (1 << TCG_REG_R6)))
-#else
-#define ALL_QLOAD_REGS  (ALL_GENERAL_REGS & ~(1 << TCG_REG_R3))
-#define ALL_QSTORE_REGS ALL_QLOAD_REGS
-#endif
-
 TCGPowerISA have_isa;
 static bool have_isel;
 bool have_altivec;
@@ -3780,23 +3768,23 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_qemu_ld_i32:
 return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-? C_O1_I1(r, L)
-: C_O1_I2(r, L, L));
+? C_O1_I1(r, r)
+: C_O1_I2(r, r, r));
 
 case INDEX_op_qemu_st_i32:
 return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-? C_O0_I2(S, S)
-: C_O0_I3(S, S, S));
+? C_O0_I2(r, r)
+: C_O0_I3(r, r, r));
 
 case INDEX_op_qemu_ld_i64:
-return (TCG_TARGET_REG_BITS == 64 ? C_O1_I1(r, L)
-: TARGET_LONG_BITS == 32 ? C_O2_I1(L, L, L)
-: C_O2_I2(L, L, L, L));
+return (TCG_TARGET_REG_BITS == 64 ? C_O1_I1(r, r)
+: TARGET_LONG_BITS == 32 ? C_O2_I1(r, r, r)
+: C_O2_I2(r, r, r, r));
 
 case INDEX_op_qemu_st_i64:
-return (TCG_TARGET_REG_BITS == 64 ? C_O0_I2(S, S)
-: TARGET_LONG_BITS == 32 ? C_O0_I3(S, S, S)
-: C_O0_I4(S, S, S, S));
+return (TCG_TARGET_REG_BITS == 64 ? C_O0_I2(r, r)
+: TARGET_LONG_BITS == 32 ? C_O0_I3(r, r, r)
+: C_O0_I4(r, r, r, r));
 
 case INDEX_op_add_vec:
 case INDEX_op_sub_vec:
-- 
2.34.1




[PATCH 37/42] tcg/ppc: Remove unused constraints A, B, C, D

2023-04-07 Thread Richard Henderson
These constraints have not been used for quite some time.

Fixes: 77b73de67632 ("Use rem/div[u]_i32 drop div[u]2_i32")
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target-con-str.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/tcg/ppc/tcg-target-con-str.h b/tcg/ppc/tcg-target-con-str.h
index f3bf030bc3..9dcbc3df50 100644
--- a/tcg/ppc/tcg-target-con-str.h
+++ b/tcg/ppc/tcg-target-con-str.h
@@ -10,10 +10,6 @@
  */
 REGS('r', ALL_GENERAL_REGS)
 REGS('v', ALL_VECTOR_REGS)
-REGS('A', 1u << TCG_REG_R3)
-REGS('B', 1u << TCG_REG_R4)
-REGS('C', 1u << TCG_REG_R5)
-REGS('D', 1u << TCG_REG_R6)
 
 /*
  * Define constraint letters for constants:
-- 
2.34.1




[PATCH 34/42] tcg/mips: Simplify constraints on qemu_ld/st

2023-04-07 Thread Richard Henderson
The softmmu tlb uses TCG_REG_TMP[0-3], not any of the normally available
registers.  Now that we handle overlap betwen inputs and helper arguments,
we can allow any allocatable reg.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target-con-set.h | 13 +
 tcg/mips/tcg-target-con-str.h |  2 --
 tcg/mips/tcg-target.c.inc | 30 --
 3 files changed, 13 insertions(+), 32 deletions(-)

diff --git a/tcg/mips/tcg-target-con-set.h b/tcg/mips/tcg-target-con-set.h
index fe3e868a2f..864034f468 100644
--- a/tcg/mips/tcg-target-con-set.h
+++ b/tcg/mips/tcg-target-con-set.h
@@ -12,15 +12,13 @@
 C_O0_I1(r)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
-C_O0_I2(SZ, S)
-C_O0_I3(SZ, S, S)
-C_O0_I3(SZ, SZ, S)
+C_O0_I3(rZ, r, r)
+C_O0_I3(rZ, rZ, r)
 C_O0_I4(rZ, rZ, rZ, rZ)
-C_O0_I4(SZ, SZ, S, S)
-C_O1_I1(r, L)
+C_O0_I4(rZ, rZ, r, r)
 C_O1_I1(r, r)
 C_O1_I2(r, 0, rZ)
-C_O1_I2(r, L, L)
+C_O1_I2(r, r, r)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
 C_O1_I2(r, r, rIK)
@@ -30,7 +28,6 @@ C_O1_I2(r, rZ, rN)
 C_O1_I2(r, rZ, rZ)
 C_O1_I4(r, rZ, rZ, rZ, 0)
 C_O1_I4(r, rZ, rZ, rZ, rZ)
-C_O2_I1(r, r, L)
-C_O2_I2(r, r, L, L)
+C_O2_I1(r, r, r)
 C_O2_I2(r, r, r, r)
 C_O2_I4(r, r, rZ, rZ, rN, rN)
diff --git a/tcg/mips/tcg-target-con-str.h b/tcg/mips/tcg-target-con-str.h
index e4b2965c72..413c280a7a 100644
--- a/tcg/mips/tcg-target-con-str.h
+++ b/tcg/mips/tcg-target-con-str.h
@@ -9,8 +9,6 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
-REGS('L', ALL_QLOAD_REGS)
-REGS('S', ALL_QSTORE_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 16b9d09959..34908c799a 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -176,20 +176,6 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_WSZ  0x2000   /* word size */
 
 #define ALL_GENERAL_REGS  0xu
-#define NOA0_REGS (ALL_GENERAL_REGS & ~(1 << TCG_REG_A0))
-
-#ifdef CONFIG_SOFTMMU
-#define ALL_QLOAD_REGS \
-(NOA0_REGS & ~((TCG_TARGET_REG_BITS < TARGET_LONG_BITS) << TCG_REG_A2))
-#define ALL_QSTORE_REGS \
-(NOA0_REGS & ~(TCG_TARGET_REG_BITS < TARGET_LONG_BITS   \
-   ? (1 << TCG_REG_A2) | (1 << TCG_REG_A3)  \
-   : (1 << TCG_REG_A1)))
-#else
-#define ALL_QLOAD_REGS   NOA0_REGS
-#define ALL_QSTORE_REGS  NOA0_REGS
-#endif
-
 
 static bool is_p2m1(tcg_target_long val)
 {
@@ -2488,18 +2474,18 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_qemu_ld_i32:
 return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-? C_O1_I1(r, L) : C_O1_I2(r, L, L));
+? C_O1_I1(r, r) : C_O1_I2(r, r, r));
 case INDEX_op_qemu_st_i32:
 return (TCG_TARGET_REG_BITS == 64 || TARGET_LONG_BITS == 32
-? C_O0_I2(SZ, S) : C_O0_I3(SZ, S, S));
+? C_O0_I2(rZ, r) : C_O0_I3(rZ, r, r));
 case INDEX_op_qemu_ld_i64:
-return (TCG_TARGET_REG_BITS == 64 ? C_O1_I1(r, L)
-: TARGET_LONG_BITS == 32 ? C_O2_I1(r, r, L)
-: C_O2_I2(r, r, L, L));
+return (TCG_TARGET_REG_BITS == 64 ? C_O1_I1(r, r)
+: TARGET_LONG_BITS == 32 ? C_O2_I1(r, r, r)
+: C_O2_I2(r, r, r, r));
 case INDEX_op_qemu_st_i64:
-return (TCG_TARGET_REG_BITS == 64 ? C_O0_I2(SZ, S)
-: TARGET_LONG_BITS == 32 ? C_O0_I3(SZ, SZ, S)
-: C_O0_I4(SZ, SZ, S, S));
+return (TCG_TARGET_REG_BITS == 64 ? C_O0_I2(rZ, r)
+: TARGET_LONG_BITS == 32 ? C_O0_I3(rZ, rZ, r)
+: C_O0_I4(rZ, rZ, r, r));
 
 default:
 g_assert_not_reached();
-- 
2.34.1




[PATCH 27/42] tcg/riscv: Require TCG_TARGET_REG_BITS == 64

2023-04-07 Thread Richard Henderson
The port currently does not support "oversize" guests, which
means riscv32 can only target 32-bit guests.  We will soon be
building TCG once for all guests.  This implies that we can
only support riscv64.

Since all Linux distributions target riscv64 not riscv32,
this is not much of a restriction and simplifies the code.

Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target-con-set.h |   6 -
 tcg/riscv/tcg-target.h |  22 ++--
 tcg/riscv/tcg-target.c.inc | 206 ++---
 3 files changed, 72 insertions(+), 162 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index cf0ac4d751..c11710d117 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -13,18 +13,12 @@ C_O0_I1(r)
 C_O0_I2(LZ, L)
 C_O0_I2(rZ, r)
 C_O0_I2(rZ, rZ)
-C_O0_I3(LZ, L, L)
-C_O0_I3(LZ, LZ, L)
-C_O0_I4(LZ, LZ, L, L)
 C_O0_I4(rZ, rZ, rZ, rZ)
 C_O1_I1(r, L)
 C_O1_I1(r, r)
-C_O1_I2(r, L, L)
 C_O1_I2(r, r, ri)
 C_O1_I2(r, r, rI)
 C_O1_I2(r, rZ, rN)
 C_O1_I2(r, rZ, rZ)
 C_O1_I4(r, rZ, rZ, rZ, rZ)
-C_O2_I1(r, r, L)
-C_O2_I2(r, r, L, L)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 0deb33701f..dddf2486c1 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -25,11 +25,14 @@
 #ifndef RISCV_TCG_TARGET_H
 #define RISCV_TCG_TARGET_H
 
-#if __riscv_xlen == 32
-# define TCG_TARGET_REG_BITS 32
-#elif __riscv_xlen == 64
-# define TCG_TARGET_REG_BITS 64
+/*
+ * We don't support oversize guests.
+ * Since we will only build tcg once, this in turn requires a 64-bit host.
+ */
+#if __riscv_xlen != 64
+#error "unsupported code generation mode"
 #endif
+#define TCG_TARGET_REG_BITS 64
 
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 20
@@ -83,13 +86,8 @@ typedef enum {
 #define TCG_TARGET_STACK_ALIGN  16
 #define TCG_TARGET_CALL_STACK_OFFSET0
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
-#if TCG_TARGET_REG_BITS == 32
-#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_EVEN
-#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
-#else
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
-#endif
 #define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 /* optional instructions */
@@ -106,8 +104,8 @@ typedef enum {
 #define TCG_TARGET_HAS_sub2_i32 1
 #define TCG_TARGET_HAS_mulu2_i320
 #define TCG_TARGET_HAS_muls2_i320
-#define TCG_TARGET_HAS_muluh_i32(TCG_TARGET_REG_BITS == 32)
-#define TCG_TARGET_HAS_mulsh_i32(TCG_TARGET_REG_BITS == 32)
+#define TCG_TARGET_HAS_muluh_i320
+#define TCG_TARGET_HAS_mulsh_i320
 #define TCG_TARGET_HAS_ext8s_i321
 #define TCG_TARGET_HAS_ext16s_i32   1
 #define TCG_TARGET_HAS_ext8u_i321
@@ -128,7 +126,6 @@ typedef enum {
 #define TCG_TARGET_HAS_setcond2 1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
-#if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_movcond_i64  0
 #define TCG_TARGET_HAS_div_i64  1
 #define TCG_TARGET_HAS_rem_i64  1
@@ -165,7 +162,6 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
-#endif
 
 #define TCG_TARGET_DEFAULT_MO (0)
 
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 266fe1433d..1edc3b1c4d 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -137,15 +137,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #define SOFTMMU_RESERVE_REGS  0
 #endif
 
-
-static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
-{
-if (TCG_TARGET_REG_BITS == 32) {
-return sextract32(val, pos, len);
-} else {
-return sextract64(val, pos, len);
-}
-}
+#define sextreg  sextract64
 
 /* test if a constant matches the constraint */
 static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
@@ -235,7 +227,6 @@ typedef enum {
 OPC_XOR = 0x4033,
 OPC_XORI = 0x4013,
 
-#if TCG_TARGET_REG_BITS == 64
 OPC_ADDIW = 0x1b,
 OPC_ADDW = 0x3b,
 OPC_DIVUW = 0x200503b,
@@ -250,23 +241,6 @@ typedef enum {
 OPC_SRLIW = 0x501b,
 OPC_SRLW = 0x503b,
 OPC_SUBW = 0x403b,
-#else
-/* Simplify code throughout by defining aliases for RV32.  */
-OPC_ADDIW = OPC_ADDI,
-OPC_ADDW = OPC_ADD,
-OPC_DIVUW = OPC_DIVU,
-OPC_DIVW = OPC_DIV,
-OPC_MULW = OPC_MUL,
-OPC_REMUW = OPC_REMU,
-OPC_REMW = OPC_REM,
-OPC_SLLIW = OPC_SLLI,
-OPC_SLLW = OPC_SLL,
-OPC_SRAIW = OPC_SRAI,
-OPC_SRAW = OPC_SRA,
-OPC_SRLIW = OPC_SRLI,
-OPC_SRLW = OPC_SRL,
-OPC_SUBW = OPC_SUB,
-#endif
 
 OPC_FENCE = 0x000f,
 OPC_NOP   = OPC_ADDI,   /* nop = addi r0,r0,0 */
@@ -500,7 +474,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg rd,
 

[PATCH 35/42] tcg/ppc: Reorg tcg_out_tlb_read

2023-04-07 Thread Richard Henderson
Allocate TCG_REG_TMP2.  Use R0, TMP1, TMP2 instead of any of
the normally allocated registers for the tlb load.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 83 +++-
 1 file changed, 48 insertions(+), 35 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 383464b408..7195c0b817 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -68,6 +68,7 @@
 #else
 # define TCG_REG_TMP1   TCG_REG_R12
 #endif
+#define TCG_REG_TMP2TCG_REG_R11
 
 #define TCG_VEC_TMP1TCG_REG_V0
 #define TCG_VEC_TMP2TCG_REG_V1
@@ -2007,10 +2008,11 @@ static void * const qemu_st_helpers[(MO_SIZE | 
MO_BSWAP) + 1] = {
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) > 0);
 QEMU_BUILD_BUG_ON(TLB_MASK_TABLE_OFS(0) < -32768);
 
-/* Perform the TLB load and compare.  Places the result of the comparison
-   in CR7, loads the addend of the TLB into R3, and returns the register
-   containing the guest address (zero-extended into R4).  Clobbers R0 and R2. 
*/
-
+/*
+ * Perform the TLB load and compare.  Places the result of the comparison
+ * in CR7, loads the addend of the TLB into TMP1, and returns the register
+ * containing the guest address (zero-extended into TMP2).  Clobbers R0.
+ */
 static TCGReg tcg_out_tlb_read(TCGContext *s, MemOp opc,
TCGReg addrlo, TCGReg addrhi,
int mem_index, bool is_read)
@@ -2026,40 +2028,44 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, MemOp opc,
 unsigned a_bits = get_alignment_bits(opc);
 
 /* Load tlb_mask[mmu_idx] and tlb_table[mmu_idx].  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_AREG0, mask_off);
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R4, TCG_AREG0, table_off);
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, TCG_AREG0, mask_off);
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP2, TCG_AREG0, table_off);
 
 /* Extract the page index, shifted into place for tlb index.  */
 if (TCG_TARGET_REG_BITS == 32) {
-tcg_out_shri32(s, TCG_REG_TMP1, addrlo,
+tcg_out_shri32(s, TCG_REG_R0, addrlo,
TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
 } else {
-tcg_out_shri64(s, TCG_REG_TMP1, addrlo,
+tcg_out_shri64(s, TCG_REG_R0, addrlo,
TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
 }
-tcg_out32(s, AND | SAB(TCG_REG_R3, TCG_REG_R3, TCG_REG_TMP1));
+tcg_out32(s, AND | SAB(TCG_REG_TMP1, TCG_REG_TMP1, TCG_REG_R0));
 
-/* Load the TLB comparator.  */
+/* Load the (low part) TLB comparator into TMP2. */
 if (cmp_off == 0 && TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
 uint32_t lxu = (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32
 ? LWZUX : LDUX);
-tcg_out32(s, lxu | TAB(TCG_REG_TMP1, TCG_REG_R3, TCG_REG_R4));
+tcg_out32(s, lxu | TAB(TCG_REG_TMP2, TCG_REG_TMP1, TCG_REG_TMP2));
 } else {
-tcg_out32(s, ADD | TAB(TCG_REG_R3, TCG_REG_R3, TCG_REG_R4));
+tcg_out32(s, ADD | TAB(TCG_REG_TMP1, TCG_REG_TMP1, TCG_REG_TMP2));
 if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
-tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_TMP1, TCG_REG_R3, cmp_off + 4);
-tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_R4, TCG_REG_R3, cmp_off);
+tcg_out_ld(s, TCG_TYPE_I32, TCG_REG_TMP2,
+   TCG_REG_TMP1, cmp_off + 4);
 } else {
-tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP1, TCG_REG_R3, cmp_off);
+tcg_out_ld(s, TCG_TYPE_TL, TCG_REG_TMP2, TCG_REG_TMP1, cmp_off);
 }
 }
 
-/* Load the TLB addend for use on the fast path.  Do this asap
-   to minimize any load use delay.  */
-tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_REG_R3,
-   offsetof(CPUTLBEntry, addend));
+/*
+ * Load the TLB addend for use on the fast path.
+ * Do this asap to minimize any load use delay.
+ */
+if (TCG_TARGET_REG_BITS >= TARGET_LONG_BITS) {
+tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, TCG_REG_TMP1,
+   offsetof(CPUTLBEntry, addend));
+}
 
-/* Clear the non-page, non-alignment bits from the address */
+/* Clear the non-page, non-alignment bits from the address into R0. */
 if (TCG_TARGET_REG_BITS == 32) {
 /* We don't support unaligned accesses on 32-bits.
  * Preserve the bottom bits and thus trigger a comparison
@@ -2090,9 +2096,6 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, MemOp opc,
 if (TARGET_LONG_BITS == 32) {
 tcg_out_rlw(s, RLWINM, TCG_REG_R0, t, 0,
 (32 - a_bits) & 31, 31 - TARGET_PAGE_BITS);
-/* Zero-extend the address for use in the final address.  */
-tcg_out_ext32u(s, TCG_REG_R4, addrlo);
-addrlo = TCG_REG_R4;
 } else if (a_bits == 0) {
 tcg_out_rld(s, RLDICR, TCG_REG_R0, t, 0, 63 - TARGET_PAGE_BITS);
 } else {
@@ -2102,16 +2105,27 @@ 

[PATCH 11/42] tcg/mips: Conditionalize tcg_out_exts_i32_i64

2023-04-07 Thread Richard Henderson
Since TCG_TYPE_I32 values are kept sign-extended in registers, we need not
extend if the register matches.  This is already relied upon by comparisons.

Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.c.inc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index df36bec5c0..2bc885e00e 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -582,7 +582,9 @@ static void tcg_out_ext32s(TCGContext *s, TCGReg rd, TCGReg 
rs)
 
 static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg rd, TCGReg rs)
 {
-tcg_out_ext32s(s, rd, rs);
+if (rd != rs) {
+tcg_out_ext32s(s, rd, rs);
+}
 }
 
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
-- 
2.34.1




[PATCH 20/42] tcg/i386: Use TCGType not bool is_64 in tcg_out_qemu_{ld, st}

2023-04-07 Thread Richard Henderson
There are several places where we already convert back from
bool to type.  Clean things up by using type throughout.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 35 +--
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 54465c7f46..ff4062ef54 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1886,8 +1886,8 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg 
addrlo, TCGReg addrhi,
  * Record the context of a call to the out of line helper code for the slow 
path
  * for a load or store, so that we can later generate the correct helper code
  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, bool is_64,
-MemOpIdx oi,
+static void add_qemu_ldst_label(TCGContext *s, bool is_ld,
+TCGType type, MemOpIdx oi,
 TCGReg datalo, TCGReg datahi,
 TCGReg addrlo, TCGReg addrhi,
 tcg_insn_unit *raddr,
@@ -1897,7 +1897,7 @@ static void add_qemu_ldst_label(TCGContext *s, bool 
is_ld, bool is_64,
 
 label->is_ld = is_ld;
 label->oi = oi;
-label->type = is_64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
+label->type = type;
 label->datalo_reg = datalo;
 label->datahi_reg = datahi;
 label->addrlo_reg = addrlo;
@@ -2151,11 +2151,10 @@ static inline int setup_guest_base_seg(void)
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
TCGReg base, int index, intptr_t ofs,
-   int seg, bool is64, MemOp memop)
+   int seg, TCGType type, MemOp memop)
 {
-TCGType type = is64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
 bool use_movbe = false;
-int rexw = is64 * P_REXW;
+int rexw = (type == TCG_TYPE_I32 ? 0 : P_REXW);
 int movop = OPC_MOVL_GvEv;
 
 /* Do big-endian loads with movbe.  */
@@ -2248,7 +2247,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
datalo, TCGReg datahi,
 /* XXX: qemu_ld and qemu_st could be modified to clobber only EDX and
EAX. It will be useful once fixed registers globals are less
common. */
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGType d_type)
 {
 TCGReg datalo, datahi, addrlo;
 TCGReg addrhi __attribute__((unused));
@@ -2262,7 +2261,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 #endif
 
 datalo = *args++;
-datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
+datahi = TCG_TARGET_REG_BITS == 64 || d_type == TCG_TYPE_I32 ? 0 : *args++;
 addrlo = *args++;
 addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
 oi = *args++;
@@ -2275,10 +2274,10 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
  label_ptr, offsetof(CPUTLBEntry, addr_read));
 
 /* TLB Hit.  */
-tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, is64, opc);
+tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, d_type, 
opc);
 
 /* Record the current context of a load into ldst label */
-add_qemu_ldst_label(s, true, is64, oi, datalo, datahi, addrlo, addrhi,
+add_qemu_ldst_label(s, true, d_type, oi, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #else
 a_bits = get_alignment_bits(opc);
@@ -2288,7 +2287,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 
 tcg_out_qemu_ld_direct(s, datalo, datahi, addrlo, x86_guest_base_index,
x86_guest_base_offset, x86_guest_base_seg,
-   is64, opc);
+   d_type, opc);
 #endif
 }
 
@@ -2344,7 +2343,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg 
datalo, TCGReg datahi,
 }
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGType d_type)
 {
 TCGReg datalo, datahi, addrlo;
 TCGReg addrhi __attribute__((unused));
@@ -2358,7 +2357,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 #endif
 
 datalo = *args++;
-datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
+datahi = TCG_TARGET_REG_BITS == 64 || d_type == TCG_TYPE_I32 ? 0 : *args++;
 addrlo = *args++;
 addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
 oi = *args++;
@@ -2374,7 +2373,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, opc);
 
 /* Record the current context of a store into ldst label */
-add_qemu_ldst_label(s, false, is64, oi, 

[PATCH 17/42] tcg: Introduce tcg_out_xchg

2023-04-07 Thread Richard Henderson
We will want a backend interface for register swapping.
This is only properly defined for x86; all others get a
stub version that always indicates failure.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c| 2 ++
 tcg/aarch64/tcg-target.c.inc | 5 +
 tcg/arm/tcg-target.c.inc | 5 +
 tcg/i386/tcg-target.c.inc| 8 
 tcg/loongarch64/tcg-target.c.inc | 5 +
 tcg/mips/tcg-target.c.inc| 5 +
 tcg/ppc/tcg-target.c.inc | 5 +
 tcg/riscv/tcg-target.c.inc   | 5 +
 tcg/s390x/tcg-target.c.inc   | 5 +
 tcg/sparc64/tcg-target.c.inc | 5 +
 tcg/tci/tcg-target.c.inc | 5 +
 11 files changed, 55 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 6fe7dd6564..d82d99e1b0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -115,6 +115,8 @@ static void tcg_out_exts_i32_i64(TCGContext *s, TCGReg ret, 
TCGReg arg);
 static void tcg_out_extu_i32_i64(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+__attribute__((unused));
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 29bc97ed1c..4ec3cf3172 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1106,6 +1106,11 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg rd,
 tcg_out_insn(s, 3305, LDR, 0, rd);
 }
 
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+{
+return false;
+}
+
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
  tcg_target_long imm)
 {
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index f865294861..4a5d57a41c 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -2607,6 +2607,11 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 tcg_out_movi32(s, COND_AL, ret, arg);
 }
 
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+{
+return false;
+}
+
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
  tcg_target_long imm)
 {
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 2d7c173a03..7d6bf30747 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -460,6 +460,7 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct)
 #define OPC_VPTERNLOGQ  (0x25 | P_EXT3A | P_DATA16 | P_VEXW | P_EVEX)
 #define OPC_VZEROUPPER  (0x77 | P_EXT)
 #define OPC_XCHG_ax_r32(0x90)
+#define OPC_XCHG_EvGv   (0x87)
 
 #define OPC_GRP3_Eb (0xf6)
 #define OPC_GRP3_Ev (0xf7)
@@ -1078,6 +1079,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 }
 }
 
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+{
+int rexw = type == TCG_TYPE_I32 ? 0 : P_REXW;
+tcg_out_modrm(s, OPC_XCHG_EvGv + rexw, r1, r2);
+return true;
+}
+
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
  tcg_target_long imm)
 {
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index fc98b9b31b..0940788c6f 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -419,6 +419,11 @@ static void tcg_out_addi(TCGContext *s, TCGType type, 
TCGReg rd,
 }
 }
 
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+{
+return false;
+}
+
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
  tcg_target_long imm)
 {
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index f103cdb4e6..a83ebe8729 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -597,6 +597,11 @@ static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg 
rd, TCGReg rs)
 tcg_out_ext32s(s, rd, rs);
 }
 
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+{
+return false;
+}
+
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
  tcg_target_long imm)
 {
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 612ad15bda..d3e547998f 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1153,6 +1153,11 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg ret,
 }
 }
 
+static bool tcg_out_xchg(TCGContext *s, TCGType type, TCGReg r1, TCGReg r2)
+{
+return false;
+}
+
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
  tcg_target_long imm)
 {
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 

[PATCH 28/42] tcg/riscv: Expand arguments to tcg_out_qemu_{ld,st}

2023-04-07 Thread Richard Henderson
Now that the host is always 64-bit, the address and
data operands are always one operand each.  In addition,
change to using TCGType to describe the data operand.

Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target.c.inc | 47 +++---
 1 file changed, 18 insertions(+), 29 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 1edc3b1c4d..6059802d9a 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -1101,7 +1101,7 @@ static bool tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 #endif /* CONFIG_SOFTMMU */
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg val,
-   TCGReg base, MemOp opc, bool is_64)
+   TCGReg base, MemOp opc, TCGType type)
 {
 /* Byte swapping is left to middle-end expansion. */
 tcg_debug_assert((opc & MO_BSWAP) == 0);
@@ -1120,7 +1120,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
val,
 tcg_out_opc_imm(s, OPC_LH, val, base, 0);
 break;
 case MO_UL:
-if (is_64) {
+if (type == TCG_TYPE_I64) {
 tcg_out_opc_imm(s, OPC_LWU, val, base, 0);
 break;
 }
@@ -1136,11 +1136,10 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, 
TCGReg val,
 }
 }
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64)
+static void tcg_out_qemu_ld(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
+MemOpIdx oi, TCGType d_type)
 {
-TCGReg addr_reg, data_reg;
-MemOpIdx oi;
-MemOp opc;
+MemOp opc = get_memop(oi);
 #if defined(CONFIG_SOFTMMU)
 tcg_insn_unit *label_ptr[1];
 #else
@@ -1148,16 +1147,11 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is_64)
 #endif
 TCGReg base;
 
-data_reg = *args++;
-addr_reg = *args++;
-oi = *args++;
-opc = get_memop(oi);
-
 #if defined(CONFIG_SOFTMMU)
 base = tcg_out_tlb_load(s, addr_reg, oi, label_ptr, 1);
-tcg_out_qemu_ld_direct(s, data_reg, base, opc, is_64);
-add_qemu_ldst_label(s, 1, oi, (is_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
-data_reg, addr_reg, s->code_ptr, label_ptr);
+tcg_out_qemu_ld_direct(s, data_reg, base, opc, d_type);
+add_qemu_ldst_label(s, true, oi, d_type, data_reg, addr_reg,
+s->code_ptr, label_ptr);
 #else
 a_bits = get_alignment_bits(opc);
 if (a_bits) {
@@ -1172,7 +1166,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is_64)
 tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP0, TCG_GUEST_BASE_REG, base);
 base = TCG_REG_TMP0;
 }
-tcg_out_qemu_ld_direct(s, data_reg, base, opc, is_64);
+tcg_out_qemu_ld_direct(s, data_reg, base, opc, d_type);
 #endif
 }
 
@@ -1200,11 +1194,10 @@ static void tcg_out_qemu_st_direct(TCGContext *s, 
TCGReg val,
 }
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64)
+static void tcg_out_qemu_st(TCGContext *s, TCGReg data_reg, TCGReg addr_reg,
+MemOpIdx oi, TCGType d_type)
 {
-TCGReg addr_reg, data_reg;
-MemOpIdx oi;
-MemOp opc;
+MemOp opc = get_memop(oi);
 #if defined(CONFIG_SOFTMMU)
 tcg_insn_unit *label_ptr[1];
 #else
@@ -1212,16 +1205,12 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is_64)
 #endif
 TCGReg base;
 
-data_reg = *args++;
-addr_reg = *args++;
-oi = *args++;
-opc = get_memop(oi);
 
 #if defined(CONFIG_SOFTMMU)
 base = tcg_out_tlb_load(s, addr_reg, oi, label_ptr, 0);
 tcg_out_qemu_st_direct(s, data_reg, base, opc);
-add_qemu_ldst_label(s, 0, oi, (is_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
-data_reg, addr_reg, s->code_ptr, label_ptr);
+add_qemu_ldst_label(s, false, oi, d_type, data_reg, addr_reg,
+s->code_ptr, label_ptr);
 #else
 a_bits = get_alignment_bits(opc);
 if (a_bits) {
@@ -1528,16 +1517,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_qemu_ld_i32:
-tcg_out_qemu_ld(s, args, false);
+tcg_out_qemu_ld(s, a0, a1, a2, TCG_TYPE_I32);
 break;
 case INDEX_op_qemu_ld_i64:
-tcg_out_qemu_ld(s, args, true);
+tcg_out_qemu_ld(s, a0, a1, a2, TCG_TYPE_I64);
 break;
 case INDEX_op_qemu_st_i32:
-tcg_out_qemu_st(s, args, false);
+tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I32);
 break;
 case INDEX_op_qemu_st_i64:
-tcg_out_qemu_st(s, args, true);
+tcg_out_qemu_st(s, a0, a1, a2, TCG_TYPE_I64);
 break;
 
 case INDEX_op_extrh_i64_i32:
-- 
2.34.1




[PATCH 01/42] tcg: Replace if + tcg_abort with tcg_debug_assert

2023-04-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 4 +---
 tcg/i386/tcg-target.c.inc | 8 +++-
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index bb52bc060b..100f81edb2 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1174,9 +1174,7 @@ static TCGTemp *tcg_global_reg_new_internal(TCGContext 
*s, TCGType type,
 {
 TCGTemp *ts;
 
-if (TCG_TARGET_REG_BITS == 32 && type != TCG_TYPE_I32) {
-tcg_abort();
-}
+tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
 
 ts = tcg_global_alloc(s);
 ts->base_type = type;
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index eb9234..aa7ee16b25 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1369,8 +1369,8 @@ static void tcg_out_addi(TCGContext *s, int reg, 
tcg_target_long val)
 }
 }
 
-/* Use SMALL != 0 to force a short forward branch.  */
-static void tcg_out_jxx(TCGContext *s, int opc, TCGLabel *l, int small)
+/* Set SMALL to force a short forward branch.  */
+static void tcg_out_jxx(TCGContext *s, int opc, TCGLabel *l, bool small)
 {
 int32_t val, val1;
 
@@ -1385,9 +1385,7 @@ static void tcg_out_jxx(TCGContext *s, int opc, TCGLabel 
*l, int small)
 }
 tcg_out8(s, val1);
 } else {
-if (small) {
-tcg_abort();
-}
+tcg_debug_assert(!small);
 if (opc == -1) {
 tcg_out8(s, OPC_JMP_long);
 tcg_out32(s, val - 5);
-- 
2.34.1




[PATCH 07/42] tcg: Split out tcg_out_ext32s

2023-04-07 Thread Richard Henderson
We will need a backend interface for performing 32-bit sign-extend.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  4 
 tcg/aarch64/tcg-target.c.inc |  9 +++--
 tcg/arm/tcg-target.c.inc |  5 +
 tcg/i386/tcg-target.c.inc|  5 +++--
 tcg/loongarch64/tcg-target.c.inc |  2 +-
 tcg/mips/tcg-target.c.inc| 12 +---
 tcg/ppc/tcg-target.c.inc |  5 +++--
 tcg/riscv/tcg-target.c.inc   |  2 +-
 tcg/s390x/tcg-target.c.inc   | 10 +-
 tcg/sparc64/tcg-target.c.inc | 11 ---
 tcg/tci/tcg-target.c.inc |  9 -
 11 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5b0db747e8..84aa8d639e 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -109,6 +109,7 @@ static void tcg_out_ext8s(TCGContext *s, TCGType type, 
TCGReg ret, TCGReg arg);
 static void tcg_out_ext16s(TCGContext *s, TCGType type, TCGReg ret, TCGReg 
arg);
 static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_ext16u(TCGContext *s, TCGReg ret, TCGReg arg);
+static void tcg_out_ext32s(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
@@ -4521,6 +4522,9 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 case INDEX_op_ext16u_i64:
 tcg_out_ext16u(s, new_args[0], new_args[1]);
 break;
+case INDEX_op_ext32s_i64:
+tcg_out_ext32s(s, new_args[0], new_args[1]);
+break;
 default:
 if (def->flags & TCG_OPF_VECTOR) {
 tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index f55829e9ce..d7964734c3 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1429,6 +1429,11 @@ static void tcg_out_ext16s(TCGContext *s, TCGType type, 
TCGReg rd, TCGReg rn)
 tcg_out_sxt(s, type, MO_16, rd, rn);
 }
 
+static void tcg_out_ext32s(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_sxt(s, TCG_TYPE_I64, MO_32, rd, rn);
+}
+
 static inline void tcg_out_uxt(TCGContext *s, MemOp s_bits,
TCGReg rd, TCGReg rn)
 {
@@ -2232,7 +2237,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_bswap32_i64:
 tcg_out_rev(s, TCG_TYPE_I32, MO_32, a0, a1);
 if (a2 & TCG_BSWAP_OS) {
-tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a0);
+tcg_out_ext32s(s, a0, a0);
 }
 break;
 case INDEX_op_bswap32_i32:
@@ -2251,7 +2256,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_ext_i32_i64:
-case INDEX_op_ext32s_i64:
 tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
 break;
 case INDEX_op_extu_i32_i64:
@@ -2322,6 +2326,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext16s_i32:
 case INDEX_op_ext16u_i64:
 case INDEX_op_ext16u_i32:
+case INDEX_op_ext32s_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 8fa0c6cbc0..401769bdd6 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -993,6 +993,11 @@ static void tcg_out_ext16u(TCGContext *s, TCGReg rd, 
TCGReg rn)
 tcg_out_ext16u_cond(s, COND_AL, rd, rn);
 }
 
+static void tcg_out_ext32s(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+g_assert_not_reached();
+}
+
 static void tcg_out_bswap16(TCGContext *s, ARMCond cond,
 TCGReg rd, TCGReg rn, int flags)
 {
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 920524589d..8bb747b81d 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1293,8 +1293,9 @@ static inline void tcg_out_ext32u(TCGContext *s, int 
dest, int src)
 tcg_out_modrm(s, OPC_MOVL_GvEv, dest, src);
 }
 
-static inline void tcg_out_ext32s(TCGContext *s, int dest, int src)
+static void tcg_out_ext32s(TCGContext *s, TCGReg dest, TCGReg src)
 {
+tcg_debug_assert(TCG_TARGET_REG_BITS == 64);
 tcg_out_modrm(s, OPC_MOVSLQ, dest, src);
 }
 
@@ -2758,7 +2759,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_ext32u(s, a0, a1);
 break;
 case INDEX_op_ext_i32_i64:
-case INDEX_op_ext32s_i64:
 tcg_out_ext32s(s, a0, a1);
 break;
 case INDEX_op_extrh_i64_i32:
@@ -2837,6 +2837,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case INDEX_op_ext16s_i64:
 case INDEX_op_ext16u_i32:
 case INDEX_op_ext16u_i64:
+case INDEX_op_ext32s_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 08c2b65b19..037474510c 100644
--- 

[PATCH 06/42] tcg: Split out tcg_out_ext16u

2023-04-07 Thread Richard Henderson
We will need a backend interface for performing 16-bit zero-extend.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  5 +
 tcg/aarch64/tcg-target.c.inc | 13 -
 tcg/arm/tcg-target.c.inc | 17 ++---
 tcg/i386/tcg-target.c.inc|  8 +++-
 tcg/loongarch64/tcg-target.c.inc |  7 ++-
 tcg/mips/tcg-target.c.inc|  5 +
 tcg/ppc/tcg-target.c.inc |  4 +++-
 tcg/riscv/tcg-target.c.inc   |  7 ++-
 tcg/s390x/tcg-target.c.inc   | 17 ++---
 tcg/sparc64/tcg-target.c.inc | 11 +--
 tcg/tci/tcg-target.c.inc | 14 +-
 11 files changed, 66 insertions(+), 42 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 739f92c2ee..5b0db747e8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -108,6 +108,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
 static void tcg_out_ext16s(TCGContext *s, TCGType type, TCGReg ret, TCGReg 
arg);
 static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg arg);
+static void tcg_out_ext16u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
@@ -4516,6 +4517,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 case INDEX_op_ext16s_i64:
 tcg_out_ext16s(s, TCG_TYPE_I64, new_args[0], new_args[1]);
 break;
+case INDEX_op_ext16u_i32:
+case INDEX_op_ext16u_i64:
+tcg_out_ext16u(s, new_args[0], new_args[1]);
+break;
 default:
 if (def->flags & TCG_OPF_VECTOR) {
 tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 3527c14d04..f55829e9ce 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1442,6 +1442,11 @@ static void tcg_out_ext8u(TCGContext *s, TCGReg rd, 
TCGReg rn)
 tcg_out_uxt(s, MO_8, rd, rn);
 }
 
+static void tcg_out_ext16u(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_uxt(s, MO_16, rd, rn);
+}
+
 static void tcg_out_addsubi(TCGContext *s, int ext, TCGReg rd,
 TCGReg rn, int64_t aimm)
 {
@@ -2241,7 +2246,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_ext16s(s, ext, a0, a0);
 } else if ((a2 & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
 /* Output must be zero-extended, but input isn't. */
-tcg_out_uxt(s, MO_16, a0, a0);
+tcg_out_ext16u(s, a0, a0);
 }
 break;
 
@@ -2249,10 +2254,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext32s_i64:
 tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
 break;
-case INDEX_op_ext16u_i64:
-case INDEX_op_ext16u_i32:
-tcg_out_uxt(s, MO_16, a0, a1);
-break;
 case INDEX_op_extu_i32_i64:
 case INDEX_op_ext32u_i64:
 tcg_out_movr(s, TCG_TYPE_I32, a0, a1);
@@ -2319,6 +2320,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext8u_i64:
 case INDEX_op_ext16s_i64:
 case INDEX_op_ext16s_i32:
+case INDEX_op_ext16u_i64:
+case INDEX_op_ext16u_i32:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index cddf977a58..8fa0c6cbc0 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -981,12 +981,18 @@ static void tcg_out_ext16s(TCGContext *s, TCGType t, 
TCGReg rd, TCGReg rn)
 tcg_out32(s, 0x06bf0070 | (COND_AL << 28) | (rd << 12) | rn);
 }
 
-static void tcg_out_ext16u(TCGContext *s, ARMCond cond, TCGReg rd, TCGReg rn)
+static void tcg_out_ext16u_cond(TCGContext *s, ARMCond cond,
+TCGReg rd, TCGReg rn)
 {
 /* uxth */
 tcg_out32(s, 0x06ff0070 | (cond << 28) | (rd << 12) | rn);
 }
 
+static void tcg_out_ext16u(TCGContext *s, TCGReg rd, TCGReg rn)
+{
+tcg_out_ext16u_cond(s, COND_AL, rd, rn);
+}
+
 static void tcg_out_bswap16(TCGContext *s, ARMCond cond,
 TCGReg rd, TCGReg rn, int flags)
 {
@@ -1372,8 +1378,8 @@ DEFINE_TCG_OUT_ARG(tcg_out_arg_imm32, uint32_t, 
tcg_out_movi32,
 (tcg_out_movi32(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
 DEFINE_TCG_OUT_ARG(tcg_out_arg_reg8, TCGReg, tcg_out_ext8u_cond,
 (tcg_out_ext8u_cond(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
-DEFINE_TCG_OUT_ARG(tcg_out_arg_reg16, TCGReg, tcg_out_ext16u,
-(tcg_out_ext16u(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
+DEFINE_TCG_OUT_ARG(tcg_out_arg_reg16, TCGReg, tcg_out_ext16u_cond,
+(tcg_out_ext16u_cond(s, COND_AL, TCG_REG_TMP, arg), arg = TCG_REG_TMP))
 DEFINE_TCG_OUT_ARG(tcg_out_arg_reg32, TCGReg, tcg_out_mov_reg, )
 
 

[PATCH 29/42] tcg: Move TCGLabelQemuLdst to tcg.c

2023-04-07 Thread Richard Henderson
This will shortly be used by sparc64 without also using
TCG_TARGET_NEED_LDST_LABELS.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c  | 13 +
 tcg/tcg-ldst.c.inc | 14 --
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 1c11f15bce..647af6c210 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -94,6 +94,19 @@ typedef struct QEMU_PACKED {
 DebugFrameFDEHeader fde;
 } DebugFrameHeader;
 
+typedef struct TCGLabelQemuLdst {
+bool is_ld; /* qemu_ld: true, qemu_st: false */
+MemOpIdx oi;
+TCGType type;   /* result type of a load */
+TCGReg addrlo_reg;  /* reg index for low word of guest virtual addr */
+TCGReg addrhi_reg;  /* reg index for high word of guest virtual addr */
+TCGReg datalo_reg;  /* reg index for low word to be loaded or stored */
+TCGReg datahi_reg;  /* reg index for high word to be loaded or stored 
*/
+const tcg_insn_unit *raddr;   /* addr of the next IR of qemu_ld/st IR */
+tcg_insn_unit *label_ptr[2]; /* label pointers to be updated */
+QSIMPLEQ_ENTRY(TCGLabelQemuLdst) next;
+} TCGLabelQemuLdst;
+
 static void tcg_register_jit_int(const void *buf, size_t size,
  const void *debug_frame,
  size_t debug_frame_size)
diff --git a/tcg/tcg-ldst.c.inc b/tcg/tcg-ldst.c.inc
index 403cbb0f06..ffada04af0 100644
--- a/tcg/tcg-ldst.c.inc
+++ b/tcg/tcg-ldst.c.inc
@@ -20,20 +20,6 @@
  * THE SOFTWARE.
  */
 
-typedef struct TCGLabelQemuLdst {
-bool is_ld; /* qemu_ld: true, qemu_st: false */
-MemOpIdx oi;
-TCGType type;   /* result type of a load */
-TCGReg addrlo_reg;  /* reg index for low word of guest virtual addr */
-TCGReg addrhi_reg;  /* reg index for high word of guest virtual addr */
-TCGReg datalo_reg;  /* reg index for low word to be loaded or stored */
-TCGReg datahi_reg;  /* reg index for high word to be loaded or stored 
*/
-const tcg_insn_unit *raddr;   /* addr of the next IR of qemu_ld/st IR */
-tcg_insn_unit *label_ptr[2]; /* label pointers to be updated */
-QSIMPLEQ_ENTRY(TCGLabelQemuLdst) next;
-} TCGLabelQemuLdst;
-
-
 /*
  * Generate TB finalization at the end of block
  */
-- 
2.34.1




[PATCH 05/42] tcg: Split out tcg_out_ext16s

2023-04-07 Thread Richard Henderson
We will need a backend interface for performing 16-bit sign-extend.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  7 +++
 tcg/aarch64/tcg-target.c.inc | 13 -
 tcg/arm/tcg-target.c.inc | 10 --
 tcg/i386/tcg-target.c.inc| 16 
 tcg/loongarch64/tcg-target.c.inc | 13 +
 tcg/mips/tcg-target.c.inc| 11 ---
 tcg/ppc/tcg-target.c.inc | 12 +---
 tcg/riscv/tcg-target.c.inc   |  9 +++--
 tcg/s390x/tcg-target.c.inc   | 12 
 tcg/sparc64/tcg-target.c.inc |  7 +++
 tcg/tci/tcg-target.c.inc | 21 -
 11 files changed, 79 insertions(+), 52 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index b02ffc5679..739f92c2ee 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -106,6 +106,7 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg);
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg);
 static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
+static void tcg_out_ext16s(TCGContext *s, TCGType type, TCGReg ret, TCGReg 
arg);
 static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
@@ -4509,6 +4510,12 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 case INDEX_op_ext8u_i64:
 tcg_out_ext8u(s, new_args[0], new_args[1]);
 break;
+case INDEX_op_ext16s_i32:
+tcg_out_ext16s(s, TCG_TYPE_I32, new_args[0], new_args[1]);
+break;
+case INDEX_op_ext16s_i64:
+tcg_out_ext16s(s, TCG_TYPE_I64, new_args[0], new_args[1]);
+break;
 default:
 if (def->flags & TCG_OPF_VECTOR) {
 tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index cca91363ce..3527c14d04 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1424,6 +1424,11 @@ static void tcg_out_ext8s(TCGContext *s, TCGType type, 
TCGReg rd, TCGReg rn)
 tcg_out_sxt(s, type, MO_8, rd, rn);
 }
 
+static void tcg_out_ext16s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rn)
+{
+tcg_out_sxt(s, type, MO_16, rd, rn);
+}
+
 static inline void tcg_out_uxt(TCGContext *s, MemOp s_bits,
TCGReg rd, TCGReg rn)
 {
@@ -2233,17 +2238,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_rev(s, TCG_TYPE_I32, MO_16, a0, a1);
 if (a2 & TCG_BSWAP_OS) {
 /* Output must be sign-extended. */
-tcg_out_sxt(s, ext, MO_16, a0, a0);
+tcg_out_ext16s(s, ext, a0, a0);
 } else if ((a2 & (TCG_BSWAP_IZ | TCG_BSWAP_OZ)) == TCG_BSWAP_OZ) {
 /* Output must be zero-extended, but input isn't. */
 tcg_out_uxt(s, MO_16, a0, a0);
 }
 break;
 
-case INDEX_op_ext16s_i64:
-case INDEX_op_ext16s_i32:
-tcg_out_sxt(s, ext, MO_16, a0, a1);
-break;
 case INDEX_op_ext_i32_i64:
 case INDEX_op_ext32s_i64:
 tcg_out_sxt(s, TCG_TYPE_I64, MO_32, a0, a1);
@@ -2316,6 +2317,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_ext8s_i64:
 case INDEX_op_ext8u_i32:
 case INDEX_op_ext8u_i64:
+case INDEX_op_ext16s_i64:
+case INDEX_op_ext16s_i32:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index b99f08a54b..cddf977a58 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -975,10 +975,10 @@ tcg_out_ext8u_cond(TCGContext *s, ARMCond cond, TCGReg 
rd, TCGReg rn)
 tcg_out_dat_imm(s, cond, ARITH_AND, rd, rn, 0xff);
 }
 
-static void tcg_out_ext16s(TCGContext *s, ARMCond cond, TCGReg rd, TCGReg rn)
+static void tcg_out_ext16s(TCGContext *s, TCGType t, TCGReg rd, TCGReg rn)
 {
 /* sxth */
-tcg_out32(s, 0x06bf0070 | (cond << 28) | (rd << 12) | rn);
+tcg_out32(s, 0x06bf0070 | (COND_AL << 28) | (rd << 12) | rn);
 }
 
 static void tcg_out_ext16u(TCGContext *s, ARMCond cond, TCGReg rd, TCGReg rn)
@@ -1541,7 +1541,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 tcg_out_ext8s(s, TCG_TYPE_I32, datalo, TCG_REG_R0);
 break;
 case MO_SW:
-tcg_out_ext16s(s, COND_AL, datalo, TCG_REG_R0);
+tcg_out_ext16s(s, TCG_TYPE_I32, datalo, TCG_REG_R0);
 break;
 default:
 tcg_out_mov_reg(s, COND_AL, datalo, TCG_REG_R0);
@@ -2249,9 +2249,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_bswap32(s, COND_AL, args[0], args[1]);
 break;
 
-case INDEX_op_ext16s_i32:
-tcg_out_ext16s(s, COND_AL, args[0], args[1]);
-break;
 case 

[PATCH 02/42] tcg: Replace tcg_abort with g_assert_not_reached

2023-04-07 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h|  6 --
 target/i386/tcg/translate.c  | 20 ++--
 target/s390x/tcg/translate.c |  4 ++--
 tcg/optimize.c   | 10 --
 tcg/tcg.c|  8 
 tcg/aarch64/tcg-target.c.inc |  4 ++--
 tcg/arm/tcg-target.c.inc |  2 +-
 tcg/i386/tcg-target.c.inc| 14 +++---
 tcg/mips/tcg-target.c.inc| 14 +++---
 tcg/ppc/tcg-target.c.inc |  8 
 tcg/s390x/tcg-target.c.inc   |  8 
 tcg/sparc64/tcg-target.c.inc |  2 +-
 tcg/tci/tcg-target.c.inc |  2 +-
 13 files changed, 47 insertions(+), 55 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 5cfaa53938..b19e167e1d 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -967,12 +967,6 @@ typedef struct TCGTargetOpDef {
 const char *args_ct_str[TCG_MAX_OP_ARGS];
 } TCGTargetOpDef;
 
-#define tcg_abort() \
-do {\
-fprintf(stderr, "%s:%d: tcg fatal error\n", __FILE__, __LINE__);\
-abort();\
-} while (0)
-
 bool tcg_op_supported(TCGOpcode op);
 
 void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, TCGTemp **args);
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 9dfad2f7bc..91c9c0c478 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -476,7 +476,7 @@ static TCGv gen_op_deposit_reg_v(DisasContext *s, MemOp ot, 
int reg, TCGv dest,
 break;
 #endif
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 return cpu_regs[reg];
 }
@@ -660,7 +660,7 @@ static void gen_lea_v_seg(DisasContext *s, MemOp aflag, 
TCGv a0,
 }
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 
 if (ovr_seg >= 0) {
@@ -765,7 +765,7 @@ static void gen_helper_in_func(MemOp ot, TCGv v, TCGv_i32 n)
 gen_helper_inl(v, cpu_env, n);
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 }
 
@@ -782,7 +782,7 @@ static void gen_helper_out_func(MemOp ot, TCGv_i32 v, 
TCGv_i32 n)
 gen_helper_outl(cpu_env, v, n);
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 }
 
@@ -1932,7 +1932,7 @@ static void gen_rotc_rm_T1(DisasContext *s, MemOp ot, int 
op1,
 break;
 #endif
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 } else {
 switch (ot) {
@@ -1951,7 +1951,7 @@ static void gen_rotc_rm_T1(DisasContext *s, MemOp ot, int 
op1,
 break;
 #endif
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 }
 /* store */
@@ -2282,7 +2282,7 @@ static AddressParts gen_lea_modrm_0(CPUX86State *env, 
DisasContext *s,
 break;
 
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 
  done:
@@ -2434,7 +2434,7 @@ static inline uint32_t insn_get(CPUX86State *env, 
DisasContext *s, MemOp ot)
 ret = x86_ldl_code(env, s);
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 return ret;
 }
@@ -3723,7 +3723,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_op_mov_reg_v(s, MO_16, R_EAX, s->T0);
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 break;
 case 0x99: /* CDQ/CWD */
@@ -3748,7 +3748,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_op_mov_reg_v(s, MO_16, R_EDX, s->T0);
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 break;
 case 0x1af: /* imul Gv, Ev */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 2d9b4bbb1f..46b874e94d 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -418,7 +418,7 @@ static int get_mem_index(DisasContext *s)
 case PSW_ASC_HOME >> FLAG_MASK_PSW_SHIFT:
 return MMU_HOME_IDX;
 default:
-tcg_abort();
+g_assert_not_reached();
 break;
 }
 #endif
@@ -652,7 +652,7 @@ static void gen_op_calc_cc(DisasContext *s)
 gen_helper_calc_cc(cc_op, cpu_env, cc_op, cc_src, cc_dst, cc_vr);
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 
 /* We now have cc in cc_op as constant */
diff --git a/tcg/optimize.c b/tcg/optimize.c
index ce05989c39..9614fa3638 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -453,9 +453,7 @@ static uint64_t do_constant_folding_2(TCGOpcode op, 
uint64_t x, uint64_t y)
 return (uint64_t)x % ((uint64_t)y ? : 1);
 
 default:
-fprintf(stderr,
-"Unrecognized operation %d in do_constant_folding.\n", op);
-tcg_abort();
+g_assert_not_reached();
 }
 }
 
@@ -493,7 +491,7 @@ static bool do_constant_folding_cond_32(uint32_t x, 
uint32_t y, TCGCond c)
 case 

[PATCH 30/42] tcg: Introduce tcg_out_ld_helper_args

2023-04-07 Thread Richard Henderson
Centralize the logic to call the helper_ldN_mmu functions.
This loses out slightly on mips by not filling the delay slot,
but the result is more maintainable.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c| 187 +++
 tcg/aarch64/tcg-target.c.inc |   8 +-
 tcg/arm/tcg-target.c.inc |  13 +--
 tcg/i386/tcg-target.c.inc|  30 +
 tcg/loongarch64/tcg-target.c.inc |  12 +-
 tcg/mips/tcg-target.c.inc|  15 +--
 tcg/ppc/tcg-target.c.inc |  41 +++
 tcg/riscv/tcg-target.c.inc   |  15 +--
 tcg/s390x/tcg-target.c.inc   |  14 +--
 9 files changed, 220 insertions(+), 115 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 647af6c210..e67b80aeeb 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -180,6 +180,10 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct);
 #ifdef TCG_TARGET_NEED_LDST_LABELS
 static int tcg_out_ldst_finalize(TCGContext *s);
 #endif
+static int tcg_out_ld_helper_args(TCGContext *s, const TCGLabelQemuLdst *l,
+  void (*ra_gen)(TCGContext *s, TCGReg r),
+  int ra_reg, int scratch_reg)
+__attribute__((unused));
 
 TCGContext tcg_init_ctx;
 __thread TCGContext *tcg_ctx;
@@ -4973,6 +4977,189 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 }
 }
 
+/* Wrapper to prevent -Wtype-limits errors for i386, where ARRAY_SIZE == 0. */
+static inline bool in_iarg_reg(unsigned arg)
+{
+unsigned max = ARRAY_SIZE(tcg_target_call_iarg_regs);
+return max != 0 && arg < max;
+}
+
+static void tcg_out_helper_arg(TCGContext *s, TCGType d_type, unsigned d_arg,
+   TCGType s_type, MemOp s_mo, TCGReg s_reg,
+   int scratch_reg)
+{
+if (TCG_TARGET_CALL_ARG_I32 == TCG_CALL_ARG_EXTEND) {
+d_type = TCG_TYPE_REG;
+}
+
+if (in_iarg_reg(d_arg)) {
+tcg_out_movext(s, d_type, tcg_target_call_iarg_regs[d_arg],
+   s_type, s_mo, s_reg);
+return;
+}
+
+/* The argument is going onto the stack; extend into scratch. */
+if ((s_mo & MO_SIZE) < (d_type == TCG_TYPE_I32 ? MO_32 : MO_64)) {
+tcg_debug_assert(scratch_reg >= 0);
+tcg_out_movext(s, d_type, scratch_reg, s_type, s_mo, s_reg);
+s_reg = scratch_reg;
+}
+tcg_out_st(s, TCG_TYPE_REG, s_reg, TCG_REG_CALL_STACK,
+   TCG_TARGET_CALL_STACK_OFFSET +
+   d_arg * sizeof(tcg_target_long));
+}
+
+static void tcg_out_helper_arg_im(TCGContext *s, TCGType d_type,
+  unsigned d_arg, tcg_target_long imm,
+  int scratch_reg)
+{
+intptr_t ofs;
+
+if (TCG_TARGET_CALL_ARG_I32 == TCG_CALL_ARG_EXTEND) {
+d_type = TCG_TYPE_REG;
+}
+if (in_iarg_reg(d_arg)) {
+tcg_out_movi(s, d_type, tcg_target_call_iarg_regs[d_arg], imm);
+return;
+}
+
+ofs = TCG_TARGET_CALL_STACK_OFFSET + d_arg * sizeof(tcg_target_long);
+if (tcg_out_sti(s, TCG_TYPE_REG, imm, TCG_REG_CALL_STACK, ofs)) {
+return;
+}
+
+tcg_debug_assert(scratch_reg >= 0);
+tcg_out_movi(s, d_type, scratch_reg, imm);
+tcg_out_st(s, TCG_TYPE_REG, scratch_reg, TCG_REG_CALL_STACK, ofs);
+}
+
+static int tcg_out_helper_arg_ra(TCGContext *s, unsigned d_arg,
+ void (*ra_gen)(TCGContext *s, TCGReg r),
+ int ra_reg, uintptr_t ra_imm,
+ int scratch_reg)
+{
+intptr_t ofs;
+
+if (in_iarg_reg(d_arg)) {
+TCGReg d_reg = tcg_target_call_iarg_regs[d_arg];
+
+if (ra_reg >= 0) {
+tcg_out_mov(s, TCG_TYPE_PTR, d_reg, ra_reg);
+} else if (ra_gen) {
+ra_gen(s, d_reg);
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, d_reg, ra_imm);
+}
+return d_reg;
+}
+
+ofs = TCG_TARGET_CALL_STACK_OFFSET + d_arg * sizeof(tcg_target_long);
+if (ra_reg < 0) {
+if (ra_gen) {
+tcg_debug_assert(scratch_reg >= 0);
+ra_gen(s, scratch_reg);
+} else if (scratch_reg >= 0) {
+tcg_out_movi(s, TCG_TYPE_PTR, scratch_reg, ra_imm);
+} else {
+bool ok = tcg_out_sti(s, TCG_TYPE_REG, ra_imm,
+  TCG_REG_CALL_STACK, ofs);
+tcg_debug_assert(ok);
+return -1;
+}
+ra_reg = scratch_reg;
+}
+tcg_out_st(s, TCG_TYPE_REG, ra_reg, TCG_REG_CALL_STACK, ofs);
+return ra_reg;
+}
+
+/*
+ * Poor man's topological sort on 2 source+destination register pairs.
+ * This is a simplified version of tcg_out_movext2 for 32-bit hosts.
+ */
+static void tcg_out_mov_32x2(TCGContext *s, TCGReg d1, TCGReg s1,
+ TCGReg d2, TCGReg s2, int t1)
+{
+tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+
+if (d1 != s2) {
+tcg_out_mov(s, TCG_TYPE_I32, d1, 

[PATCH 26/42] tcg/s390x: Pass TCGType to tcg_out_qemu_{ld,st}

2023-04-07 Thread Richard Henderson
We need to set this in TCGLabelQemuLdst, so plumb this
all the way through from tcg_out_op.

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target.c.inc | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index b399798664..77dcdd7c0f 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1770,13 +1770,14 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg 
addr_reg, MemOp opc,
 }
 
 static void add_qemu_ldst_label(TCGContext *s, bool is_ld, MemOpIdx oi,
-TCGReg data, TCGReg addr,
+TCGType type, TCGReg data, TCGReg addr,
 tcg_insn_unit *raddr, tcg_insn_unit *label_ptr)
 {
 TCGLabelQemuLdst *label = new_ldst_label(s);
 
 label->is_ld = is_ld;
 label->oi = oi;
+label->type = type;
 label->datalo_reg = data;
 label->addrlo_reg = addr;
 label->raddr = tcg_splitwx_to_rx(raddr);
@@ -1900,7 +1901,7 @@ static void tcg_prepare_user_ldst(TCGContext *s, TCGReg 
*addr_reg,
 #endif /* CONFIG_SOFTMMU */
 
 static void tcg_out_qemu_ld(TCGContext* s, TCGReg data_reg, TCGReg addr_reg,
-MemOpIdx oi)
+MemOpIdx oi, TCGType d_type)
 {
 MemOp opc = get_memop(oi);
 #ifdef CONFIG_SOFTMMU
@@ -1916,7 +1917,8 @@ static void tcg_out_qemu_ld(TCGContext* s, TCGReg 
data_reg, TCGReg addr_reg,
 
 tcg_out_qemu_ld_direct(s, opc, data_reg, base_reg, TCG_REG_R2, 0);
 
-add_qemu_ldst_label(s, 1, oi, data_reg, addr_reg, s->code_ptr, label_ptr);
+add_qemu_ldst_label(s, 1, oi, d_type, data_reg, addr_reg,
+s->code_ptr, label_ptr);
 #else
 TCGReg index_reg;
 tcg_target_long disp;
@@ -1931,7 +1933,7 @@ static void tcg_out_qemu_ld(TCGContext* s, TCGReg 
data_reg, TCGReg addr_reg,
 }
 
 static void tcg_out_qemu_st(TCGContext* s, TCGReg data_reg, TCGReg addr_reg,
-MemOpIdx oi)
+MemOpIdx oi, TCGType d_type)
 {
 MemOp opc = get_memop(oi);
 #ifdef CONFIG_SOFTMMU
@@ -1947,7 +1949,8 @@ static void tcg_out_qemu_st(TCGContext* s, TCGReg 
data_reg, TCGReg addr_reg,
 
 tcg_out_qemu_st_direct(s, opc, data_reg, base_reg, TCG_REG_R2, 0);
 
-add_qemu_ldst_label(s, 0, oi, data_reg, addr_reg, s->code_ptr, label_ptr);
+add_qemu_ldst_label(s, 0, oi, d_type, data_reg, addr_reg,
+s->code_ptr, label_ptr);
 #else
 TCGReg index_reg;
 tcg_target_long disp;
@@ -2307,13 +2310,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 
 case INDEX_op_qemu_ld_i32:
-/* ??? Technically we can use a non-extending instruction.  */
+tcg_out_qemu_ld(s, args[0], args[1], args[2], TCG_TYPE_I32);
+break;
 case INDEX_op_qemu_ld_i64:
-tcg_out_qemu_ld(s, args[0], args[1], args[2]);
+tcg_out_qemu_ld(s, args[0], args[1], args[2], TCG_TYPE_I64);
 break;
 case INDEX_op_qemu_st_i32:
+tcg_out_qemu_st(s, args[0], args[1], args[2], TCG_TYPE_I32);
+break;
 case INDEX_op_qemu_st_i64:
-tcg_out_qemu_st(s, args[0], args[1], args[2]);
+tcg_out_qemu_st(s, args[0], args[1], args[2], TCG_TYPE_I64);
 break;
 
 case INDEX_op_ld16s_i64:
-- 
2.34.1




[PATCH for-8.0] tcg/i386: Adjust assert in tcg_out_addi_ptr

2023-04-07 Thread Richard Henderson
We can arrive here on _WIN64 because Int128 is passed by reference.
Change the assert to check that the immediate is in range,
instead of attempting to check the host ABI.

Fixes: 6a6d772e30d ("tcg: Introduce tcg_out_addi_ptr")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1581
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index eb9234..5a151fe64a 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1082,7 +1082,7 @@ static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, 
TCGReg rs,
  tcg_target_long imm)
 {
 /* This function is only used for passing structs by reference. */
-tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+tcg_debug_assert(imm == (int32_t)imm);
 tcg_out_modrm_offset(s, OPC_LEA, rd, rs, imm);
 }
 
-- 
2.34.1




[PATCH 03/42] tcg: Split out tcg_out_ext8s

2023-04-07 Thread Richard Henderson
We will need a backend interface for performing 8-bit sign-extend.
Use it in tcg_reg_alloc_op in the meantime.

Signed-off-by: Richard Henderson 
---
 tcg/tcg.c| 21 -
 tcg/aarch64/tcg-target.c.inc | 11 +++
 tcg/arm/tcg-target.c.inc | 10 --
 tcg/i386/tcg-target.c.inc| 10 +-
 tcg/loongarch64/tcg-target.c.inc | 11 ---
 tcg/mips/tcg-target.c.inc| 12 
 tcg/ppc/tcg-target.c.inc | 10 --
 tcg/riscv/tcg-target.c.inc   |  9 +++--
 tcg/s390x/tcg-target.c.inc   | 10 +++---
 tcg/sparc64/tcg-target.c.inc |  7 +++
 tcg/tci/tcg-target.c.inc | 21 -
 11 files changed, 81 insertions(+), 51 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index c3a8578951..76ba3e28cd 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -105,6 +105,7 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg1,
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg);
+static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
 static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
@@ -4496,11 +4497,21 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp 
*op)
 }
 
 /* emit instruction */
-if (def->flags & TCG_OPF_VECTOR) {
-tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
-   new_args, const_args);
-} else {
-tcg_out_op(s, op->opc, new_args, const_args);
+switch (op->opc) {
+case INDEX_op_ext8s_i32:
+tcg_out_ext8s(s, TCG_TYPE_I32, new_args[0], new_args[1]);
+break;
+case INDEX_op_ext8s_i64:
+tcg_out_ext8s(s, TCG_TYPE_I64, new_args[0], new_args[1]);
+break;
+default:
+if (def->flags & TCG_OPF_VECTOR) {
+tcg_out_vec_op(s, op->opc, TCGOP_VECL(op), TCGOP_VECE(op),
+   new_args, const_args);
+} else {
+tcg_out_op(s, op->opc, new_args, const_args);
+}
+break;
 }
 
 /* move the outputs in the correct register if needed */
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 1315cb92ab..4f4f814293 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1419,6 +1419,11 @@ static inline void tcg_out_sxt(TCGContext *s, TCGType 
ext, MemOp s_bits,
 tcg_out_sbfm(s, ext, rd, rn, 0, bits);
 }
 
+static void tcg_out_ext8s(TCGContext *s, TCGType type, TCGReg rd, TCGReg rn)
+{
+tcg_out_sxt(s, type, MO_8, rd, rn);
+}
+
 static inline void tcg_out_uxt(TCGContext *s, MemOp s_bits,
TCGReg rd, TCGReg rn)
 {
@@ -2230,10 +2235,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
-case INDEX_op_ext8s_i64:
-case INDEX_op_ext8s_i32:
-tcg_out_sxt(s, ext, MO_8, a0, a1);
-break;
 case INDEX_op_ext16s_i64:
 case INDEX_op_ext16s_i32:
 tcg_out_sxt(s, ext, MO_16, a0, a1);
@@ -2310,6 +2311,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_call: /* Always emitted via tcg_out_call.  */
 case INDEX_op_exit_tb:  /* Always emitted via tcg_out_exit_tb.  */
 case INDEX_op_goto_tb:  /* Always emitted via tcg_out_goto_tb.  */
+case INDEX_op_ext8s_i32:  /* Always emitted via tcg_reg_alloc_op.  */
+case INDEX_op_ext8s_i64:
 default:
 g_assert_not_reached();
 }
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index b4daa97e7a..04a860897f 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -958,10 +958,10 @@ static void tcg_out_udiv(TCGContext *s, ARMCond cond,
 tcg_out32(s, 0x0730f010 | (cond << 28) | (rd << 16) | rn | (rm << 8));
 }
 
-static void tcg_out_ext8s(TCGContext *s, ARMCond cond, TCGReg rd, TCGReg rn)
+static void tcg_out_ext8s(TCGContext *s, TCGType t, TCGReg rd, TCGReg rn)
 {
 /* sxtb */
-tcg_out32(s, 0x06af0070 | (cond << 28) | (rd << 12) | rn);
+tcg_out32(s, 0x06af0070 | (COND_AL << 28) | (rd << 12) | rn);
 }
 
 static void __attribute__((unused))
@@ -1533,7 +1533,7 @@ static bool tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 datahi = lb->datahi_reg;
 switch (opc & MO_SSIZE) {
 case MO_SB:
-tcg_out_ext8s(s, COND_AL, datalo, TCG_REG_R0);
+tcg_out_ext8s(s, TCG_TYPE_I32, datalo, TCG_REG_R0);
 break;
 case MO_SW:
 tcg_out_ext16s(s, COND_AL, datalo, TCG_REG_R0);
@@ -2244,9 +2244,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_bswap32(s, COND_AL, args[0], args[1]);
 break;
 
-case INDEX_op_ext8s_i32:
-

Re: [PATCH v12 02/10] target/riscv: add support for Zca extension

2023-04-07 Thread liweiwei



On 2023/4/8 03:25, Daniel Henrique Barboza wrote:



On 4/7/23 00:34, liweiwei wrote:


On 2023/4/7 09:14, liweiwei wrote:


On 2023/4/7 04:22, Daniel Henrique Barboza wrote:

Hi,

This patch is going to break the sifive_u boot if I rebase

"[PATCH v6 0/9] target/riscv: rework CPU extensions validation​"

on top of it, as it is the case today with the current 
riscv-to-apply.next.


The reason is that the priv spec version for Zca is marked as 
1_12_0, and
the priv spec version for both sifive CPUs is 1_10_0, and both are 
enabling

RVC.

This patch from that series above:

"[PATCH v6 5/9] target/riscv/cpu.c: add priv_spec 
validate/disable_exts helpers"


Makes the disabling of the extension based on priv version to 
happen *after* we
do all the validations, instead of before as we're doing today. Zca 
(and Zcd) will
be manually enabled just to be disabled shortly after by the priv 
spec code. And

this will happen:


Yeah, I didn't take priv_version into consideration before.

This is a new problem if we disable them at the end and was not 
triggered in my previous tests.


Not only Zca and Zcd, Zcf also has the same problem.



qemu-system-riscv64: warning: disabling zca extension for hart 
0x because privilege spec version does not match
qemu-system-riscv64: warning: disabling zca extension for hart 
0x0001 because privilege spec version does not match
qemu-system-riscv64: warning: disabling zcd extension for hart 
0x0001 because privilege spec version does not match

(--- hangs ---)

This means that the assumption made in this patch - that Zca 
implies RVC - is no

longer valid, and all these translations won't work.

As specified in Zc* spec,  Zca is the subset of RVC.  C & F include 
Zcf  in RV32. C & D include Zcd.


Some possible solutions:

- Do not use Zca as a synonym for RVC, i.e. drop this patch. We 
would need to convert

all Zca checks to RVC checks in all translation code.


We should check both Zca and RVC in this way.

Similarly, we also should check both C and Zcf for Zcf 
instructions, C and Zcd for Zcd instructions.


I can update this patchset or add a new patch for it if needed.



- Do not apply patch 5/9 from that series that moves the 
disable_ext code to the end
of validation. Also a possibility, but we would be sweeping the 
problem under the rug.
Zca still can't be used as a RVC replacement due to priv spec 
version constraints, but
we just won't disable Zca because we'll keep validating exts too 
early (which is the

problem that the patch addresses).

- change the priv spec of the sifive CPUs - and everyone that uses 
RVC -  to 1_12_0. Not

sure if this makes sense.

- do not disable any extensions due to privilege spec version 
mismatch. This would make
all the priv_version related artifacts to be more "educational" 
than to be an actual
configuration we want to enforce. Not sure if that would do any 
good in the end, but

it's also a possibility.


I prefer this way. For vendor-specific cpu types, the implicitly 
implied extensions will have no effect on its function,


and this can be invisible to user if we mask them in isa_string 
exposed to the kernel.


The question is whether we need constrain the  configuration for 
general cpu type.


Subset extension for another extension is not a single case in 
RISC-V. such as zaamo is subset of A. Zfhmin is subset of Zfh.


Maybe some of them don't have this problem. However,  I think it's 
better to take the related work away from the developer.


I think we can combine the two method if we want to constrain the 
configuration for general cpu type:


- remain disable  extensions due to privilege spec version mismatch 
before validation to disable the extensions manually set by users


- mask the implicitly enabled extensions in isa_string to make them 
invisible to users (I have sent a new patch to do this, you can pick 
it into


your patchset if this way is acceptable).


I tested that patch with my series. If we keep the disable extension 
code to be executed
before the validation, filtering the extensions that were user enabled 
only, it fixes

the problem I reported here.

It's worth noticing though that we would be making the intended, 
conscious decision of
hiding extensions from the isa_string that are actually enabled in the 
hart. And CPUs
such as SIFIVE_CPU will start working with Z extensions that are 
beyond their declared
priv spec. This wouldn't be a problem if we could guarantee that 
userland would always
read 'isa_string' before using an extension, but in reality we can't 
guarantee that.
Firing an instruction for a given extension and capturing SIGILL to 
see if the hart supports

it or not isn't forbidden by the ISA.


The implicitly enabled extensions are mostly subset of its super 
extension, except zfinx (I think it's visible to user,


and we  can change it to check zdinx/zhinx{min} requires it). So 
enabling them when their super subset are enabled will



[PATCH V4] tracing: install trace events file only if necessary

2023-04-07 Thread casantos
From: Carlos Santos 

It is not useful when configuring with --enable-trace-backends=nop.

Signed-off-by: Carlos Santos 
---
Changes v1->v2:
  Install based on chosen trace backend, not on chosen emulators.
Changes v2->v3:
  Add missing comma
Changes v3->v4:
  Fix array comparison:
get_option('trace_backends') != [ 'nop' ]
  not
get_option('trace_backends') != 'nop'
---
 trace/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/trace/meson.build b/trace/meson.build
index 8e80be895c..30b1d942eb 100644
--- a/trace/meson.build
+++ b/trace/meson.build
@@ -64,7 +64,7 @@ trace_events_all = custom_target('trace-events-all',
  input: trace_events_files,
  command: [ 'cat', '@INPUT@' ],
  capture: true,
- install: true,
+ install: get_option('trace_backends') != [ 
'nop' ],
  install_dir: qemu_datadir)
 
 if 'ust' in get_option('trace_backends')
-- 
2.31.1




[PATCH] block/vhdx: fix dynamic VHDX BAT corruption

2023-04-07 Thread Lukas Tschoke
The corruption occurs when a BAT entry aligned to 4096 bytes is changed.

Specifically, the corruption occurs during the creation of the LOG Data
Descriptor. The incorrect behavior involves copying 4088 bytes from the
original 4096 bytes aligned offset to `tmp[8..4096]` and then copying
the new value for the first BAT entry to the beginning `tmp[0..8]`.
This results in all existing BAT entries inside the 4K region being
incorrectly moved by 8 bytes and the last entry being lost.

This bug did not cause noticeable corruption when only sequentially
writing once to an empty dynamic VHDX (e.g.
using `qemu-img convert -O vhdx -o subformat=dynamic ...`), but it
still resulted in invalid values for the (unused) Sector Bitmap BAT
entries.

Importantly, this corruption would only become noticeable after the
corrupted BAT is re-read from the file.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/727
Signed-off-by: Lukas Tschoke 
---
 block/vhdx-log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/vhdx-log.c b/block/vhdx-log.c
index c48cf65d62..38148f107a 100644
--- a/block/vhdx-log.c
+++ b/block/vhdx-log.c
@@ -981,7 +981,7 @@ static int vhdx_log_write(BlockDriverState *bs, 
BDRVVHDXState *s,
 sector_write = merged_sector;
 } else if (i == sectors - 1 && trailing_length) {
 /* partial sector at the end of the buffer */
-ret = bdrv_pread(bs->file, file_offset,
+ret = bdrv_pread(bs->file, file_offset + trailing_length,
  VHDX_LOG_SECTOR_SIZE - trailing_length,
  merged_sector + trailing_length, 0);
 if (ret < 0) {
-- 
2.40.0





[PATCH] target/arm: Fix debugging of ARMv8M Secure code

2023-04-07 Thread pbartell
From: Paul Bartell 

Revert changes to arm_cpu_get_phys_page_attrs_debug made in commit
4a35855682cebb89f9630b07aa9fd37c4e8c733b.

Commit 4a35855682 modifies the arm_cpu_get_phys_page_attrs_debug function
so that it calls get_phys_addr_with_struct rather than get_phys_addr, which
leads to a variety of memory access errors when debugging secure state
code on qemu ARMv8M targets with gdb.

This commit fixes a variety of gdb memory access errors including:
"error reading variable" and "Cannot access memory at address" when
attempting to read any memory address via gdb.

Signed-off-by: Paul Bartell 
---
 target/arm/ptw.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index ec3f51782a..5a1339d38f 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2999,16 +2999,12 @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, 
vaddr addr,
 {
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = >env;
-S1Translate ptw = {
-.in_mmu_idx = arm_mmu_idx(env),
-.in_secure = arm_is_secure(env),
-.in_debug = true,
-};
 GetPhysAddrResult res = {};
 ARMMMUFaultInfo fi = {};
+ARMMMUIdx mmu_idx = arm_mmu_idx(env);
 bool ret;
 
-ret = get_phys_addr_with_struct(env, , addr, MMU_DATA_LOAD, , );
+ret = get_phys_addr(env, addr, MMU_DATA_LOAD, mmu_idx, , );
 *attrs = res.f.attrs;
 
 if (ret) {
-- 
2.37.3




Re: [PATCH 02/10] accel/kvm: Declare kvm_direct_msi_allowed in stubs

2023-04-07 Thread Richard Henderson

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

Avoid when calling kvm_direct_msi_enabled() from
arm_gicv3_its_common.c the next commit:

   Undefined symbols for architecture arm64:
 "_kvm_direct_msi_allowed", referenced from:
 _its_class_name in hw_intc_arm_gicv3_its_common.c.o
   ld: symbol(s) not found for architecture arm64

Signed-off-by: Philippe Mathieu-Daudé
---
  accel/stubs/kvm-stub.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 01/10] sysemu/kvm: Remove unused headers

2023-04-07 Thread Richard Henderson

On 4/5/23 09:04, Philippe Mathieu-Daudé wrote:

All types used are forward-declared in "qemu/typedefs.h".

Signed-off-by: Philippe Mathieu-Daudé
---
  include/sysemu/kvm.h | 3 ---
  1 file changed, 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 2/2] accel/stubs: Build HAX/KVM/XEN stubs once

2023-04-07 Thread Richard Henderson

On 4/5/23 09:13, Philippe Mathieu-Daudé wrote:

+softmmu_ss.add_all(when: ['CONFIG_SOFTMMU'], if_true: sysemu_stubs_ss)


This when is redundant.
You can drop sysemu_stubs_ss and add each stub file directly to softmmu_ss.


r~



Re: [PATCH 1/2] accel/stubs: Remove kvm_flush_coalesced_mmio_buffer() stub

2023-04-07 Thread Richard Henderson

On 4/5/23 09:13, Philippe Mathieu-Daudé wrote:

kvm_flush_coalesced_mmio_buffer() is only called from
qemu_flush_coalesced_mmio_buffer() where it is protected
by a kvm_enabled() check. When KVM is not available, the
call is elided, there is no need for a stub definition.


Reviewed-by: Richard Henderson 


r~



[PATCH V3] tracing: install trace events file only if necessary

2023-04-07 Thread casantos
From: Carlos Santos 

It is not useful when configuring with --enable-trace-backends=nop.

Signed-off-by: Carlos Santos 
---
Changes v1->v2:
  Install based on chosen trace backend, not on chosen emulators.
Changes v2->v3:
  Add missing comma
---
 trace/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/trace/meson.build b/trace/meson.build
index 8e80be895c..833bb57113 100644
--- a/trace/meson.build
+++ b/trace/meson.build
@@ -64,7 +64,7 @@ trace_events_all = custom_target('trace-events-all',
  input: trace_events_files,
  command: [ 'cat', '@INPUT@' ],
  capture: true,
- install: true,
+ install: get_option('trace_backends') != 
'nop',
  install_dir: qemu_datadir)
 
 if 'ust' in get_option('trace_backends')
-- 
2.31.1




Re: [PATCH 14/14] accel: Rename HVF struct hvf_vcpu_state -> struct AccelvCPUState

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

We want all accelerators to share the same opaque pointer in
CPUState.

Rename the 'hvf_vcpu_state' structure as 'AccelvCPUState'.

Use the generic 'accel' field of CPUState instead of 'hvf'.

Replace g_malloc0() by g_new0() for readability.

Signed-off-by: Philippe Mathieu-Daudé
---
  include/hw/core/cpu.h |  3 --
  include/sysemu/hvf_int.h  |  2 +-
  accel/hvf/hvf-accel-ops.c | 16 -
  target/arm/hvf/hvf.c  | 70 +++
  4 files changed, 44 insertions(+), 47 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 13/14] accel: Inline WHPX get_whpx_vcpu()

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

No need for this helper to access the CPUState::accel field.

Signed-off-by: Philippe Mathieu-Daudé
---
  target/i386/whpx/whpx-all.c | 29 ++---
  1 file changed, 10 insertions(+), 19 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 12/14] accel: Rename WHPX struct whpx_vcpu -> struct AccelvCPUState

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

We want all accelerators to share the same opaque pointer in
CPUState. Rename WHPX 'whpx_vcpu' as 'AccelvCPUState'.

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/i386/whpx/whpx-all.c | 30 +++---
  1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index 70eadb7f05..2372c4227a 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -229,7 +229,7 @@ typedef enum WhpxStepMode {
  WHPX_STEP_EXCLUSIVE,
  } WhpxStepMode;
  
-struct whpx_vcpu {

+struct AccelvCPUState {
  WHV_EMULATOR_HANDLE emulator;
  bool window_registered;
  bool interruptable;
@@ -260,9 +260,9 @@ static bool whpx_has_xsave(void)
   * VP support
   */
  
-static struct whpx_vcpu *get_whpx_vcpu(CPUState *cpu)

+static struct AccelvCPUState *get_whpx_vcpu(CPUState *cpu)
  {
-return (struct whpx_vcpu *)cpu->accel;
+return (struct AccelvCPUState *)cpu->accel;


Same comment about removing 'struct'.

Reviewed-by: Richard Henderson 



-vcpu = g_new0(struct whpx_vcpu, 1);
+vcpu = g_new0(struct AccelvCPUState, 1);
  
  if (!vcpu) {

  error_report("WHPX: Failed to allocte VCPU context.");


Hah.  And a "can't happen" error_report, since we're not actually using try 
here.  :-P


r~




Re: [PATCH 11/14] accel: Inline NVMM get_qemu_vcpu()

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

No need for this helper to access the CPUState::accel field.

Signed-off-by: Philippe Mathieu-Daudé
---
  target/i386/nvmm/nvmm-all.c | 28 +++-
  1 file changed, 11 insertions(+), 17 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 10/14] accel: Rename NVMM struct qemu_vcpu -> struct AccelvCPUState

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

-struct qemu_vcpu *qcpu = get_qemu_vcpu(cpu);
+struct AccelvCPUState *qcpu = get_qemu_vcpu(cpu);


With the typedef in hw/core/cpu.h, you can drop the 'struct' at the same time.

Otherwise,
Reviewed-by: Richard Henderson 


-qcpu = g_try_malloc0(sizeof(*qcpu));
+qcpu = g_try_new0(struct AccelvCPUState, 1);


Another 'try' to clean up.  :-)


r~



Re: [PATCH 07/14] accel: Rename struct hax_vcpu_state -> struct AccelvCPUState

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

+struct AccelvCPUState;


Missing typedef?


r~



Re: [PATCH 08/14] accel: Move HAX hThread to accelerator context

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

hThread variable is only used by the HAX accelerator,
so move it to the accelerator specific context.

Signed-off-by: Philippe Mathieu-Daudé
---
  include/hw/core/cpu.h   | 1 -
  target/i386/hax/hax-i386.h  | 3 +++
  target/i386/hax/hax-accel-ops.c | 2 +-
  target/i386/hax/hax-all.c   | 2 +-
  target/i386/hax/hax-windows.c   | 2 +-
  5 files changed, 6 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 07/14] accel: Rename struct hax_vcpu_state -> struct AccelvCPUState

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

We want all accelerators to share the same opaque pointer in
CPUState. Start with the HAX context, renaming its forward
declarated structure 'hax_vcpu_state' as 'AccelvCPUState'.

Signed-off-by: Philippe Mathieu-Daudé
---
  include/hw/core/cpu.h   | 7 +++
  target/i386/hax/hax-i386.h  | 3 ++-
  target/i386/nvmm/nvmm-all.c | 2 +-
  target/i386/whpx/whpx-all.c | 2 +-
  4 files changed, 7 insertions(+), 7 deletions(-)


Can this be squashed with previous?  It seems odd to change the name twice in a 
row.
Is the "v" in AccelvCPUState helpful?


+struct AccelvCPUState *accel;
 /* shared by kvm, hax and hvf */
 bool vcpu_dirty;


Move below the comment?  Or is that later?


r~



Re: [PATCH 05/14] accel: Rename 'hax_vcpu' as 'accel' in CPUState

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

All accelerators will share a single opaque context
in CPUState. Start by renaming 'hax_vcpu' as 'accelCPUState'.


Pasto in 'accel' here.

Reviewed-by: Richard Henderson 


r~



Re: [PATCH 02/14] accel: Remove unused hThread variable on TCG/WHPX

2023-04-07 Thread Richard Henderson

On 4/5/23 03:17, Philippe Mathieu-Daudé wrote:

On Windows hosts, cpu->hThread is assigned but never accessed:
remove it.

Signed-off-by: Philippe Mathieu-Daudé
---
  accel/tcg/tcg-accel-ops-mttcg.c   | 4 
  accel/tcg/tcg-accel-ops-rr.c  | 3 ---
  target/i386/whpx/whpx-accel-ops.c | 3 ---
  3 files changed, 10 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 01/14] accel: Document generic accelerator headers

2023-04-07 Thread Richard Henderson

On 4/5/23 03:17, Philippe Mathieu-Daudé wrote:

These headers are meant to be include by any file to check
the availability of accelerators, thus are not accelerator
specific.

Signed-off-by: Philippe Mathieu-Daudé
---
  include/sysemu/hax.h  | 2 ++
  include/sysemu/kvm.h  | 2 ++
  include/sysemu/nvmm.h | 2 ++
  include/sysemu/tcg.h  | 2 ++
  include/sysemu/whpx.h | 2 ++
  include/sysemu/xen.h  | 2 ++
  6 files changed, 12 insertions(+)


Acked-by: Richard Henderson 

r~



Re: [PATCH 03/14] accel: Fix a leak on Windows HAX

2023-04-07 Thread Richard Henderson

On 4/5/23 03:18, Philippe Mathieu-Daudé wrote:

hThread is only used on the error path in hax_kick_vcpu_thread().

Fixes: b0cb0a66d6 ("Plumb the HAXM-based hardware acceleration support")
Signed-off-by: Philippe Mathieu-Daudé
---
  target/i386/hax/hax-all.c | 3 +++
  1 file changed, 3 insertions(+)


Reviewed-by: Richard Henderson 

r~



Re: [PULL 5/6] edk2: replace build scripts

2023-04-07 Thread Olaf Hering
Mon, 20 Mar 2023 10:38:46 +0100 Gerd Hoffmann :

> Remove Makefile.edk2 and the edk2*.sh scripts and replace them
> with a python script (which already handles fedora rpm builds)
> and a config file for it.

This breaks 'make roms efirom' (in case this happens to be a valid make target).

Olaf


pgpEk5fc3zOXQ.pgp
Description: Digitale Signatur von OpenPGP


[PATCH] Hexagon (target/hexagon) Additional instructions handled by idef-parser

2023-04-07 Thread Taylor Simpson
Currently, idef-parser skips all floating point instructions.  However,
there are some floating point instructions that can be handled.

The following instructions are now parsed
F2_sfimm_p
F2_sfimm_n
F2_dfimm_p
F2_dfimm_n
F2_dfmpyll
F2_dfmpylh

To make these instructions work, we fix some bugs in parser-helpers.c
gen_rvalue_extend
gen_cast_op

Test cases added to tests/tcg/hexagon/fpstuff.c

Signed-off-by: Taylor Simpson 
---
 target/hexagon/idef-parser/parser-helpers.c | 16 +++---
 tests/tcg/hexagon/fpstuff.c | 54 +
 target/hexagon/gen_idef_parser_funcs.py | 10 +++-
 3 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/target/hexagon/idef-parser/parser-helpers.c 
b/target/hexagon/idef-parser/parser-helpers.c
index 18cde6a1be..0b160e6f58 100644
--- a/target/hexagon/idef-parser/parser-helpers.c
+++ b/target/hexagon/idef-parser/parser-helpers.c
@@ -386,13 +386,10 @@ HexValue gen_rvalue_extend(Context *c, YYLTYPE *locp, 
HexValue *rvalue)
 
 if (rvalue->type == IMMEDIATE) {
 HexValue res = gen_imm_qemu_tmp(c, locp, 64, rvalue->signedness);
-bool is_unsigned = (rvalue->signedness == UNSIGNED);
-const char *sign_suffix = is_unsigned ? "u" : "";
 gen_c_int_type(c, locp, 64, rvalue->signedness);
-OUT(c, locp, " ", , " = ");
-OUT(c, locp, "(", sign_suffix, "int64_t) ");
-OUT(c, locp, "(", sign_suffix, "int32_t) ");
-OUT(c, locp, rvalue, ";\n");
+OUT(c, locp, " ", , " = (");
+gen_c_int_type(c, locp, 64, rvalue->signedness);
+OUT(c, locp, ")", rvalue, ";\n");
 return res;
 } else {
 HexValue res = gen_tmp(c, locp, 64, rvalue->signedness);
@@ -963,7 +960,12 @@ HexValue gen_cast_op(Context *c,
 if (src->bit_width == target_width) {
 return *src;
 } else if (src->type == IMMEDIATE) {
-HexValue res = *src;
+HexValue res;
+if (src->bit_width < target_width) {
+res = gen_rvalue_extend(c, locp, src);
+} else {
+res = *src;
+}
 res.bit_width = target_width;
 res.signedness = signedness;
 return res;
diff --git a/tests/tcg/hexagon/fpstuff.c b/tests/tcg/hexagon/fpstuff.c
index 90ce9a6ef3..28f9397155 100644
--- a/tests/tcg/hexagon/fpstuff.c
+++ b/tests/tcg/hexagon/fpstuff.c
@@ -20,6 +20,7 @@
  */
 
 #include 
+#include 
 
 const int FPINVF_BIT = 1; /* Invalid */
 const int FPINVF = 1 << FPINVF_BIT;
@@ -706,6 +707,57 @@ static void check_float2int_convs()
 check_fpstatus(usr, FPINVF);
 }
 
+static void check_float_consts(void)
+{
+int res32;
+unsigned long long res64;
+
+asm("%0 = sfmake(#%1):neg\n\t" : "=r"(res32) : "i"(0xf));
+check32(res32, 0xbc9e);
+
+asm("%0 = sfmake(#%1):pos\n\t" : "=r"(res32) : "i"(0xf));
+check32(res32, 0x3c9e);
+
+asm("%0 = dfmake(#%1):neg\n\t" : "=r"(res64) : "i"(0xf));
+check64(res64, 0xbf93c000ULL);
+
+asm("%0 = dfmake(#%1):pos\n\t" : "=r"(res64) : "i"(0xf));
+check64(res64, 0x3f93c000ULL);
+}
+
+static inline unsigned long long dfmpyll(double x, double y)
+{
+unsigned long long res64;
+asm("%0 = dfmpyll(%1, %2)" : "=r"(res64) : "r"(x), "r"(y));
+return res64;
+}
+
+static inline unsigned long long dfmpylh(double acc, double x, double y)
+{
+unsigned long long res64 = *(unsigned long long *)
+asm("%0 += dfmpylh(%1, %2)" : "+r"(res64) : "r"(x), "r"(y));
+return res64;
+}
+
+static void check_dfmpyxx(void)
+{
+unsigned long long res64;
+
+res64 = dfmpyll(DBL_MIN, DBL_MIN);
+check64(res64, 0ULL);
+res64 = dfmpyll(-1.0, DBL_MIN);
+check64(res64, 0ULL);
+res64 = dfmpyll(DBL_MAX, DBL_MAX);
+check64(res64, 0x1fffdULL);
+
+res64 = dfmpylh(DBL_MIN, DBL_MIN, DBL_MIN);
+check64(res64, 0x10ULL);
+res64 = dfmpylh(-1.0, DBL_MAX, DBL_MIN);
+check64(res64, 0xc00fffe0ULL);
+res64 = dfmpylh(DBL_MAX, 0.0, -1.0);
+check64(res64, 0x7fefULL);
+}
+
 int main()
 {
 check_compare_exception();
@@ -718,6 +770,8 @@ int main()
 check_sffixupd();
 check_sffms();
 check_float2int_convs();
+check_float_consts();
+check_dfmpyxx();
 
 puts(err ? "FAIL" : "PASS");
 return err ? 1 : 0;
diff --git a/target/hexagon/gen_idef_parser_funcs.py 
b/target/hexagon/gen_idef_parser_funcs.py
index 917753d6d8..9bed7ee55e 100644
--- a/target/hexagon/gen_idef_parser_funcs.py
+++ b/target/hexagon/gen_idef_parser_funcs.py
@@ -89,7 +89,15 @@ def main():
 continue
 if ( tag.startswith('V6_') ) :
 continue
-if ( tag.startswith('F') ) :
+if ( tag.startswith('F') and
+ tag not in {
+ 'F2_sfimm_p',
+ 'F2_sfimm_n',
+ 'F2_dfimm_p',
+ 'F2_dfimm_n',
+ 

[PATCH] do not lockdown github PRs submitted to forks of official mirror

2023-04-07 Thread lauren
From: "Lauren N. Liberda" 

qemu forks on github are typically the way of work on changes
to be upstreamed later, such as support for new devices. currently,
the workflow prevents any external contributors from submitting
code changes, and blindly points them to upstream instead.

Signed-off-by: Lauren N. Liberda 
---
 .github/workflows/lockdown.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/workflows/lockdown.yml b/.github/workflows/lockdown.yml
index d5e1265cff..370f1c8f7e 100644
--- a/.github/workflows/lockdown.yml
+++ b/.github/workflows/lockdown.yml
@@ -12,6 +12,7 @@ permissions:
 jobs:
   action:
 runs-on: ubuntu-latest
+if: github.repository == 'qemu/qemu'
 steps:
   - uses: dessant/repo-lockdown@v2
 with:
-- 
2.40.0




[PATCH] Hexagon (target/hexagon) Remove unused slot variable in helpers

2023-04-07 Thread Taylor Simpson
The slot variable in helpers was only passed to log_reg_write function
where the argument is unused.
- Remove declaration from generated helper functions
- Remove slot argument from log_reg_write

Signed-off-by: Taylor Simpson 
---
 target/hexagon/macros.h| 2 +-
 target/hexagon/op_helper.h | 2 +-
 target/hexagon/op_helper.c | 2 +-
 target/hexagon/gen_helper_funcs.py | 2 --
 4 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 482a9c787f..b978fd1840 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -28,7 +28,7 @@
 #define READ_REG(NUM)(env->gpr[(NUM)])
 #define READ_PREG(NUM)   (env->pred[NUM])
 
-#define WRITE_RREG(NUM, VAL) log_reg_write(env, NUM, VAL, slot)
+#define WRITE_RREG(NUM, VAL) log_reg_write(env, NUM, VAL)
 #define WRITE_PREG(NUM, VAL) log_pred_write(env, NUM, VAL)
 #endif
 
diff --git a/target/hexagon/op_helper.h b/target/hexagon/op_helper.h
index 34b3a53975..db22b54401 100644
--- a/target/hexagon/op_helper.h
+++ b/target/hexagon/op_helper.h
@@ -27,7 +27,7 @@ uint32_t mem_load4(CPUHexagonState *env, uint32_t slot, 
target_ulong vaddr);
 uint64_t mem_load8(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 
 void log_reg_write(CPUHexagonState *env, int rnum,
-   target_ulong val, uint32_t slot);
+   target_ulong val);
 void log_store64(CPUHexagonState *env, target_ulong addr,
  int64_t val, int width, int slot);
 void log_store32(CPUHexagonState *env, target_ulong addr,
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index c9a156030e..63a5b9b202 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -53,7 +53,7 @@ G_NORETURN void HELPER(raise_exception)(CPUHexagonState *env, 
uint32_t excp)
 }
 
 void log_reg_write(CPUHexagonState *env, int rnum,
-   target_ulong val, uint32_t slot)
+   target_ulong val)
 {
 HEX_DEBUG_LOG("log_reg_write[%d] = " TARGET_FMT_ld " (0x" TARGET_FMT_lx 
")",
   rnum, val, val);
diff --git a/target/hexagon/gen_helper_funcs.py 
b/target/hexagon/gen_helper_funcs.py
index 7a224b66e6..6edd82c423 100755
--- a/target/hexagon/gen_helper_funcs.py
+++ b/target/hexagon/gen_helper_funcs.py
@@ -265,8 +265,6 @@ def gen_helper_function(f, tag, tagregs, tagimms):
 if i > 0: f.write(", ")
 f.write("uint32_t part1")
 f.write(")\n{\n")
-if (not hex_common.need_slot(tag)):
-f.write("uint32_t slot __attribute__((unused)) = 4;\n" )
 if hex_common.need_ea(tag): gen_decl_ea(f)
 ## Declare the return variable
 i=0
-- 
2.25.1



Re: [PATCH 8/8] block, block-backend: write some hot coroutine wrappers by hand

2023-04-07 Thread Paolo Bonzini
Il ven 7 apr 2023, 22:04 Eric Blake  ha scritto:

> On Fri, Apr 07, 2023 at 05:33:03PM +0200, Paolo Bonzini wrote:
> > The introduction of the graph lock is causing blk_get_geometry, a hot
> function
> > used in the I/O path, to create a coroutine.  However, the only part
> that really
> > needs to run in coroutine context is the call to
> bdrv_co_refresh_total_sectors,
> > which in turn only happens in the rare case of host CD-ROM devices.
> >
> > So, write by hand the three wrappers on the path from
> blk_co_get_geometry to
> > bdrv_co_refresh_total_sectors, so that the coroutine wrapper is only
> created
> > if bdrv_nb_sectors actually calls bdrv_refresh_total_sectors.
> >
> > Reported-by: Stefan Hajnoczi 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  block.c   | 22 ++
> >  block/block-backend.c | 27 +++
> >
> >  include/sysemu/block-backend-io.h |  5 ++---
> >  4 files changed, 52 insertions(+), 4 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index dbbc8de30c24..3390efd18cf6 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -5859,6 +5859,28 @@ int64_t coroutine_fn
> bdrv_co_nb_sectors(BlockDriverState *bs)
> >  return bs->total_sectors;
> >  }
> >
> > +/*
> > + * This wrapper is written by hand because this function is in the hot
> I/O path,
> > + * via blk_get_geometry.
> > + */
> > +int64_t coroutine_mixed_fn bdrv_nb_sectors(BlockDriverState *bs)
> > +{
> > +BlockDriver *drv = bs->drv;
> > +IO_CODE();
> > +
> > +if (!drv)
> > +return -ENOMEDIUM;
> > +
> > +if (!bs->bl.has_variable_length) {
> > +int ret = bdrv_refresh_total_sectors(bs, bs->total_sectors);
>
> Is this logic backwards?  Why are we only refreshing the total sectors
> when we don't have variable length?
>

Yes, it is backwards.

Paolo


> > +if (ret < 0) {
> > +return ret;
> > +}
> > +}
> > +
> > +return bs->total_sectors;
> > +}
> > +
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>
>


Re: [PATCH 8/8] block, block-backend: write some hot coroutine wrappers by hand

2023-04-07 Thread Eric Blake
On Fri, Apr 07, 2023 at 05:33:03PM +0200, Paolo Bonzini wrote:
> The introduction of the graph lock is causing blk_get_geometry, a hot function
> used in the I/O path, to create a coroutine.  However, the only part that 
> really
> needs to run in coroutine context is the call to 
> bdrv_co_refresh_total_sectors,
> which in turn only happens in the rare case of host CD-ROM devices.
> 
> So, write by hand the three wrappers on the path from blk_co_get_geometry to
> bdrv_co_refresh_total_sectors, so that the coroutine wrapper is only created
> if bdrv_nb_sectors actually calls bdrv_refresh_total_sectors.
> 
> Reported-by: Stefan Hajnoczi 
> Signed-off-by: Paolo Bonzini 
> ---
>  block.c   | 22 ++
>  block/block-backend.c | 27 +++
>  
>  include/sysemu/block-backend-io.h |  5 ++---
>  4 files changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index dbbc8de30c24..3390efd18cf6 100644
> --- a/block.c
> +++ b/block.c
> @@ -5859,6 +5859,28 @@ int64_t coroutine_fn 
> bdrv_co_nb_sectors(BlockDriverState *bs)
>  return bs->total_sectors;
>  }
>  
> +/*
> + * This wrapper is written by hand because this function is in the hot I/O 
> path,
> + * via blk_get_geometry.
> + */
> +int64_t coroutine_mixed_fn bdrv_nb_sectors(BlockDriverState *bs)
> +{
> +BlockDriver *drv = bs->drv;
> +IO_CODE();
> +
> +if (!drv)
> +return -ENOMEDIUM;
> +
> +if (!bs->bl.has_variable_length) {
> +int ret = bdrv_refresh_total_sectors(bs, bs->total_sectors);

Is this logic backwards?  Why are we only refreshing the total sectors
when we don't have variable length?

> +if (ret < 0) {
> +return ret;
> +}
> +}
> +
> +return bs->total_sectors;
> +}
> +

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 7/8] block-backend: ignore inserted state in blk_co_nb_sectors

2023-04-07 Thread Eric Blake
On Fri, Apr 07, 2023 at 05:33:02PM +0200, Paolo Bonzini wrote:
> All callers of blk_co_nb_sectors (and blk_nb_sectors) are able to
> handle a non-inserted CD-ROM as a zero-length file, they do not need
> to raise an error.
> 
> Not using blk_co_is_available() aligns the function with
> blk_co_get_geometry(), which becomes a simple wrapper for
> blk_co_nb_sectors().  It will also make it possible to skip the creation
> of a coroutine in the (common) case where bs->bl.has_variable_length
> is false.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/block-backend.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 6/8] block-backend: inline bdrv_co_get_geometry

2023-04-07 Thread Eric Blake
On Fri, Apr 07, 2023 at 05:33:01PM +0200, Paolo Bonzini wrote:
> bdrv_co_get_geometry is only used in blk_co_get_geometry.  Inline it in
> there, to reduce the number of wrappers for bs->total_sectors.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block.c  | 10 --
>  block/block-backend.c|  8 ++--
>  include/block/block-io.h |  3 ---
>  3 files changed, 6 insertions(+), 15 deletions(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/block.c b/block.c
> index 9de50ac7c811..dbbc8de30c24 100644
> --- a/block.c
> +++ b/block.c
> @@ -5879,16 +5879,6 @@ int64_t coroutine_fn 
> bdrv_co_getlength(BlockDriverState *bs)
>  return ret * BDRV_SECTOR_SIZE;
>  }
>  
> -/* return 0 as number of sectors if no device present or error */
> -void coroutine_fn bdrv_co_get_geometry(BlockDriverState *bs,
> -   uint64_t *nb_sectors_ptr)
> -{
> -int64_t nb_sectors = bdrv_co_nb_sectors(bs);
> -IO_CODE();

Pre-patch, we called bdrv_co_nb_sectors() before the IO_CODE guard...

> +/* return 0 as number of sectors if no device present or error */
>  void coroutine_fn blk_co_get_geometry(BlockBackend *blk,
>uint64_t *nb_sectors_ptr)
>  {
> +BlockDriverState *bs = blk_bs(blk);
> +
>  IO_CODE();
>  GRAPH_RDLOCK_GUARD();
>  
> -if (!blk_bs(blk)) {
> +if (!bs) {
>  *nb_sectors_ptr = 0;
>  } else {
> -bdrv_co_get_geometry(blk_bs(blk), nb_sectors_ptr);
> +int64_t nb_sectors = bdrv_co_nb_sectors(bs);

...post-patch the order swaps.  That actually feels better to me, (the
guard is supposed to do sanity checks to detect coding bugs at the
soonest possible moment; if we have a bug, doing the work and only
later failing the check is not as safe as failing fast) - but probably
no impact to correctly written code.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 5/8] migration/block: replace uses of blk_nb_sectors that do not check result

2023-04-07 Thread Eric Blake
On Fri, Apr 07, 2023 at 05:33:00PM +0200, Paolo Bonzini wrote:
> Uses of blk_nb_sectors must check whether the result is negative.
> Otherwise, underflow can happen.  Fortunately, alloc_aio_bitmap()
> and bmds_aio_inflight() both have an alternative way to retrieve the
> number of sectors in the file.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  migration/block.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 4/8] block: remove has_variable_length from BlockDriver

2023-04-07 Thread Eric Blake
On Fri, Apr 07, 2023 at 05:32:59PM +0200, Paolo Bonzini wrote:
> Fill in the field in BlockLimits directly for host devices, and
> copy it from there for the raw format.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/file-posix.c   | 12 
>  block/file-win32.c   |  2 +-
>  block/io.c   |  2 --
>  block/raw-format.c   |  3 ++-
>  include/block/block_int-common.h |  2 --
>  5 files changed, 11 insertions(+), 10 deletions(-)

The change makes sense to me.  I'm having a slight doubt on whether it
might cause any regression in the bigger picture where a regular host
file exposed as a guest image grew outside of qemu's control but where
qemu used to see the new size automatically but now won't see it until
an explicit QMP action.  But I suspect that since we already do have a
QMP size for telling qemu to do resizes itself, or to at least refresh
its notion of size (for that very case of a third-party adding more
storage and telling qemu it is now safe to use that extra space), the
explicit QMP interaction is probably sufficient, and that any such
corner-case regression I'm worrying about is not a problem in reality.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




  1   2   >