Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers

2021-01-15 Thread Cupertino Miranda
Ok, enter and leave will officially get to be TCG code.
To be honest initially we thought that helper code would be preferable 
to TCG one. Apparently we were wrong. :-)

Thanks for your quick feedback.

On 1/15/21 9:53 PM, Richard Henderson wrote:
> On 1/15/21 11:48 AM, Cupertino Miranda wrote:
>>> In the case of enter or leave, this is one load/store plus one addition,
>>> followed by a branch.  All of which is encoded as fields in the instruction.
>>> Extremely simple.
>>
>> So your recommendation is leave the conditional exception triggering of
>> enter and leave in a helper and move the loads/stores to tcg ?
> 
> What?  No.
> 
> 
> r~
> 
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers

2021-01-15 Thread Cupertino Miranda



On 1/15/21 8:31 PM, Richard Henderson wrote:
> On 1/15/21 7:11 AM, Cupertino Miranda wrote:
>>> Similarly.  I think that both of these could be implemented entirely in
>>> translate, which is what
>>>
 +bool restore_fp= u7 & 0x10; /* u[4] indicates if fp must be saved 
  */
 +bool restore_blink = u7 & 0x20; /* u[5] indicates saving of blink 
  */
 +bool jump_to_blink = u7 & 0x40; /* u[6] should we jump to blink?  
  */
>>>
>>> these bits strongly imply.
>>>
>>
>> For lack of knowing better, it is unclear to me where to draw the line
>> when choosing between a translate time (tcg) or helper implementation.
>> Your suggestions for carry/overflow computation are sharp and we should
>> have never used an helper, however I wonder what would be the benefit of
>> implementing enter and leave through TCG.
>>
>> We have dealt with those exception issues by just changing SP in the end
>> of the instruction implementation, when no exceptions can happen.
> 
> 5-10 tcg opcodes is the rule of thumb.  A conditional exception (requiring a
> branch) is a good reason to put the whole thing out of line.
> 
> In the case of enter or leave, this is one load/store plus one addition,
> followed by a branch.  All of which is encoded as fields in the instruction.
> Extremely simple.

So your recommendation is leave the conditional exception triggering of 
enter and leave in a helper and move the loads/stores to tcg ?

> 
>> As far as I understand when an exception happens in the middle of the
>> helper or even on a TCG implementation, it jumps out of that TB
>> execution to deal with the exception. On rtie instead of it returning to
>> the same tcg_ld or tcg_st where it actually triggered the exception it
>> will re-decode the same instruction which triggered the exception, and
>> re-attempts to execute it.
>> Is that the case in current TCG implementation, or did it improved and
>> it is now able to return to previous execution flow (i.e translation
>> block) ?
> 
> I think I don't understand your question.
> 
> An exception leaves the TB, via longjmp.  Before the longjmp, there is 
> normally
> an "unwind" or "restore" operation, to sync the cpu state with the middle of
> the TB.  This happens in restore_state_to_opc().
> 
> When processing of the exception is complete, execution will continue with the
> appropriate cpu state.  Which will probably be a new TB that (logically)
> partially overlaps the previous TB.
> 
> I.e. everything will work as you'd expect.
> 
> So... what's the question?
> 
You answered the question. That is exactly how I understand it works.

> 
> r~
> 
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers

2021-01-15 Thread Richard Henderson
On 1/15/21 7:11 AM, Cupertino Miranda wrote:
>> Similarly.  I think that both of these could be implemented entirely in
>> translate, which is what
>>
>>> +bool restore_fp= u7 & 0x10; /* u[4] indicates if fp must be saved  
>>> */
>>> +bool restore_blink = u7 & 0x20; /* u[5] indicates saving of blink  
>>> */
>>> +bool jump_to_blink = u7 & 0x40; /* u[6] should we jump to blink?   
>>> */
>>
>> these bits strongly imply.
>>
> 
> For lack of knowing better, it is unclear to me where to draw the line 
> when choosing between a translate time (tcg) or helper implementation.
> Your suggestions for carry/overflow computation are sharp and we should 
> have never used an helper, however I wonder what would be the benefit of 
> implementing enter and leave through TCG.
> 
> We have dealt with those exception issues by just changing SP in the end 
> of the instruction implementation, when no exceptions can happen.

5-10 tcg opcodes is the rule of thumb.  A conditional exception (requiring a
branch) is a good reason to put the whole thing out of line.

In the case of enter or leave, this is one load/store plus one addition,
followed by a branch.  All of which is encoded as fields in the instruction.
Extremely simple.

> As far as I understand when an exception happens in the middle of the 
> helper or even on a TCG implementation, it jumps out of that TB 
> execution to deal with the exception. On rtie instead of it returning to 
> the same tcg_ld or tcg_st where it actually triggered the exception it 
> will re-decode the same instruction which triggered the exception, and 
> re-attempts to execute it.
> Is that the case in current TCG implementation, or did it improved and 
> it is now able to return to previous execution flow (i.e translation 
> block) ?

I think I don't understand your question.

An exception leaves the TB, via longjmp.  Before the longjmp, there is normally
an "unwind" or "restore" operation, to sync the cpu state with the middle of
the TB.  This happens in restore_state_to_opc().

When processing of the exception is complete, execution will continue with the
appropriate cpu state.  Which will probably be a new TB that (logically)
partially overlaps the previous TB.

I.e. everything will work as you'd expect.

So... what's the question?


r~


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers

2021-01-15 Thread Cupertino Miranda
>> +void QEMU_NORETURN helper_halt(CPUARCState *env, uint32_t npc)
>> +{
>> +CPUState *cs = env_cpu(env);
>> +if (env->stat.Uf) {
>> +cs->exception_index = EXCP_PRIVILEGEV;
>> +env->causecode = 0;
>> +env->param = 0;
>> + /* Restore PC such that we point at the faulty instruction.  */
>> +env->eret = env->pc;
> 
> Any reason not to handle Uf at translate time?  Or at least create a single
> helper function for that here.  But it seems like translate will have to do a
> lot of priv checking anyway and will already have that handy.

Since we needed a helper anyway to deal with causecode and param, we 
thought it would be reasonable to do all in the helper.
We did not made a TCG access for causecode and param enviroment values.

> 
>> +void helper_enter(CPUARCState *env, uint32_t u6)
>> +{
>> +/* nothing to do? then bye-bye! */
>> +if (!u6) {
>> +return;
>> +}
>> +
>> +uint8_t regs   = u6 & 0x0f; /* u[3:0] determines registers to save 
>> */
>> +boolsave_fp= u6 & 0x10; /* u[4] indicates if fp must be saved  
>> */
>> +boolsave_blink = u6 & 0x20; /* u[5] indicates saving of blink  
>> */
>> +uint8_t stack_size = 4 * (regs + save_fp + save_blink);
>> +
>> +/* number of regs to be saved must be sane */
>> +check_enter_leave_nr_regs(env, regs, GETPC());
> 
> Both of these checks could be translate time.
> 
>> +/* this cannot be executed in a delay/execution slot */
>> +check_delay_or_execution_slot(env, GETPC());
> 
> As could this.
> 
>> +/* stack must be a multiple of 4 (32 bit aligned) */
>> +check_addr_is_word_aligned(env, CPU_SP(env) - stack_size, GETPC());
>> +
>> +uint32_t tmp_sp = CPU_SP(env);
>> +
>> +if (save_fp) {
>> +tmp_sp -= 4;
>> +cpu_stl_data(env, tmp_sp, CPU_FP(env));
>> +}
> 
> And what if these stores raise an exception?  I doubt you're going to get an
> exception at the correct pc.
> 
>> +void helper_leave(CPUARCState *env, uint32_t u7)
> 
> Similarly.  I think that both of these could be implemented entirely in
> translate, which is what
> 
>> +bool restore_fp= u7 & 0x10; /* u[4] indicates if fp must be saved  
>> */
>> +bool restore_blink = u7 & 0x20; /* u[5] indicates saving of blink  
>> */
>> +bool jump_to_blink = u7 & 0x40; /* u[6] should we jump to blink?   
>> */
> 
> these bits strongly imply.
> 

For lack of knowing better, it is unclear to me where to draw the line 
when choosing between a translate time (tcg) or helper implementation.
Your suggestions for carry/overflow computation are sharp and we should 
have never used an helper, however I wonder what would be the benefit of 
implementing enter and leave through TCG.

We have dealt with those exception issues by just changing SP in the end 
of the instruction implementation, when no exceptions can happen.

As far as I understand when an exception happens in the middle of the 
helper or even on a TCG implementation, it jumps out of that TB 
execution to deal with the exception. On rtie instead of it returning to 
the same tcg_ld or tcg_st where it actually triggered the exception it 
will re-decode the same instruction which triggered the exception, and 
re-attempts to execute it.
Is that the case in current TCG implementation, or did it improved and 
it is now able to return to previous execution flow (i.e translation 
block) ?
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 04/15] arc: TCG and decoder glue code and helpers

2020-12-01 Thread Richard Henderson
On 11/11/20 10:17 AM, cupertinomira...@gmail.com wrote:
> From: Cupertino Miranda 
> 
> Signed-off-by: Cupertino Miranda 
> ---
>  target/arc/extra_mapping.def   |  40 ++
>  target/arc/helper.c| 293 +
>  target/arc/helper.h|  46 ++
>  target/arc/op_helper.c | 749 +
>  target/arc/semfunc_mapping.def | 329 +++
>  5 files changed, 1457 insertions(+)
>  create mode 100644 target/arc/extra_mapping.def
>  create mode 100644 target/arc/helper.c
>  create mode 100644 target/arc/helper.h
>  create mode 100644 target/arc/op_helper.c
>  create mode 100644 target/arc/semfunc_mapping.def

Not an ideal patch split, since nothing uses the def files at this point.  But
looking forward to the end product, they seem to be exactly what I was talking
about re string manipulation vs patch 3.

In particular, arc_map_opcode should be done at qemu build/compile time.  And
not for nothing, a linear search through strings during translation is really
beyond the pale.

> +#if defined(CONFIG_USER_ONLY)
> +
> +void arc_cpu_do_interrupt(CPUState *cs)
> +{
> +ARCCPU *cpu = ARC_CPU(cs);
> +CPUARCState *env = >env;
> +
> +cs->exception_index = -1;
> +CPU_ILINK(env) = env->pc;
> +}

There are no user-only interrupts.

> +/*
> + * NOTE: Special LP_END exception. Immediatelly return code execution to

Immediately.

> +/* 15. The PC is set with the appropriate exception vector. */
> +env->pc = cpu_ldl_code(env, env->intvec + offset);
> +CPU_PCL(env) = env->pc & 0xfffe;

You should be using address_space_ldl, and handling any error.  As it is, if
this load fails you'll get another interrupt, bringing you right back here, etc.

> +DEF_HELPER_1(debug, void, env)
> +DEF_HELPER_2(norm, i32, env, i32)
> +DEF_HELPER_2(normh, i32, env, i32)
> +DEF_HELPER_2(ffs, i32, env, i32)
> +DEF_HELPER_2(fls, i32, env, i32)
> +DEF_HELPER_2(lr, tl, env, i32)
> +DEF_HELPER_3(sr, void, env, i32, i32)
> +DEF_HELPER_2(halt, noreturn, env, i32)
> +DEF_HELPER_1(rtie, void, env)
> +DEF_HELPER_1(flush, void, env)
> +DEF_HELPER_4(raise_exception, noreturn, env, i32, i32, i32)
> +DEF_HELPER_2(zol_verify, void, env, i32)
> +DEF_HELPER_2(fake_exception, void, env, i32)
> +DEF_HELPER_2(set_status32, void, env, i32)
> +DEF_HELPER_1(get_status32, i32, env)
> +DEF_HELPER_3(carry_add_flag, i32, i32, i32, i32)
> +DEF_HELPER_3(overflow_add_flag, i32, i32, i32, i32)
> +DEF_HELPER_3(overflow_sub_flag, i32, i32, i32, i32)
> +
> +DEF_HELPER_2(enter, void, env, i32)
> +DEF_HELPER_2(leave, void, env, i32)
> +
> +DEF_HELPER_3(mpymu, i32, env, i32, i32)
> +DEF_HELPER_3(mpym, i32, env, i32, i32)
> +
> +DEF_HELPER_3(repl_mask, i32, i32, i32, i32)

Use DEF_HELPER_FLAGS_* when possible.

> +target_ulong helper_norm(CPUARCState *env, uint32_t src1)
> +{
> +int i;
> +int32_t tmp = (int32_t) src1;
> +if (tmp == 0 || tmp == -1) {
> +return 0;
> +}
> +for (i = 0; i <= 31; i++) {
> +if ((tmp >> i) == 0) {
> +break;
> +}
> +if ((tmp >> i) == -1) {
> +break;
> +}
> +}
> +return i;
> +}

The spec says 0/-1 -> 0x1f, not 0.

That said, this appears to be clrsb32(src1), which could also be expanded
inline with two uses of tcg_gen_clz_i32.

> +target_ulong helper_normh(CPUARCState *env, uint32_t src1)
> +{
> +int i;
> +for (i = 0; i <= 15; i++) {
> +if (src1 >> i == 0) {
> +break;
> +}
> +if (src1 >> i == -1) {
> +break;
> +}
> +}
> +return i;
> +}

Similarly.


> +
> +target_ulong helper_ffs(CPUARCState *env, uint32_t src)
> +{
> +int i;
> +if (src == 0) {
> +return 31;
> +}
> +for (i = 0; i <= 31; i++) {
> +if (((src >> i) & 1) != 0) {
> +break;
> +}
> +}
> +return i;
> +}

This should use tcg_gen_ctz_i32.

> +target_ulong helper_fls(CPUARCState *env, uint32_t src)
> +{
> +int i;
> +if (src == 0) {
> +return 0;
> +}
> +
> +for (i = 31; i >= 0; i--) {
> +if (((src >> i) & 1) != 0) {
> +break;
> +}
> +}
> +return i;
> +}

This should use tcg_gen_clz_i32.

> +
> +static void report_aux_reg_error(uint32_t aux)
> +{
> +if (((aux >= ARC_BCR1_START) && (aux <= ARC_BCR1_END)) ||
> +((aux >= ARC_BCR2_START) && (aux <= ARC_BCR2_END))) {
> +qemu_log_mask(LOG_UNIMP, "Undefined BCR 0x%03x\n", aux);
> +}
> +
> +error_report("Undefined AUX register 0x%03x, aborting", aux);
> +exit(EXIT_FAILURE);
> +}

Do not exit on failure, or error_report for that matter -- raise an exception.
 Or...

> +void helper_sr(CPUARCState *env, uint32_t val, uint32_t aux)
> +{
> +struct arc_aux_reg_detail *aux_reg_detail =
> +arc_aux_reg_struct_for_address(aux, ARC_OPCODE_ARCv2HS);
> +
> +if (aux_reg_detail == NULL) {
> +report_aux_reg_error(aux);
> +}

... on the off-chance the 

[PATCH 04/15] arc: TCG and decoder glue code and helpers

2020-11-11 Thread cupertinomiranda
From: Cupertino Miranda 

Signed-off-by: Cupertino Miranda 
---
 target/arc/extra_mapping.def   |  40 ++
 target/arc/helper.c| 293 +
 target/arc/helper.h|  46 ++
 target/arc/op_helper.c | 749 +
 target/arc/semfunc_mapping.def | 329 +++
 5 files changed, 1457 insertions(+)
 create mode 100644 target/arc/extra_mapping.def
 create mode 100644 target/arc/helper.c
 create mode 100644 target/arc/helper.h
 create mode 100644 target/arc/op_helper.c
 create mode 100644 target/arc/semfunc_mapping.def

diff --git a/target/arc/extra_mapping.def b/target/arc/extra_mapping.def
new file mode 100644
index 00..1387d7d483
--- /dev/null
+++ b/target/arc/extra_mapping.def
@@ -0,0 +1,40 @@
+/*
+ * QEMU ARC EXTRA MAPPING
+ *
+ * Copyright (c) 2020 Synopsys Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * http://www.gnu.org/licenses/lgpl-2.1.html
+ */
+
+SEMANTIC_FUNCTION(SWI, 1)
+SEMANTIC_FUNCTION(SWI, 1)
+SEMANTIC_FUNCTION(UNIMP, 0)
+SEMANTIC_FUNCTION(RTIE, 0)
+SEMANTIC_FUNCTION(SLEEP, 1)
+
+MAPPING(swi, SWI, 0)
+CONSTANT(SWI, swi_s, 0, 0)
+MAPPING(swi_s, SWI, 1, 0)
+MAPPING(trap_s, TRAP, 1, 0)
+MAPPING(rtie, RTIE, 0)
+MAPPING(sleep, SLEEP, 1, 0)
+MAPPING(vadd2, VADD, 3, 0, 1, 2)
+MAPPING(vadd2h, VADD, 3, 0, 1, 2)
+MAPPING(vadd4h, VADD, 3, 0, 1, 2)
+MAPPING(vsub2, VSUB, 3, 0, 1, 2)
+MAPPING(vsub2h, VSUB, 3, 0, 1, 2)
+MAPPING(vsub4h, VSUB, 3, 0, 1, 2)
+MAPPING(mpyd, MPYD, 3, 0, 1, 2)
+MAPPING(mpydu, MPYD, 3, 0, 1, 2)
diff --git a/target/arc/helper.c b/target/arc/helper.c
new file mode 100644
index 00..aca7152ef8
--- /dev/null
+++ b/target/arc/helper.c
@@ -0,0 +1,293 @@
+/*
+ * QEMU ARC CPU
+ *
+ * Copyright (c) 2020 Synppsys Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * http://www.gnu.org/licenses/lgpl-2.1.html
+ */
+
+#include "qemu/osdep.h"
+
+#include "cpu.h"
+#include "hw/irq.h"
+#include "include/hw/sysbus.h"
+#include "include/sysemu/sysemu.h"
+#include "qemu/qemu-print.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "qemu/host-utils.h"
+#include "exec/helper-proto.h"
+#include "irq.h"
+
+#if defined(CONFIG_USER_ONLY)
+
+void arc_cpu_do_interrupt(CPUState *cs)
+{
+ARCCPU *cpu = ARC_CPU(cs);
+CPUARCState *env = >env;
+
+cs->exception_index = -1;
+CPU_ILINK(env) = env->pc;
+}
+
+#else /* !CONFIG_USER_ONLY */
+
+void arc_cpu_do_interrupt(CPUState *cs)
+{
+ARCCPU  *cpu= ARC_CPU(cs);
+CPUARCState *env= >env;
+uint32_t offset = 0;
+uint32_t vectno;
+const char  *name;
+
+/*
+ * NOTE: Special LP_END exception. Immediatelly return code execution to
+ * lp_start.
+ * Now also used for delayslot MissI cases.
+ * This special exception should not execute any of the exception
+ * handling code. Instead it returns immediately after setting PC to the
+ * address passed as exception parameter.
+ */
+if (cs->exception_index == EXCP_LPEND_REACHED
+|| cs->exception_index == EXCP_FAKE) {
+env->pc = env->param;
+CPU_PCL(env) = env->pc & 0xfffe;
+return;
+}
+
+/* If we take an exception within an exception => fatal Machine Check. */
+if (env->stat.AEf == 1) {
+cs->exception_index = EXCP_MACHINE_CHECK;
+env->causecode = 0;
+env->param = 0;
+env->mmu.enabled = false; /* no more MMU */
+env->mpu.enabled = false; /* no more MPU */
+}
+vectno = cs->exception_index & 0x0F;
+offset = vectno << 2;
+
+/* Generic computation for exceptions. */
+switch (cs->exception_index) {
+case EXCP_RESET:
+name = "Reset";
+break;
+case EXCP_MEMORY_ERROR:
+name = "Memory Error";
+break;
+case