Re: [PATCH 05/15] arc: TCG instruction generator and hand-definitions

2021-01-15 Thread Richard Henderson
On 1/15/21 7:11 AM, Cupertino Miranda wrote:
>> On 11/11/20 10:17 AM, cupertinomira...@gmail.com wrote:
>>> +/*
>>> + * The macro to add boiler plate code for conditional execution.
>>> + * It will add tcg_gen codes only if there is a condition to
>>> + * be checked (ctx->insn.cc != 0). This macro assumes that there
>>> + * is a "ctx" variable of type "DisasCtxt *" in context. Remember
>>> + * to pair it with CC_EPILOGUE macro.
>>> + */
>>> +#define CC_PROLOGUE   \
>>> +  TCGv cc = tcg_temp_local_new(); \
>>> +  TCGLabel *done = gen_new_label();   \
>>> +  do {\
>>> +if (ctx->insn.cc) {   \
>>> +arc_gen_verifyCCFlag(ctx, cc);\
>>> +tcg_gen_brcondi_tl(TCG_COND_NE, cc, 1, done); \
>>> +} \
>>> +  } while (0)
>>> +
>>> +/*
>>> + * The finishing counter part of CC_PROLUGE. This is supposed
>>> + * to be put at the end of the function using it.
>>> + */
>>> +#define CC_EPILOGUE  \
>>> +if (ctx->insn.cc) {  \
>>> +gen_set_label(done); \
>>> +}\
>>> +tcg_temp_free(cc)
>>
>> Why would this need to be boiler-plate?  Why would this not simply exist in
>> exactly one location?
>>
>> You don't need a tcg_temp_local_new, because the cc value is not used past 
>> the
>> branch.  You should free the temp immediately after the branch.
>>
> 
> I wonder if what you thought was to move those macros to functions and 
> call it when CC_PROLOGUE and CC_EPILOGUE are used.
> I think the macros were choosen due to the sharing of the 'cc' and 
> 'done' variables in both CC_PROLOGUE AND CC_EPILOGUE.

I meant that the checking of ctx->insn.cc could be done at a higher level, so
that this code existed in exactly one place, not scattered between all of the
different instructions.

But if that isn't possible for some reason, you can certainly put "done" into
the DisasContext so that you can have to functions cc_prologue(ctx) and
cc_epilogue(ctx).


>>> +void gen_goto_tb(DisasContext *ctx, int n, TCGv dest)
>>> +{
>>> +tcg_gen_mov_tl(cpu_pc, dest);
>>> +tcg_gen_andi_tl(cpu_pcl, dest, 0xfffc);
>>> +if (ctx->base.singlestep_enabled) {
>>> +gen_helper_debug(cpu_env);
>>> +}
>>> +tcg_gen_exit_tb(NULL, 0);
>>
>> Missing else.  This is dead code for single-step.
> Goes a little above my knowledge of QEMU internals to be honest.
> Do you have a recommendation what we should be doing here ?

Both of these actions end the TB, so:

  if (ctx->base.singlestep_enabled) {
gen_helper_debug(cpu_env);
  } else {
tcg_gen_exit_tb(NULL, 0);
  }

>> You could put all of these into a const static table.
> What do you mean, can we make the effect of tcg_global_mem_new_i32 as 
> constant ?

No, I mean all of the data that is passed to tcg_global_mem_new.  See for
instance target/sparc/translate.c, sparc_tcg_init().


>>> +static void init_constants(void)
>>> +{
>>> +#define SEMANTIC_FUNCTION(...)
>>> +#define MAPPING(...)
>>> +#define CONSTANT(NAME, MNEMONIC, OP_NUM, VALUE) \
>>> +  add_constant_operand(MAP_##MNEMONIC##_##NAME, OP_NUM, VALUE);
>>> +#include "target/arc/semfunc_mapping.def"
>>> +#include "target/arc/extra_mapping.def"
>>> +#undef MAPPING
>>> +#undef CONSTANT
>>> +#undef SEMANTIC_FUNCTION
>>> +}
>>
>> Ew.  Yet another thing that can be done at build time.
> As far as I remember it, there was no way I could generate this table 
> using the C pre-processor. Do you suggest to make this table using an 
> external tool ?

I assumed that you could just as easily generate a table using the c
preprocessor as this function.  I guess I'd like to know more about why you
can't...

>>> +int32_t limm = operand.value;
>>> +if (operand.type & ARC_OPERAND_LIMM) {
>>> +limm = ctx->insn.limm;
>>> +tcg_gen_movi_tl(cpu_limm, limm);
>>> +ret = cpu_r[62];
>>> +} else {
>>> +ret = tcg_const_local_i32(limm);
>>> +}
>>> +}
>>> +}
>>> +
>>> +  return ret;
>>
>> Why are you using locals for everything?  Is it because you have no proper
>> control over your use of branching?
> 
> Initially we though locals the good way to define temporaries. :-(
> What should be the best ? We will need to change a lot of code for this.

TCG is a poor optimizer.  If you can at all avoid branches, while implementing
a single instruction, do so.  Because this means that you can use
tcg_temp_new_i32 (et al) which are "normal" temps, and are not spilled to the
stack at the end of the BB.

This does not necessarily apply to conditional execution (cc_prologue, et al)
because you can think of those as "outside" of the instruction, merely wrapping
them.  The actual live temps will be "inside" and not live past the done label.

>>> +/* Return from 

Re: [PATCH 05/15] arc: TCG instruction generator and hand-definitions

2021-01-15 Thread Cupertino Miranda
> On 11/11/20 10:17 AM, cupertinomira...@gmail.com wrote:
>> +/*
>> + * The macro to add boiler plate code for conditional execution.
>> + * It will add tcg_gen codes only if there is a condition to
>> + * be checked (ctx->insn.cc != 0). This macro assumes that there
>> + * is a "ctx" variable of type "DisasCtxt *" in context. Remember
>> + * to pair it with CC_EPILOGUE macro.
>> + */
>> +#define CC_PROLOGUE   \
>> +  TCGv cc = tcg_temp_local_new(); \
>> +  TCGLabel *done = gen_new_label();   \
>> +  do {\
>> +if (ctx->insn.cc) {   \
>> +arc_gen_verifyCCFlag(ctx, cc);\
>> +tcg_gen_brcondi_tl(TCG_COND_NE, cc, 1, done); \
>> +} \
>> +  } while (0)
>> +
>> +/*
>> + * The finishing counter part of CC_PROLUGE. This is supposed
>> + * to be put at the end of the function using it.
>> + */
>> +#define CC_EPILOGUE  \
>> +if (ctx->insn.cc) {  \
>> +gen_set_label(done); \
>> +}\
>> +tcg_temp_free(cc)
> 
> Why would this need to be boiler-plate?  Why would this not simply exist in
> exactly one location?
> 
> You don't need a tcg_temp_local_new, because the cc value is not used past the
> branch.  You should free the temp immediately after the branch.
> 

I wonder if what you thought was to move those macros to functions and 
call it when CC_PROLOGUE and CC_EPILOGUE are used.
I think the macros were choosen due to the sharing of the 'cc' and 
'done' variables in both CC_PROLOGUE AND CC_EPILOGUE.

>> +void gen_goto_tb(DisasContext *ctx, int n, TCGv dest)
>> +{
>> +tcg_gen_mov_tl(cpu_pc, dest);
>> +tcg_gen_andi_tl(cpu_pcl, dest, 0xfffc);
>> +if (ctx->base.singlestep_enabled) {
>> +gen_helper_debug(cpu_env);
>> +}
>> +tcg_gen_exit_tb(NULL, 0);
> 
> Missing else.  This is dead code for single-step.
Goes a little above my knowledge of QEMU internals to be honest.
Do you have a recommendation what we should be doing here ?

> 
>> +void arc_translate_init(void)
>> +{
>> +int i;
>> +static int init_not_done = 1;
>> +
>> +if (init_not_done == 0) {
>> +return;
>> +}
> 
> Useless.  This will only be called once.
> 
>> +#define ARC_REG_OFFS(x) offsetof(CPUARCState, x)
>> +
>> +#define NEW_ARC_REG(x) \
>> +tcg_global_mem_new_i32(cpu_env, offsetof(CPUARCState, x), #x)
>> +
>> +cpu_S1f = NEW_ARC_REG(macmod.S1);
>> +cpu_S2f = NEW_ARC_REG(macmod.S2);
>> +cpu_CSf = NEW_ARC_REG(macmod.CS);
>> +
>> +cpu_Zf  = NEW_ARC_REG(stat.Zf);
>> +cpu_Lf  = NEW_ARC_REG(stat.Lf);
>> +cpu_Nf  = NEW_ARC_REG(stat.Nf);
>> +cpu_Cf  = NEW_ARC_REG(stat.Cf);
>> +cpu_Vf  = NEW_ARC_REG(stat.Vf);
>> +cpu_Uf  = NEW_ARC_REG(stat.Uf);
>> +cpu_DEf = NEW_ARC_REG(stat.DEf);
>> +cpu_ESf = NEW_ARC_REG(stat.ESf);
>> +cpu_AEf = NEW_ARC_REG(stat.AEf);
>> +cpu_Hf  = NEW_ARC_REG(stat.Hf);
>> +cpu_IEf = NEW_ARC_REG(stat.IEf);
>> +cpu_Ef  = NEW_ARC_REG(stat.Ef);
>> +
>> +cpu_is_delay_slot_instruction = 
>> NEW_ARC_REG(stat.is_delay_slot_instruction);
>> +
>> +cpu_l1_Zf = NEW_ARC_REG(stat_l1.Zf);
>> +cpu_l1_Lf = NEW_ARC_REG(stat_l1.Lf);
>> +cpu_l1_Nf = NEW_ARC_REG(stat_l1.Nf);
>> +cpu_l1_Cf = NEW_ARC_REG(stat_l1.Cf);
>> +cpu_l1_Vf = NEW_ARC_REG(stat_l1.Vf);
>> +cpu_l1_Uf = NEW_ARC_REG(stat_l1.Uf);
>> +cpu_l1_DEf = NEW_ARC_REG(stat_l1.DEf);
>> +cpu_l1_AEf = NEW_ARC_REG(stat_l1.AEf);
>> +cpu_l1_Hf = NEW_ARC_REG(stat_l1.Hf);
>> +
>> +cpu_er_Zf = NEW_ARC_REG(stat_er.Zf);
>> +cpu_er_Lf = NEW_ARC_REG(stat_er.Lf);
>> +cpu_er_Nf = NEW_ARC_REG(stat_er.Nf);
>> +cpu_er_Cf = NEW_ARC_REG(stat_er.Cf);
>> +cpu_er_Vf = NEW_ARC_REG(stat_er.Vf);
>> +cpu_er_Uf = NEW_ARC_REG(stat_er.Uf);
>> +cpu_er_DEf = NEW_ARC_REG(stat_er.DEf);
>> +cpu_er_AEf = NEW_ARC_REG(stat_er.AEf);
>> +cpu_er_Hf = NEW_ARC_REG(stat_er.Hf);
>> +
>> +cpu_eret = NEW_ARC_REG(eret);
>> +cpu_erbta = NEW_ARC_REG(erbta);
>> +cpu_ecr = NEW_ARC_REG(ecr);
>> +cpu_efa = NEW_ARC_REG(efa);
>> +cpu_bta = NEW_ARC_REG(bta);
>> +cpu_lps = NEW_ARC_REG(lps);
>> +cpu_lpe = NEW_ARC_REG(lpe);
>> +cpu_pc = NEW_ARC_REG(pc);
>> +cpu_npc = NEW_ARC_REG(npc);
>> +
>> +cpu_bta_l1 = NEW_ARC_REG(bta_l1);
>> +cpu_bta_l2 = NEW_ARC_REG(bta_l2);
>> +
>> +cpu_intvec = NEW_ARC_REG(intvec);
>> +
>> +for (i = 0; i < 64; i++) {
>> +char name[16];
>> +
>> +sprintf(name, "r[%d]", i);
>> +
>> +cpu_r[i] = tcg_global_mem_new_i32(cpu_env,
>> +  ARC_REG_OFFS(r[i]),
>> +  strdup(name));
>> +}
>> +
>> +cpu_gp = cpu_r[26];
>> +cpu_fp = cpu_r[27];
>> +cpu_sp = cpu_r[28];
>> +cpu_ilink1 = cpu_r[29];
>> +

Re: [PATCH 05/15] arc: TCG instruction generator and hand-definitions

2020-12-01 Thread Richard Henderson
On 11/11/20 10:17 AM, cupertinomira...@gmail.com wrote:
> +/*
> + * The macro to add boiler plate code for conditional execution.
> + * It will add tcg_gen codes only if there is a condition to
> + * be checked (ctx->insn.cc != 0). This macro assumes that there
> + * is a "ctx" variable of type "DisasCtxt *" in context. Remember
> + * to pair it with CC_EPILOGUE macro.
> + */
> +#define CC_PROLOGUE   \
> +  TCGv cc = tcg_temp_local_new(); \
> +  TCGLabel *done = gen_new_label();   \
> +  do {\
> +if (ctx->insn.cc) {   \
> +arc_gen_verifyCCFlag(ctx, cc);\
> +tcg_gen_brcondi_tl(TCG_COND_NE, cc, 1, done); \
> +} \
> +  } while (0)
> +
> +/*
> + * The finishing counter part of CC_PROLUGE. This is supposed
> + * to be put at the end of the function using it.
> + */
> +#define CC_EPILOGUE  \
> +if (ctx->insn.cc) {  \
> +gen_set_label(done); \
> +}\
> +tcg_temp_free(cc)

Why would this need to be boiler-plate?  Why would this not simply exist in
exactly one location?

You don't need a tcg_temp_local_new, because the cc value is not used past the
branch.  You should free the temp immediately after the branch.

> +void gen_goto_tb(DisasContext *ctx, int n, TCGv dest)
> +{
> +tcg_gen_mov_tl(cpu_pc, dest);
> +tcg_gen_andi_tl(cpu_pcl, dest, 0xfffc);
> +if (ctx->base.singlestep_enabled) {
> +gen_helper_debug(cpu_env);
> +}
> +tcg_gen_exit_tb(NULL, 0);

Missing else.  This is dead code for single-step.

> +void arc_translate_init(void)
> +{
> +int i;
> +static int init_not_done = 1;
> +
> +if (init_not_done == 0) {
> +return;
> +}

Useless.  This will only be called once.

> +#define ARC_REG_OFFS(x) offsetof(CPUARCState, x)
> +
> +#define NEW_ARC_REG(x) \
> +tcg_global_mem_new_i32(cpu_env, offsetof(CPUARCState, x), #x)
> +
> +cpu_S1f = NEW_ARC_REG(macmod.S1);
> +cpu_S2f = NEW_ARC_REG(macmod.S2);
> +cpu_CSf = NEW_ARC_REG(macmod.CS);
> +
> +cpu_Zf  = NEW_ARC_REG(stat.Zf);
> +cpu_Lf  = NEW_ARC_REG(stat.Lf);
> +cpu_Nf  = NEW_ARC_REG(stat.Nf);
> +cpu_Cf  = NEW_ARC_REG(stat.Cf);
> +cpu_Vf  = NEW_ARC_REG(stat.Vf);
> +cpu_Uf  = NEW_ARC_REG(stat.Uf);
> +cpu_DEf = NEW_ARC_REG(stat.DEf);
> +cpu_ESf = NEW_ARC_REG(stat.ESf);
> +cpu_AEf = NEW_ARC_REG(stat.AEf);
> +cpu_Hf  = NEW_ARC_REG(stat.Hf);
> +cpu_IEf = NEW_ARC_REG(stat.IEf);
> +cpu_Ef  = NEW_ARC_REG(stat.Ef);
> +
> +cpu_is_delay_slot_instruction = 
> NEW_ARC_REG(stat.is_delay_slot_instruction);
> +
> +cpu_l1_Zf = NEW_ARC_REG(stat_l1.Zf);
> +cpu_l1_Lf = NEW_ARC_REG(stat_l1.Lf);
> +cpu_l1_Nf = NEW_ARC_REG(stat_l1.Nf);
> +cpu_l1_Cf = NEW_ARC_REG(stat_l1.Cf);
> +cpu_l1_Vf = NEW_ARC_REG(stat_l1.Vf);
> +cpu_l1_Uf = NEW_ARC_REG(stat_l1.Uf);
> +cpu_l1_DEf = NEW_ARC_REG(stat_l1.DEf);
> +cpu_l1_AEf = NEW_ARC_REG(stat_l1.AEf);
> +cpu_l1_Hf = NEW_ARC_REG(stat_l1.Hf);
> +
> +cpu_er_Zf = NEW_ARC_REG(stat_er.Zf);
> +cpu_er_Lf = NEW_ARC_REG(stat_er.Lf);
> +cpu_er_Nf = NEW_ARC_REG(stat_er.Nf);
> +cpu_er_Cf = NEW_ARC_REG(stat_er.Cf);
> +cpu_er_Vf = NEW_ARC_REG(stat_er.Vf);
> +cpu_er_Uf = NEW_ARC_REG(stat_er.Uf);
> +cpu_er_DEf = NEW_ARC_REG(stat_er.DEf);
> +cpu_er_AEf = NEW_ARC_REG(stat_er.AEf);
> +cpu_er_Hf = NEW_ARC_REG(stat_er.Hf);
> +
> +cpu_eret = NEW_ARC_REG(eret);
> +cpu_erbta = NEW_ARC_REG(erbta);
> +cpu_ecr = NEW_ARC_REG(ecr);
> +cpu_efa = NEW_ARC_REG(efa);
> +cpu_bta = NEW_ARC_REG(bta);
> +cpu_lps = NEW_ARC_REG(lps);
> +cpu_lpe = NEW_ARC_REG(lpe);
> +cpu_pc = NEW_ARC_REG(pc);
> +cpu_npc = NEW_ARC_REG(npc);
> +
> +cpu_bta_l1 = NEW_ARC_REG(bta_l1);
> +cpu_bta_l2 = NEW_ARC_REG(bta_l2);
> +
> +cpu_intvec = NEW_ARC_REG(intvec);
> +
> +for (i = 0; i < 64; i++) {
> +char name[16];
> +
> +sprintf(name, "r[%d]", i);
> +
> +cpu_r[i] = tcg_global_mem_new_i32(cpu_env,
> +  ARC_REG_OFFS(r[i]),
> +  strdup(name));
> +}
> +
> +cpu_gp = cpu_r[26];
> +cpu_fp = cpu_r[27];
> +cpu_sp = cpu_r[28];
> +cpu_ilink1 = cpu_r[29];
> +cpu_ilink2 = cpu_r[30];
> +cpu_blink  = cpu_r[31];
> +cpu_acclo  = cpu_r[58];
> +cpu_acchi  = cpu_r[59];
> +cpu_lpc= cpu_r[60];
> +cpu_limm   = cpu_r[62];
> +cpu_pcl= cpu_r[63];

You shouldn't need two pointers to these.  Either use cpu_r[PCL] (preferred) or
#define cpu_pcl cpu_r[63].

You could put all of these into a const static table.

> +static int arc_gen_INVALID(const DisasContext *ctx)
> +{
> +fprintf(stderr, "invalid inst @:%08x\n", ctx->cpc);

Never fprintf.  Raise an