On 3/9/22 04:02, Amir Gonnen wrote:
How does "cpu_crs_R" work?
...> Otherwise, each gpr access would be indirect. I'm probably missing 
something here.

They are indirect, but with some optimization.

+    TCGv_ptr crs = tcg_global_mem_new_ptr(cpu_env,
+                                          offsetof(CPUNios2State, crs), "crs");
+
+    for (int i = 0; i < NUM_GP_REGS; i++) {
+        cpu_crs_R[i] = tcg_global_mem_new(crs, 4 * i, gr_regnames[i]);
+    }

Note that the crs variable is relative to env, and then the cpu_crs_R registers are relative to crs.

Without an EIC-enabled kernel for testing, it's hard for me to show the nios2 code at work, but it is identical to what we do over in target/sparc:

    for (i = 8; i < 32; ++i) {
        cpu_regs[i] = tcg_global_mem_new(cpu_regwptr,
                                         (i - 8) * sizeof(target_ulong),
                                         gregnames[i]);
    }

A small example of what this looks like is -d in_asm,op_ind,op_opt :

IN: __libc_start_main
0x00000000001032e8:  save  %sp, -720, %sp
0x00000000001032ec:  stx  %i0, [ %fp + 0x87f ]
0x00000000001032f0:  stx  %i1, [ %fp + 0x887 ]
0x00000000001032f4:  stx  %i2, [ %fp + 0x88f ]

OP before indirect lowering:
 ld_i32 tmp0,env,$0xfffffffffffffff8
 brcond_i32 tmp0,$0x0,lt,$L0              dead: 0 1

 ---- 00000000001032e8 00000000001032ec
 add_i64 tmp3,o6,$0xfffffffffffffd30      dead: 1 2
 call save,$0x0,$0,env                    dead: 0
 mov_i64 o6,tmp3                          sync: 0  dead: 0 1

 ---- 00000000001032ec 00000000001032f0
 add_i64 tmp2,i6,$0x87f                   dead: 2
 qemu_st_i64 i0,tmp2,beq,0                dead: 0 1

 ---- 00000000001032f0 00000000001032f4
 add_i64 tmp2,i6,$0x887                   dead: 2
 qemu_st_i64 i1,tmp2,beq,0                dead: 0 1

 ---- 00000000001032f4 00000000001032f8
 add_i64 tmp2,i6,$0x88f                   dead: 1 2
 qemu_st_i64 i2,tmp2,beq,0                dead: 0 1


You can see that early on, we optimize with the windowed registers themselves (o[0-7] and i[0-7] here). But then we lower that to explicit load/store operations:


OP after optimization and liveness analysis:
 ld_i32 tmp0,env,$0xfffffffffffffff8      pref=0xffff
 brcond_i32 tmp0,$0x0,lt,$L0              dead: 0 1

 ---- 00000000001032e8 00000000001032ec
 ld_i64 tmp20,regwptr,$0x30               dead: 1  pref=0xffff
 add_i64 tmp3,tmp20,$0xfffffffffffffd30   dead: 1 2  pref=0xf038
 call save,$0x0,$0,env                    dead: 0
 st_i64 tmp3,regwptr,$0x30                dead: 0

 ---- 00000000001032ec 00000000001032f0
 ld_i64 tmp36,regwptr,$0xb0               pref=0xf038
 add_i64 tmp2,tmp36,$0x87f                dead: 2  pref=0xffff
 ld_i64 tmp30,regwptr,$0x80               pref=0xffff
 qemu_st_i64 tmp30,tmp2,beq,0             dead: 0 1

 ---- 00000000001032f0 00000000001032f4
 add_i64 tmp2,tmp36,$0x887                dead: 2  pref=0xffff
 ld_i64 tmp31,regwptr,$0x88               pref=0xffff
 qemu_st_i64 tmp31,tmp2,beq,0             dead: 0 1

 ---- 00000000001032f4 00000000001032f8
 add_i64 tmp2,tmp36,$0x88f                dead: 1 2  pref=0xffff
 ld_i64 tmp32,regwptr,$0x90               dead: 1  pref=0xffff
 qemu_st_i64 tmp32,tmp2,beq,0             dead: 0 1


You can now see the new tmpN variables, and the uses of regwptr in the loads 
and stores.


r~

Reply via email to