Re: [Qemu-devel] [PATCH 08/10] target-avr: adding instruction translation

Michael Rolnik Sun, 05 Jun 2016 14:50:16 -0700

please see my answer inside.

On Sun, Jun 5, 2016 at 6:27 AM, Richard Henderson <r...@twiddle.net> wrote:


> On 06/02/2016 01:07 PM, Michael Rolnik wrote:
>
>> Signed-off-by: Michael Rolnik <mrol...@gmail.com>
>> ---
>>  target-avr/translate-inst.c | 2443
>> +++++++++++++++++++++++++++++++++++++++++++
>>
>
> Is there a reason this code isn't going into translate.c?
> You wouldn't need the declarations in translate-inst.h or translate.h.

I see here two levels of logic
a. instruction translation
b. general flow of program translation.


>
>
> +/*
>> +    NOTE:   all registers are assumed to hold 8 bit values.
>> +            so all operations done on registers should preseve this
>> property
>> +*/
>> +
>> +/*
>> +    NOTE:   the flags C,H,V,N,V have either 0 or 1 values
>> +    NOTE:   the flag Z has inverse logic, when the value of Zf is 0 the
>> flag is assumed to be set, non zero - not set
>> +*/
>>
>
> This documentation belongs with the declarations in cpu.h.

moved.

>
>
> +void    gen_add_CHf(TCGv R, TCGv Rd, TCGv Rr);
>> +void    gen_add_Vf(TCGv R, TCGv Rd, TCGv Rr);
>> +void    gen_sub_CHf(TCGv R, TCGv Rd, TCGv Rr);
>> +void    gen_sub_Vf(TCGv R, TCGv Rd, TCGv Rr);
>> +void    gen_ZNSf(TCGv R);
>> +void    gen_push_ret(CPUAVRState *env, int    ret);
>> +void    gen_pop_ret(CPUAVRState *env, TCGv    ret);
>> +void    gen_jmp_ez(void);
>> +void    gen_jmp_z(void);
>> +
>> +void    gen_set_addr(TCGv addr, TCGv H, TCGv M, TCGv l); /*  H:M:L   =
>> addr  */
>> +void    gen_set_xaddr(TCGv addr);
>> +void    gen_set_yaddr(TCGv addr);
>> +void    gen_set_zaddr(TCGv addr);
>> +
>> +TCGv    gen_get_addr(TCGv H, TCGv M, TCGv L);            /*  addr =
>> H:M:L    */
>> +TCGv    gen_get_xaddr(void);
>> +TCGv    gen_get_yaddr(void);
>> +TCGv    gen_get_zaddr(void);
>> +int     sex(int Imm, unsigned bits);
>>
>
> Order these functions properly and you don't need forward declarations.

is it a requirements? this way it look cleaner.

>
>
> +
>> +void    gen_add_CHf(TCGv R, TCGv Rd, TCGv Rr)
>> +{
>> +    TCGv    t1      = tcg_temp_new_i32();
>> +    TCGv    t2      = tcg_temp_new_i32();
>> +    TCGv    t3      = tcg_temp_new_i32();
>> +
>> +    tcg_gen_and_tl(t1, Rd, Rr);    /*  t1  = Rd & Rr  */
>> +    tcg_gen_not_tl(t2, R);             /*  t2  = Rd & ~R  */
>> +    tcg_gen_and_tl(t2, Rd, t2);
>>
>
> tcg_gen_andc_tl.

thanks

>
>
> +    tcg_gen_not_tl(t3, R);             /*  t3  = Rr  *~R  */
>> +    tcg_gen_and_tl(t3, Rr, t3);
>>
>
> Likewise.

thanks

>
>
> +    tcg_gen_or_tl(t1, t1, t2);    /*  t1  = t1 | t2 | t3  */
>> +    tcg_gen_or_tl(t1, t1, t3);
>>
>
> While this is exactly the formula in the manual, it's also equal to
>
>     ((Rd ^ Rr) ^ R) & 16
>
Please explain. I don't see it.

http://www.wolframalpha.com/input/?i=A+and+B+or+A+and+not+C+or+B+and+not+C,+A+xor+B+xor+C


>
>
> where we examine the difference between the non-carry addition (Rd ^ Rr)
> and the carry addition (R) to find the carry out of bit 3.  This reduces
> the operation count to 2 from 5.
>
> +    tcg_gen_shri_tl(cpu_Cf, t1, 7);     /*  Cf  = t1(7)  */
>>
>
> Of course, Cf is also bit 8 of R, at least before you masked that off.
>
> +    tcg_gen_shri_tl(cpu_Hf, t1, 3);     /*  Hf  = t1(3)  */
>> +    tcg_gen_andi_tl(cpu_Hf, cpu_Hf, 1);
>>
>
> Consider leaving Hf in bit 3 (or 4, as above) instead of shifting and
> masking to bit 0.  This makes it (slightly) more complicated to read the
> full SREG value, but it saves two instructions per arithmetic instruction.
> This will add up quickly.
>
yes you are right. next version.

>
> One can make the same argument for most of the other flags.
>
> Compare how this is handled in target-arm for cpu_[NZCV]F.
>
> +void    gen_add_Vf(TCGv    R, TCGv    Rd, TCGv    Rr)
>> +{
>> +    TCGv    t1      = tcg_temp_new_i32();
>> +    TCGv    t2      = tcg_temp_new_i32();
>> +
>> +    tcg_gen_not_tl(t1, Rd);            /*  t1  = ~Rd & ~Rr & R  */
>> +    tcg_gen_not_tl(t2, Rr);
>> +    tcg_gen_and_tl(t1, t1, t2);
>> +    tcg_gen_and_tl(t1, t1, R);
>> +
>> +    tcg_gen_not_tl(t2, R);             /*  t2  = Rd & Rr & ~R  */
>> +    tcg_gen_and_tl(t2, t2, Rd);
>> +    tcg_gen_and_tl(t2, t2, Rr);
>> +
>> +    tcg_gen_or_tl(t1, t1, t2);    /*  t1  = Rd & Rr & ~R | ~Rd & ~Rr &
>> R  */
>>
>
> While this is indeed the formula in the manual, a better expression is
>
>   (R ^ Rd) & ~(Rd ^ Rr)
>
right, thanks.

>
> which is 3 operations instead of 5.  As alluded above, it might be best to
> keep Vf in bit 7 instead of bit 0.
>
> Also note that if you use bit 4 to compute Hf, as above, then one can
> share a sub-expression with the computation of Vf.
>
> +void    gen_sub_CHf(TCGv    R, TCGv    Rd, TCGv    Rr)
>> +{
>> +    TCGv    t1      = tcg_temp_new_i32();
>> +    TCGv    t2      = tcg_temp_new_i32();
>> +    TCGv    t3      = tcg_temp_new_i32();
>> +
>> +    /*  Cf & Hf  */
>> +    tcg_gen_not_tl(t1, Rd);            /*  t1  = ~Rd  */
>> +    tcg_gen_and_tl(t2, t1, Rr);    /*  t2  = ~Rd & Rr  */
>> +    tcg_gen_or_tl(t3, t1, Rr);    /*  t3  = (~Rd | Rr) & R  */
>> +    tcg_gen_and_tl(t3, t3, R);
>> +    tcg_gen_or_tl(t2, t2, t3);    /*  t2  = ~Rd & Rr | ~Rd & R | R & Rr
>> */
>> +    tcg_gen_shri_tl(cpu_Cf, t2, 7);     /*  Cf  = t2(7)  */
>> +    tcg_gen_shri_tl(cpu_Hf, t2, 3);     /*  Hf  = t2(3)  */
>> +    tcg_gen_andi_tl(cpu_Hf, cpu_Hf, 1);
>>
>
> Note that carry and borrow are related, and thus this is *also* computable
> via ((Rd ^ Rr) ^ R) on bit 4.

please explain, I don't see it
http://www.wolframalpha.com/input/?i=not+A+and+B+or+not+A+and+C+or++C+and+B,+A+xor+B+xor+C

>
>
> +void    gen_sub_Vf(TCGv    R, TCGv    Rd, TCGv    Rr)
>> +{
>> +    TCGv    t1      = tcg_temp_new_i32();
>> +    TCGv    t2      = tcg_temp_new_i32();
>> +
>> +    /*  Vf  */
>> +    tcg_gen_and_tl(t1, Rr, R);     /*  t1  = Rd & ~Rr & ~R  */
>> +    tcg_gen_not_tl(t1, t1);
>> +    tcg_gen_and_tl(t1, t1, Rd);
>> +    tcg_gen_not_tl(t2, Rd);            /*  t2  = ~Rd & Rr & R  */
>> +    tcg_gen_and_tl(t2, t2, Rr);
>> +    tcg_gen_and_tl(t2, t2, R);
>> +    tcg_gen_or_tl(t1, t1, t2);    /*  t1  = Rd & ~Rr & ~R | ~Rd & Rr &
>> R  */
>> +    tcg_gen_shri_tl(cpu_Vf, t1, 7);     /*  Vf  = t1(7)  */
>>
>
> This is equal to
>
>   (R ^ Rd) & (Rd ^ Rr)
>
> +void    gen_ZNSf(TCGv    R)
>> +{
>> +    tcg_gen_mov_tl(cpu_Zf, R);             /*  Zf  = R  */
>> +    tcg_gen_shri_tl(cpu_Nf, R, 7);     /*  Nf  = R(7)  */
>> +    tcg_gen_and_tl(cpu_Sf, cpu_Nf, cpu_Vf);/*  Sf  = Nf & Vf  */
>>
>
> This should be xor.

thanks. fixed

>
>
> +void    gen_push_ret(CPUAVRState *env, int    ret)
>> +{
>> +    tcg_gen_qemu_st8(tcg_const_local_i32((ret & 0x0000ff)), cpu_sp,
>> DATA_INDEX);
>>
>
> You don't need a local constant, and you should be freeing these 3 temps.

done.


>
>
I'll also note that the piece-wise store is big-endian, so you could
> perform this in 1 store for 2_BYTE and 2 stores for 3_BYTE.

I got an expression that the platform is little endian.

>
>
> +void    gen_jmp_ez()
>> +{
>> +    tcg_gen_mov_tl(cpu_pc, cpu_eind);
>> +    tcg_gen_shli_tl(cpu_pc, cpu_pc, 8);
>> +    tcg_gen_or_tl(cpu_pc, cpu_pc, cpu_r[31]);
>> +    tcg_gen_shli_tl(cpu_pc, cpu_pc, 8);
>> +    tcg_gen_or_tl(cpu_pc, cpu_pc, cpu_r[30]);
>> +    tcg_gen_andi_tl(cpu_pc, cpu_pc, 0xffffff);
>> +    tcg_gen_exit_tb(0);
>> +}
>> +
>> +void    gen_jmp_z()
>> +{
>> +    tcg_gen_mov_tl(cpu_pc, cpu_r[31]);
>> +    tcg_gen_shli_tl(cpu_pc, cpu_pc, 8);
>> +    tcg_gen_or_tl(cpu_pc, cpu_pc, cpu_r[30]);
>> +    tcg_gen_andi_tl(cpu_pc, cpu_pc, 0xffff);
>>
>
> Note this is
>
>     tcg_gen_deposit_tl(cpu_pc, cpu_r[30], cpu_r[31], 8, 8);
>
> Also consider storing EIND (and perhaps already shifted so that you can
> just OR it in.
>
> +void    gen_set_addr(TCGv addr, TCGv H, TCGv M, TCGv L)
>> +{
>> +    tcg_gen_andi_tl(L, addr, 0xff);
>> +
>> +    tcg_gen_shri_tl(addr, addr, 8);
>> +    tcg_gen_andi_tl(M, addr, 0xff);
>> +
>> +    tcg_gen_shri_tl(addr, addr, 8);
>> +    tcg_gen_andi_tl(H, addr, 0xff);
>> +}
>> +
>> +void    gen_set_xaddr(TCGv addr)
>> +{
>> +    gen_set_addr(addr, cpu_rampX, cpu_r[27], cpu_r[26]);
>> +}
>> +
>> +void    gen_set_yaddr(TCGv addr)
>> +{
>> +    gen_set_addr(addr, cpu_rampY, cpu_r[29], cpu_r[28]);
>> +}
>> +
>> +void    gen_set_zaddr(TCGv addr)
>> +{
>> +    gen_set_addr(addr, cpu_rampZ, cpu_r[31], cpu_r[30]);
>> +}
>> +
>> +TCGv    gen_get_addr(TCGv H, TCGv M, TCGv L)
>> +{
>> +    TCGv    addr= tcg_temp_new_i32();
>> +
>> +    tcg_gen_mov_tl(addr, H);                     /*  addr    = H:M:L  */
>> +    tcg_gen_shli_tl(addr, addr, 8);
>> +    tcg_gen_or_tl(addr, addr, M);
>> +    tcg_gen_shli_tl(addr, addr, 8);
>> +    tcg_gen_or_tl(addr, addr, L);
>> +
>> +    return  addr;
>> +}
>> +
>> +TCGv    gen_get_xaddr()
>> +{
>> +    return  gen_get_addr(cpu_rampX, cpu_r[27], cpu_r[26]);
>> +}
>> +
>> +TCGv    gen_get_yaddr()
>> +{
>> +    return  gen_get_addr(cpu_rampY, cpu_r[29], cpu_r[28]);
>> +}
>> +
>> +TCGv    gen_get_zaddr()
>> +{
>> +    return  gen_get_addr(cpu_rampZ, cpu_r[31], cpu_r[30]);
>> +}
>>
>
> Wow.  Um... Surely it would be better to store X and Y internally as whole
> 24-bit quantities, and Z as a 16-bit quantity (to be extended with rampz,
> rampd, or eind as needed).
>
rampX/Y/Z are represented now as 0x00ff0000.
X/Y/Z can be represented as 16 bits registers, however I do not know if and
when r26-r31 are used as 8 bits, so if X/Y/Z are represented as 16 bits it
would be hard to use r26-r31 in arithmetics


>
> This would allow normal loads, stores, and autoincrement to operate
> efficiently.  When you see byte accesses to R[26-31], you extract/deposit
> the bytes as required.  I assume that happens (significantly) less often
> than operating on X/Y/Z as a unit...
>
> One might make the same argument for R[24-25] as it pertains to adiw et al.

I don't think so, it will make simple arithmetics complex.

>
>
> +int     sex(int Imm, unsigned bits)
>> +{
>> +    Imm <<= 32 - bits;
>> +    Imm >>= 32 - bits;
>> +
>> +    return  Imm;
>> +}
>>
>
> This is sextract32(imm, 0, bits).

done. thanks.

>
>
> +int    avr_translate_ADIW(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    if (avr_feature(env, AVR_FEATURE_ADIW_SBIW) == false) {
>> +        gen_helper_unsupported(cpu_env);
>> +
>> +        return BS_EXCP;
>> +    }
>> +
>> +    TCGv    RdL = cpu_r[24 + 2 * ADIW_Rd(opcode)];
>> +    TCGv    RdH = cpu_r[25 + 2 * ADIW_Rd(opcode)];
>> +    int     Imm = (ADIW_hImm(opcode) << 4) | (ADIW_lImm(opcode));
>> +    TCGv    R   = tcg_temp_new_i32();
>> +    TCGv    Rd  = tcg_temp_new_i32();
>> +    TCGv    t0  = tcg_temp_new_i32();
>> +
>> +    /*  op  */
>> +    tcg_gen_shli_tl(Rd, RdH, 8);     /*  R = ((H << 8) | L) + Imm  */
>> +    tcg_gen_or_tl(Rd, RdL, Rd);
>> +    tcg_gen_addi_tl(R, Rd, Imm);
>> +    tcg_gen_andi_tl(R, R, 0xffff);/*  make it 16 bits  */
>> +
>> +    /*  Cf  */
>> +    tcg_gen_not_tl(t0, R);             /*  t0  = Rd & ~R  */
>> +    tcg_gen_and_tl(t0, Rd, t0);
>> +    tcg_gen_shri_tl(cpu_Cf, t0, 15);    /*  Cf  = t0(15)  */
>>
>
> Or simply bit 16.
>
> +    /*  Sf  */
>> +    tcg_gen_and_tl(cpu_Sf, cpu_Nf, cpu_Vf);/*  Sf  = Nf & Vf  */
>>
>
> xor.

fixed.

>
>
> +int    avr_translate_ASR(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    TCGv    Rd  = cpu_r[ASR_Rd(opcode)];
>> +    TCGv    t1  = tcg_temp_new_i32();
>> +    TCGv    t2  = tcg_temp_new_i32();
>> +
>> +    /*  op  */
>> +    tcg_gen_andi_tl(t1, Rd, 0x80);      /*  t1  = (Rd & 0x80) | (Rd >>
>> 1)  */
>> +    tcg_gen_shri_tl(t2, Rd, 1);
>> +    tcg_gen_or_tl(t1, t1, t2);
>> +
>> +    /*  Cf  */
>> +    tcg_gen_andi_tl(cpu_Cf, Rd, 1);         /*  Cf  = Rd(0)  */
>> +
>> +    /*  Vf  */
>> +    tcg_gen_and_tl(cpu_Vf, cpu_Nf, cpu_Cf);/*  Vf  = Nf & Cf  */
>>
>
> xor.
>
> +/*
>> +    Copies the T Flag in the SREG (Status Register) to bit b in register
>> Rd.
>> +*/
>> +int    avr_translate_BLD(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    TCGv    Rd  = cpu_r[BLD_Rd(opcode)];
>> +    TCGv    t1  = tcg_temp_new_i32();
>> +
>> +    tcg_gen_shli_tl(t1, cpu_Tf, BLD_Bit(opcode));
>> +    tcg_gen_or_tl(Rd, Rd, t1);
>> +    tcg_gen_and_tl(Rd, Rd, t1);
>>
>
> Err... extra and?  I'm pretty sure this insn simply sets bit B, and
> doesn't clear all of the others.

it was a bug. fixed.

>
>
> +/*
>> +    Clears the specified bits in register Rd. Performs the logical AND
>> between the contents of register Rd and the
>> +    complement of the constant mask K. The result will be placed in
>> register Rd.
>> +*/
>> +int    avr_translate_COM(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    TCGv    Rd  = cpu_r[COM_Rd(opcode)];
>> +    TCGv    R   = tcg_temp_new_i32();
>> +
>> +    tcg_gen_movi_tl(R, 0xff);
>> +    tcg_gen_sub_tl(Rd, R, Rd);
>>
>
> Better as xor with 0xff.
>
done

>
> +/*
>> +    This instruction performs a compare between two registers Rd and Rr,
>> and skips the next instruction if Rd = Rr.
>> +*/
>> +int    avr_translate_CPSE(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    TCGv        Rd  = cpu_r[CPSE_Rd(opcode)];
>> +    TCGv        Rr  = cpu_r[(CPSE_hRr(opcode) << 4) |
>> (CPSE_lRr(opcode))];
>> +    TCGLabel   *skip= gen_new_label();
>> +
>> +    tcg_gen_movi_tl(cpu_pc, ctx->inst[1].npc);      /*  PC if next inst
>> is skipped  */
>> +    tcg_gen_brcond_i32(TCG_COND_EQ, Rd, Rr, skip);
>> +    tcg_gen_movi_tl(cpu_pc, ctx->inst[0].npc);      /*  PC if next inst
>> is not skipped  */
>> +    gen_set_label(skip);
>>
>
> Any reason you're not using goto_tb?

I think, this place can be optimized by not exiting tb on skip. I will do
it in next version.

>
>
> +/*
>> +    Subtracts one -1- from the contents of register Rd and places the
>> result in the destination register Rd.
>> +    The C Flag in SREG is not affected by the operation, thus allowing
>> the DEC instruction to be used on a loop
>> +    counter in multiple-precision computations.
>> +    When operating on unsigned values, only BREQ and BRNE branches can
>> be expected to perform consistently.
>> +    When operating on two’s complement values, all signed branches are
>> available.
>> +*/
>> +int    avr_translate_DEC(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    TCGv        Rd  = cpu_r[DEC_Rd(opcode)];
>> +
>> +    tcg_gen_subi_tl(Rd, Rd, 1);     /*  Rd  = Rd - 1  */
>> +    tcg_gen_andi_tl(Rd, Rd, 0xff);  /*  make it 8 bits  */
>> +
>> +    tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_Vf, Rd, 0x7f);  /* cpu_Vf   =
>> Rd == 0x7f  */
>>
>
> This is INC overflow.

please explain, I don't see a problem here

>
>
> +int    avr_translate_INC(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    TCGv    Rd  = cpu_r[INC_Rd(opcode)];
>> +
>> +    tcg_gen_addi_tl(Rd, Rd, 1);
>> +    tcg_gen_andi_tl(Rd, Rd, 0xff);
>> +
>> +    tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_Vf, Rd, 0x80);  /* cpu_Vf   =
>> Rd == 0x80  */
>>
>
> This is DEC overflow.
> please explain, I don't see a problem here
>
> +/*
>> +    Load one byte indirect from data space to register and stores an
>> clear the bits in data space specified by the
>> +    register. The instruction can > +/*
>>
> only be used towards internal SRAM.
>
>> +    The data location is pointed to by the Z (16 bits) Pointer Register
>> in the Register File. Memory access is limited
>> +    to the current data segment of 64KB. To access another data segment
>> in devices with more than 64KB data
>> +    space, the RAMPZ in register in the I/O area has to be changed.
>> +    The Z-pointer Register is left unchanged by the operation. This
>> instruction is especially suited for clearing status
>> +    bits stored in SRAM.
>> +*/
>> +int    avr_translate_LAC(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    if (avr_feature(env, AVR_FEATURE_RMW) == false) {
>> +        gen_helper_unsupported(cpu_env);
>> +
>> +        return BS_EXCP;
>> +    }
>> +
>> +    TCGv    Rr  = cpu_r[LAC_Rr(opcode)];
>> +    TCGv    addr= gen_get_zaddr();
>> +    TCGv    t0  = tcg_temp_new_i32();
>> +    TCGv    t1  = tcg_temp_new_i32();
>> +
>> +    tcg_gen_qemu_ld8u(
>> +                    t0, addr, DATA_INDEX);    /*  t0      = mem[addr]  */
>> +    tcg_gen_movi_tl(t1, 0xff);                  /*  t1      = t0 & (0xff
>> - Rr)  */
>> +    tcg_gen_sub_tl(t1, t1, Rr);
>> +    tcg_gen_and_tl(t1, t0, t1);
>>
>
> These last 3 are
>
fixed

>
>   tcg_gen_andc_tl(t1, t0, Rr);
>
> +/*
>> +    This instruction performs 8-bit x 8-bit -> 16-bit signed
>> multiplication.
>> +*/
>> +int    avr_translate_MULS(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    if (avr_feature(env, AVR_FEATURE_MUL) == false) {
>> +        gen_helper_unsupported(cpu_env);
>> +
>> +        return BS_EXCP;
>> +    }
>> +
>> +    TCGv    R0  = cpu_r[0];
>> +    TCGv    R1  = cpu_r[1];
>> +    TCGv    Rd  = cpu_r[16 + MULS_Rd(opcode)];
>> +    TCGv    Rr  = cpu_r[16 + MULS_Rr(opcode)];
>> +    TCGv    R   = tcg_temp_new_i32();
>> +
>> +    tcg_gen_ext8s_tl(Rd, Rd);            /*  make Rd full 32 bit signed
>> */
>> +    tcg_gen_ext8s_tl(Rr, Rr);            /*  make Rr full 32 bit signed
>> */
>>
>
> You need to either zero-extend these again afterward, or you need to
> perform the extensions into a temporary.

oops

>
>
> +/*
>> +    Replaces the contents of register Rd with its two’s complement; the
>> value $80 is left unchanged.
>> +*/
>> +int    avr_translate_NEG(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    TCGv    Rd  = cpu_r[NEG_Rd(opcode)];
>> +    TCGv    R   = tcg_temp_new_i32();
>> +
>> +    tcg_gen_neg_tl(R, Rd);
>> +
>> +    tcg_gen_setcondi_tl(TCG_COND_NE, cpu_Cf, R, 0x00);  /*  Cf  = R !=
>> 0x00  */
>> +    tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_Vf, R, 0x80);  /*  Vf  = R ==
>> 0x80  */
>>
>
> Correct, but surely it's better to re-use the normal subtraction
> infrastructure.  In particular, setcond is at minimum two operations.  The
> tcg optimizer should do approximately as well folding away the zero from
> the subtract.
>
> +    tcg_gen_not_tl(cpu_Hf, Rd);                                /*  Hf  =
>> (~Rd | R)(3)  */
>> +    tcg_gen_or_tl(cpu_Hf, cpu_Hf, R);
>> +    tcg_gen_shri_tl(cpu_Hf, cpu_Hf, 3);
>> +    tcg_gen_andi_tl(cpu_Hf, cpu_Hf, 1);
>> +    gen_ZNSf(R);
>> +
>> +    tcg_temp_free_i32(R);
>>
>
> You failed to write back the result.
>
done.

>
> +int    avr_translate_XCH(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +    if (avr_feature(env, AVR_FEATURE_RMW) == false) {
>> +        gen_helper_unsupported(cpu_env);
>> +
>> +        return BS_EXCP;
>> +    }
>> +
>> +    TCGv    Rd      = cpu_r[XCH_Rd(opcode)];
>> +    TCGv    t0      = tcg_temp_new_i32();
>> +    TCGv    addr    = gen_get_zaddr();
>> +
>> +    tcg_gen_qemu_ld8u(
>> +                    addr, t0, DATA_INDEX);
>> +    tcg_gen_qemu_st8(
>> +                    addr, Rd, DATA_INDEX);
>>
>
> The address and data temporaries are swapped.
>
> fixed.

>
> r~
>



-- 
Best Regards,
Michael Rolnik

Re: [Qemu-devel] [PATCH 08/10] target-avr: adding instruction translation

Reply via email to