Re: [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port.

Richard Henderson Thu, 08 Apr 2010 09:36:04 -0700

On 04/08/2010 02:56 AM, Aurelien Jarno wrote:
> I have applied the patch. I have some comments though, it would be nice
> if you can address them with additional patches.


Sure.

>> +static void tcg_out_ori(TCGContext *s, int ret, int arg, tcg_target_ulong m)
>> +{
>> +    if (m == 0) {
>> +        tcg_out_mov(s, ret, arg);
>> +    } else if (m == -1) {
>> +        tcg_out_movi(s, TCG_TYPE_I32, ret, -1);
> 
> Those cases are already eliminated in tcg/tcg-op.h. This code looks
> redundant.

The cases eliminated in tcg-op.h are with immediate constants.
There is no generic code in tcg.c to eliminate these cases 
after constant propagation.  However, I can remove them with...

>> +    } else {
>> +        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m);
>> +        tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_OR);
> 
> Do we really want a movi here? It would be better to leave the tcg code
> load the constant itself, so that if the same constant is used twice, it
> is only loaded once.

I've never caught TCG properly re-using constants, but I take your
point -- there's no reason why tcg.c can't be improved, and this 
port would miss out on that improvement.  I'll invent a constraint
that matches or_mask_p.

>> +static void tcg_out_andi(TCGContext *s, int ret, int arg, tcg_target_ulong 
>> m)
...
>> +        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m);
>> +        tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_AND);
> 
> Same.

ANDI slightly different case.  This function is used by tcg_out_tlb_read
with constants that may or may not satisfy and_mask_p.  I think it's cleaner
to handle the arbitrary case here, rather than open code the same test in
the tlb read function.

I will of course add a constraint to match and_mask_p, for ANDs that 
originate within the opcode stream.

>> +        tcg_out_reloc(s, s->code_ptr, R_PARISC_PCREL17F, label_index, 0);
>> +        tcg_out32(s, op);
> 
> This breaks partial retranslation. The bits corresponding to the offset
> should be preserved.

I don't recall ever hearing about re-translation.  Can you point me
at the bits that do it, so I can figure out what's going on?  This
sounds like something that ought to be documented properly...

I rather assumed that the "addend" parameter to patch_reloc would
hold whatever is really needed to be preserved.  What else is that
field for, anyway? 

>> -    if (opc == 3)
>> -        data_reg2 = *args++;
>> -    else
>> -        data_reg2 = 0; /* suppress warning */
>> +    data_reg2 = (opc == 3 ? *args++ : TCG_REG_R0);
> 
> I am not sure TCG_REG_R0 is really correct here, and I find it confusing.
> While it's value is zero, the assignment there is just to make GCC
> happy, it won't be used after

Correct.  I don't see what else I can really do though.  I think it's
worse to mix types: integer-as-register-number (i.e. *args) and 
integer-as-filler (i.e. 0).  Better to at least have them be the same
type as it clarifies that *args must be a register number.

Perhaps just a comment here?

>>  #if defined(CONFIG_SOFTMMU)
>> -    tcg_out_mov(s, r1, addr_reg);
>> +    lab1 = gen_new_label();
>> +    lab2 = gen_new_label();
> 
> Do you really want to use label here? load/store are the most common
> instructions, I am not really sure of the resulting performance.

I think the code is *so* much more readable re-using the usual branch
and relocate code.  I'd almost rather spend the time speeding up the
use of temporary labels than uglifying the code here.

>> +    /* These three correspond exactly to the fallback implementation.
>> +       But by including them we reduce the number of TCG ops that 
>> +       need to be generated, and these opcodes are fairly common.  */
> 
> Are you sure it really makes a difference?

Not quantifiably, but the reasoning is sound.  I can remove them if you insist.

>> +    { INDEX_op_add_i32, { "r", "rZ", "ri" } },
>> +    { INDEX_op_sub_i32, { "r", "rI", "ri" } },
>> +    { INDEX_op_and_i32, { "r", "rZ", "ri" } },
>> +    { INDEX_op_or_i32, { "r", "rZ", "ri" } },
> 
> Already commented for "and" and "or", but the same apply for add and 
> sub. Do we really need a "i" contraints here if the constant is going 
> to be loaded with a movi.

ADD and SUB are not going to use movi.  They will use one or both of
ADDIL (21-bit constant << 11) and LDO (14-bit constant).  As a pair
these insns can perform a full 32-bit constant addition.

I suppose technically there's a subset of 32-bit constants that could
benefit from generic code loading constants into registers.  The only
valid output register for ADDIL is R1.  So at the moment for

        R3 = R4 + 0x10000;

we generate

        addil   0x10000, r4, r1
        copy    r1, r3

where we could equivalently generate

        ldil    0x10000, r5
        add     r4, r5, r3

However I don't think this is worth worrying about in the short term.

>> +    if (GUEST_BASE != 0) {
>> +        tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE);
>> +    }
> 
> The final GUEST_BASE value is computed after the prologue has been
> generated. The value is modified in two cases:
> - The user specify a non-aligned base address.
> - /proc/sys/vm/mmap_min_addr is different than 0, which is now the
>   in default configuration for more than one year.
> 
> When it happens, the guest crashes almost immediately.

To be fair, mmap_min_addr only affects GUEST_BASE if the executable
image we've loaded overlaps.  Which is uncommon, but certainly possible.

Hmm.  I wonder which is better: one extra instruction needed per qemu_ld
vs having one more call-saved register available.  At the moment we don't
even come close to using all of the call-saved registers, and it would be
easy enough to have the prologue read the actual guest_base variable rather
than embed the constant.


r~

Re: [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port.

Reply via email to