On 04/08/2010 02:56 AM, Aurelien Jarno wrote: > I have applied the patch. I have some comments though, it would be nice > if you can address them with additional patches.
Sure. >> +static void tcg_out_ori(TCGContext *s, int ret, int arg, tcg_target_ulong m) >> +{ >> + if (m == 0) { >> + tcg_out_mov(s, ret, arg); >> + } else if (m == -1) { >> + tcg_out_movi(s, TCG_TYPE_I32, ret, -1); > > Those cases are already eliminated in tcg/tcg-op.h. This code looks > redundant. The cases eliminated in tcg-op.h are with immediate constants. There is no generic code in tcg.c to eliminate these cases after constant propagation. However, I can remove them with... >> + } else { >> + tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m); >> + tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_OR); > > Do we really want a movi here? It would be better to leave the tcg code > load the constant itself, so that if the same constant is used twice, it > is only loaded once. I've never caught TCG properly re-using constants, but I take your point -- there's no reason why tcg.c can't be improved, and this port would miss out on that improvement. I'll invent a constraint that matches or_mask_p. >> +static void tcg_out_andi(TCGContext *s, int ret, int arg, tcg_target_ulong >> m) ... >> + tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m); >> + tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_AND); > > Same. ANDI slightly different case. This function is used by tcg_out_tlb_read with constants that may or may not satisfy and_mask_p. I think it's cleaner to handle the arbitrary case here, rather than open code the same test in the tlb read function. I will of course add a constraint to match and_mask_p, for ANDs that originate within the opcode stream. >> + tcg_out_reloc(s, s->code_ptr, R_PARISC_PCREL17F, label_index, 0); >> + tcg_out32(s, op); > > This breaks partial retranslation. The bits corresponding to the offset > should be preserved. I don't recall ever hearing about re-translation. Can you point me at the bits that do it, so I can figure out what's going on? This sounds like something that ought to be documented properly... I rather assumed that the "addend" parameter to patch_reloc would hold whatever is really needed to be preserved. What else is that field for, anyway? >> - if (opc == 3) >> - data_reg2 = *args++; >> - else >> - data_reg2 = 0; /* suppress warning */ >> + data_reg2 = (opc == 3 ? *args++ : TCG_REG_R0); > > I am not sure TCG_REG_R0 is really correct here, and I find it confusing. > While it's value is zero, the assignment there is just to make GCC > happy, it won't be used after Correct. I don't see what else I can really do though. I think it's worse to mix types: integer-as-register-number (i.e. *args) and integer-as-filler (i.e. 0). Better to at least have them be the same type as it clarifies that *args must be a register number. Perhaps just a comment here? >> #if defined(CONFIG_SOFTMMU) >> - tcg_out_mov(s, r1, addr_reg); >> + lab1 = gen_new_label(); >> + lab2 = gen_new_label(); > > Do you really want to use label here? load/store are the most common > instructions, I am not really sure of the resulting performance. I think the code is *so* much more readable re-using the usual branch and relocate code. I'd almost rather spend the time speeding up the use of temporary labels than uglifying the code here. >> + /* These three correspond exactly to the fallback implementation. >> + But by including them we reduce the number of TCG ops that >> + need to be generated, and these opcodes are fairly common. */ > > Are you sure it really makes a difference? Not quantifiably, but the reasoning is sound. I can remove them if you insist. >> + { INDEX_op_add_i32, { "r", "rZ", "ri" } }, >> + { INDEX_op_sub_i32, { "r", "rI", "ri" } }, >> + { INDEX_op_and_i32, { "r", "rZ", "ri" } }, >> + { INDEX_op_or_i32, { "r", "rZ", "ri" } }, > > Already commented for "and" and "or", but the same apply for add and > sub. Do we really need a "i" contraints here if the constant is going > to be loaded with a movi. ADD and SUB are not going to use movi. They will use one or both of ADDIL (21-bit constant << 11) and LDO (14-bit constant). As a pair these insns can perform a full 32-bit constant addition. I suppose technically there's a subset of 32-bit constants that could benefit from generic code loading constants into registers. The only valid output register for ADDIL is R1. So at the moment for R3 = R4 + 0x10000; we generate addil 0x10000, r4, r1 copy r1, r3 where we could equivalently generate ldil 0x10000, r5 add r4, r5, r3 However I don't think this is worth worrying about in the short term. >> + if (GUEST_BASE != 0) { >> + tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE); >> + } > > The final GUEST_BASE value is computed after the prologue has been > generated. The value is modified in two cases: > - The user specify a non-aligned base address. > - /proc/sys/vm/mmap_min_addr is different than 0, which is now the > in default configuration for more than one year. > > When it happens, the guest crashes almost immediately. To be fair, mmap_min_addr only affects GUEST_BASE if the executable image we've loaded overlaps. Which is uncommon, but certainly possible. Hmm. I wonder which is better: one extra instruction needed per qemu_ld vs having one more call-saved register available. At the moment we don't even come close to using all of the call-saved registers, and it would be easy enough to have the prologue read the actual guest_base variable rather than embed the constant. r~