On 24.05.2013 19:02, Richard Henderson wrote: > On 05/24/2013 01:53 AM, Claudio Fontana wrote: >>> No real need to special case zero; it's just an extra test slowing down the >>> compiler. >> >> Yes, we need to handle the special case zero. >> Otherwise no instruction at all would be emitted for value 0. > > Hmm, true. Although I'd been thinking more along the lines of > arranging the code such that we'd use movz to set the zero.
I think we need to keep treating zero specially if we want to keep the optimization where we don't emit needless MOVK instructions for half-words of value 0000h. I can however make one single function out of movi32 and movi64, it could look like this: if (!value) { tcg_out_movr(s, 0, rd, TCG_REG_ZXR); return; } base = (value > 0xffffffff) ? 0xd2800000 : 0x52800000; while (value) { /* etc etc */ } >> I actually don't know whether to prefer ext=0 or ext=1, >> in the sense that it would be useful to know whether using the extended >> registers >> with a small constant is performance-wise preferable to using the 32bit >> operation, >> and relying on 0-extension. See also the rotation comment below. > >>From the armv8 isa overview: > > # Rationale: [...] By maintaining this semantic information in the instruction > # set, implementations can exploit this information to avoid expending energy > # or cycles to compute, forward and store the unused upper 32 bits of such > # data types. Implementations are free to exploit this freedom in whatever way > # they choose to save energy. I did not notice that, that solves the issue. >>> addr_reg almost certainly needs to be zero-extended for 32-bit guests, >>> easily >>> done by setting ext = 0 here. >> >> I can easily put an #ifdef just to be sure. > > No ifdef, just the TARGET_LONG_BITS == 64 comparison works. > >>> You initialize FP, but you don't reserve the register, so it's going to get >>> clobbered. We don't actually use the frame pointer in the translated code, >>> so >>> I don't think there's any call to actually initialize it either. >> >> The FP is not going to be clobbered, not by code here and not by called code. >> >> It is not going to be clobbered between our use before the jump and after the >> jump, because all the called functions need to preserve FP as mandated by the >> calling conventions. >> >> It is not going to be clobbered from the point of view of our caller, >> because we save (FP, LR) along with (X19, X20) .. (X27, X28) and restore them >> before returning. > > Ah, well, I didn't see it mentioned here, > >> + tcg_regset_clear(s->reserved_regs); >> + tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP); >> + tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP); >> + tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register >> */ > > but hadn't noticed that it's not listed in the reg_alloc_order. > >> We use FP to point to the callee_saved registers, and to move to/from them >> in the tcg_out_store_pair and tcg_out_load_pair functions. > > I hadn't noticed you'd hard-coded FP into the load/store_pair functions. > Let's *really* not do that. Even if we decide to continue using it, let's > pass it in explicitly. > > But I don't see that you're really gaining anything in the prologue from > using FP instead of SP. It seems like a waste of a register to me. > > > r~ > -- Claudio Fontana Server OS Architect Huawei Technologies Duesseldorf GmbH Riesstraße 25 - 80992 München office: +49 89 158834 4135 mobile: +49 15253060158