gen_lowpart called where 'truncate' needed?
I am porting gcc-4.4.3 to TILE-Gx, a 64-bit VLIW RISC. It's gone pretty well so far; most tests work, Linux builds, etc. Thanks for writing such good documentation. This chip, like MIPS64, maintains the invariant that SImode values in DImode registers are always sign-extended. So I have implemented PROMOTE_MODE, TARGET_MODE_REP_EXTENDED and LOAD_EXTEND_OP accordingly, and made sure my .md file always preserves this invariant. This almost works, but testing has shown two cases that break this invariant, and I want to know if it's my fault. The problem is that 'widen_bswap' and 'expand_divmod' each produce a DImode value whose high 32 bits are known to be zero, and then simply 'gen_lowpart' the DImode value to convert it to SImode. This produces a SUBREG, not a 'truncate' rtx, so I end up with a register that is incorrectly zero-extended. Am I doing something wrong, or are these functions incorrect? As far as I can tell, gen_lowpart is not expected to force a sign-extension operation. But SUBREG rules are a bit subtle so I could be missing something. Everything works fine if I modify 'widen_bswap' and 'expand_divmod' to do this instead of 'return gen_lowpart (mode, x)': return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x)); Since the high bits are already zero, that would be less efficient on most platforms, so guarding it with something like this would probably be smarter: if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND) return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x)); I'm happy to believe I'm doing something wrong in my back end, but I'm not sure what that would be. I could also believe these are obscure edge cases no one cared about before. Any tips would be appreciated. Thanks! -Mat
Re: gen_lowpart called where 'truncate' needed?
Adam Nemet writes: > Ian Lance Taylor writes: > > Mat Hostetter writes: > > > >> Since the high bits are already zero, that would be less efficient on > >> most platforms, so guarding it with something like this would probably > >> be smarter: > >> > >> if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND) > >> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x)); > >> > >> I'm happy to believe I'm doing something wrong in my back end, but I'm > >> not sure what that would be. I could also believe these are obscure > >> edge cases no one cared about before. Any tips would be appreciated. > > > > Interesting. I think you are in obscure edge case territory. Your > > suggestion makes sense to me, and in fact it should probably be put > > into gen_lowpart_common. > > FWIW, I disagree. Firstly, mode_rep_extended is a special case of > !TRULY_NOOP_TRUNCATION so the above check should use that. Secondly, in > MIPS we call gen_lowpart to convert DI to SI when we know it's safe. In > this case we always want a subreg not a truncate for better code. So I > don't think gen_lowpart_common is the right place to fix this. > > I think the right fix is to call convert_to_mode or convert_move in the > expansion code which ensure the proper truncation. That would yield correct code, but wouldn't it throw away the fact that the high bits are already known to be zero, and yield redundant zero-extension on some platforms? I'm guessing that's why the code was originally written to call convert_lowpart rather than convert_to_mode. And just to add a minor wrinkle: for the widen_bswap case, which produces (bswap64 (x) >> 32), the optimal thing is actually to use ashiftrt instead of lshiftrt when targetm.mode_rep_extended says SIGN_EXTEND, and then call convert_lowpart as now. -Mat
Re: gen_lowpart called where 'truncate' needed?
Adam Nemet writes: > > > I think the right fix is to call convert_to_mode or convert_move in the > > > expansion code which ensure the proper truncation. > > > > That would yield correct code, but wouldn't it throw away the fact > > that the high bits are already known to be zero, and yield redundant > > zero-extension on some platforms? I'm guessing that's why the code was > > originally written to call convert_lowpart rather than convert_to_mode. > > convert_to_mode uses gen_lowpart for truncation if TRULY_NOOP_TRUNCATION. I was concerned about the !TRULY_NOOP_TRUNCATION case; I didn't want to harm someone else's performance to benefit my chip. But, looking around, it looks like (almost?) every chip either #defines TRULY_NOOP_TRUNCATION or wants this fix. So your suggestion sounds good to me, and in any case using straightforward type conversions should surely be the default choice. -Mat
how do I achieve a weaker UNSPEC_VOLATILE?
I've implemented some special insns that access hardware resources. These insns have side effects so they cannot be deleted or reordered with respect to each other. I made them UNSPEC_VOLATILE, which generates correct code. Unfortunately, performance is poor. The problem is that UNSPEC_VOLATILE is a scheduling barrier, so the scheduler does not issue any other insn in the same cycle. Since my chip is a VLIW, I rely on the scheduler annotations to determine which insns go in a bundle (same cycle == same bundle). Due to the scheduler barrier, none of these special insns ever get bundled with anything else, which wastes valuable VLIW slots. How should I achieve the effect I need (preserve these insns and their relative ordering), while still allowing other insns to be bundled with them? One hack that occurs to me is to annotate the special insns to pretend each one reads and writes a phony hardware register. This would preserve ordering and prevent them from being deleted, at least if a phony hardware register would be considered live on exit from a function, etc. (would it?) But even if this works, I worry the phony dependencies and more complex insn patterns might prevent 'combine' from ever combining two of these special insns together, which is valuable and works now. But perhaps there is a cleaner way. Any advice? Thanks! -Mat
Is nonoverlapping_memrefs_p wrong for unknown offsets?
We recently tracked down an aliasing bug in gcc-4.4.3 (found by our TILE-Gx back end, not yet ready to contribute), and we wanted to make sure the fix is right. try_crossjump_bb identifies some common insns in SPEC2000.eon and uses merge_memattrs to merge them. To do so, it has to unify their aliasing data such that any insn that aliased either of the original insns aliases the merged insn. In our case we have two identical-looking insns that are actually referencing different stack spill slots, so their MEM_OFFSETs are different. merge_memattrs correctly NULLs out the MEM_OFFSET of the merged insn to indicate it's not sure what the offset is, although it leaves a non-NULL MEM_SIZE: else if (MEM_OFFSET (x) != MEM_OFFSET (y)) { set_mem_offset (x, 0); set_mem_offset (y, 0); } Later, nonoverlapping_memrefs_p decides that this merged insn does not alias another spill slot insn (one which has a valid MEM_OFFSET), but in fact they do alias. The scheduler then creates an incorrect schedule. The bug seems to be that nonoverlapping_memrefs_p does not conservatively assume that a NULL MEM_OFFSET can alias any offset, despite comments to the contrary. Instead it essentially treats it like a known offset of zero. We don't understand why it doesn't just give up if the references have the same base but one of the offsets is unknown. A minimal patch to do so would look like this, but if this is correct, then code after this short-circuit can be simplified, yielding a larger patch (which I could produce): --- //tilera/branch/gcc/tools/gcc/gcc/alias.c 2010/04/12 10:29:48 +++ //tilera/branch/gcc/tools/gcc/gcc/alias.c 2010/04/12 10:36:35 @@ -2150,6 +2150,11 @@ : MEM_SIZE (rtly) ? INTVAL (MEM_SIZE (rtly)) : -1); + /* If we don't have an offset for both memrefs, we have to assume they can + be anywhere in the DECL's MEM. */ + if (!moffsetx || !moffsety) +return 0; + /* If we have an offset for either memref, it can update the values computed above. */ if (moffsetx) -Mat
Re: Is nonoverlapping_memrefs_p wrong for unknown offsets?
Great, thanks -- I should have re-checked bugzilla after we tracked this down. We also noticed a few minor performance issues along the way. It would be better if merge_memattrs did a min/max thing with offset/size to choose an offset and size that encompass both original references, rather than giving up on the offset altogether when the offsets don't match exactly. Also, flow_find_cross_jump coarsens the aliasing information as it scans, so even if it gives up because it doesn't find enough insns to merge, the aliasing data is lost. We implemented a split of the scan from the actual destructive merging. -Mat Steven Bosscher writes: > On Mon, Apr 12, 2010 at 6:57 PM, Mat Hostetter wrote: > > try_crossjump_bb identifies some common insns in SPEC2000.eon and uses > > merge_memattrs to merge them. To do so, it has to unify their > > aliasing data such that any insn that aliased either of the original > > insns aliases the merged insn. In our case we have two > > identical-looking insns that are actually referencing different stack > > spill slots, so their MEM_OFFSETs are different. merge_memattrs > > correctly NULLs out the MEM_OFFSET of the merged insn to indicate it's > > not sure what the offset is, although it leaves a non-NULL MEM_SIZE: > > > > else if (MEM_OFFSET (x) != MEM_OFFSET (y)) > > { > > set_mem_offset (x, 0); > > set_mem_offset (y, 0); > > } > > > > Later, nonoverlapping_memrefs_p decides that this merged insn does not > > alias another spill slot insn (one which has a valid MEM_OFFSET), but > > in fact they do alias. The scheduler then creates an incorrect schedule. > > Sounds like http://gcc.gnu.org/PR42509 -- does that help? > > Ciao! > Steven