gen_lowpart called where 'truncate' needed?

2010-02-03 Thread Mat Hostetter
I am porting gcc-4.4.3 to TILE-Gx, a 64-bit VLIW RISC.  It's gone
pretty well so far; most tests work, Linux builds, etc.  Thanks for
writing such good documentation.

This chip, like MIPS64, maintains the invariant that SImode values in
DImode registers are always sign-extended.  So I have implemented
PROMOTE_MODE, TARGET_MODE_REP_EXTENDED and LOAD_EXTEND_OP accordingly,
and made sure my .md file always preserves this invariant.

This almost works, but testing has shown two cases that break this
invariant, and I want to know if it's my fault.

The problem is that 'widen_bswap' and 'expand_divmod' each produce a
DImode value whose high 32 bits are known to be zero, and then simply
'gen_lowpart' the DImode value to convert it to SImode.  This produces
a SUBREG, not a 'truncate' rtx, so I end up with a register that is
incorrectly zero-extended.

Am I doing something wrong, or are these functions incorrect?  As far
as I can tell, gen_lowpart is not expected to force a sign-extension
operation.  But SUBREG rules are a bit subtle so I could be missing
something.

Everything works fine if I modify 'widen_bswap' and 'expand_divmod'
to do this instead of 'return gen_lowpart (mode, x)':

  return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));

Since the high bits are already zero, that would be less efficient on
most platforms, so guarding it with something like this would probably
be smarter:

  if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND)
return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));

I'm happy to believe I'm doing something wrong in my back end, but I'm
not sure what that would be.  I could also believe these are obscure
edge cases no one cared about before.  Any tips would be appreciated.
Thanks!

-Mat


Re: gen_lowpart called where 'truncate' needed?

2010-02-04 Thread Mat Hostetter
Adam Nemet  writes:

> Ian Lance Taylor  writes:
> > Mat Hostetter  writes:
> >
> >> Since the high bits are already zero, that would be less efficient on
> >> most platforms, so guarding it with something like this would probably
> >> be smarter:
> >>
> >>   if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND)
> >> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
> >>
> >> I'm happy to believe I'm doing something wrong in my back end, but I'm
> >> not sure what that would be.  I could also believe these are obscure
> >> edge cases no one cared about before.  Any tips would be appreciated.
> >
> > Interesting.  I think you are in obscure edge case territory.  Your
> > suggestion makes sense to me, and in fact it should probably be put
> > into gen_lowpart_common.
> 
> FWIW, I disagree.  Firstly, mode_rep_extended is a special case of
> !TRULY_NOOP_TRUNCATION so the above check should use that.  Secondly, in
> MIPS we call gen_lowpart to convert DI to SI when we know it's safe.  In
> this case we always want a subreg not a truncate for better code.  So I
> don't think gen_lowpart_common is the right place to fix this.
>
> I think the right fix is to call convert_to_mode or convert_move in the
> expansion code which ensure the proper truncation.

That would yield correct code, but wouldn't it throw away the fact
that the high bits are already known to be zero, and yield redundant
zero-extension on some platforms?  I'm guessing that's why the code was
originally written to call convert_lowpart rather than convert_to_mode.

And just to add a minor wrinkle: for the widen_bswap case, which
produces (bswap64 (x) >> 32), the optimal thing is actually to use
ashiftrt instead of lshiftrt when targetm.mode_rep_extended says
SIGN_EXTEND, and then call convert_lowpart as now.

-Mat


Re: gen_lowpart called where 'truncate' needed?

2010-02-05 Thread Mat Hostetter
Adam Nemet  writes:

> > > I think the right fix is to call convert_to_mode or convert_move in the
> > > expansion code which ensure the proper truncation.
> > 
> > That would yield correct code, but wouldn't it throw away the fact
> > that the high bits are already known to be zero, and yield redundant
> > zero-extension on some platforms?  I'm guessing that's why the code was
> > originally written to call convert_lowpart rather than convert_to_mode.
> 
> convert_to_mode uses gen_lowpart for truncation if TRULY_NOOP_TRUNCATION.

I was concerned about the !TRULY_NOOP_TRUNCATION case; I didn't want
to harm someone else's performance to benefit my chip.  But, looking
around, it looks like (almost?) every chip either #defines
TRULY_NOOP_TRUNCATION or wants this fix.  So your suggestion sounds
good to me, and in any case using straightforward type conversions
should surely be the default choice.

-Mat


how do I achieve a weaker UNSPEC_VOLATILE?

2010-03-12 Thread Mat Hostetter
I've implemented some special insns that access hardware resources.
These insns have side effects so they cannot be deleted or reordered
with respect to each other.  I made them UNSPEC_VOLATILE, which
generates correct code.  Unfortunately, performance is poor.

The problem is that UNSPEC_VOLATILE is a scheduling barrier, so the
scheduler does not issue any other insn in the same cycle.  Since my
chip is a VLIW, I rely on the scheduler annotations to determine which
insns go in a bundle (same cycle == same bundle).  Due to the
scheduler barrier, none of these special insns ever get bundled with
anything else, which wastes valuable VLIW slots.

How should I achieve the effect I need (preserve these insns and their
relative ordering), while still allowing other insns to be bundled
with them?

One hack that occurs to me is to annotate the special insns to pretend
each one reads and writes a phony hardware register.  This would
preserve ordering and prevent them from being deleted, at least if a
phony hardware register would be considered live on exit from a
function, etc. (would it?)

But even if this works, I worry the phony dependencies and more
complex insn patterns might prevent 'combine' from ever combining two
of these special insns together, which is valuable and works now.

But perhaps there is a cleaner way.  Any advice?  Thanks!

-Mat


Is nonoverlapping_memrefs_p wrong for unknown offsets?

2010-04-12 Thread Mat Hostetter
We recently tracked down an aliasing bug in gcc-4.4.3 (found by our
TILE-Gx back end, not yet ready to contribute), and we wanted to make
sure the fix is right.

try_crossjump_bb identifies some common insns in SPEC2000.eon and uses
merge_memattrs to merge them.  To do so, it has to unify their
aliasing data such that any insn that aliased either of the original
insns aliases the merged insn.  In our case we have two
identical-looking insns that are actually referencing different stack
spill slots, so their MEM_OFFSETs are different.  merge_memattrs
correctly NULLs out the MEM_OFFSET of the merged insn to indicate it's
not sure what the offset is, although it leaves a non-NULL MEM_SIZE:

  else if (MEM_OFFSET (x) != MEM_OFFSET (y))
{
  set_mem_offset (x, 0);
  set_mem_offset (y, 0);
}

Later, nonoverlapping_memrefs_p decides that this merged insn does not
alias another spill slot insn (one which has a valid MEM_OFFSET), but
in fact they do alias. The scheduler then creates an incorrect schedule.

The bug seems to be that nonoverlapping_memrefs_p does not
conservatively assume that a NULL MEM_OFFSET can alias any offset,
despite comments to the contrary. Instead it essentially treats it
like a known offset of zero.  We don't understand why it doesn't just
give up if the references have the same base but one of the offsets is
unknown.  A minimal patch to do so would look like this, but if this
is correct, then code after this short-circuit can be simplified,
yielding a larger patch (which I could produce):

--- //tilera/branch/gcc/tools/gcc/gcc/alias.c   2010/04/12 10:29:48
+++ //tilera/branch/gcc/tools/gcc/gcc/alias.c   2010/04/12 10:36:35
@@ -2150,6 +2150,11 @@
   : MEM_SIZE (rtly) ? INTVAL (MEM_SIZE (rtly)) :
   -1);
 
+  /* If we don't have an offset for both memrefs, we have to assume they can
+ be anywhere in the DECL's MEM. */
+  if (!moffsetx || !moffsety)
+return 0;
+
   /* If we have an offset for either memref, it can update the values computed
  above.  */
   if (moffsetx)

-Mat


Re: Is nonoverlapping_memrefs_p wrong for unknown offsets?

2010-04-12 Thread Mat Hostetter
Great, thanks -- I should have re-checked bugzilla after we tracked
this down.

We also noticed a few minor performance issues along the way.
It would be better if merge_memattrs did a min/max thing with
offset/size to choose an offset and size that encompass both original
references, rather than giving up on the offset altogether when the
offsets don't match exactly.

Also, flow_find_cross_jump coarsens the aliasing information as it
scans, so even if it gives up because it doesn't find enough insns to
merge, the aliasing data is lost.  We implemented a split of the scan
from the actual destructive merging.

-Mat


Steven Bosscher  writes:

> On Mon, Apr 12, 2010 at 6:57 PM, Mat Hostetter  wrote:
> > try_crossjump_bb identifies some common insns in SPEC2000.eon and uses
> > merge_memattrs to merge them.  To do so, it has to unify their
> > aliasing data such that any insn that aliased either of the original
> > insns aliases the merged insn.  In our case we have two
> > identical-looking insns that are actually referencing different stack
> > spill slots, so their MEM_OFFSETs are different.  merge_memattrs
> > correctly NULLs out the MEM_OFFSET of the merged insn to indicate it's
> > not sure what the offset is, although it leaves a non-NULL MEM_SIZE:
> >
> >          else if (MEM_OFFSET (x) != MEM_OFFSET (y))
> >            {
> >              set_mem_offset (x, 0);
> >              set_mem_offset (y, 0);
> >            }
> >
> > Later, nonoverlapping_memrefs_p decides that this merged insn does not
> > alias another spill slot insn (one which has a valid MEM_OFFSET), but
> > in fact they do alias. The scheduler then creates an incorrect schedule.
> 
> Sounds like http://gcc.gnu.org/PR42509 -- does that help?
> 
> Ciao!
> Steven