Re: Make regcprop to eliminate noop moves better

2016-09-23 Thread Jan Hubicka
> On 09/22/2016 01:48 PM, Jan Hubicka wrote:
> >
> > * postreload.c (reload_cse_simplify): Also accept USE in noop move
> > patterns.
> >
> >diff --git a/gcc/postreload.c b/gcc/postreload.c
> >index 61c1ce8..4f3a526 100644
> >--- a/gcc/postreload.c
> >+++ b/gcc/postreload.c
> >@@ -153,7 +153,8 @@ reload_cse_simplify (rtx_insn *insn, rtx testreg)
> >   value = SET_DEST (part);
> > }
> > }
> >-  else if (GET_CODE (part) != CLOBBER)
> >+  else if (GET_CODE (part) != CLOBBER
> >+   && GET_CODE (part) != USE)
> > break;
> > }
> 
> Hmm. This is probably ok, but what's the USE for in your pattern?

Becuase moves in vector unit at GCN require exec reigster to be set to -1 and
this is modelled by explicit use.

Honza
> 
> 
> Bernd


Re: Make regcprop to eliminate noop moves better

2016-09-22 Thread Bernd Schmidt

On 09/22/2016 01:48 PM, Jan Hubicka wrote:


* postreload.c (reload_cse_simplify): Also accept USE in noop move
patterns.

diff --git a/gcc/postreload.c b/gcc/postreload.c
index 61c1ce8..4f3a526 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -153,7 +153,8 @@ reload_cse_simplify (rtx_insn *insn, rtx testreg)
  value = SET_DEST (part);
}
}
- else if (GET_CODE (part) != CLOBBER)
+ else if (GET_CODE (part) != CLOBBER
+  && GET_CODE (part) != USE)
break;
}


Hmm. This is probably ok, but what's the USE for in your pattern?


Bernd


Re: Make regcprop to eliminate noop moves better

2016-09-22 Thread Jan Hubicka
> > while working on the GCN port I ended up with many redundant register copies
> > of the form
> >  mov reg, exec
> >  do something
> >  mov reg, exec
> >  do something
> >  ...
> > these copies are generated by LRA because exec is small register class and
> > needs a lot of reloading (it could be improved too, but I do not care
> > because I want to handle exec specially later anyway).
> > 
> > I was however suprised this garbage survives postreload optimizations.  It
> > is easy to fix in regcprop which already does some noop copy elimination,
> > but only of the for mov reg, reg after substituting.
> 
> Right, this ought to be dealt with during postreload CSE, there is roughly 
> the 
> same code as yours:
> 
> /* See whether a single set SET is a noop.  */
> static int
> reload_cse_noop_set_p (rtx set)
> {
>   if (cselib_reg_set_mode (SET_DEST (set)) != GET_MODE (SET_DEST (set)))
> return 0;
> 
>   return rtx_equal_for_cselib_p (SET_DEST (set), SET_SRC (set));
> }
> 
> Any idea about why this doesn't work in your case?

Thanks for pointing that code out.  I looked for noop_set in postreload
passes and found one in regcprop first.

My case is not optimized because my IRA move pattern contains an use that is
not handled by postreload cse.  I am testing the attached patch and plan
commit it as obvious.  

regcprop uses single_set which may be bit more natural, but it won't be stronger
here because REG_DEAD notes are not computed, yet, and the current way noop
moves are discovered runs in parallel with simplification which is probably
a bit cheaper.

I think elimination at both places makes sense:  postreload cse is run before
splitting and regcprop afterwards. It seems that at least for x86 we get quite
few noop moves by splitting.  I will get statistics from x86_64 bootstrap for
regcprop part of elimination.

* postreload.c (reload_cse_simplify): Also accept USE in noop move
patterns.

diff --git a/gcc/postreload.c b/gcc/postreload.c
index 61c1ce8..4f3a526 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -153,7 +153,8 @@ reload_cse_simplify (rtx_insn *insn, rtx testreg)
  value = SET_DEST (part);
}
}
- else if (GET_CODE (part) != CLOBBER)
+ else if (GET_CODE (part) != CLOBBER
+  && GET_CODE (part) != USE)
break;
}
 


Re: Make regcprop to eliminate noop moves better

2016-09-21 Thread Eric Botcazou
> while working on the GCN port I ended up with many redundant register copies
> of the form
>  mov reg, exec
>  do something
>  mov reg, exec
>  do something
>  ...
> these copies are generated by LRA because exec is small register class and
> needs a lot of reloading (it could be improved too, but I do not care
> because I want to handle exec specially later anyway).
> 
> I was however suprised this garbage survives postreload optimizations.  It
> is easy to fix in regcprop which already does some noop copy elimination,
> but only of the for mov reg, reg after substituting.

Right, this ought to be dealt with during postreload CSE, there is roughly the 
same code as yours:

/* See whether a single set SET is a noop.  */
static int
reload_cse_noop_set_p (rtx set)
{
  if (cselib_reg_set_mode (SET_DEST (set)) != GET_MODE (SET_DEST (set)))
return 0;

  return rtx_equal_for_cselib_p (SET_DEST (set), SET_SRC (set));
}

Any idea about why this doesn't work in your case?

-- 
Eric Botcazou


Re: Make regcprop to eliminate noop moves better

2016-09-19 Thread Jeff Law

On 09/18/2016 06:42 AM, Jan Hubicka wrote:

Hi,
while working on the GCN port I ended up with many redundant register copies
of the form
 mov reg, exec
 do something
 mov reg, exec
 do something
 ...
these copies are generated by LRA because exec is small register class and needs
a lot of reloading (it could be improved too, but I do not care because I want
to handle exec specially later anyway).

I was however suprised this garbage survives postreload optimizations.  It is 
easy
to fix in regcprop which already does some noop copy elimination, but only
of the for mov reg, reg after substituting.

This patch implements it and eliminates many movs while running testsuite.
Bootstrapped/regtested x86_64-linux, OK?
Needs a ChangeLog.  Ideally we'd have tests to verify we're catching 
these cases that survive the postreload optimizations.


With those OK.  Possibly OK with just a ChangeLog if it's too damn 
painful to construct a consistent end-to-end testcase (this may be a 
good example where we could probably use David's framework to test this 
directly rather than via an end-to-end test).


Jeff


Make regcprop to eliminate noop moves better

2016-09-18 Thread Jan Hubicka
Hi,
while working on the GCN port I ended up with many redundant register copies
of the form
 mov reg, exec
 do something
 mov reg, exec
 do something
 ...
these copies are generated by LRA because exec is small register class and needs
a lot of reloading (it could be improved too, but I do not care because I want
to handle exec specially later anyway).

I was however suprised this garbage survives postreload optimizations.  It is 
easy
to fix in regcprop which already does some noop copy elimination, but only
of the for mov reg, reg after substituting.

This patch implements it and eliminates many movs while running testsuite.
Bootstrapped/regtested x86_64-linux, OK?

Honza

Index: regcprop.c
===
--- regcprop.c  (revision 240109)
+++ regcprop.c  (working copy)
@@ -771,6 +771,25 @@ copyprop_hardreg_forward_1 (basic_block
}
 
   set = single_set (insn);
+
+  /* Detect noop sets and remove them before processing side effects.  */
+  if (set && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set)))
+   {
+ unsigned int regno = REGNO (SET_SRC (set));
+ rtx r1 = find_oldest_value_reg (REGNO_REG_CLASS (regno),
+ SET_DEST (set), vd);
+ rtx r2 = find_oldest_value_reg (REGNO_REG_CLASS (regno),
+ SET_SRC (set), vd);
+ if (rtx_equal_p (r1 ? r1 : SET_DEST (set), r2 ? r2 : SET_SRC (set)))
+   {
+ bool last = insn == BB_END (bb);
+ delete_insn (insn);
+ if (last)
+   break;
+ continue;
+   }
+   }
+
   extract_constrain_insn (insn);
   preprocess_constraints (insn);
   const operand_alternative *op_alt = which_op_alt ();
@@ -860,7 +879,9 @@ copyprop_hardreg_forward_1 (basic_block
 register in the same class.  */
  if (REG_P (SET_DEST (set)))
{
- new_rtx = find_oldest_value_reg (REGNO_REG_CLASS (regno), src, 
vd);
+ new_rtx = find_oldest_value_reg (REGNO_REG_CLASS (regno),
+  src, vd);
+
  if (new_rtx && validate_change (insn, &SET_SRC (set), new_rtx, 0))
{
  if (dump_file)