Re: Make regcprop to eliminate noop moves better
> On 09/22/2016 01:48 PM, Jan Hubicka wrote: > > > > * postreload.c (reload_cse_simplify): Also accept USE in noop move > > patterns. > > > >diff --git a/gcc/postreload.c b/gcc/postreload.c > >index 61c1ce8..4f3a526 100644 > >--- a/gcc/postreload.c > >+++ b/gcc/postreload.c > >@@ -153,7 +153,8 @@ reload_cse_simplify (rtx_insn *insn, rtx testreg) > > value = SET_DEST (part); > > } > > } > >- else if (GET_CODE (part) != CLOBBER) > >+ else if (GET_CODE (part) != CLOBBER > >+ && GET_CODE (part) != USE) > > break; > > } > > Hmm. This is probably ok, but what's the USE for in your pattern? Becuase moves in vector unit at GCN require exec reigster to be set to -1 and this is modelled by explicit use. Honza > > > Bernd
Re: Make regcprop to eliminate noop moves better
On 09/22/2016 01:48 PM, Jan Hubicka wrote: * postreload.c (reload_cse_simplify): Also accept USE in noop move patterns. diff --git a/gcc/postreload.c b/gcc/postreload.c index 61c1ce8..4f3a526 100644 --- a/gcc/postreload.c +++ b/gcc/postreload.c @@ -153,7 +153,8 @@ reload_cse_simplify (rtx_insn *insn, rtx testreg) value = SET_DEST (part); } } - else if (GET_CODE (part) != CLOBBER) + else if (GET_CODE (part) != CLOBBER + && GET_CODE (part) != USE) break; } Hmm. This is probably ok, but what's the USE for in your pattern? Bernd
Re: Make regcprop to eliminate noop moves better
> > while working on the GCN port I ended up with many redundant register copies > > of the form > > mov reg, exec > > do something > > mov reg, exec > > do something > > ... > > these copies are generated by LRA because exec is small register class and > > needs a lot of reloading (it could be improved too, but I do not care > > because I want to handle exec specially later anyway). > > > > I was however suprised this garbage survives postreload optimizations. It > > is easy to fix in regcprop which already does some noop copy elimination, > > but only of the for mov reg, reg after substituting. > > Right, this ought to be dealt with during postreload CSE, there is roughly > the > same code as yours: > > /* See whether a single set SET is a noop. */ > static int > reload_cse_noop_set_p (rtx set) > { > if (cselib_reg_set_mode (SET_DEST (set)) != GET_MODE (SET_DEST (set))) > return 0; > > return rtx_equal_for_cselib_p (SET_DEST (set), SET_SRC (set)); > } > > Any idea about why this doesn't work in your case? Thanks for pointing that code out. I looked for noop_set in postreload passes and found one in regcprop first. My case is not optimized because my IRA move pattern contains an use that is not handled by postreload cse. I am testing the attached patch and plan commit it as obvious. regcprop uses single_set which may be bit more natural, but it won't be stronger here because REG_DEAD notes are not computed, yet, and the current way noop moves are discovered runs in parallel with simplification which is probably a bit cheaper. I think elimination at both places makes sense: postreload cse is run before splitting and regcprop afterwards. It seems that at least for x86 we get quite few noop moves by splitting. I will get statistics from x86_64 bootstrap for regcprop part of elimination. * postreload.c (reload_cse_simplify): Also accept USE in noop move patterns. diff --git a/gcc/postreload.c b/gcc/postreload.c index 61c1ce8..4f3a526 100644 --- a/gcc/postreload.c +++ b/gcc/postreload.c @@ -153,7 +153,8 @@ reload_cse_simplify (rtx_insn *insn, rtx testreg) value = SET_DEST (part); } } - else if (GET_CODE (part) != CLOBBER) + else if (GET_CODE (part) != CLOBBER + && GET_CODE (part) != USE) break; }
Re: Make regcprop to eliminate noop moves better
> while working on the GCN port I ended up with many redundant register copies > of the form > mov reg, exec > do something > mov reg, exec > do something > ... > these copies are generated by LRA because exec is small register class and > needs a lot of reloading (it could be improved too, but I do not care > because I want to handle exec specially later anyway). > > I was however suprised this garbage survives postreload optimizations. It > is easy to fix in regcprop which already does some noop copy elimination, > but only of the for mov reg, reg after substituting. Right, this ought to be dealt with during postreload CSE, there is roughly the same code as yours: /* See whether a single set SET is a noop. */ static int reload_cse_noop_set_p (rtx set) { if (cselib_reg_set_mode (SET_DEST (set)) != GET_MODE (SET_DEST (set))) return 0; return rtx_equal_for_cselib_p (SET_DEST (set), SET_SRC (set)); } Any idea about why this doesn't work in your case? -- Eric Botcazou
Re: Make regcprop to eliminate noop moves better
On 09/18/2016 06:42 AM, Jan Hubicka wrote: Hi, while working on the GCN port I ended up with many redundant register copies of the form mov reg, exec do something mov reg, exec do something ... these copies are generated by LRA because exec is small register class and needs a lot of reloading (it could be improved too, but I do not care because I want to handle exec specially later anyway). I was however suprised this garbage survives postreload optimizations. It is easy to fix in regcprop which already does some noop copy elimination, but only of the for mov reg, reg after substituting. This patch implements it and eliminates many movs while running testsuite. Bootstrapped/regtested x86_64-linux, OK? Needs a ChangeLog. Ideally we'd have tests to verify we're catching these cases that survive the postreload optimizations. With those OK. Possibly OK with just a ChangeLog if it's too damn painful to construct a consistent end-to-end testcase (this may be a good example where we could probably use David's framework to test this directly rather than via an end-to-end test). Jeff
Make regcprop to eliminate noop moves better
Hi, while working on the GCN port I ended up with many redundant register copies of the form mov reg, exec do something mov reg, exec do something ... these copies are generated by LRA because exec is small register class and needs a lot of reloading (it could be improved too, but I do not care because I want to handle exec specially later anyway). I was however suprised this garbage survives postreload optimizations. It is easy to fix in regcprop which already does some noop copy elimination, but only of the for mov reg, reg after substituting. This patch implements it and eliminates many movs while running testsuite. Bootstrapped/regtested x86_64-linux, OK? Honza Index: regcprop.c === --- regcprop.c (revision 240109) +++ regcprop.c (working copy) @@ -771,6 +771,25 @@ copyprop_hardreg_forward_1 (basic_block } set = single_set (insn); + + /* Detect noop sets and remove them before processing side effects. */ + if (set && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))) + { + unsigned int regno = REGNO (SET_SRC (set)); + rtx r1 = find_oldest_value_reg (REGNO_REG_CLASS (regno), + SET_DEST (set), vd); + rtx r2 = find_oldest_value_reg (REGNO_REG_CLASS (regno), + SET_SRC (set), vd); + if (rtx_equal_p (r1 ? r1 : SET_DEST (set), r2 ? r2 : SET_SRC (set))) + { + bool last = insn == BB_END (bb); + delete_insn (insn); + if (last) + break; + continue; + } + } + extract_constrain_insn (insn); preprocess_constraints (insn); const operand_alternative *op_alt = which_op_alt (); @@ -860,7 +879,9 @@ copyprop_hardreg_forward_1 (basic_block register in the same class. */ if (REG_P (SET_DEST (set))) { - new_rtx = find_oldest_value_reg (REGNO_REG_CLASS (regno), src, vd); + new_rtx = find_oldest_value_reg (REGNO_REG_CLASS (regno), + src, vd); + if (new_rtx && validate_change (insn, &SET_SRC (set), new_rtx, 0)) { if (dump_file)