Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Richard Henderson
On 10/08/2014 08:31 AM, Jiong Wang wrote:
 
 On 30/09/14 19:36, Jiong Wang wrote:
 2014-09-30 17:30 GMT+01:00 Jeff Law l...@redhat.com:
 On 09/30/14 08:37, Jiong Wang wrote:

 On 30/09/14 05:21, Jeff Law wrote:

 I do agree with Richard that it would be useful to see the insns that
 are incorrectly sunk and the surrounding context.
 So I must be missing something.  I thought the shrink-wrapping code wouldn't
 sink arithmetic/logical insns like we see with insn 14 and insn 182.  I
 thought it was limited to reg-reg copies and constant initializations.
 yes, it was limited to reg-reg copies, and my previous sink improvement 
 aimed to
 sink any rtx

A: be single_set
B: the src operand be any combination of no more than one register
 and no non-constant objects.

 while some operator like shift may have side effect. IMHO, all side
 effects are reflected on RTX,
 together with this fail_on_clobber_use modification, the rtx returned
 by single_set_no_clobber_use is
 safe to sink if it meets the above limit B and pass later register
 use/def check in move_insn_for_shrink_wrap ?
 
 Ping ~
 
 And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap 
 invoked,
 we could avoid creating new wrapper function by invoke single_set_2 directly.

I'm committing the following to fix this.

(1) Don't bother modifying single_set; just look for a bare SET.
(2) Tighten the set of expressions we're willing to move.
(3) Use direct return false in the failure case, rather than
counting a non-zero number of non-constants, etc.

Tested on x86_64, and against Andi's test case (unfortunately unreduced).


r~

2014-10-10  Richard Henderson  r...@redhat.com

PR target/63404
* shrink-wrap.c (move_insn_for_shrink_wrap): Don't use single_set.
Restrict the set of expressions we're willing to move.


diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b1ff8a2..257812c 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -176,17 +176,40 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   basic_block next_block;
   edge live_edge;
 
-  /* Look for a simple register copy.  */
-  set = single_set (insn);
-  if (!set)
+  /* Look for a simple register assignment.  We don't use single_set here
+ because we can't deal with any CLOBBERs, USEs, or REG_UNUSED secondary
+ destinations.  */
+  if (!INSN_P (insn))
+return false;
+  set = PATTERN (insn);
+  if (GET_CODE (set) != SET)
 return false;
   src = SET_SRC (set);
   dest = SET_DEST (set);
 
-  if (!REG_P (src))
+  /* For the destination, we want only a register.  Also disallow STACK
+ or FRAME related adjustments.  They are likely part of the prologue,
+ so keep them in the entry block.  */
+  if (!REG_P (dest)
+  || dest == stack_pointer_rtx
+  || dest == frame_pointer_rtx
+  || dest == hard_frame_pointer_rtx)
+return false;
+
+  /* For the source, we want one of:
+  (1) A (non-overlapping) register
+  (2) A constant,
+  (3) An expression involving no more than one register.
+
+ That last point comes from the code following, which was originally
+ written to handle only register move operations, and still only handles
+ a single source register when checking for overlaps.  Happily, the
+ same checks can be applied to expressions like (plus reg const).  */
+
+  if (CONSTANT_P (src))
+;
+  else if (!REG_P (src))
 {
-  unsigned int reg_num = 0;
-  unsigned int nonconstobj_num = 0;
   rtx src_inner = NULL_RTX;
 
   if (can_throw_internal (insn))
@@ -196,30 +219,50 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   FOR_EACH_SUBRTX_VAR (iter, array, src, ALL)
{
  rtx x = *iter;
- if (REG_P (x))
+ switch (GET_RTX_CLASS (GET_CODE (x)))
{
- reg_num++;
- src_inner = x;
+   case RTX_CONST_OBJ:
+   case RTX_COMPARE:
+   case RTX_COMM_COMPARE:
+   case RTX_BIN_ARITH:
+   case RTX_COMM_ARITH:
+   case RTX_UNARY:
+   case RTX_TERNARY:
+ /* Constant or expression.  Continue.  */
+ break;
+
+   case RTX_OBJ:
+   case RTX_EXTRA:
+ switch (GET_CODE (x))
+   {
+   case UNSPEC:
+   case SUBREG:
+   case STRICT_LOW_PART:
+   case PC:
+ /* Ok.  Continue.  */
+ break;
+
+   case REG:
+ /* Fail if we see a second inner register.  */
+ if (src_inner != NULL)
+   return false;
+ src_inner = x;
+ break;
+
+   default:
+ return false;
+   }
+ break;
+
+   default:
+ return false;
}
- else if (!CONSTANT_P (x)  OBJECT_P (x))
-   nonconstobj_num++;
}
 
-  if (nonconstobj_num  0
-  

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jiong Wang


On 10/10/14 16:59, Richard Henderson wrote:

On 10/08/2014 08:31 AM, Jiong Wang wrote:


Ping ~

And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked,
we could avoid creating new wrapper function by invoke single_set_2 directly.

I'm committing the following to fix this.

(1) Don't bother modifying single_set; just look for a bare SET.
(2) Tighten the set of expressions we're willing to move.
(3) Use direct return false in the failure case, rather than
 counting a non-zero number of non-constants, etc.

Tested on x86_64, and against Andi's test case (unfortunately unreduced).


minor nit, after this patch, gcc.target/aarch64/shrink_wrap_symbol_ref_1.c 
still not shrink-wrapped under -mabi=ilp32,
the reason is as Pinski reported, LO_SUM is with with RTX_OBJ class, while it 
could be treated as expression.
in your fix, if it's RTX_OBJ, and LO_SUM, then the move aborted by return NULL.

anyway, thanks for the fix, I verified the shrink-wrapping of some hotpot 
functions in benchmark on aarch64 are not affected.

Regards,
Jiong




r~





Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Richard Henderson
On 10/10/2014 09:39 AM, Jiong Wang wrote:
 (1) Don't bother modifying single_set; just look for a bare SET.
 (2) Tighten the set of expressions we're willing to move.
 (3) Use direct return false in the failure case, rather than
  counting a non-zero number of non-constants, etc.

 Tested on x86_64, and against Andi's test case (unfortunately unreduced).
 
 minor nit, after this patch, gcc.target/aarch64/shrink_wrap_symbol_ref_1.c
 still not shrink-wrapped under -mabi=ilp32,
 the reason is as Pinski reported, LO_SUM is with with RTX_OBJ class, while it
 could be treated as expression.
 in your fix, if it's RTX_OBJ, and LO_SUM, then the move aborted by return 
 NULL.

I missed that message.  Interesting.

I wonder what kind of fallout there would be from changing LO_SUM to
RTX_BIN_ARITH, which is what it should have been all along.

My guess is that someone looked at HIGH being RTX_CONST_OBJ and thought that
LO_SUM should also be a kind of object too.  But it's really a lot more like a
kind of PLUS.  If instead we had a LOW to match HIGH it would have been

  (plus reg (low symbol))

and thus more obvious.

I'll see if I can bootstrap such a change on e.g. sparc or ppc32, which uses
lo_sum heavily.

 anyway, thanks for the fix, I verified the shrink-wrapping of some hotpot
 functions in benchmark on aarch64 are not affected.

Thanks for the additional testing.


r~



Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jeff Law

On 10/10/14 10:52, Richard Henderson wrote:


I missed that message.  Interesting.

I wonder what kind of fallout there would be from changing LO_SUM to
RTX_BIN_ARITH, which is what it should have been all along.

My guess is that someone looked at HIGH being RTX_CONST_OBJ and thought that
LO_SUM should also be a kind of object too.
Most likely.  One could even make an argument that LO_SUM should be a 
constant, but that's a more intrusive change.




 But it's really a lot more like a

kind of PLUS.  If instead we had a LOW to match HIGH it would have been
Right.  In fact, I believe at the hardware level it's typically 
implemented as addition.


Jeff


Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jakub Jelinek
On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote:
  But it's really a lot more like a
 kind of PLUS.  If instead we had a LOW to match HIGH it would have been
 Right.  In fact, I believe at the hardware level it's typically implemented
 as addition.

Can be addition or bitwise or, as high should have some low bits zero, both
usually work the same.

Jakub


Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jeff Law

On 10/10/14 11:04, Jakub Jelinek wrote:

On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote:

  But it's really a lot more like a

kind of PLUS.  If instead we had a LOW to match HIGH it would have been

Right.  In fact, I believe at the hardware level it's typically implemented
as addition.


Can be addition or bitwise or, as high should have some low bits zero, both
usually work the same.
It can be bitwise-or on some architectures, but I believe it's typically 
implemented as addition.  There's also typically an overlap between the 
high and low parts when building up addresses.  GCC actually takes 
advantage of that overlap to reduce unnecessary HIGH expressions.



And you can always have an oddball architecture like the PA where the 
LO_SUM does something utterly braindead.   It looks like this


addil %r27,symbolic nonsense

You might think %r27 holds the high bits of the address...  Umm, no it 
doesn't.  There's an implicit %r1 source/destination operand (which 
holds the high bits).  So what this really does is add %r27, %r1 and the 
symbolic constant.  %r27 is a data pointer.  Obviously the implicit 
operand is used get more bits holding the symbolic constant in the 
instruction.


If that's not bad enough, if the object is in readonly memory, then the 
linker will modify the instruction to change %r27 to %r0 (hardwired 0 
constant).But I'm getting offtopic here :-)


jeff


Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Richard Henderson
On 10/10/2014 10:21 AM, Jeff Law wrote:
 On 10/10/14 11:04, Jakub Jelinek wrote:
 On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote:
   But it's really a lot more like a
 kind of PLUS.  If instead we had a LOW to match HIGH it would have been
 Right.  In fact, I believe at the hardware level it's typically implemented
 as addition.

 Can be addition or bitwise or, as high should have some low bits zero, both
 usually work the same.
 It can be bitwise-or on some architectures...

Just to be clear, the LOW rtx code I mentioned upthread was for illustration
purposes only.  I only intend to change the rtx class of the existing LO_SUM.

:-)


r~



Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Jeff Law

On 10/10/14 11:25, Richard Henderson wrote:

On 10/10/2014 10:21 AM, Jeff Law wrote:

On 10/10/14 11:04, Jakub Jelinek wrote:

On Fri, Oct 10, 2014 at 11:00:54AM -0600, Jeff Law wrote:

   But it's really a lot more like a

kind of PLUS.  If instead we had a LOW to match HIGH it would have been

Right.  In fact, I believe at the hardware level it's typically implemented
as addition.


Can be addition or bitwise or, as high should have some low bits zero, both
usually work the same.

It can be bitwise-or on some architectures...


Just to be clear, the LOW rtx code I mentioned upthread was for illustration
purposes only.  I only intend to change the rtx class of the existing LO_SUM.

That was my expectation :-)
jeff



Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-10 Thread Richard Henderson
On 10/10/2014 09:52 AM, Richard Henderson wrote:
 I wonder what kind of fallout there would be from changing LO_SUM to
 RTX_BIN_ARITH, which is what it should have been all along.

The answer is a lot.  Mostly throughout simplify-rtx.c, wherein we'd have to
move all sorts of code around to accommodate the class change.

I've abandoned that line and I'm just testing a one line change to
shrink-wrap.c to accept LO_SUM.


r~



Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-10-08 Thread Jiong Wang


On 30/09/14 19:36, Jiong Wang wrote:

2014-09-30 17:30 GMT+01:00 Jeff Law l...@redhat.com:

On 09/30/14 08:37, Jiong Wang wrote:


On 30/09/14 05:21, Jeff Law wrote:


I do agree with Richard that it would be useful to see the insns that
are incorrectly sunk and the surrounding context.

So I must be missing something.  I thought the shrink-wrapping code wouldn't
sink arithmetic/logical insns like we see with insn 14 and insn 182.  I
thought it was limited to reg-reg copies and constant initializations.

yes, it was limited to reg-reg copies, and my previous sink improvement aimed to
sink any rtx

   A: be single_set
   B: the src operand be any combination of no more than one register
and no non-constant objects.

while some operator like shift may have side effect. IMHO, all side
effects are reflected on RTX,
together with this fail_on_clobber_use modification, the rtx returned
by single_set_no_clobber_use is
safe to sink if it meets the above limit B and pass later register
use/def check in move_insn_for_shrink_wrap ?


Ping ~

And as there is NONDEBUG_INSN_P check before move_insn_for_shrink_wrap invoked,
we could avoid creating new wrapper function by invoke single_set_2 directly.

comments?

bootstrap ok on x86-64, and no regression on check-gcc/g++.
will do aarch64 bootstrapping/regression going on.

2014-10-08  Jiong Wang  jiong.w...@arm.com

* rtl.h (single_set_2): New parameter fail_on_clobber_use.
(single_set): Likewise.
* config/ia64/ia64.c (ia64_single_set): Likewise.
* rtlanal.c (single_set_2): Return NULL_RTX if fail_on_clobber_use be 
true.
* shrink-wrap.c (move_insn_for_shrink_wrap): Use single_set_2.

Regards,
Jiong



Regards,
Jiong


Jeff


diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index 9337be1..09d3c4a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -7172,7 +7172,7 @@ ia64_single_set (rtx_insn *insn)
   break;
 
 default:
-  ret = single_set_2 (insn, x);
+  ret = single_set_2 (insn, x, false);
   break;
 }
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e73f731..c0b5bf5 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2797,7 +2797,7 @@ extern void set_insn_deleted (rtx);
 
 /* Functions in rtlanal.c */
 
-extern rtx single_set_2 (const rtx_insn *, const_rtx);
+extern rtx single_set_2 (const rtx_insn *, const_rtx, bool fail_on_clobber_use);
 
 /* Handle the cheap and common cases inline for performance.  */
 
@@ -2810,7 +2810,7 @@ inline rtx single_set (const rtx_insn *insn)
 return PATTERN (insn);
 
   /* Defer to the more expensive case.  */
-  return single_set_2 (insn, PATTERN (insn));
+  return single_set_2 (insn, PATTERN (insn), false);
 }
 
 extern enum machine_mode get_address_mode (rtx mem);
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 3063458..7d6ed27 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1182,7 +1182,7 @@ record_hard_reg_uses (rtx *px, void *data)
will not be used, which we ignore.  */
 
 rtx
-single_set_2 (const rtx_insn *insn, const_rtx pat)
+single_set_2 (const rtx_insn *insn, const_rtx pat, bool fail_on_clobber_use)
 {
   rtx set = NULL;
   int set_verified = 1;
@@ -1197,6 +1197,8 @@ single_set_2 (const rtx_insn *insn, const_rtx pat)
 	{
 	case USE:
 	case CLOBBER:
+	  if (fail_on_clobber_use)
+		return NULL_RTX;
 	  break;
 
 	case SET:
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b1ff8a2..a3b57b6 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -177,7 +177,8 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   edge live_edge;
 
   /* Look for a simple register copy.  */
-  set = single_set (insn);
+  set = (GET_CODE (PATTERN (insn)) == SET ? PATTERN (insn)
+	 : single_set_2 (insn, PATTERN (insn), true));
   if (!set)
 return false;
   src = SET_SRC (set);

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-30 Thread Richard Earnshaw
On 29/09/14 19:32, Richard Henderson wrote:
 On 09/29/2014 11:12 AM, Jiong Wang wrote:
 +inline rtx single_set_no_clobber_use (const rtx_insn *insn)
 +{
 +  if (!INSN_P (insn))
 +return NULL_RTX;
 +
 +  if (GET_CODE (PATTERN (insn)) == SET)
 +return PATTERN (insn);
 +
 +  /* Defer to the more expensive case, and return NULL_RTX if there is
 + USE or CLOBBER.  */
 +  return single_set_2 (insn, PATTERN (insn), true);
  }
 
 What more expensive case?
 
 If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == 
 SET.
 
 I think this function is somewhat useless, and should not be added.
 
 An adjustment to move_insn_for_shrink_wrap may be reasonable though.  I 
 haven't
 tried to understand the miscompilation yet.  I can imagine that this would
 disable quite a bit of shrink wrapping for x86 though.  Can we do better in
 understanding when the clobbered register is live at the location to which 
 we'd
 like to move then insns?
 
 
 r~
 

I think part of the problem is in the naming of single_set().  From the
name it's not entirely obvious to users that this includes insns that
clobber registers or which write other registers that are unused after
that point.  I've previously had to fix a bug where this assumption was
made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300)

Most uses of single_set prior to register allocation are probably safe;
but later uses are fraught with potential problems of this nature and
may well be bugs waiting to happen.

R.



Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-30 Thread Jiong Wang


On 30/09/14 05:21, Jeff Law wrote:

On 09/29/14 13:24, Jiong Wang wrote:

I don't think so. from the x86-64 bootstrap, there is no regression
on the number of functions shrink-wrapped. actually speaking,
previously only single mov dest, src handled, so the disallowing
USE/CLOBBER will not disallow shrink-wrap opportunity which was
allowed previously.

This is the key, of course.  shrink-wrapping is very restrictive in its
ability to sink insns.  The only forms it'll currently shrink are simple
moves.  Arithmetic, logicals, etc are left alone.  Thus disallowing
USE/CLOBBER does not impact the x86 port in any significant way.


yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped
after we sinking more insn except simple mov dest, src  on x86-64 bootstrap. 
and
I remember the similar percentage on glibc build.

while on aarch64, the overall functions shrink-wrapped increased +25% on some
programs. maybe we could gain the same on other RISC backend.



I do agree with Richard that it would be useful to see the insns that
are incorrectly sunk and the surrounding context.


insn 14 and 182 are sunk incorrectly. below is the details.

(insn 14 173 174 2 (parallel [

(set (reg:QI 37 r8 [orig:86 D.32480 ] [86])

(lshiftrt:QI (reg:QI 37 r8 [orig:86 D.32480 ] [86])

(const_int 2 [0x2])))

(clobber (reg:CC 17 flags))

]) /home/andi/lsrc/linux/block/blk-flush2.c:50 547 {*lshrqi3_1}

 (expr_list:REG_EQUAL (lshiftrt:QI (mem:QI (plus:DI (reg/v/f:DI 43 r14 
[orig:85 q ] [85])

(const_int 1612 [0x64c])) [20 *q_7+1612 S1 A32])

(const_int 2 [0x2]))

(nil)))

(insn 174 14 182 2 (set (reg:QI 44 r15 [orig:86 D.32480 ] [86])

(reg:QI 37 r8 [orig:86 D.32480 ] [86])) 
/home/andi/lsrc/linux/block/blk-flush2.c:50 93 {*movqi_internal}

 (nil))

(insn 182 174 16 2 (parallel [

(set (reg:SI 44 r15 [orig:86 D.32480 ] [86])

(and:SI (reg:SI 44 r15 [orig:86 D.32480 ] [86])

(const_int 1 [0x1])))

(clobber (reg:CC 17 flags))

]) /home/andi/lsrc/linux/block/blk-flush2.c:50 376 {*andsi_1}

 (nil))


Jeff








Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-30 Thread Jeff Law

On 09/30/14 08:37, Jiong Wang wrote:


On 30/09/14 05:21, Jeff Law wrote:

On 09/29/14 13:24, Jiong Wang wrote:

I don't think so. from the x86-64 bootstrap, there is no regression
on the number of functions shrink-wrapped. actually speaking,
previously only single mov dest, src handled, so the disallowing
USE/CLOBBER will not disallow shrink-wrap opportunity which was
allowed previously.

This is the key, of course.  shrink-wrapping is very restrictive in its
ability to sink insns.  The only forms it'll currently shrink are simple
moves.  Arithmetic, logicals, etc are left alone.  Thus disallowing
USE/CLOBBER does not impact the x86 port in any significant way.


yes, and we could get +1.22% (2567 compared with 2536) functions
shrink-wrapped
after we sinking more insn except simple mov dest, src  on x86-64
bootstrap. and
I remember the similar percentage on glibc build.

while on aarch64, the overall functions shrink-wrapped increased +25% on
some
programs. maybe we could gain the same on other RISC backend.



I do agree with Richard that it would be useful to see the insns that
are incorrectly sunk and the surrounding context.
So I must be missing something.  I thought the shrink-wrapping code 
wouldn't sink arithmetic/logical insns like we see with insn 14 and insn 
182.  I thought it was limited to reg-reg copies and constant 
initializations.


Jeff




Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-30 Thread Jeff Law

On 09/30/14 08:15, Richard Earnshaw wrote:


I think part of the problem is in the naming of single_set().  From the
name it's not entirely obvious to users that this includes insns that
clobber registers or which write other registers that are unused after
that point.  I've previously had to fix a bug where this assumption was
made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300)

Most uses of single_set prior to register allocation are probably safe;
but later uses are fraught with potential problems of this nature and
may well be bugs waiting to happen.
Very possibly.  There's a bit of a natural tension here in that often we 
don't much care about the additional CLOBBERS, but when we get it wrong, 
obviously it's bad.


I haven't done any research, but I suspect the change it ignore clobbers 
in single_set came in as part of exposing the CC register and avoiding 
regressions all over the place as a result.


I wonder what would happen if we ignored prior to register allocation, 
then rejected insns with those CLOBBERs once register allocation started.


Jeff


Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-30 Thread Richard Earnshaw
On 30/09/14 17:45, Jeff Law wrote:
 On 09/30/14 08:15, Richard Earnshaw wrote:

 I think part of the problem is in the naming of single_set().  From the
 name it's not entirely obvious to users that this includes insns that
 clobber registers or which write other registers that are unused after
 that point.  I've previously had to fix a bug where this assumption was
 made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300)

 Most uses of single_set prior to register allocation are probably safe;
 but later uses are fraught with potential problems of this nature and
 may well be bugs waiting to happen.
 Very possibly.  There's a bit of a natural tension here in that often we 
 don't much care about the additional CLOBBERS, but when we get it wrong, 
 obviously it's bad.
 
 I haven't done any research, but I suspect the change it ignore clobbers 
 in single_set came in as part of exposing the CC register and avoiding 
 regressions all over the place as a result.

It's not just clobbers; it ignores patterns like

(parallel
 [(set (a) (...)
  (set (b) (...)])
[(reg_note (reg_unused(b))]

Which is probably fine before register allocation but definitely
something you have to think about afterwards.

 
 I wonder what would happen if we ignored prior to register allocation, 
 then rejected insns with those CLOBBERs once register allocation started.
 

Might work; but it might miss some useful cases as well...

R.




Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-30 Thread Jiong Wang
2014-09-30 17:30 GMT+01:00 Jeff Law l...@redhat.com:
 On 09/30/14 08:37, Jiong Wang wrote:


 On 30/09/14 05:21, Jeff Law wrote:

 On 09/29/14 13:24, Jiong Wang wrote:

 I don't think so. from the x86-64 bootstrap, there is no regression
 on the number of functions shrink-wrapped. actually speaking,
 previously only single mov dest, src handled, so the disallowing
 USE/CLOBBER will not disallow shrink-wrap opportunity which was
 allowed previously.

 This is the key, of course.  shrink-wrapping is very restrictive in its
 ability to sink insns.  The only forms it'll currently shrink are simple
 moves.  Arithmetic, logicals, etc are left alone.  Thus disallowing
 USE/CLOBBER does not impact the x86 port in any significant way.


 yes, and we could get +1.22% (2567 compared with 2536) functions
 shrink-wrapped
 after we sinking more insn except simple mov dest, src  on x86-64
 bootstrap. and
 I remember the similar percentage on glibc build.

 while on aarch64, the overall functions shrink-wrapped increased +25% on
 some
 programs. maybe we could gain the same on other RISC backend.


 I do agree with Richard that it would be useful to see the insns that
 are incorrectly sunk and the surrounding context.

 So I must be missing something.  I thought the shrink-wrapping code wouldn't
 sink arithmetic/logical insns like we see with insn 14 and insn 182.  I
 thought it was limited to reg-reg copies and constant initializations.

yes, it was limited to reg-reg copies, and my previous sink improvement aimed to
sink any rtx

  A: be single_set
  B: the src operand be any combination of no more than one register
and no non-constant objects.

while some operator like shift may have side effect. IMHO, all side
effects are reflected on RTX,
together with this fail_on_clobber_use modification, the rtx returned
by single_set_no_clobber_use is
safe to sink if it meets the above limit B and pass later register
use/def check in move_insn_for_shrink_wrap ?

Regards,
Jiong


 Jeff




Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-30 Thread Steven Bosscher
On Tue, Sep 30, 2014 at 6:51 PM, Richard Earnshaw wrote:
 It's not just clobbers; it ignores patterns like

 (parallel
  [(set (a) (...)
   (set (b) (...)])
 [(reg_note (reg_unused(b))]

 Which is probably fine before register allocation but definitely
 something you have to think about afterwards.

Even before RA this isn't always fine. We have checks for
!multiple_sets for this.

Ciao!
Steven


[PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-29 Thread Jiong Wang

it's exposed by linux kernel for x86.

the root cause is current single_set will ignore CLOBBER  USE,
while we need to take them into account when handling shrink-wrap.

this patch add one parameter to single_set_2 to support return
NULL_RTX if we want to remove any side-effect. add a new helper function
single_set_no_clobber_use added.

pass x86-64 bootstrap  check-gcc/g++, also manually checked ths issue
reported at 63404 is gone away.

also no regression on aarch64-none-elf regression test.

comments?

thanks.

2014-09-26  Jiong Wang  jiong.w...@arm.com

* rtl.h (single_set_no_clobber_use): New function.
(single_set_2): New parameter fail_on_clobber_use.
(single_set): Likewise.
* config/ia64/ia64.c (ia64_single_set): Likewise.
* rtlanal.c (single_set_2): Return NULL_RTX if fail_on_clobber_use be 
true.
* shrink-wrap.c (move_insn_for_shrink_wrap): Use 
single_set_no_clobber_use.

diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index 9337be1..09d3c4a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -7172,7 +7172,7 @@ ia64_single_set (rtx_insn *insn)
   break;
 
 default:
-  ret = single_set_2 (insn, x);
+  ret = single_set_2 (insn, x, false);
   break;
 }
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e73f731..7c40d5a 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2797,7 +2797,7 @@ extern void set_insn_deleted (rtx);
 
 /* Functions in rtlanal.c */
 
-extern rtx single_set_2 (const rtx_insn *, const_rtx);
+extern rtx single_set_2 (const rtx_insn *, const_rtx, bool fail_on_clobber_use);
 
 /* Handle the cheap and common cases inline for performance.  */
 
@@ -2810,7 +2810,20 @@ inline rtx single_set (const rtx_insn *insn)
 return PATTERN (insn);
 
   /* Defer to the more expensive case.  */
-  return single_set_2 (insn, PATTERN (insn));
+  return single_set_2 (insn, PATTERN (insn), false);
+}
+
+inline rtx single_set_no_clobber_use (const rtx_insn *insn)
+{
+  if (!INSN_P (insn))
+return NULL_RTX;
+
+  if (GET_CODE (PATTERN (insn)) == SET)
+return PATTERN (insn);
+
+  /* Defer to the more expensive case, and return NULL_RTX if there is
+ USE or CLOBBER.  */
+  return single_set_2 (insn, PATTERN (insn), true);
 }
 
 extern enum machine_mode get_address_mode (rtx mem);
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 3063458..cb5e36a 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1182,7 +1182,7 @@ record_hard_reg_uses (rtx *px, void *data)
will not be used, which we ignore.  */
 
 rtx
-single_set_2 (const rtx_insn *insn, const_rtx pat)
+single_set_2 (const rtx_insn *insn, const_rtx pat, bool fail_on_clobber_use)
 {
   rtx set = NULL;
   int set_verified = 1;
@@ -1197,6 +1197,8 @@ single_set_2 (const rtx_insn *insn, const_rtx pat)
 	{
 	case USE:
 	case CLOBBER:
+	  if (fail_on_clobber_use)
+		return NULL_RTX;
 	  break;
 
 	case SET:
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b1ff8a2..5624ef7 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -177,7 +177,7 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn,
   edge live_edge;
 
   /* Look for a simple register copy.  */
-  set = single_set (insn);
+  set = single_set_no_clobber_use (insn);
   if (!set)
 return false;
   src = SET_SRC (set);

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-29 Thread Richard Henderson
On 09/29/2014 11:12 AM, Jiong Wang wrote:
 +inline rtx single_set_no_clobber_use (const rtx_insn *insn)
 +{
 +  if (!INSN_P (insn))
 +return NULL_RTX;
 +
 +  if (GET_CODE (PATTERN (insn)) == SET)
 +return PATTERN (insn);
 +
 +  /* Defer to the more expensive case, and return NULL_RTX if there is
 + USE or CLOBBER.  */
 +  return single_set_2 (insn, PATTERN (insn), true);
  }

What more expensive case?

If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET.

I think this function is somewhat useless, and should not be added.

An adjustment to move_insn_for_shrink_wrap may be reasonable though.  I haven't
tried to understand the miscompilation yet.  I can imagine that this would
disable quite a bit of shrink wrapping for x86 though.  Can we do better in
understanding when the clobbered register is live at the location to which we'd
like to move then insns?


r~


Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-29 Thread Jiong Wang


On 29/09/14 19:32, Richard Henderson wrote:

On 09/29/2014 11:12 AM, Jiong Wang wrote:

+inline rtx single_set_no_clobber_use (const rtx_insn *insn)
+{
+  if (!INSN_P (insn))
+return NULL_RTX;
+
+  if (GET_CODE (PATTERN (insn)) == SET)
+return PATTERN (insn);
+
+  /* Defer to the more expensive case, and return NULL_RTX if there is
+ USE or CLOBBER.  */
+  return single_set_2 (insn, PATTERN (insn), true);
  }


Richard,

  thanks for review.


What more expensive case?


single_set_no_clobber_use is just a clone of single_set, I copied the comments 
with
only minor modifications.

I think the more expensive case here means the case where there are PARALLEL 
that
we need to check the inner rtx.



If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET.

I think this function is somewhat useless, and should not be added.

An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't
tried to understand the miscompilation yet.  I can imagine that this would
disable quite a bit of shrink wrapping for x86 though.


I don't think so. from the x86-64 bootstrap, there is no regression on the 
number
of functions shrink-wrapped. actually speaking, previously only single mov 
dest, src
handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap 
opportunity which
was allowed previously.

and I am afraid if we don't reuse single_set_2, then there will be another loop
to check all those inner rtx which single_set_2 already does.

so, IMHO, just modify single_set_2 will be more efficient.


Can we do better in
understanding when the clobbered register is live at the location to which we'd
like to move then insns?


currently, the generic code in move_insn_for_shrink_wrap only handle dest/src 
be single
register, so if there is clobber or use, then we might need to check multiply 
regs, then there might
be a few modifications. and I think that's better be done after all single 
dest/src issues fixed.

--
Jiong





r~








Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-29 Thread David Malcolm
On Mon, 2014-09-29 at 20:24 +0100, Jiong Wang wrote:
 On 29/09/14 19:32, Richard Henderson wrote:
  On 09/29/2014 11:12 AM, Jiong Wang wrote:
  +inline rtx single_set_no_clobber_use (const rtx_insn *insn)
  +{
  +  if (!INSN_P (insn))
  +return NULL_RTX;
  +
  +  if (GET_CODE (PATTERN (insn)) == SET)
  +return PATTERN (insn);
  +
  +  /* Defer to the more expensive case, and return NULL_RTX if there is
  + USE or CLOBBER.  */
  +  return single_set_2 (insn, PATTERN (insn), true);
}
 
 Richard,
 
thanks for review.
 
  What more expensive case?
 
 single_set_no_clobber_use is just a clone of single_set, I copied the 
 comments with
 only minor modifications.

I introduced that comment to single_set, in r215089, when making
single_set into an inline function (so that it could check that it
received an rtx_insn *, rather than an rtx).

 I think the more expensive case here means the case where there are 
 PARALLEL that
 we need to check the inner rtx.

My comment may have been misleading, sorry.  IIRC, what I was thinking
that the old implementation had split single_set into a macro and a
function.  This was by Honza (CCed), 14 years ago to the day back in
r36664 (on 2000-09-29):
https://gcc.gnu.org/ml/gcc-patches/2000-09/msg00893.html


/* Single set is implemented as macro for performance reasons.  */
#define single_set(I) (INSN_P (I) \
  ? (GET_CODE (PATTERN (I)) == SET \
 ? PATTERN (I) : single_set_1 (I)) \
  : NULL_RTX)

I think by the more expensive case I meant having to make a function
call to handle the less-common cases (which indeed covers the PARALLEL
case), rather than having logic inline; preserving that inlined vs
not-inlined split was one of my aims for r215089.

Perhaps it should be rewritten to Defer to a function call to handle
the less common cases, or somesuch?

[...snip rest of post...]

Dave



Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-29 Thread Jeff Law

On 09/29/14 13:24, Jiong Wang wrote:


I don't think so. from the x86-64 bootstrap, there is no regression
on the number of functions shrink-wrapped. actually speaking,
previously only single mov dest, src handled, so the disallowing
USE/CLOBBER will not disallow shrink-wrap opportunity which was
allowed previously.
This is the key, of course.  shrink-wrapping is very restrictive in its 
ability to sink insns.  The only forms it'll currently shrink are simple 
moves.  Arithmetic, logicals, etc are left alone.  Thus disallowing 
USE/CLOBBER does not impact the x86 port in any significant way.


I do agree with Richard that it would be useful to see the insns that 
are incorrectly sunk and the surrounding context.

Jeff