Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-28 Thread Bernd Schmidt
On 11/28/19 8:53 PM, Gunther Nikl wrote:> Bernd Schmidt
:
>> On 11/23/19 9:53 PM, Bernd Schmidt wrote:

>> move.w %a4,%d0
>> -   tst.b %d0
>> -   jeq .L352
>> +   jeq .L353
>>
>> And the reason - that's a movqi using move.w.
>
> Can this problem also happen on older (pre-ccmode) GCC versions? Or was
> this only an issue of the ccmode conversion?

This was an error in the conversion (I think).

> What about the compare constraint errors? Are those also present on
> older GCC versions but never surfaced?

Those were present, but presumably nothing ever tried to rerecognize a
compare insn after reload (unlike jumps) so you wouldn't get a crash. I
haven't checked whether it could have produced invalid assembly - I'm
guessing probably not.


Bernd


Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-25 Thread Bernd Schmidt
On 11/26/19 3:21 AM, Joseph Myers wrote:
> 
> The soft-float ColdFire build (--with-arch=cf --with-cpu=54455 
> --disable-multilib) successfully built libgcc and glibc, but ran into an 
> ICE building the glibc tests.  Again, I've not bisected but this commit 
> seems likely to be responsible.  Compile the attached preprocessed source 
> with -O2.

Try the following. This seems to be the same (preexisting) problem which
got fixed for the normal 68k compare patterns, where an operand with
nonimmediate_operand predicate allows constants in its constraints.


Bernd
diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index 25e0b73741f..d6df385787a 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -496,7 +496,7 @@
 ;; needs to be reloaded.
 
 (define_mode_attr scc0_cf_constraints [(QI "=d") (HI "=d") (SI "=d,d,d")])
-(define_mode_attr cmp1_cf_constraints [(QI "dm") (HI "dm") (SI "mrKs,r,rm")])
+(define_mode_attr cmp1_cf_constraints [(QI "dm") (HI "dm") (SI "mr,r,rm")])
 (define_mode_attr cmp2_cf_constraints [(QI "C0") (HI "C0") (SI "r,mrKs,C0")])
 (define_mode_attr cmp2_cf_predicate [(QI "const0_operand") (HI "const0_operand") (SI "general_operand")])
 


Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-25 Thread Bernd Schmidt
On 11/26/19 1:36 AM, Joseph Myers wrote:
> I'm seeing a libgcc build failure for coldfire in my build-many-glibcs.py 
> bot (m68k-linux-gnu configured --with-arch=cf --disable-multilib).  That's 
> building _mulsc3.o; I get assembler errors:

I overlooked a difference in the 68881 vs coldfire patterns when I
combined them. They use different suffixes for register compares (I only
spotted the different constraints).

The following seems to fix the assembler failures. I cannot properly
test Coldfire however due to lack of hardware.


Bernd
	* config/m68k/,68k.c (m68k_output_compare_fp): Use .d instead of .x
	for register compares on Coldfire.

diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index 8d010ebe6e9..4b30c401e80 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -4501,7 +4501,12 @@ m68k_output_compare_fp (rtx op0, rtx op1, rtx_code code)
   if (op1 == CONST0_RTX (GET_MODE (op0)))
 {
   if (FP_REG_P (op0))
-	output_asm_insn ("ftst%.x %0", ops);
+	{
+	  if (TARGET_COLDFIRE_FPU)
+	output_asm_insn ("ftst%.d %0", ops);
+	  else
+	output_asm_insn ("ftst%.x %0", ops);
+	}
   else
 	output_asm_insn (("ftst%." + prec + " %0").c_str (), ops);
   return code;
@@ -4510,7 +4515,10 @@ m68k_output_compare_fp (rtx op0, rtx op1, rtx_code code)
   switch (which_alternative)
 {
 case 0:
-  output_asm_insn ("fcmp%.x %1,%0", ops);
+  if (TARGET_COLDFIRE_FPU)
+	output_asm_insn ("fcmp%.d %1,%0", ops);
+  else
+	output_asm_insn ("fcmp%.x %1,%0", ops);
   break;
 case 1:
   output_asm_insn (("fcmp%." + prec + " %f1,%0").c_str (), ops);


Autoinc vs reload and LRA

2019-11-25 Thread Bernd Schmidt
So I was curious what would happen if I turned on LRA for m68k. It turns
out my autoinc patches from the cc0 patch set expose a bug in how LRA
handles autoincrement. While it copies the logic from reload's
inc_for_reload, it appears to be missing the find_reloads_address code
to ensure an autoinc address is reloaded entirely if it is part of a
jump. LRA can reload just the register inside a POST_INC, which leads to
creating an output reload.

Hence, a new version of the autoinc changes, below. Since reload is
known to work, we allow autoinc in jumps unless targetm.lra_p. One part
of the patch is a fix for the earlier combine patch which was already
checked in, the other part is a new version of the auto-inc-dec patch.
Bootstrapped and tested on the gcc135 machine
(powerpc64le-unknown-linux-gnu). OK?


Bernd
	* auto-inc-dec.c (merge_in_block): Allow autoinc in jumps unless
	LRA is enabled.
	* combine.c (can_combine_p): Disallow autoinc in jumps unless LRA is
	disabled.

diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c
index bdb6efab520..1b224cc9777 100644
--- a/gcc/auto-inc-dec.c
+++ b/gcc/auto-inc-dec.c
@@ -1441,10 +1441,9 @@ merge_in_block (int max_reg, basic_block bb)
 	  continue;
 	}
 
-  /* This continue is deliberate.  We do not want the uses of the
-	 jump put into reg_next_use because it is not considered safe to
-	 combine a preincrement with a jump.  */
-  if (JUMP_P (insn))
+  /* Reload should handle auto-inc within a jump correctly, while LRA
+	 is known to have issues with autoinc.  */
+  if (JUMP_P (insn) && targetm.lra_p ())
 	continue;
 
   if (dump_file)
diff --git a/gcc/combine.c b/gcc/combine.c
index 2e21459f504..3fbd84fcb80 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2117,12 +2117,16 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn *pred ATTRIBUTE_UNUSED,
 
   /* If INSN contains an autoincrement or autodecrement, make sure that
  register is not used between there and I3, and not already used in
- I3 either.  Neither must it be used in PRED or SUCC, if they exist.  */
+ I3 either.  Neither must it be used in PRED or SUCC, if they exist.
+ Also insist that I3 not be a jump if using LRA; if it were one
+ and the incremented register were spilled, we would lose.
+ Reload handles this correctly.  */
 
   if (AUTO_INC_DEC)
 for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
   if (REG_NOTE_KIND (link) == REG_INC
-	  && (reg_used_between_p (XEXP (link, 0), insn, i3)
+	  && ((JUMP_P (i3) && targetm.lra_p ())
+	  || reg_used_between_p (XEXP (link, 0), insn, i3)
 	  || (pred != NULL_RTX
 		  && reg_overlap_mentioned_p (XEXP (link, 0), PATTERN (pred)))
 	  || (pred2 != NULL_RTX


Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-25 Thread Bernd Schmidt
On 11/25/19 1:38 PM, Tobias Burnus wrote:
> Thanks for the m68k work! Can you also update
> https://gcc.gnu.org/backends.html ?

Committed as obvious.


Bernd

commit f42834ad5e77c05cb6bc0908b8fc9282fec7fc19
Author: Bernd Schmidt 
Date:   Mon Nov 25 13:48:08 2019 +0100

Change backends table to show m68k does not use cc0

diff --git a/htdocs/backends.html b/htdocs/backends.html
index c9449065..c4b916d3 100644
--- a/htdocs/backends.html
+++ b/htdocs/backends.html
@@ -89,7 +89,7 @@ iq2000 | ???   FICB   b  g  t
 lm32   |   F g
 m32c   |L  FIlb  gs
 m32r   |   FI b   s
-m68k   |   ?cpb   i
+m68k   |   ? pb   i
 mcore  |  ?FIpb mgs
 mep|   F Cb  g  t s
 microblaze | CB   i   s


Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-25 Thread Bernd Schmidt
On 11/25/19 1:34 PM, John Paul Adrian Glaubitz wrote:
> Are all 4 + 2 patches in now? Thus, can we close the bug?

We're missing one piece for better autoinc generation, but that's a
small optimization issue. The cc0 conversion is complete.


Bernd



Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-25 Thread Bernd Schmidt
On 11/23/19 6:36 PM, Jeff Law wrote:

> Not really.  I've already indicated to Bernd that he should go ahead and
> commit the changes and we can iterate on any problems that arise.

After the last fix, I did some more testing and since I feel confident
that it really is in good shape now, I committed it. Thanks!


Bernd


Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-25 Thread Bernd Schmidt
On 11/25/19 12:26 PM, Andreas Schwab wrote:
> On Nov 24 2019, Bernd Schmidt wrote:
> 
>> Whew, I think I have it. One tst instruction eliminated when it
>> shouldn't have been:
>>
>> move.w %a4,%d0
>> -   tst.b %d0
>> -   jeq .L352
>> +   jeq .L353
>>
>> And the reason - that's a movqi using move.w. The following should fix
>> it.
> 
> Apparently that also fixed the testsuite regressions.

I still wish I knew how you managed to reproduce those. That would have
been easier to debug than messing around with a three-stage
cross-to-native setup.


Bernd


Re: [PATCH 4/4] Fix autoinc cbranch

2019-11-24 Thread Bernd Schmidt
On 11/24/19 8:43 PM, Segher Boessenkool wrote:
> But.  Allowing autoinc into jump insns means those jump insns may then
> eventually need an output reload; it may just have been because of that?

That's almost certainly the reasoning, but as I pointed out in my
original mail - reload is careful around autoincs and emits the reload
sequence before the reloaded insn.

For example, see inc_for_reload:
  /* Postincrement.
 Because this might be a jump insn or a compare, and because RELOADREG
 may not be available after the insn in an input reload, we must do
 the incrementation before the insn being reloaded for.

On m68k with cc0 this is already necessary, because even a cmp insn
cannot have output reloads (they would overwrite the flags). This is why
I think it should be safe to allow them in jumps too.


Bernd


Re: [PATCH 4/4] Fix autoinc cbranch

2019-11-24 Thread Bernd Schmidt
On 11/19/19 1:27 AM, Segher Boessenkool wrote:
> The combine parts are okay for trunk, if you keep an eye out :-)

Thanks, now committed. That leaves the auto-inc-dec part. Since we're
being adventurous, I've also bootstrapped and tested the following in
the meantime (on the gcc135 machine). This just deletes the tests for
jumps entirely rather than hedging bets.


Bernd
Index: gcc/auto-inc-dec.c
===
--- gcc/auto-inc-dec.c	(revision 278653)
+++ gcc/auto-inc-dec.c	(working copy)
@@ -1441,12 +1441,6 @@ merge_in_block (int max_reg, basic_block
 	  continue;
 	}
 
-  /* This continue is deliberate.  We do not want the uses of the
-	 jump put into reg_next_use because it is not considered safe to
-	 combine a preincrement with a jump.  */
-  if (JUMP_P (insn))
-	continue;
-
   if (dump_file)
 	dump_insn_slim (dump_file, insn);
 


Re: [PATCH ix86] Fix rtx_costs for flag-setting adds

2019-11-24 Thread Bernd Schmidt
On 11/22/19 6:05 PM, Uros Bizjak wrote:
> Indeed, this is a different case, an overflow test that results in one
> CMP insn. I think, we should check if the second operand is either 0
> (then proceed as it is now), or if the second operand equals first
> operand of PLUS insn, then we actually emit CMP insn (please see
> PR30315).

Here's what I committed - preapproved by Uros off-list (I forgot
Reply-All again).


Bernd
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 278653)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2019-11-24  Bernd Schmidt  
+
+	* config/i386/i386.c (ix86_rtx_costs): Handle care of a PLUS in a
+	COMPARE, representing an overflow detection.
+
 2019-11-23  Jan Hubicka  
 
 	* cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT): Remove.
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 278653)
+++ gcc/config/i386/i386.c	(working copy)
@@ -19501,6 +19501,15 @@ ix86_rtx_costs (rtx x, machine_mode mode
 	  return true;
 	}
 
+  if (GET_CODE (XEXP (x, 0)) == PLUS
+	  && rtx_equal_p (XEXP (XEXP (x, 0), 0), XEXP (x, 1)))
+	{
+	  /* This is an overflow detection, count it as a normal compare.  */
+	  *total = rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
+			 COMPARE, 0, speed);
+	  return true;
+	}
+
   /* The embedded comparison operand is completely free.  */
   if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
 	  && XEXP (x, 1) == const0_rtx)


Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-23 Thread Bernd Schmidt
On 11/23/19 9:53 PM, Bernd Schmidt wrote:
> I'll spend a few more days trying to see if I can do something about the
> bootstrap failure Mikael saw (currently trying to do a two-stage cross
> build rather than a really slow bootstrap).

Whew, I think I have it. One tst instruction eliminated when it
shouldn't have been:

move.w %a4,%d0
-   tst.b %d0
-   jeq .L352
+   jeq .L353

And the reason - that's a movqi using move.w. The following should fix
it. I'll run a few more tests, and then check it all in as Jeff
suggested if things looks OK.


Bernd
diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index c72325fa4ab..8d010ebe6e9 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -3314,7 +3314,11 @@ output_move_qimode (rtx *operands)
   /* 68k family (including the 5200 ColdFire) does not support byte moves to
  from address registers.  */
   if (ADDRESS_REG_P (operands[0]) || ADDRESS_REG_P (operands[1]))
-return "move%.w %1,%0";
+{
+  if (ADDRESS_REG_P (operands[1]))
+	CC_STATUS_INIT;
+  return "move%.w %1,%0";
+}
   return "move%.b %1,%0";
 }
 


Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-23 Thread Bernd Schmidt
On 11/23/19 6:36 PM, Jeff Law wrote:
> Not really.  I've already indicated to Bernd that he should go ahead and 
> commit the changes and we can iterate on any problems that arise.

In the meantime I've made an aranym setup in addition to the qemu setup
I had, and I've not been able to reproduce failures like the ones
Andreas reported.
I'll spend a few more days trying to see if I can do something about the
bootstrap failure Mikael saw (currently trying to do a two-stage cross
build rather than a really slow bootstrap).


Bernd


Re: [PATCH ix86] Fix rtx_costs for flag-setting adds

2019-11-22 Thread Bernd Schmidt
On 11/22/19 3:04 PM, Uros Bizjak wrote:
> On Fri, Nov 22, 2019 at 1:58 PM Bernd Schmidt  wrote:
>>
>> A patch I posted recently fixes combine to take costs of JUMP_INSNs into
>> account. That causes the pr30315 test to fail with -m32, since the cost
>> of an add that sets the flags is estimated too high.
>>
>> The following seems to fix it.  Bootstrapped and tested on x86_64-linux, ok?
> 
> I think that the intention of the code below  "The embedded comparison
> operand is completely free." comment is to handle this case. It looks
> that it should return the cost of the inside operation of COMPARE rtx.

There seem to be two problems with that. We're dealing with patterns
such as:

(set (reg:CCC 17 flags)
(compare:CCC (plus:SI (mem/c:SI (reg/f:SI 16 argp) [2 a+0 S4
A32])
(reg/v:SI 87 [ b ]))
(mem/c:SI (reg/f:SI 16 argp) [2 a+0 S4 A32])))

If I remove the test for const0_rtx, it still doesn't work - I think
setting *total to zero is ineffective, since we'll still count the MEM
twice.

So, how about the following?


Bernd

@@ -19502,9 +19502,12 @@
}

   /* The embedded comparison operand is completely free.  */
-  if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
- && XEXP (x, 1) == const0_rtx)
-   *total = 0;
+  if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0
+   {
+ *total = rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
+COMPARE, 0, speed);
+ return true;
+   }

   return false;



[PATCH ix86] Fix rtx_costs for flag-setting adds

2019-11-22 Thread Bernd Schmidt
A patch I posted recently fixes combine to take costs of JUMP_INSNs into
account. That causes the pr30315 test to fail with -m32, since the cost
of an add that sets the flags is estimated too high.

The following seems to fix it.  Bootstrapped and tested on x86_64-linux, ok?


Bernd

	* config/i386/i386.c (ix86_rtx_costs): For a PLUS inside a COMPARE,
	representing an add that sets the flags, count just the PLUS.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7115ec44c2a..6e48f5ccbde 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19500,6 +19500,11 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 		+ rtx_cost (const1_rtx, mode, outer_code, opno, speed));
 	  return true;
 	}
+  if (GET_CODE (XEXP (x, 0)) == PLUS)
+	{
+	  *total = rtx_cost (XEXP (x, 0), mode, COMPARE, 0, speed);
+	  return true;
+	}
 
   /* The embedded comparison operand is completely free.  */
   if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0)))


Re: [PATCH 3/4] Set costs for jumps in combine

2019-11-21 Thread Bernd Schmidt
On 11/22/19 1:42 AM, Segher Boessenkool wrote:
> On Thu, Nov 21, 2019 at 02:36:53PM +0100, Bernd Schmidt wrote:
>> Thanks. Just FYI, this is held up a little. I decided I'd also test on
>> x86, and there it shows a case where ix86_rtx_cost misses something: the
>> i386/pr30315.c testcase wants to combine compares into addition+jump on
>> carry, but the rtx_costs show too high a cost for (compare (plus)). I'm
>> testing a fix for that in i386.c.
> 
> Maybe i386 should implement the insn_cost hook as well?  For most targets
> that is a lot simpler to get right than rtx_cost, but allowing memory in
> many insns and all the different insn lengths complicates matters.  At
> least insn_cost isn't inside-out, that should make it easier to deal with
> already.

That kind of thing is up to the x86 maintainers. I think the problem at
hand can be fixed quite simply by detecting PLUS inside COMPARE and just
counting it like we would a normal PLUS. Patch will follow once testing
is complete.


Bernd


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-21 Thread Bernd Schmidt
On 11/21/19 1:30 PM, Matthias Klose wrote:
> 
> that would be apt build-dep gcc-9. The former would only install the build
> dependencies of the gcc-defaults package.

That gets me

E: You must put some 'source' URIs in your sources.list

where /etc/apt/sources.list looks like

  deb http://ftp.ports.debian.org/debian-ports sid main
  deb-src http://ftp.ports.debian.org/debian-ports sid main

which googling suggests is what I want?


Bernd


Re: [PATCH 3/4] Set costs for jumps in combine

2019-11-21 Thread Bernd Schmidt
On 11/13/19 5:16 PM, Segher Boessenkool wrote:
> On Wed, Nov 13, 2019 at 02:13:48PM +0100, Bernd Schmidt wrote:
>> Also, it does not compute costs for jump
>> insns, so they are always set to zero. As a consequence, any possible
>> substitution is performed if a combination into a jump is possible,
>> which turns out isn't really desirable on m68k with cbranch patterns.
>>
>> This patch simply removes a test for NONJUMP_INSN_P. Bootstrapped and
>> tested on the gcc135 machine (powerpc64le-unknown-linux-gnu).
> 
> I wonder why that test was there.  It was added in r84513, which is where
> insn_rtx_cost was made from combine_insn_cost, which didn't have that
> non-jump thing yet.
> 
> It is still stage 1, so we'll find out if any target breaks I guess.
> Okay for trunk.  Thanks!

Thanks. Just FYI, this is held up a little. I decided I'd also test on
x86, and there it shows a case where ix86_rtx_cost misses something: the
i386/pr30315.c testcase wants to combine compares into addition+jump on
carry, but the rtx_costs show too high a cost for (compare (plus)). I'm
testing a fix for that in i386.c.


Bernd


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-20 Thread Bernd Schmidt
On 11/20/19 8:27 PM, Mikael Pettersson wrote:
> On Wed, Nov 20, 2019 at 3:16 PM Bernd Schmidt  wrote:
>> Probably best to just run tests on stage1 and hope something shows up.
> 
> Ok, how do I did that?  I've always just done 'make -k check' after
> full bootstraps.
> I assume the stage 1 artifacts are the ones in the prev-* directories.

There's a --disable-bootstrap configure option.

>> What distro are you using for native builds? The m68k debian I'm using
>> does not have an installable gcc package.
> 
> I run a bespoke distro on m68k and sparc64, derived from Fedora but
> massively cut down in size, with target patches done by myself or
> ported from Debian and Gentoo as necessary.
> 
> Debian/m68k not having a gcc package?  That sounds odd; I see e.g. a
> gcc-9 in http://ftp.ports.debian.org/debian-ports/pool-m68k/main/g/

Turns out I wasn't sufficiently familiar with how debian works. I was
missing an "apt update" step. I kind of assumed a package like gcc would
install out of the box (others did, things like emacs).


Bernd


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-20 Thread Bernd Schmidt
On 11/20/19 2:50 PM, Mikael Pettersson wrote:
> On Mon, Nov 18, 2019 at 9:57 PM Mikael Pettersson  
> wrote:
>>
>> On Mon, Nov 18, 2019 at 8:31 PM Bernd Schmidt  wrote:
>>>
>>> Hi Mikael,
>>>
>>>> This fixed the problem, thanks.
>>>
>>> Could you also run the testsuite to see if you can reproduce the
>>> g++.old-deja failures Andreas reported?
>>
>> Yes, but it will probably take another week before the native
>> bootstrap (on aranym) and test suite run is finished.  It's currently
>> in stage 2.

Ugh, that suggests the stage2 compiler was miscompiled. That would be
nasty to track down.

Probably best to just run tests on stage1 and hope something shows up.

What distro are you using for native builds? The m68k debian I'm using
does not have an installable gcc package.


Bernd



Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-18 Thread Bernd Schmidt
Hi Mikael,

> This fixed the problem, thanks.

Could you also run the testsuite to see if you can reproduce the
g++.old-deja failures Andreas reported?


Bernd


Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-18 Thread Bernd Schmidt
(Apologies to Jeff who's getting this twice because I didn't hit
reply-all the first time.)

On 11/17/19 6:56 PM, Jeff Law wrote:

> While scanning this patch I did notice the introduction of
> CC_STATUS_INIT in output_{and,ior,xor}si.  You might want to check that.

That is intentional. CC_STATUS_INIT can be used without cc0, and there's
precedent in (IIRC) ARM. We need it to get final to tell us when it
encounters a label - those don't make it to the postscan_insn hook.

> So unless there's objections over the next say 48-72 hrs, let's get the
> kit in and we can iterate if there's further issues that need resolving.

Cool. I'll wait for a final confirmation.


Bernd



Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-16 Thread Bernd Schmidt
On 11/16/19 9:18 AM, Andreas Schwab wrote:
> On Nov 16 2019, Bernd Schmidt wrote:
> 
>> Well, there has to be some difference between what you are doing and
>> what I am doing, because:
>>
>> Running /local/src/egcs/git/gcc/testsuite/g++.old-deja/old-deja.exp ...
>>
>>  === g++ Summary ===
>>
>> # of expected passes 26826
>> # of expected failures   82
>> # of unsupported tests   157
>> /local/src/egcs/bm68k-test/gcc/xg++  version 10.0.0 20191101
>> (experimental) (GCC)
> 
>   === g++ Summary ===
> 
> # of expected passes  170041
> # of unexpected failures  74
> # of expected failures708
> # of unresolved testcases 2
> # of unsupported tests7419
> /daten/aranym/gcc/gcc-20191115/Build/gcc/xg++  version 10.0.0 20191114 
> (experimental) [trunk revision 278266] (GCC) 

Once again, that doesn't help me track things down. A bug report without
instructions to reproduce is useless. Could you please provide me the
generated assembly files with -fverbose-asm that I asked for?


Bernd



Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-15 Thread Bernd Schmidt
On 11/15/19 11:50 PM, Andreas Schwab wrote:
> On Nov 15 2019, Bernd Schmidt wrote:
> 
>> I meant the compiler command line of course... for any -mcpu flags that
>> might differ from my test run.
> 
> There are none.

Well, there has to be some difference between what you are doing and
what I am doing, because:

Running /local/src/egcs/git/gcc/testsuite/g++.old-deja/old-deja.exp ...

=== g++ Summary ===

# of expected passes26826
# of expected failures  82
# of unsupported tests  157
/local/src/egcs/bm68k-test/gcc/xg++  version 10.0.0 20191101
(experimental) (GCC)

Is there anything you think you can give me to help reproduce this?
Before/after assembly files, generated with -fverbose-asm?


Bernd


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-15 Thread Bernd Schmidt
On 11/15/19 10:58 PM, Andreas Schwab wrote:
> On Nov 15 2019, Bernd Schmidt wrote:
> 
>> Any chance you could show the command lines from the log files or some
>> other way of reproducing the issue?
> 
> Executing on aranym: OMP_NUM_THREADS=2 
> LD_LIBRARY_PATH=.:/daten/aranym/gcc/gcc-20191115/Build/m68k-linux/./libstdc++-v3/src/.libs:/daten/aranym/gcc/gcc-20191115/Build/m68k-linux/./libstdc++-v3/src/.libs:/daten/aranym/gcc/gcc-20191115/Build/gcc:/daten/aranym/gcc/gcc-20191115/Build/gcc
>  timeout 1200 ./dyncast1.exe (timeout = 300)
> Executed ./dyncast1.exe, status 1
> Output: Error 25
> child process exited abnormally
> FAIL: g++.old-deja/g++.other/dyncast1.C  -std=c++17 execution test

I meant the compiler command line of course... for any -mcpu flags that
might differ from my test run.


Bernd


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-15 Thread Bernd Schmidt
On 11/15/19 5:34 PM, Andreas Schwab wrote:
> On Nov 15 2019, Bernd Schmidt wrote:
> 
>> Are these with the patch?
> 
> Yes.
> 
>> Are you on real hardware
> 
> No, I'm using aranym.

Any chance you could show the command lines from the log files or some
other way of reproducing the issue?


Bernd


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-15 Thread Bernd Schmidt
On 11/15/19 2:48 PM, Andreas Schwab wrote:
> Here are the results of running the testsuite on m68k-linux:
> 
> http://gcc.gnu.org/ml/gcc-testresults/2019-11/msg00908.html
> 
> This is a list of regressions:

Are these with the patch? I'm not seeing any of these in my testing with
qemu. Are you on real hardware (and on which hardware?), and can you do
anything to help narrow down what's going wrong?


Bernd


Re: [PATCH 1/4] Preliminary m68k patches

2019-11-14 Thread Bernd Schmidt
On 11/13/19 9:03 PM, Jeff Law wrote:
> OK.  I'd actually recommend this go ahead and get installed.  My tester
> will bootstrap it overnight.

Alright, let me know how that turns out. What kind of machine do you
have for that?


Bernd


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-13 Thread Bernd Schmidt
On 11/13/19 7:16 PM, Segher Boessenkool wrote:
> I tried this out with a kernel build (just the defconfig).

> during RTL pass: jump2
> /home/segher/src/kernel/fs/binfmt_elf.c: In function 'elf_core_dump':
> /home/segher/src/kernel/fs/binfmt_elf.c:2409:1: internal compiler error: in 
> patch_jump_insn, at cfgrtl.c:1290

> Can you reproduce that?

Yes. It's actually an issue I spotted at one point, but I thought to
myself "I'll just leave it, I want to make the minimum amount of
changes". Should have thought harder.

The constraints for comparison patterns in m68k.md allow constants as
the first operand of a comparison. They also use nonimmediate_operand.
Hence, the internal error you saw when something tries to rerecognize
the instruction.

The following should fix it, but it's only very lightly tested so far.
I'll merge it into patch 2.


Bernd
	* config/m68k/m68k.md (cmp1_constraints): Don't allow constants.

diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index bb46e5880e2..56685db0c72 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -488,7 +488,7 @@
 ;; and operand predicates.  So to be safe, just don't allow the PC-rel
 
 (define_mode_attr scc0_constraints [(QI "=d,d,d") (HI "=d,d,d,d,d") (SI "=d,d,d,d,d,d")])
-(define_mode_attr cmp1_constraints [(QI "dn,dm,>") (HI "rnm,d,n,m,>") (SI "r,rKT,rKs,mr,ma,>")])
+(define_mode_attr cmp1_constraints [(QI "dn,dm,>") (HI "rnm,d,n,m,>") (SI "r,r,r,mr,ma,>")])
 (define_mode_attr cmp2_constraints [(QI "dm,nd,>") (HI "d,rnm,m,n,>") (SI "mrC0,mr,ma,KTrC0,Ksr,>")])
 
 ;; Note that operand 0 of an SCC insn is supported in the hardware as


[PATCH 4/4] Fix autoinc cbranch

2019-11-13 Thread Bernd Schmidt
After the m68k cc0 conversion, there is one code quality regression that
I can see: we no longer generate autoinc addressing modes in
comparisons. This is because the parts of the compiler that generate
autoinc are unwilling to substitute into jumps.

If you look at the code in reload, you'll see that it's careful around
jumps at find_reload time, and the code to perform autoinc reloads does
try to put all the extra code before the instruction. LRA seems to have
copied most of that code.
Also, in the former cc0 reality, a compare wasn't really any different
from a jump on m68k: we can't have a reload after the instruction in
either case. Any kind of move or arithmetic would clobber the flags.

That leads me to believe that there is no issue with autoinc in jumps,
hence this patch. Bootstrapped and tested on the gcc135 machine
(powerpc64le-unknown-linux-gnu). I don't really expect this to get
approved; alternatively I could write some peepholes which would
generate the same code as long as register pressure doesn't get too high.


Bernd
	* auto-inc-dec.c (merge_in_block): Allow jumps.
	* combine.c (can_combine_p): Allow jumps in autoinc.

diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c
index bdb6efa..6dab135 100644
--- a/gcc/auto-inc-dec.c
+++ b/gcc/auto-inc-dec.c
@@ -1441,10 +1441,11 @@ merge_in_block (int max_reg, basic_block bb)
  continue;
}
 
-  /* This continue is deliberate.  We do not want the uses of the
-jump put into reg_next_use because it is not considered safe to
-combine a preincrement with a jump.  */
-  if (JUMP_P (insn))
+  /* We used to skip jump insns, but both reload and LRA seem to
+take precautions not to perform autoinc reloads after a jump or
+a comparison.  Allow them for regular autoinc only (for test
+coverage reasons more than anything).  */
+  if ((HAVE_PRE_MODIFY_REG || HAVE_POST_MODIFY_REG) && JUMP_P (insn))
continue;
 
   if (dump_file)
diff --git a/gcc/combine.c b/gcc/combine.c
index 857ea30..e9e1464 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2119,15 +2118,12 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn *pred ATTRIBUTE_UNUSED,
 
   /* If INSN contains an autoincrement or autodecrement, make sure that
  register is not used between there and I3, and not already used in
- I3 either.  Neither must it be used in PRED or SUCC, if they exist.
- Also insist that I3 not be a jump; if it were one
- and the incremented register were spilled, we would lose.  */
+ I3 either.  Neither must it be used in PRED or SUCC, if they exist.  */
 
   if (AUTO_INC_DEC)
 for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
   if (REG_NOTE_KIND (link) == REG_INC
- && (JUMP_P (i3)
- || reg_used_between_p (XEXP (link, 0), insn, i3)
+ && (reg_used_between_p (XEXP (link, 0), insn, i3)
  || (pred != NULL_RTX
  && reg_overlap_mentioned_p (XEXP (link, 0), PATTERN (pred)))
  || (pred2 != NULL_RTX


[PATCH 3/4] Set costs for jumps in combine

2019-11-13 Thread Bernd Schmidt
The combiner is somewhat strange about how it uses costs. If any of the
insns involved in a comparison have a cost of 0, it does not verify that
the substitution is cheaper. Also, it does not compute costs for jump
insns, so they are always set to zero. As a consequence, any possible
substitution is performed if a combination into a jump is possible,
which turns out isn't really desirable on m68k with cbranch patterns.

This patch simply removes a test for NONJUMP_INSN_P. Bootstrapped and
tested on the gcc135 machine (powerpc64le-unknown-linux-gnu).


Bernd
	* combine.c (combine_instructions): Record costs for jumps.

diff --git a/gcc/combine.c b/gcc/combine.c
index 857ea30dafd..9446d2769ab 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -1234,8 +1234,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs)
 		insn);
 
 	/* Record the current insn_cost of this instruction.  */
-	if (NONJUMP_INSN_P (insn))
-	  INSN_COST (insn) = insn_cost (insn, optimize_this_for_speed_p);
+	INSN_COST (insn) = insn_cost (insn, optimize_this_for_speed_p);
 	if (dump_file)
 	  {
 		fprintf (dump_file, "insn_cost %d for ", INSN_COST (insn));


[PATCH 2/4] The main m68k cc0 conversion

2019-11-13 Thread Bernd Schmidt
This achieves the conversion by using combined cbranch/cstore patterns,
and using a mechanism similar to the cc_status tracking to elide certain
comparisons.  Unlike cc_status, this is opt-in and requires a
flags_valid attribute to be set for suitable instructions.  Due to lack
of test hardware, this conversion is omitted for a number of coldfire
patterns as opposed to normal m68k.

For DImode comparisons, scc_di and beq0_di/bne0_di patterns already
existed and are reused.  The bgt0_di/ble0_di patterns are replaced with
expander code to test just the high word, along with some smarts in
m68k_find_flags_value.


Bernd


Re: [PATCH 1/4] Preliminary m68k patches

2019-11-13 Thread Bernd Schmidt
This tidies up a few spots in the m68k backend in preparation for the
large patch to follow.  This is purely for review purposes: this patch
has not been tested independently, and will be committed together with
the following one.

Noteworthy changes:

Some patterns and peepholes were unified through mode iterators.  The
m68k_subword_comparison_operator predicate was adapted to also work with
SImode.

There are already scc_di patterns, so there is no need to generate a cc0
set/use pair in cstoredi.

Without HAVE_cc0, combine sometimes substitutes a stack push into the
destination of a divmod instruction, and then gets confused because it
doesn't seem to expect it in a PARALLEL.  Since the instruction only
works on registers anyway, use register_operand.

There are patterns that use register_operand with "do" constraints which
allow memory. This works at reload time, but the instruction can not be
rerecognized later on.  This becomes a problem if such operands occur in
a jump instruction, as subsequent passes will try to redirect branches
and thus attempt to rerecognize the pattern.

movqi/movhi do not accept constants that are not CONST_INT.  The code to
output them would not set flags correctly and was changed to
gcc_unreachable.

Comments were added to some patterns which are not being generated due
to incorrect tests/predicates. Fixing these is out of scope for this
work, but the problems are at least documented.

All the passes working on conditional traps seem to assume
const_true_rtx is used for unconditional ones, rather than const1_rtx.


Bernd
* config/m68k/m68k.c (output_move_himode, output_move_qimode):
Replace code for non-CONST_INT constants with gcc_unreachable.
* config/m68k/m68k.md (cbranchdi): Don't generate individual
compare and test.
(CMPMODE): New mode_iterator.
(cbranchsi4, cbranchqi4, cbranchhi4): Replace expanders with
cbranch4.
(cstoresi4, cstoreqi4, cstorehi4): Replace expanders with
cstore4.
(cmp_68881): Remove 'F' constraint from first comparison
operand.
(bit test insns patterns): Use nonimmediate_operand, not
register_operand, for source operands that allow memory in
their constraints.
(divmodsi4, udivmodsi4, divmodhi4 and related unnamed patterns):
Use register_operand, not nonimmediate_operand, for the
destinations.
(DBCC): New mode_iterator.
(dbcc peepholes): Use it to reduce duplication.
(trap): Use const_true_rtx, not const1_rtx.
* config/m68k/predicates.md (m68k_comparison_operand): Renamed
from m68k_subword_comparison_operand and changed to handle
SImode.

diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index 1030dfa5957..4f3503b9118 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -3072,7 +3072,7 @@ output_move_simode (rtx *operands)
 const char *
 output_move_himode (rtx *operands)
 {
- if (GET_CODE (operands[1]) == CONST_INT)
+  if (GET_CODE (operands[1]) == CONST_INT)
 {
   if (operands[1] == const0_rtx
 	  && (DATA_REG_P (operands[0])
@@ -3094,7 +3094,7 @@ output_move_himode (rtx *operands)
 	return "move%.w %1,%0";
 }
   else if (CONSTANT_P (operands[1]))
-return "move%.l %1,%0";
+gcc_unreachable ();
   return "move%.w %1,%0";
 }
 
@@ -3103,7 +3103,7 @@ output_move_qimode (rtx *operands)
 {
   /* 68k family always modifies the stack pointer by at least 2, even for
  byte pushes.  The 5200 (ColdFire) does not do this.  */
-  
+
   /* This case is generated by pushqi1 pattern now.  */
   gcc_assert (!(GET_CODE (operands[0]) == MEM
 		&& GET_CODE (XEXP (operands[0], 0)) == PRE_DEC
@@ -3134,7 +3134,7 @@ output_move_qimode (rtx *operands)
   if (operands[1] == const0_rtx && ADDRESS_REG_P (operands[0]))
 return "sub%.l %0,%0";
   if (GET_CODE (operands[1]) != CONST_INT && CONSTANT_P (operands[1]))
-return "move%.l %1,%0";
+gcc_unreachable ();
   /* 68k family (including the 5200 ColdFire) does not support byte moves to
  from address registers.  */
   if (ADDRESS_REG_P (operands[0]) || ADDRESS_REG_P (operands[1]))
diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index 31e8767e7e3..e60978150d1 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -456,19 +456,14 @@
 	  (match_operand:DI 3 "general_operand")]))]
   ""
 {
-  if (operands[3] == const0_rtx)
-emit_insn (gen_tstdi (operands[2]));
-  else
-emit_insn (gen_cmpdi (operands[2], operands[3]));
-  operands[2] = cc0_rtx;
-  operands[3] = const0_rtx;
 })
 
+(define_mode_iterator CMPMODE [QI HI SI])
 
-(define_expand "cbranchsi4"
+(define_expand "cbranch4"
   [(set (cc0)
-	(compare (match_operand:SI 1 "nonimmediate_operand" "")
-		 (match_operand:SI 2 "general_operand" "")))
+	(compare (match_operand:CMPMODE 1 "nonimmediate_operand" "")
+		 

[PATCH 0/4] Eliminate cc0 from m68k

2019-11-13 Thread Bernd Schmidt
This is a set of patches to convert m68k so that it no longer uses cc0.
The approach is to combine cc0 setter/user pairs into cbranch and cstore
patterns. It does not expose the flag register directly. Since m68k is a
target that is not under active development, and probably receives very
limited testing, I felt it was important to make it generate as close to
the same code as previously. Also, given that the target clobbers the
flags for pretty much every move, it seems unlikely that there's much
value to be had from anything more complex. Trying to model every
instruction's effect on the flags would be too error-prone for not
nearly enough gain.

The cc0 machinery allows for eliminating unnecessary comparisons by
examining the effect instructions have on the flags registers. I have
replicated that mechanism with a relatively modest amount of code based
on a final_postscan_insn hook, but it is now opt-in: an instruction
pattern can set the "flags_valid" attribute to a number of possible
values to indicate what effect it has. That should be more reliable (try
git log m68k.md to see recent sprinkling of CC_STATUS_INIT to squash
bugs with the previous mechanism).
We can remember either values where the flags indicate a comparison
against zero (after practically all arithmetic and move insns), or
alternatively record two comparison operands to eliminate identical
compares. I stopped adding optimizations once I found it hard to find
any meaningful differences in generated code. In particular, the
m68k.exp tests which verify that these optimizations are performed all
still pass.

Testing was done with the qemu-system-m68k/debian combination. I do not
have access to Coldfire hardware, and I tried to be somewhat
conservative, for example by not adding "flags_valid" everywhere it
would probably be possible. For someone with access to the hardware, it
should be trivial to add such attributes and test that everything still
works.
I'll have to rerun my final tests because test_summary made a mess of
things, but as far as I am able to tell, there are no regressions, and
the patch set even fixes some failures in libstdc++.

The first and second patch contain the m68k changes. They are separated
only to make the review easier, they were not tested separately (since
the time for a test run is measured in days). The first patch contains
preliminary cleanup and fixes, the second the main cc0 conversion. After
that, there are some changes in the rest of the compiler.


Bernd


Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-07 Thread Bernd Schmidt
On 08/27/2017 09:36 PM, Jon Beniston wrote:
> I have an out-of-tree GCC port and it is struggling supporting
> auto-vectorization on some dot product instructions.  For example, I have an
> instruction that takes three operands which are all 32-bit general
> registers. The second and third operands will be treated as V2HI then do dot
> product, and then generate an SI result which is then added to the first
> operand which is SI as well.

This seems to ring a bell. I think I submitted a patch for this here -
is this the same problem?
https://gcc.gnu.org/ml/gcc-patches/2010-12/msg01724.html


Bernd


Re: MAINTAINERS update

2017-07-03 Thread Bernd Schmidt
On 06/11/2017 08:03 PM, Gerald Pfeifer wrote:
> On Tue, 30 May 2017, Bernd Schmidt wrote:
>> On 05/30/2017 09:05 AM, Richard Biener wrote:
>>> This leaves the nvptx and c6x ports without a maintainer.  Do 
>>> you have any recommendations for a successor here?
>> Not really. It would be a shame to lose the C6X port though. If I'm 
>> CC'd on any bug reports I'm prepared to keep it working - if that's
>> considered sufficient, I can readd myself as maintainer.
> 
> I think that would be preferrable.  Even if practically it may
> not make a huge difference, people with less background/involvement
> will know who to contact, and having an entire port without maintainer
> just doesn't feel right.

I've done that now.


Bernd
Index: MAINTAINERS
===
--- MAINTAINERS	(revision 249919)
+++ MAINTAINERS	(working copy)
@@ -49,6 +49,7 @@ arm port		Richard Earnshaw	
 avr port		Denis Chertykov		<cherty...@gmail.com>
 bfin port		Jie Zhang		<jzhang...@gmail.com>
+c6x port		Bernd Schmidt		<bernds_...@t-online.de>
 cris port		Hans-Peter Nilsson	<h...@axis.com>
 epiphany port		Joern Rennecke		<g...@amylaar.uk>
 fr30 port		Nick Clifton		<ni...@redhat.com>
Index: ChangeLog
===
--- ChangeLog	(revision 249919)
+++ ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2017-07-03  Bernd Schmidt  <bschm...@redhat.com>
+
+	* MAINTAINERS: Readd myself for c6x.
+
 2017-06-28  Martin Liska  <mli...@suse.cz>
 
 	PR bootstrap/81217


Re: MAINTAINERS update

2017-05-30 Thread Bernd Schmidt
On 05/30/2017 09:05 AM, Richard Biener wrote:
> This leaves the nvptx and c6x ports without a maintainer.  Do you have
> any recommendations for a successor here?

Not really. It would be a shame to lose the C6X port though. If I'm CC'd
on any bug reports I'm prepared to keep it working - if that's
considered sufficient, I can readd myself as maintainer. It hasn't
required much attention over the years anyway.


Bernd



Re: MAINTAINERS update

2017-05-29 Thread Bernd Schmidt

On 05/27/2017 12:52 PM, Bernd Schmidt wrote:

I am no longer working for Red Hat, so I've updated my email address.
Also, I don't expect to be around very much in the near future, so I've
removed myself as maintainer for some areas.


Judging by a reply I got, I may have been too terse. No need to worry, 
I'm just choosing to do something else for a while and reading gcc 
mailing lists will not be a priority in the near term.



Bernd



MAINTAINERS update

2017-05-27 Thread Bernd Schmidt
I am no longer working for Red Hat, so I've updated my email address. 
Also, I don't expect to be around very much in the near future, so I've 
removed myself as maintainer for some areas.



Bernd
Index: ChangeLog
===
--- ChangeLog	(revision 248535)
+++ ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2017-05-27  Bernd Schmidt  <bschm...@redhat.com>
+
+	* MAINTAINERS: Update my email address, and remove myself as
+	maintainer in some areas.
+
 2017-05-25  Eric Gallager  <eg...@gwmail.gwu.edu>
 
 	* MAINTAINERS: Add self to Write After Approval
Index: MAINTAINERS
===
--- MAINTAINERS	(revision 248535)
+++ MAINTAINERS	(working copy)
@@ -30,7 +30,7 @@ Michael Meissner<gnu@the-meissners.o
 Jason Merrill	<ja...@redhat.com>
 David S. Miller	<da...@redhat.com>
 Joseph Myers	<jos...@codesourcery.com>
-Bernd Schmidt	<bschm...@redhat.com>
+Bernd Schmidt	<bernds_...@t-online.de>
 Ian Lance Taylor<i...@airs.com>
 Jim Wilson	<wil...@tuliptree.org>
 
@@ -48,9 +48,7 @@ arm port		Nick Clifton		<ni...@redhat.co
 arm port		Richard Earnshaw	<richard.earns...@arm.com>
 arm port		Ramana Radhakrishnan	<ramana.radhakrish...@arm.com>
 avr port		Denis Chertykov		<cherty...@gmail.com>
-bfin port		Bernd Schmidt		<bschm...@redhat.com>
 bfin port		Jie Zhang		<jzhang...@gmail.com>
-c6x port		Bernd Schmidt		<bschm...@redhat.com>
 cris port		Hans-Peter Nilsson	<h...@axis.com>
 epiphany port		Joern Rennecke		<g...@amylaar.uk>
 fr30 port		Nick Clifton		<ni...@redhat.com>
@@ -85,7 +83,6 @@ nds32 port		Chung-Ju Wu		<jasonwucj@gmai
 nds32 port		Shiva Chen		<shiva0...@gmail.com>
 nios2 port		Chung-Lin Tang		<clt...@codesourcery.com>
 nios2 port		Sandra Loosemore	<san...@codesourcery.com>
-nvptx port		Bernd Schmidt		<bschm...@redhat.com>
 pdp11 port		Paul Koning		<n...@arrl.net>
 picochip port		Daniel Towner		<d...@picochip.com>
 riscv port		Kito Cheng		<kito.ch...@gmail.com>
@@ -231,7 +228,6 @@ tree browser/unparser	Sebastian Pop		
 profile feedback	Jan Hubicka		<hubi...@ucw.cz>
 reload			Ulrich Weigand		<uweig...@de.ibm.com>
-reload			Bernd Schmidt		<bschm...@redhat.com>
 dfp.c, related		Ben Elliston		<b...@gnu.org>
 RTL optimizers		Eric Botcazou		<ebotca...@libertysurf.fr>
 instruction combiner	Segher Boessenkool	<seg...@kernel.crashing.org>


PR78972, 80283: Extend TER with scheduling

2017-05-12 Thread Bernd Schmidt
If you look at certain testcases like the one for PR78972, you'll find 
that the code generated by TER is maximally pessimal in terms of 
register pressure: we can generate a large number of intermediate 
results, and defer all the statements that use them up.


Another observation one can make is that doing TER doesn't actually buy 
us anything for a large subset of the values it finds: only a handful of 
places in the expand phase actually make use of the information. In 
cases where we know we aren't going to be making use of it, we could 
move expressions freely without doing TER-style substitution.


This patch uses the information collected by TER about the moveability 
of statements and performs a mini scheduling pass with the aim of 
reducing register pressure. The heuristic is fairly simple: something 
that consumes more values than it produces is preferred. This could be 
tuned further, but it already produces pretty good results: for the 
78972 testcase, the stack size is reduced from 2448 bytes to 288, and 
for PR80283, the stackframe of 496 bytes vanishes with the pass enabled.


In terms of benchmarks I've run SPEC a few times, and the last set of 
results showed not much of a change. Getting reproducible results has 
been tricky but all runs I made have been within 0%-1% improvement.


In this patch, the changed behaviour is gated with a -fschedule-ter 
option which is off by default; with that default it bootstraps and 
tests without regressions. The compiler also bootstraps with the option 
enabled, in that case there are some optimization issues. I'll address 
some of them with two followup patches, the remaining failures are:

 * a handful of guality/PR43077.c failures
   Debug insn generation is somewhat changed, and the peephole2 pass
   drops one of them on the floor.
 * three target/i386/bmi-* tests fail. These expect the combiner to
   build certain instruction patterns, and that turns out to be a
   little fragile. It would be nice to be able to use match.pd to
   produce target-specific patterns during expand.

Thoughts? Ok to apply?


Bernd
	PR middle-end/80283
	PR tree-optimization/78972
	* cfgexpand.c (should_forward_stmt_p, decide_schedule_stmt,
	resolve_deps, rank_ready_insns, add_dep, set_bb_uids, set_prios,
	floating_point_op_p, simple_schedule, expand_one_gimple_stmt):
	New static functions.
	(struct gimple_dep, struct gimple_sched_state): New structs.
	(sched_map): New static variable.
	(expand_gimple_basic_block): Use expand_one_gimple_state, and
	use the results from simple_schedule if flag_schedule_ter.
	* common.opt (fschedule-ter): New option.
	* toplev.c (process_options): Set flag_tree_ter if flag_schedule_ter.
	* doc/invoke.texi (Optimization Options): Add -fschedule-ter.
	(-fschedule-ter): Document.

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 66af699..e1b4898 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5337,6 +5337,345 @@ expand_debug_locations (void)
   flag_strict_aliasing = save_strict_alias;
 }
 
+/* Determine whether TER decided that a statment should be forwarded.  */
+
+static bool
+should_forward_stmt_p (gimple *stmt)
+{
+  if (!SA.values)
+return false;
+
+  def_operand_p def_p;
+  def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
+  if (def_p == NULL)
+return false;
+
+  /* Forward this stmt if it is in the list of
+ replaceable expressions.  */
+  if (bitmap_bit_p (SA.values,
+		SSA_NAME_VERSION (DEF_FROM_PTR (def_p
+return true;
+
+  return false;
+}
+
+/* Decide whether STMT should be scheduled, or left as a TER replacement.
+   Clear it as a replacement if we decide to schedule it.  */
+
+static bool
+decide_schedule_stmt (gimple *stmt)
+{
+  if (!SA.values)
+return false;
+
+  if (is_gimple_debug (stmt))
+return true;
+
+  def_operand_p def_p;
+  def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
+
+  if (def_p == NULL)
+return false;
+
+  tree def = DEF_FROM_PTR (def_p);
+  use_operand_p use_p;
+  gimple *use_stmt;
+  if (!single_imm_use (def, _p, _stmt))
+return false;
+
+  /* There are only a few cases where expansion can actually make use of
+ replaceable SSA names.  Since TER is pessimal for scheduling in
+ general, we try to limit what we do to cases where we know it is
+ beneficial, and schedule insns for the other cases.  */
+  tree_code def_c = ERROR_MARK;
+  if (gimple_code (stmt) == GIMPLE_ASSIGN)
+def_c = gimple_assign_rhs_code (stmt);
+
+  if (gimple_code (use_stmt) == GIMPLE_ASSIGN)
+{
+  tree_code c = gimple_assign_rhs_code (use_stmt);
+  if (TREE_CODE_CLASS (c) != tcc_comparison
+	  && c != FMA_EXPR
+	  && c != SSA_NAME
+	  && c != MEM_REF
+	  && c != TARGET_MEM_REF
+	  && def_c != VIEW_CONVERT_EXPR)
+	{
+	  if (bitmap_clear_bit (SA.values, SSA_NAME_VERSION (def)))
+	return true;
+	}
+}
+  return false;
+}
+
+struct gimple_dep;
+
+/* Record state for a single statement during simple_sched.  */
+struct gimple_sched_state
+{
+  gimple 

Re: [PATCH 1/5] nvptx: implement SIMT enter/exit insns

2017-03-27 Thread Bernd Schmidt

On 03/27/2017 12:56 PM, Alexander Monakov wrote:

Hello Bernd,

Can you have a look at this patch (unchanged from previous posting in January)?
The rest of the patches in the set are reviewed.

On Wed, 22 Mar 2017, Alexander Monakov wrote:


This patch adds handling of new omp_simt_enter/omp_simt_exit named insns
in the NVPTX backend.

* config/nvptx/nvptx-protos.h (nvptx_output_simt_enter): Declare.
(nvptx_output_simt_exit): Declare.
* config/nvptx/nvptx.c (nvptx_init_unisimt_predicate): Use
cfun->machine->unisimt_location.  Handle NULL unisimt_predicate.
(init_softstack_frame): Move initialization of crtl->is_leaf to...
(nvptx_declare_function_name): ...here.  Emit declaration of local
memory space buffer for omp_simt_enter insn.
(nvptx_output_unisimt_switch): New.
(nvptx_output_softstack_switch): New.
(nvptx_output_simt_enter): New.
(nvptx_output_simt_exit): New.
* config/nvptx/nvptx.h (struct machine_function): New fields
has_simtreg, unisimt_location, simt_stack_size, simt_stack_align.
* config/nvptx/nvptx.md (UNSPECV_SIMT_ENTER): New unspec.
(UNSPECV_SIMT_EXIT): Ditto.
(omp_simt_enter_insn): New insn.
(omp_simt_enter): New expansion.
(omp_simt_exit): New insn.
* config/nvptx/nvptx.opt (msoft-stack-reserve-local): New option.


Technically this whole series isn't a regression fix, but since Jakub 
has acked the rest, this is OK too.



Bernd



LRA fix for 80160

2017-03-24 Thread Bernd Schmidt
This fixes two PRs; we shouldn't try to avoid spilling a reg if it has 
an alternate class it can use.


Bootstrapped and tested on x86_64-linux, approved by Vlad, committed.


Bernd
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 246472)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2017-03-25  Bernd Schmidt  <bschm...@redhat.com>
+
+	PR rtl-optimization/80160
+	PR rtl-optimization/80159
+	* lra-assigns.c (must_not_spill_p): Tighten new test to also take
+	reg_alternate_class into account.
+
 2017-03-24  Vladimir Makarov  <vmaka...@redhat.com>
 
 	PR target/80148
Index: gcc/lra-assigns.c
===
--- gcc/lra-assigns.c	(revision 246472)
+++ gcc/lra-assigns.c	(working copy)
@@ -908,7 +908,8 @@ must_not_spill_p (unsigned spill_regno)
  does not solve the general case where existing reloads fully
  cover a limited register class.  */
   if (!bitmap_bit_p (_reload_pseudos, spill_regno)
-  && reg_class_size [reg_preferred_class (spill_regno)] == 1)
+  && reg_class_size [reg_preferred_class (spill_regno)] == 1
+  && reg_alternate_class (spill_regno) == NO_REGS)
 return true;
   return false;
 }
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(revision 246472)
+++ gcc/testsuite/ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2017-03-25  Bernd Schmidt  <bschm...@redhat.com>
+
+	PR rtl-optimization/80160
+	PR rtl-optimization/80159
+
+	* gcc.target/i386/pr80160.c: New test.
+
 2017-03-24  Jakub Jelinek  <ja...@redhat.com>
 
 	PR sanitizer/79904
Index: gcc/testsuite/gcc.target/i386/pr80160.c
===
--- gcc/testsuite/gcc.target/i386/pr80160.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr80160.c	(working copy)
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-omit-frame-pointer -w" } */
+/* { dg-additional-options "-march=pentium-mmx" { target ia32 } } */
+
+typedef struct { long long a; } a_t;
+int *a, b;
+a_t *e, c;
+long long f;
+void fn (int);
+void fn2 (void);
+int fn3 (a_t);
+void fn4 (a_t);
+a_t foo (long long val) { return (a_t){val}; }
+static void
+bar (int ka)
+{
+  unsigned i;
+  for (i = 0; i < 512; i++) {
+long d;
+c = (a_t){d};
+fn2 ();
+  }
+  fn (ka);
+}
+void
+test (void)
+{
+  a_t g;
+  a_t *h, j;
+  h = e;
+  j = *h;
+  if (e == (a_t *) 1) {
+a_t k = {fn3 (j)};
+fn4 (j);
+long l;
+g = foo((long long)b << 2 | l);
+k = g;
+if (j.a != k.a) {
+  a_t m = g;
+  int n = m.a, o = m.a >> 32;
+  asm ("# %0 %1 %2 %3" : "=m"(*a), "+A"(f) : "b"(n), "c"(o));
+}
+  }
+  bar ((int) h);
+}


Re: [PATCH] Decrease compile time memory with heavy find_base_{value,term} on i?86/x86_64 (PR rtl-optimization/63191, take 2)

2017-03-22 Thread Bernd Schmidt

On 03/22/2017 04:38 PM, Uros Bizjak wrote:


LGTM, but I don't want to step on Bernd's toes, so let's wait for his opinion.


I was waiting for yours really, that's the one that counts.


Bernd



Re: [PATCH] Fix tree-prof/pr66295.c

2017-03-16 Thread Bernd Schmidt

On 03/15/2017 09:59 PM, Segher Boessenkool wrote:

This testcase can only ever be built on x86 (it needs the "avx*"
attributes).  This patch skips the test elsewhere.

Is this okay for trunk?


Ok.


Bernd



Re: [PATCH] Remove dead stores and initializations

2017-03-16 Thread Bernd Schmidt

On 03/16/2017 01:31 PM, Markus Trippelsdorf wrote:

clang --analyze pointed out a number of dead stores and initializations.

Tested on ppc64le. Ok for trunk?


I'd say - not now.

Ideally someone would delve into the commit history to figure out what 
happened with each of these, and whether any of these indicate bugs.



Bernd



Document PR79806 as a non-bug

2017-03-15 Thread Bernd Schmidt
I suggest we apply the following and close the PR as INVALID (not a 
bug). Ok?



Bernd

Index: pr65693.c
===
--- pr65693.c   (revision 245685)
+++ pr65693.c   (working copy)
@@ -2,6 +2,11 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */

+/* This test relies on -O2 to optimize away the division operation
+   that's expanded as part of the alloca.  With -O0, the explicit use
+   of edx causes a compilation failure, which is expected
+   behaviour.  */
+
 int a;

 void


Re: Combiner fix for PR79910

2017-03-15 Thread Bernd Schmidt

On 03/15/2017 04:00 PM, Bernd Schmidt wrote:

On 03/15/2017 12:09 AM, Bernd Schmidt wrote:

 I'll retest with your
suggestion and with the bitmap creation conditional on i1 being nonnull.


Like this (also had to throw in a bitmap_empty_p). Retested as before. Ok?


Oops, that one also has dbg_cnt stuff in it which I was going to remove. 
If you want to approve that along with the rest, the following bit is 
also needed:


Index: gcc/dbgcnt.def
===
--- gcc/dbgcnt.def  (revision 245685)
+++ gcc/dbgcnt.def  (working copy)
@@ -145,6 +145,7 @@ DEBUG_COUNTER (asan_use_after_scope)
 DEBUG_COUNTER (auto_inc_dec)
 DEBUG_COUNTER (ccp)
 DEBUG_COUNTER (cfg_cleanup)
+DEBUG_COUNTER (combine)
 DEBUG_COUNTER (cprop)
 DEBUG_COUNTER (cse2_move2add)
 DEBUG_COUNTER (dce)


Bernd


Re: Combiner fix for PR79910

2017-03-15 Thread Bernd Schmidt

On 03/15/2017 12:09 AM, Bernd Schmidt wrote:

 I'll retest with your
suggestion and with the bitmap creation conditional on i1 being nonnull.


Like this (also had to throw in a bitmap_empty_p). Retested as before. Ok?


Bernd

Index: gcc/combine.c
===
--- gcc/combine.c	(revision 245685)
+++ gcc/combine.c	(working copy)
@@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.
 #include "valtrack.h"
 #include "rtl-iter.h"
 #include "print-rtl.h"
+#include "dbgcnt.h"
 
 /* Number of attempts to combine instructions in this function.  */
 
@@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in
   return true;
 }
 
+/* Set up a set of registers used in an insn.  Called through note_uses,
+   arguments as described for that function.  */
+
+static void
+record_used_regs (rtx *xptr, void *data)
+{
+  bitmap set = (bitmap)data;
+  int i, j;
+  enum rtx_code code;
+  const char *fmt;
+  rtx x = *xptr;
+
+  /* repeat is used to turn tail-recursion into iteration since GCC
+ can't do it when there's no return value.  */
+ repeat:
+  if (x == 0)
+return;
+
+  code = GET_CODE (x);
+  if (REG_P (x))
+{
+  unsigned regno = REGNO (x);
+  unsigned end_regno = END_REGNO (x);
+  while (regno < end_regno)
+	bitmap_set_bit (set, regno++);
+  return;
+}
+
+  /* Recursively scan the operands of this expression.  */
+
+  for (i = GET_RTX_LENGTH (code) - 1, fmt = GET_RTX_FORMAT (code); i >= 0; i--)
+{
+  if (fmt[i] == 'e')
+	{
+	  /* If we are about to do the last recursive call
+	 needed at this level, change it into iteration.
+	 This function is called enough to be worth it.  */
+	  if (i == 0)
+	{
+	  x = XEXP (x, 0);
+	  goto repeat;
+	}
+
+	  record_used_regs ( (x, i), data);
+	}
+  else if (fmt[i] == 'E')
+	for (j = 0; j < XVECLEN (x, i); j++)
+	  record_used_regs ( (x, i, j), data);
+}
+}
+
 /* Try to combine the insns I0, I1 and I2 into I3.
Here I0, I1 and I2 appear earlier than I3.
I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into
@@ -2742,6 +2794,27 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 
   added_links_insn = 0;
 
+  /* For combinations that may result in two insns, we have to gather
+ some extra information about registers used, so that we can
+ update all relevant LOG_LINKS later.  */
+  auto_bitmap i2_regset, i3_regset, links_regset;
+  if (i1)
+{
+  note_uses ( (i2), record_used_regs, (bitmap)i2_regset);
+  note_uses ( (i3), record_used_regs, (bitmap)i3_regset);
+  insn_link *ll;
+  FOR_EACH_LOG_LINK (ll, i3)
+	bitmap_set_bit (links_regset, ll->regno);
+  FOR_EACH_LOG_LINK (ll, i2)
+	bitmap_set_bit (links_regset, ll->regno);
+  if (i1)
+	FOR_EACH_LOG_LINK (ll, i1)
+	  bitmap_set_bit (links_regset, ll->regno);
+  if (i0)
+	FOR_EACH_LOG_LINK (ll, i0)
+	  bitmap_set_bit (links_regset, ll->regno);
+}
+
   /* First check for one important special case that the code below will
  not handle.  Namely, the case where I1 is zero, I2 is a PARALLEL
  and I3 is a SET whose SET_SRC is a SET_DEST in I2.  In that case,
@@ -4004,6 +4077,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	}
 }
 
+  if (!dbg_cnt (combine))
+{
+  undo_all ();
+  return 0;
+}
+
   /* If it still isn't recognized, fail and change things back the way they
  were.  */
   if ((insn_code_number < 0
@@ -4051,6 +4130,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
   return 0;
 }
 
+  auto_bitmap new_regs_in_i2;
+  if (newi2pat)
+{
+  /* We need to discover situations where we introduce a use of a
+	 register into I2, where none of the existing LOG_LINKS contain
+	 a reference to it.  This can happen if previously I3 referenced
+	 the reg, and there is an additional use between I2 and I3.  We
+	 must remove the LOG_LINKS entry from that additional use and
+	 distribute it along with our own ones.  */
+	note_uses (, record_used_regs, (bitmap)new_regs_in_i2);
+	bitmap_and_compl_into (new_regs_in_i2, i2_regset);
+	bitmap_and_compl_into (new_regs_in_i2, links_regset);
+
+	/* Here, we first look for situations where a hard register use
+	   moved, and just give up.  This should happen approximately
+	   never, and it's not worth it to deal with possibilities like
+	   multi-word registers.  Later, when fixing up LOG_LINKS, we
+	   deal with the case where a pseudo use moved.  */
+	if (!bitmap_empty_p (new_regs_in_i2)
+	&& prev_nonnote_insn (i3) != i2
+	&& bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER)
+	  {
+	undo_all ();
+	return 0;
+	  }
+}
+
   if (MAY_HAVE_DEBUG_INSNS)
 {
   struct undo *undo;
@@ -4494,6 +4600,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 			NULL_RTX, NULL_RTX, NULL_RTX);
   }
 
+if (newi2pat)
+  {
+	bitmap_itera

Fix C6X hwloop issue

2017-03-15 Thread Bernd Schmidt
This fixes a failure in the testsuite. When we transform the doloop 
pattern, the decrement of the old iteration register goes away, which is 
a problem if it's used after the loop (where it should have the value zero).


Committed.


Bernd
	* config/c6x/c6x.c (predicate_insn): Avoid rtl sharing failure.
	(hwloop_optimize): Handle case where the old iteration reg is used
	after the loop.

Index: gcc/config/c6x/c6x.c
===
--- gcc/config/c6x/c6x.c	(revision 245685)
+++ gcc/config/c6x/c6x.c	(working copy)
@@ -5788,8 +5789,9 @@ hwloop_optimize (hwloop_info loop)
   start_sequence ();
 
   insn = emit_insn (gen_mvilc (loop->iter_reg));
+  if (loop->iter_reg_used_outside)
+insn = emit_move_insn (loop->iter_reg, const0_rtx);
   insn = emit_insn (gen_sploop (GEN_INT (sp_ii)));
-
   seq = get_insns ();
 
   if (!single_succ_p (entry_bb) || vec_safe_length (loop->incoming) > 1)


Reload fix for an old aarch64 issue

2017-03-14 Thread Bernd Schmidt
This triggered a kernel miscompilation with an old (4.8 I think) aarch64 
toolchain.


Here's the reloads for the insn where things go wrong:

Reloads for insn # 210
Reload 0: reload_in (DI) = (reg/v/f:DI 80 [ pgdata ])
GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 0)
reload_in_reg: (reg/v/f:DI 80 [ pgdata ])
reload_reg_rtx: (reg:DI 5 x5)
Reload 1: reload_in (DI) = (plus:DI (reg/v/f:DI 80 [ pgdata ])
(const_int 7172 
[0x1c04]))

GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 0), inc by 2
reload_in_reg: (plus:DI (reg/v/f:DI 80 [ pgdata ])
(const_int 7172 
[0x1c04]))

reload_reg_rtx: (reg:DI 5 x5)

Note how the input and input_address reloads use the same reload 
register. This is intended as the two categories aren't supposed to 
conflict. The problem is that reload 1 is a PLUS expression, and when 
reloading these, gen_reload may have to resort to tricks, such as 
loading one of the operands (the constant 7172) into the reload 
register, and then adding the other operand to it. In effect that's an 
earlyclobber of the reload register, and that is not represented in the 
for_input/for_input_address classification.


Most likely we haven't tripped over this earlier because on other 
machines the addition can be done in a single insn.


The following fixes this by using RELOAD_OTHER in this case. This is 
pessimistic, but at that point I don't think we can really know what 
gen_reload will have to do. Bootstrapped and tested on x86_64-linux on 
the 4.7 branch - that's the best method of testing that I can think of 
(but I think I'll also run some c6x tests with trunk). If no one 
objects, I'll check this in soonish.



Bernd
	* reload.c (find_reloads): When reloading a nonoffsettable address,
	use RELOAD_OTHER for it and its address reloads.

Index: gcc/reload.c
===
--- gcc/reload.c	(revision 211480)
+++ gcc/reload.c	(working copy)
@@ -3989,14 +3989,14 @@ find_reloads (rtx insn, int replace, int
 			  (recog_data.operand[i], 0), (rtx*) 0,
 			 base_reg_class (VOIDmode, as, MEM, SCRATCH),
 			 address_mode,
-			 VOIDmode, 0, 0, i, RELOAD_FOR_INPUT);
+			 VOIDmode, 0, 0, i, RELOAD_OTHER);
 	rld[operand_reloadnum[i]].inc
 	  = GET_MODE_SIZE (GET_MODE (recog_data.operand[i]));
 
 	/* If this operand is an output, we will have made any
 	   reloads for its address as RELOAD_FOR_OUTPUT_ADDRESS, but
 	   now we are treating part of the operand as an input, so
-	   we must change these to RELOAD_FOR_INPUT_ADDRESS.  */
+	   we must change these to RELOAD_FOR_OTHER_ADDRESS.  */
 
 	if (modified[i] == RELOAD_WRITE)
 	  {
@@ -4005,10 +4005,10 @@ find_reloads (rtx insn, int replace, int
 		if (rld[j].opnum == i)
 		  {
 			if (rld[j].when_needed == RELOAD_FOR_OUTPUT_ADDRESS)
-			  rld[j].when_needed = RELOAD_FOR_INPUT_ADDRESS;
+			  rld[j].when_needed = RELOAD_FOR_OTHER_ADDRESS;
 			else if (rld[j].when_needed
  == RELOAD_FOR_OUTADDR_ADDRESS)
-			  rld[j].when_needed = RELOAD_FOR_INPADDR_ADDRESS;
+			  rld[j].when_needed = RELOAD_FOR_OTHER_ADDRESS;
 		  }
 		  }
 	  }


Re: Combiner fix for PR79910

2017-03-14 Thread Bernd Schmidt

On 03/15/2017 12:03 AM, Jeff Law wrote:

On 03/10/2017 04:24 PM, Bernd Schmidt wrote:


PR rtl-optimization/79910
* combine.c (record_used_regs): New static function.
(try_combine): Handle situations where there is an additional
instruction between I2 and I3 which needs to have a LOG_LINK
updated.

PR rtl-optimization/79910
* gcc.dg/torture/pr79910.c: New test.

What a nasty little problem.

I don't like that we have to build these bitmaps due to the compile-time
cost.  Would it make sense to only build them when i0/i1 exist?


I suppose at the moment we don't do 2->2 combinations, so we could 
conditionalize this on having an i1.



We don't do 4->3 combinations, just 4->2 and 3->2, so it's only the
i2pattern where we might need to conjure up a LOG_LINK, right?


We don't do 4->3 combinations, right?  So we only have to care about
when there's going to be an newi2pat, right (3->2 or 4->2).



We don't ever create a newi1pat (for a 4->3 combination), right?  So we
only have to worry about testing when there's a newi2pat, right?


Yes to all, there isn't a newi1pat, only 4->2 and 3->2 can be an issue.


+if (prev_nonnote_insn (i3) != i2)
+  for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++)
+if (bitmap_bit_p (new_regs_in_i2, r))

if (prev_nonnote_insn (i3) != i2
&& bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER)


Ah. I had wondered about the loop but only thought in the direction of 
intersecting this bitmap with one of all hard regs (and I think there 
isn't such a bitmap, so I kept the loop). I'll retest with your 
suggestion and with the bitmap creation conditional on i1 being nonnull.



Bernd


Combiner fix for PR79910

2017-03-10 Thread Bernd Schmidt

In this PR, we have a few insns involved in two instruction combinations:

insn 16: set r100
insn 27: some calculation
insn 28: some calculation
insn 32: using r100
insn 33: using r100
insn 35: some calculation

Then we combine insns 27, 28 and 33, producing two output insns, As a 
result, insn 28 (i2) now obtains a use of r100. But insn 32, which is 
not involved in this combination, has the LOG_LINKS entry for that 
register, and we don't know that we need to update it. As a result, the 
second combination, involving regs 16, 32 and 35 (based on the remaining 
LOG_LINK for r100), produces incorrect code, as we don't realize there's 
a use of r100 before insn 32.


The following fixes it. Bootstrapped and tested on x86_64-linux, ok (on 
all branches)?



Bernd

	PR rtl-optimization/79910
	* combine.c (record_used_regs): New static function.
	(try_combine): Handle situations where there is an additional
	instruction between I2 and I3 which needs to have a LOG_LINK
	updated.

	PR rtl-optimization/79910
	* gcc.dg/torture/pr79910.c: New test.

Index: gcc/combine.c
===
--- gcc/combine.c	(revision 245685)
+++ gcc/combine.c	(working copy)
@@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in
   return true;
 }
 
+/* Set up a set of registers used in an insn.  Called through note_uses,
+   arguments as described for that function.  */
+
+static void
+record_used_regs (rtx *xptr, void *data)
+{
+  bitmap set = (bitmap)data;
+  int i, j;
+  enum rtx_code code;
+  const char *fmt;
+  rtx x = *xptr;
+
+  /* repeat is used to turn tail-recursion into iteration since GCC
+ can't do it when there's no return value.  */
+ repeat:
+  if (x == 0)
+return;
+
+  code = GET_CODE (x);
+  if (REG_P (x))
+{
+  unsigned regno = REGNO (x);
+  unsigned end_regno = END_REGNO (x);
+  while (regno < end_regno)
+	bitmap_set_bit (set, regno++);
+  return;
+}
+
+  /* Recursively scan the operands of this expression.  */
+
+  for (i = GET_RTX_LENGTH (code) - 1, fmt = GET_RTX_FORMAT (code); i >= 0; i--)
+{
+  if (fmt[i] == 'e')
+	{
+	  /* If we are about to do the last recursive call
+	 needed at this level, change it into iteration.
+	 This function is called enough to be worth it.  */
+	  if (i == 0)
+	{
+	  x = XEXP (x, 0);
+	  goto repeat;
+	}
+
+	  record_used_regs ( (x, i), data);
+	}
+  else if (fmt[i] == 'E')
+	for (j = 0; j < XVECLEN (x, i); j++)
+	  record_used_regs ( (x, i, j), data);
+}
+}
+
 /* Try to combine the insns I0, I1 and I2 into I3.
Here I0, I1 and I2 appear earlier than I3.
I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into
@@ -2742,6 +2794,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 
   added_links_insn = 0;
 
+  auto_bitmap i2_regset, i3_regset, links_regset;
+  note_uses ( (i2), record_used_regs, (bitmap)i2_regset);
+  note_uses ( (i3), record_used_regs, (bitmap)i3_regset);
+  insn_link *ll;
+  FOR_EACH_LOG_LINK (ll, i3)
+bitmap_set_bit (links_regset, ll->regno);
+  FOR_EACH_LOG_LINK (ll, i2)
+bitmap_set_bit (links_regset, ll->regno);
+  if (i1)
+   FOR_EACH_LOG_LINK (ll, i1)
+  bitmap_set_bit (links_regset, ll->regno);
+  if (i0)
+FOR_EACH_LOG_LINK (ll, i0)
+  bitmap_set_bit (links_regset, ll->regno);
+
   /* First check for one important special case that the code below will
  not handle.  Namely, the case where I1 is zero, I2 is a PARALLEL
  and I3 is a SET whose SET_SRC is a SET_DEST in I2.  In that case,
@@ -4051,6 +4124,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
   return 0;
 }
 
+  auto_bitmap new_regs_in_i2;
+  if (newi2pat)
+{
+  /* We need to discover situations where we introduce a use of a
+	 register into I2, where none of the existing LOG_LINKS contain
+	 a reference to it.  This can happen if previously I3 referenced
+	 the reg, and there is an additional use between I2 and I3.  We
+	 must remove the LOG_LINKS entry from that additional use and
+	 distribute it along with our own ones.  */
+	note_uses (, record_used_regs, (bitmap)new_regs_in_i2);
+	bitmap_and_compl_into (new_regs_in_i2, i2_regset);
+	bitmap_and_compl_into (new_regs_in_i2, links_regset);
+
+	/* Here, we first look for situations where a hard register use
+	   moved, and just give up.  This should happen approximately
+	   never, and it's not worth it to deal with possibilities like
+	   multi-word registers.  Later, when fixing up LOG_LINKS, we
+	   deal with the case where a pseudo use moved.  */
+	if (prev_nonnote_insn (i3) != i2)
+	  for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++)
+	if (bitmap_bit_p (new_regs_in_i2, r))
+	  {
+		undo_all ();
+		return 0;
+	}
+}
+
   if (MAY_HAVE_DEBUG_INSNS)
 {
   struct undo *undo;
@@ -4494,6 +4594,45 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 			NULL_RTX, NULL_RTX, NULL_RTX);
   }
 
+if (newi2pat)
+  {
+	bitmap_iterator 

Re: Fix IRA issue, PR79728

2017-03-10 Thread Bernd Schmidt

Ping (minus the require-effective-target line, as Uros pointed out).


Bernd

On 03/03/2017 02:51 PM, Bernd Schmidt wrote:

This is an ICE where setup_pressure_classes fails if xmm0 is a global
reg. Instead of GENERAL/FLOAT/SSE/MMX_REGS, it computes only
SSE_FIRST_REG as the third register class. The problem is that the costs
for moving between SSE_FIRST_REG and SSE_REGS are inflated because we
think we have no available registers in SSE_FIRST_REG (since the only
register in that class is global), and that somewhat confuses the
algorithm.

The following fixes it by tweaking contains_regs_of_mode. Out of
caution, I've retained the old meaning for code in reload which uses this.

Bootstrapped and tested on x86_64-linux. Ok?


Bernd


Re: [PATCH] Fix out-of-bounds write in RTL function reader (PR bootstrap/79952)

2017-03-10 Thread Bernd Schmidt

On 03/10/2017 08:03 PM, David Malcolm wrote:

print-rtl.c:rtx_writer::print_rtx_operand_code_0 has some special
-casing for SYMBOL_REF, but if I'm reading things right we don't yet
dump SYMBOL_REF_BLOCK and SYMBOL_REF_BLOCK_OFFSET, so we'd need to dump
these somehow.


Yeah. Perhaps as an extra table or so to show the various blocks and the 
symbols in them sorted by offset - could be useful information for RTL 
dumps too.



Bernd



Re: [PATCH] Decrease compile time memory with heavy find_base_{value,term} on i?86/x86_64 (PR rtl-optimization/63191)

2017-03-10 Thread Bernd Schmidt

On 03/10/2017 06:53 PM, Jakub Jelinek wrote:

+
+template 
+static inline rtx
+ix86_delegitimize_address_tmpl (rtx x)
 {


Why is this a template and not a function arg?


Bernd


Re: [PATCH] Fix out-of-bounds write in RTL function reader (PR bootstrap/79952)

2017-03-09 Thread Bernd Schmidt

On 03/09/2017 08:28 PM, David Malcolm wrote:

The root cause is an out-of-bounds memory write in the RTL dump
reader when handling SYMBOL_REFs with SYMBOL_FLAG_HAS_BLOCK_INFO set.

Such SYMBOL_REFs are normally created by varasm.c:create_block_symbol,
which has:


Hmm, I don't actually recall seeing this stuff. It's for section anchors 
apparently.



OK for trunk in stage 4?

gcc/ChangeLog:
PR bootstrap/79952
* read-rtl-function.c (function_reader::read_rtx_operand): Update
x with result of extra_parsing_for_operand_code_0.
(function_reader::extra_parsing_for_operand_code_0): Convert
return type from void to rtx, returning x.  When reading
SYMBOL_REF with SYMBOL_FLAG_HAS_BLOCK_INFO, reallocate x to the
larger size containing struct block_symbol.


Looks OK for now, but longer term I think we should make it possible to 
reconstruct this data.



Bernd


Re: C PATCH to fix c/79758 (ICE-on-invalid with function redefinition and old style decls)

2017-03-03 Thread Bernd Schmidt

On 03/03/2017 02:33 PM, Marek Polacek wrote:


2017-03-03  Marek Polacek  

PR c/79758
* c-decl.c (store_parm_decls_oldstyle): Check if the element of
current_function_prototype_arg_types is error_mark_node.  Fix
formatting.  Use TREE_VALUE instead of TREE_TYPE.

* gcc.dg/noncompile/pr79758.c: New test.


Ok.


Bernd



Fix IRA issue, PR79728

2017-03-03 Thread Bernd Schmidt
This is an ICE where setup_pressure_classes fails if xmm0 is a global 
reg. Instead of GENERAL/FLOAT/SSE/MMX_REGS, it computes only 
SSE_FIRST_REG as the third register class. The problem is that the costs 
for moving between SSE_FIRST_REG and SSE_REGS are inflated because we 
think we have no available registers in SSE_FIRST_REG (since the only 
register in that class is global), and that somewhat confuses the algorithm.


The following fixes it by tweaking contains_regs_of_mode. Out of 
caution, I've retained the old meaning for code in reload which uses this.


Bootstrapped and tested on x86_64-linux. Ok?


Bernd
	PR rtl-optimization/79728
	* regs.h (struct target_regs): New field
	x_contains_allocatable_regs_of_mode.
	(contains_allocatable_regs_of_mode): New macro.
	* reginfo.c (init_reg_sets_1): Initialize it, and change
	contains_reg_of_mode so it includes global regs as well.
	* reload.c (push_reload): Use contains_allocatable_regs_of_mode
	rather thanc ontains_regs_of_mode.

	PR rtl-optimization/79728
	* gcc.target/i386/sse-globalreg.c: New test.

Index: gcc/reginfo.c
===
--- gcc/reginfo.c	(revision 245685)
+++ gcc/reginfo.c	(working copy)
@@ -468,19 +468,29 @@ init_reg_sets_1 (void)
   memset (contains_reg_of_mode, 0, sizeof (contains_reg_of_mode));
   for (m = 0; m < (unsigned int) MAX_MACHINE_MODE; m++)
 {
-  HARD_REG_SET ok_regs;
+  HARD_REG_SET ok_regs, ok_regs2;
   CLEAR_HARD_REG_SET (ok_regs);
+  CLEAR_HARD_REG_SET (ok_regs2);
   for (j = 0; j < FIRST_PSEUDO_REGISTER; j++)
-	if (!fixed_regs [j] && HARD_REGNO_MODE_OK (j, (machine_mode) m))
-	  SET_HARD_REG_BIT (ok_regs, j);
+	if (!TEST_HARD_REG_BIT (fixed_nonglobal_reg_set, j)
+	&& HARD_REGNO_MODE_OK (j, (machine_mode) m))
+	  {
+	SET_HARD_REG_BIT (ok_regs, j);
+	if (!fixed_regs[j])
+	  SET_HARD_REG_BIT (ok_regs2, j);
+	  }
 
   for (i = 0; i < N_REG_CLASSES; i++)
 	if ((targetm.class_max_nregs ((reg_class_t) i, (machine_mode) m)
 	 <= reg_class_size[i])
 	&& hard_reg_set_intersect_p (ok_regs, reg_class_contents[i]))
 	  {
-	 contains_reg_of_mode [i][m] = 1;
-	 have_regs_of_mode [m] = 1;
+	 contains_reg_of_mode[i][m] = 1;
+	 if (hard_reg_set_intersect_p (ok_regs2, reg_class_contents[i]))
+	   {
+		 have_regs_of_mode[m] = 1;
+		 contains_allocatable_reg_of_mode[i][m] = 1;
+	   }
 	  }
  }
 }
Index: gcc/regs.h
===
--- gcc/regs.h	(revision 245685)
+++ gcc/regs.h	(working copy)
@@ -218,6 +218,10 @@ struct target_regs {
   /* 1 if the corresponding class contains a register of the given mode.  */
   char x_contains_reg_of_mode[N_REG_CLASSES][MAX_MACHINE_MODE];
 
+  /* 1 if the corresponding class contains a register of the given mode
+ which is not global and can therefore be allocated.  */
+  char x_contains_allocatable_reg_of_mode[N_REG_CLASSES][MAX_MACHINE_MODE];
+
   /* Record for each mode whether we can move a register directly to or
  from an object of that mode in memory.  If we can't, we won't try
  to use that mode directly when accessing a field of that mode.  */
@@ -243,6 +247,8 @@ extern struct target_regs *this_target_r
   (this_target_regs->x_have_regs_of_mode)
 #define contains_reg_of_mode \
   (this_target_regs->x_contains_reg_of_mode)
+#define contains_allocatable_reg_of_mode \
+  (this_target_regs->x_contains_allocatable_reg_of_mode)
 #define direct_load \
   (this_target_regs->x_direct_load)
 #define direct_store \
Index: gcc/reload.c
===
--- gcc/reload.c	(revision 245685)
+++ gcc/reload.c	(working copy)
@@ -1055,7 +1055,7 @@ push_reload (rtx in, rtx out, rtx *inloc
 #ifdef CANNOT_CHANGE_MODE_CLASS
   && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (in)), inmode, rclass)
 #endif
-  && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (in))]
+  && contains_allocatable_reg_of_mode[rclass][GET_MODE (SUBREG_REG (in))]
   && (CONSTANT_P (SUBREG_REG (in))
 	  || GET_CODE (SUBREG_REG (in)) == PLUS
 	  || strict_low
@@ -1164,7 +1164,7 @@ push_reload (rtx in, rtx out, rtx *inloc
 #ifdef CANNOT_CHANGE_MODE_CLASS
   && !CANNOT_CHANGE_MODE_CLASS (GET_MODE (SUBREG_REG (out)), outmode, rclass)
 #endif
-  && contains_reg_of_mode[(int) rclass][(int) GET_MODE (SUBREG_REG (out))]
+  && contains_allocatable_reg_of_mode[rclass][GET_MODE (SUBREG_REG (out))]
   && (CONSTANT_P (SUBREG_REG (out))
 	  || strict_low
 	  || (((REG_P (SUBREG_REG (out))
Index: gcc/testsuite/gcc.target/i386/sse-globalreg.c
===
--- gcc/testsuite/gcc.target/i386/sse-globalreg.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/sse-globalreg.c	(working copy)
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -w" } */
+/* { dg-require-effective-target sse2 } */
+
+register 

Fix (work around) LRA infinite loop, PR78911

2017-03-03 Thread Bernd Schmidt
In this PR, we have an endless cycle in LRA, generating ever more 
instructions. The relevant portions of the dump seem to be these:


** Local #9: **
  Creating newreg=130 from oldreg=128, assigning class CREG to r130
   18: 
{r117:DI=unspec/v[[r129:SI*0x8+r123:SI],r117:DI,r122:SI,r130:SI,0x8005] 
62;[r129:SI*0x8+r123:SI]=unspec/v[0] 62;flags:CCZ=unspec/v[0] 62;}


** Assignment #7: **
 Assigning to 130 (cl=CREG, orig=114, freq=1148, tfirst=130, 
tfreq=1148)...
  2nd iter for reload pseudo assignments:
 Reload r130 assignment failure
  Spill  r87(hr=5, freq=4296)
  Spill  r123(hr=4, freq=1148)
  Spill reload  r129(hr=2, freq=1148)

** Local #11: **

   Spilling non-eliminable hard regs: 6
  Creating newreg=131, assigning class GENERAL_REGS to address r131
   Change to class INDEX_REGS for r131

** Assignment #8: **

	 Assigning to 131 (cl=INDEX_REGS, orig=131, freq=1148, tfirst=131, 
tfreq=1148)...
	 Trying 0: spill 117(freq=1722)	 Now best 0(cost=574, bad_spills=0, 
insn_pseudos=1)


 Trying 1: spill 117(freq=1722)
	 Trying 2: spill 130(freq=1148)	 Now best 2(cost=0, bad_spills=0, 
insn_pseudos=1)


 Trying 3: spill 122(freq=1148)
 Trying 4: spill 129(freq=1148)

** Local #12: **
  Creating newreg=133 from oldreg=130, assigning class CREG to r133

What seems to happen is that in the "Assignemnt #8" phase, we're 
considering one new reload reg only, and we end up spilling r130, which 
only allows a register from a single class. Then we reload r130 again 
and make r133, and the cycle starts.


Reload is designed in a way to avoid cycles and to process all reloads 
for an insn in order of ascending class so as to avoid this kind of 
issue. With LRA I'm really not sure how to fix this properly, but the 
following patch seems to cure the PR at least, by recognizing when we're 
about to spill a reload reg whose assigned class contains only contains 
a single register.


Bootstrapped and tested on x86_64-linux, ok?


Bernd
	PR rtl-optimization/78911
	* lra-assigns.c (must_not_spill_p): New function.
	(spill_for): Use it.

	PR rtl-optimization/78911
	* gcc.target/i386/pr78911-1.c: New test.
	* gcc.target/i386/pr78911-2.c: New test.

Index: gcc/lra-assigns.c
===
--- gcc/lra-assigns.c	(revision 245685)
+++ gcc/lra-assigns.c	(working copy)
@@ -889,6 +889,30 @@ assign_temporarily (int regno, int hard_
   live_pseudos_reg_renumber[regno] = hard_regno;
 }
 
+/* Return true iff there is a reason why pseudo SPILL_REGNO should not
+   be spilled.  */
+static bool
+must_not_spill_p (unsigned spill_regno)
+{
+  if ((pic_offset_table_rtx != NULL
+   && spill_regno == REGNO (pic_offset_table_rtx))
+  || ((int) spill_regno >= lra_constraint_new_regno_start
+	  && ! bitmap_bit_p (_inheritance_pseudos, spill_regno)
+	  && ! bitmap_bit_p (_split_regs, spill_regno)
+	  && ! bitmap_bit_p (_subreg_reload_pseudos, spill_regno)
+	  && ! bitmap_bit_p (_optional_reload_pseudos, spill_regno)))
+return true;
+  /* A reload pseudo that requires a singleton register class should
+ not be spilled.
+ FIXME: this mitigates the issue on certain i386 patterns, but
+ does not solve the general case where existing reloads fully
+ cover a limited register class.  */
+  if (!bitmap_bit_p (_reload_pseudos, spill_regno)
+  && reg_class_size [reg_preferred_class (spill_regno)] == 1)
+return true;
+  return false;
+}
+
 /* Array used for sorting reload pseudos for subsequent allocation
after spilling some pseudo.	*/
 static int *sorted_reload_pseudos;
@@ -960,13 +984,7 @@ spill_for (int regno, bitmap spilled_pse
   /* Spill pseudos.	 */
   static_p = false;
   EXECUTE_IF_SET_IN_BITMAP (_pseudos_bitmap, 0, spill_regno, bi)
-	if ((pic_offset_table_rtx != NULL
-	 && spill_regno == REGNO (pic_offset_table_rtx))
-	|| ((int) spill_regno >= lra_constraint_new_regno_start
-		&& ! bitmap_bit_p (_inheritance_pseudos, spill_regno)
-		&& ! bitmap_bit_p (_split_regs, spill_regno)
-		&& ! bitmap_bit_p (_subreg_reload_pseudos, spill_regno)
-		&& ! bitmap_bit_p (_optional_reload_pseudos, spill_regno)))
+	if (must_not_spill_p (spill_regno))
 	  goto fail;
 	else if (non_spilled_static_chain_regno_p (spill_regno))
 	  static_p = true;
Index: gcc/testsuite/gcc.target/i386/pr78911-1.c
===
--- gcc/testsuite/gcc.target/i386/pr78911-1.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr78911-1.c	(working copy)
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/78911 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-strict-aliasing -fno-omit-frame-pointer" } */
+/* { dg-additional-options "-fPIC" { target fpic } } */
+/* { dg-additional-options "-march=pentium-m" { target ia32 } } */
+
+int a, b, d, e;
+long long *c;
+
+static 

Re: GCSE: Use HOST_WIDE_INT instead of int (PR, rtl-optimization/79574).

2017-03-02 Thread Bernd Schmidt

On 03/02/2017 06:50 PM, Martin Liška wrote:

Hello.

This is second part of fixes needed to not trigger integer overflow in gcse 
pass.


So, how is this intended to work? The min/max stored in the param is an 
int, and by using a HOST_WIDE_INT here, we expect that it is a larger 
type and therefore won't overflow?



   {
expr = flat_table[i];
fprintf (file, "Index %d (hash value %d; max distance %d)\n  ",
-expr->bitmap_index, hash_val[i], expr->max_distance);
+expr->bitmap_index, hash_val[i], (int)expr->max_distance);
print_rtl (file, expr->expr);
fprintf (file, "\n");


Use HOST_WIDE_INT_PRINT_DEC maybe? Otherwise OK, I guess.


Bernd


Re: C PATCH to fix c/79758 (ICE-on-invalid with function redefinition and old style decls)

2017-03-02 Thread Bernd Schmidt

On 03/02/2017 06:35 PM, Marek Polacek wrote:

While at it, I fixed wrong formatting in the nearby code.  Also use NULL_TREE
instead of 0 where appropriate.  I really dislike those zeros-as-trees; one
day I'll just go and turn them into NULL_TREEs.


I sympathize, but it makes it harder to see which parts of the patch are 
actual changes. Generally it's best to separate functional and 
formatting changes.



@@ -8996,7 +8999,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct 
c_arg_info *arg_info)
 declared for the arg.  ISO C says we take the unqualified
 type for parameters declared with qualified type.  */
  if (TREE_TYPE (parm) != error_mark_node
- && TREE_TYPE (type) != error_mark_node
+ && TREE_TYPE (TREE_VALUE (type)) != error_mark_node
  && ((TYPE_ATOMIC (DECL_ARG_TYPE (parm))
   != TYPE_ATOMIC (TREE_VALUE (type)))
  || !comptypes (TYPE_MAIN_VARIANT (DECL_ARG_TYPE (parm)),


Isn't this most likely intended to be TREE_VALUE (type) != error_mark_node?


@@ -9017,7 +9020,7 @@ store_parm_decls_oldstyle (tree fndecl, const struct 
c_arg_info *arg_info)
  if (targetm.calls.promote_prototypes (TREE_TYPE 
(current_function_decl))
  && INTEGRAL_TYPE_P (TREE_TYPE (parm))
  && TYPE_PRECISION (TREE_TYPE (parm))
- < TYPE_PRECISION (integer_type_node))
+< TYPE_PRECISION (integer_type_node))


Should add the necessary parens while fixing formatting.


Bernd


Re: [PATCH] Fix ICE with multiple conditional traps turned into unconditional in one bb (PR rtl-optimization/79780)

2017-03-02 Thread Bernd Schmidt

On 03/02/2017 10:23 AM, Jakub Jelinek wrote:

2017-03-02  Jakub Jelinek  

PR rtl-optimization/79780
* cprop.c (one_cprop_pass): When second and further conditional trap
in a single basic block is turned into an unconditional trap, turn it
into a deleted note to avoid RTL verification failures.

* gcc.c-torture/compile/pr79780.c: New test.


Ok.


Bernd



Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Bernd Schmidt

On 02/23/2017 10:27 PM, Jakub Jelinek wrote:

Now successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk?


LGTM.


Bernd



Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Bernd Schmidt

On 02/23/2017 02:36 PM, Jakub Jelinek wrote:

and both UNLT and GE can be reversed.  But if the arguments of the condition
are canonicalized, we run into:
  /* Test for an integer condition, or a floating-point comparison
 in which NaNs can be ignored.  */
  if (CONST_INT_P (arg0)
  || (GET_MODE (arg0) != VOIDmode
  && GET_MODE_CLASS (mode) != MODE_CC
  && !HONOR_NANS (mode)))
return reverse_condition (code);
and thus always return UNKNOWN.


So... do you think we could add (in gcc-8, probably, although if it 
fixes this regression...)


   else if (GET_MODE (arg0) != VOIDmode
&& GET_MODE_CLASS (mode) != MODE_CC
&& HONOR_NANS (mode))
 return reverse_condition_maybe_unordered (code);

to make this work?


Bernd


Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Bernd Schmidt

On 02/23/2017 12:46 PM, Jakub Jelinek wrote:

But as soon as we only have the (unlt (reg:DF 100) (reg:DF 97)),
reversed_comparison_code fails on it:
case UNLT:
case UNLE:
case UNGT:
case UNGE:
  /* We don't have safe way to reverse these yet.  */
  return UNKNOWN;


I do have to wonder why. The reverse_condition_maybe_unordered function 
knows how to reverse these, and I can't quite figure out what the 
problem is here. The comment isn't super helpful.



-  && (reversed_comparison_code (if_info->cond, if_info->jump)
-  != UNKNOWN))
+  && (if_info->rev_cond
+  || reversed_comparison_code (if_info->cond,
+   if_info->jump) != UNKNOWN))


Maybe have a reversed_cond_code (if_info) function? This pattern seems 
to occur a few times.



@@ -4676,7 +4713,12 @@ find_cond_trap (basic_block test_bb, edg
 {
   code = reversed_comparison_code (cond, jump);
   if (code == UNKNOWN)
-   return FALSE;
+   {
+ cond = noce_get_condition (jump, _earliest, true);
+ if (!cond)
+   return FALSE;
+ code = GET_CODE (cond);
+   }


This one is an extra optimization, similar but unrelated to the others, 
right? Can't you figure out the need to reverse a bit earlier and pass 
it into the first noce_get_condition call?


Otherwise ok.


Bernd



Re: [PATCH 1/6] c6x: Fix for RTL checking

2017-02-21 Thread Bernd Schmidt

On 02/21/2017 03:48 PM, Segher Boessenkool wrote:

2017-02-21  Segher Boessenkool  

* config/c6x/c6x.c (predicate_insn): Do not incorrectly share RTL.


Ok, thanks.


Bernd



Re: fwprop fix for PR79405

2017-02-20 Thread Bernd Schmidt

On 02/17/2017 10:11 AM, Richard Biener wrote:

Index: gcc/fwprop.c
===
--- gcc/fwprop.c(revision 245501)
+++ gcc/fwprop.c(working copy)
@@ -1478,7 +1478,8 @@ fwprop (void)
  Do not forward propagate addresses into loops until after unrolling.
  CSE did so because it was able to fix its own mess, but we are not.  */

-  for (i = 0; i < DF_USES_TABLE_SIZE (); i++)
+  unsigned sz = DF_USES_TABLE_SIZE ();
+  for (i = 0; i < sz; i++)
 {
   df_ref use = DF_USES_GET (i);
   if (use)

might work?  (not knowing too much about this detail of the DF data
structures - can the table shrink?)


This would probably work to fix the bug, but this behaviour is 
explicitly documented as intentional (in the comment the second half of 
which you've quoted). I assume it enables additional substitutions.



Bernd


Re: Improving code generation in the nvptx back end

2017-02-20 Thread Bernd Schmidt

On 02/17/2017 02:09 PM, Thomas Schwinge wrote:

Hi!

On Fri, 17 Feb 2017 14:00:09 +0100, I wrote:

[...] for "normal" functions there is no reason to use the
".param" space for passing arguments in and out of functions.  We can
then get rid of the boilerplate code to move ".param %in_ar*" into ".reg
%ar*", and the other way round for "%value_out"/"%value".  This will then
also simplify the call sites, where all that code "evaporates".  That's
actually something I started to look into, many months ago, and I now
just dug out those changes, and will post them later.

(Very likely, the PTX "JIT" compiler will do the very same thing without
difficulty, but why not directly generate code that is less verbose to
read?)


Using my WIP patch, the generated PTX code changes/is simplified as
follows:


It's probably a good idea to run cuobjdump -sass to see whether this has 
any effect at all.


The most important issue that needs solving is probably still the old 
issue that ptxas isn't reliable. Looks like the llvm folks ran into the 
same problem, as I discovered last week:

  https://bugs.llvm.org//show_bug.cgi?id=27738


Bernd


fwprop fix for PR79405

2017-02-16 Thread Bernd Schmidt

We have two registers being assigned to each other:

 (set (reg 213) (reg 209))
 (set (reg 209) (reg 213))

These being the only definitions, we are happy to forward propagate reg 
209 for reg 213 into a third insn, making a new use for reg 209. We are 
then happy to forward propagate reg 213 for it in the same insn... 
ending up in an infinite loop.


I don't really see an elegant way to prevent this, so the following just 
tries to detect the situation (and more general ones) by brute force. 
Bootstrapped and tested on x86_64-linux, verified that the test passes 
with a ppc cross, ok?



Bernd

	PR rtl-optimization/79405
	* fwprop.c (forward_propagate_into): Detect potentially cyclic
	replacements and bail out for them.

	PR rtl-optimization/79405
	* gcc.dg/torture/pr79405.c: New test.

Index: gcc/fwprop.c
===
--- gcc/fwprop.c	(revision 244815)
+++ gcc/fwprop.c	(working copy)
@@ -1374,13 +1374,42 @@ forward_propagate_into (df_ref use)
 
   /* Only consider uses that have a single definition.  */
   def = get_def_for_use (use);
-  if (!def)
+  if (!def || DF_REF_INSN_INFO (def) == NULL)
 return false;
   if (DF_REF_FLAGS (def) & DF_REF_READ_WRITE)
 return false;
   if (DF_REF_IS_ARTIFICIAL (def))
 return false;
 
+  df_ref tmp_def = def;
+  /* There is a problematic case where a chain of assignments
+   rA = rB; rB = rC;  ; rM = rN; rN = rA.
+ can cause us to replace these registers in an infinite cycle.
+ Walk backwards until we can guarantee that this situation is
+ not present.  */
+  for (;;)
+{
+  rtx_insn *insn = DF_REF_INSN (tmp_def);
+  rtx set = single_set (insn);
+  if (set == NULL_RTX)
+	break;
+  rtx src = SET_SRC (set);
+  rtx dst = SET_DEST (set);
+  if (GET_CODE (src) != REG || GET_CODE (dst) != REG)
+	break;
+  if (rtx_equal_p (src, DF_REF_REG (use)))
+	return false;
+  df_ref tmp_use = df_single_use (DF_REF_INSN_INFO (tmp_def));
+  if (!tmp_use)
+	break;
+  tmp_def = get_def_for_use (tmp_use);
+  if (!tmp_def || DF_REF_INSN_INFO (tmp_def) == NULL)
+	break;
+  if (DF_REF_FLAGS (tmp_def) & DF_REF_READ_WRITE)
+	break;
+  if (DF_REF_IS_ARTIFICIAL (tmp_def))
+	break;
+}
   /* Do not propagate loop invariant definitions inside the loop.  */
   if (DF_REF_BB (def)->loop_father != DF_REF_BB (use)->loop_father)
 return false;
Index: gcc/testsuite/gcc.dg/torture/pr79405.c
===
--- gcc/testsuite/gcc.dg/torture/pr79405.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr79405.c	(working copy)
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+char cz;
+long long int xx, u2;
+
+void
+qv (int js, int wl)
+{
+  if (js != 0)
+{
+  short int sc;
+  int *at = (int *)
+  long long int gx = 0;
+
+  for (;;)
+{
+  *at = 0;
+  js /= sc;
+
+  for (wl = 0; wl < 2; ++wl)
+{
+  xx = gx;
+  u2 %= xx > 0;
+  cz /= u2;
+
+ fa:
+  if (cz != u2)
+{
+  gx |= js;
+  cz = gx / js;
+}
+}
+}
+
+ yq:
+  wl /= 0x8000;
+  u2 = wl;
+  u2 |= (wl != 0) | (wl != 0 && gx != 0);
+  js = u2;
+  goto fa;
+}
+  goto yq;
+}


Re: C PATCH to fix ICE with -Wdouble-promotion (PR c/79515)

2017-02-15 Thread Bernd Schmidt

On 02/15/2017 12:49 PM, Marek Polacek wrote:

We ICEd on this testcase in do_warn_double_promotion because an invalid
conversion had produced an error result type and accessing that via
TYPE_MAIN_VARIANT crashes.  Fixed in an obvious way.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-02-15  Marek Polacek  

PR c/79515
* c-warn.c (do_warn_double_promotion): Don't warn if an invalid
conversion has occured.

* gcc.dg/dfp/pr79515.c: New.


Ok.


Bernd




Re: [PATCH] Replace XALLOCAVEC with XCNEWVEC (PR c/79471).

2017-02-13 Thread Bernd Schmidt

On 02/13/2017 02:06 PM, Martin Liška wrote:

On 02/13/2017 01:58 PM, Bernd Schmidt wrote:

On 02/13/2017 11:15 AM, Martin Liška wrote:

In order to not cause a stack overflow, lets use a vector allocated on heap 
instead of
the one created by XALLOCVEC.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.


Ok. I'm surprised this is marked as a regression, but it's simple enough that 
it probably ought to be fixed in any case.


Yep. I've just removed that. Is it also fine to both active branches after it 
finishes regression tests?


Sure.


Bernd


Re: [PATCH] Invalidate combiner's cached last value upon insn removal (PR rtl-optimization/79388, PR rtl-optimization/79450)

2017-02-13 Thread Bernd Schmidt

On 02/10/2017 08:50 PM, Jakub Jelinek wrote:


2017-02-10  Jakub Jelinek  

PR rtl-optimization/79388
PR rtl-optimization/79450
* combine.c (distribute_notes): When removing TEM_INSN for which
corresponding dest has last value recorded, invalidate that last
value.

* gcc.c-torture/execute/pr79388.c: New test.
* gcc.c-torture/execute/pr79450.c: New test.


Ok.


Bernd


Re: [PATCH] Replace XALLOCAVEC with XCNEWVEC (PR c/79471).

2017-02-13 Thread Bernd Schmidt

On 02/13/2017 11:15 AM, Martin Liška wrote:

In order to not cause a stack overflow, lets use a vector allocated on heap 
instead of
the one created by XALLOCVEC.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.


Ok. I'm surprised this is marked as a regression, but it's simple enough 
that it probably ought to be fixed in any case.



Bernd


Re: [patch] Fix PR middle-end/78468

2017-01-27 Thread Bernd Schmidt

On 01/27/2017 01:02 PM, Eric Botcazou wrote:

The attached
patch is a middle ground between the previously working and currently broken
situations: if the back-end defines STACK_DYNAMIC_OFFSET, then the middle-end
assumes that STACK_DYNAMIC_OFFSET maintains the alignment; if it doesn't,
which means the default value of STACK_DYNAMIC_OFFSET computed in function.c
is used instead, then the middle-end maintains the alignment itself.


I'd say let's not have a middle ground - this stuff is sufficiently 
brain-twisting that I'd rather go back to a known working state. If 
there was an error in the previous patch, let's roll it back until we 
fully understand the situation.



Bernd


One more cprop trap_if fix, PR79194

2017-01-27 Thread Bernd Schmidt
This PR seems to be curable by fixing up the CFG a little earlier. 
Bootstrapped and tested on x86_64-linux, and it seems to cure the 
testcase with a ppc cross. I'd appreciate if someone ran full ppc tests 
with this though. Ok?



Bernd
	PR rtl-optimization/79194
	* cprop.c (one_cprop_pass): Move deletion of code after unconditional
	traps before call to bypass_conditional_jumps.

	PR rtl-optimization/79194
	* gcc.dg/torture/pr79194.c: New test.

Index: gcc/cprop.c
===
--- gcc/cprop.c	(revision 244817)
+++ gcc/cprop.c	(working copy)
@@ -1863,8 +1863,6 @@ one_cprop_pass (void)
 	  }
 	}
 
-  changed |= bypass_conditional_jumps ();
-
   while (!uncond_traps.is_empty ())
 	{
 	  rtx_insn *insn = uncond_traps.pop ();
@@ -1873,6 +1871,8 @@ one_cprop_pass (void)
 	  emit_barrier_after_bb (to_split);
 	}
 
+  changed |= bypass_conditional_jumps ();
+
   FREE_REG_SET (reg_set_bitmap);
   free_cprop_mem ();
 }
Index: gcc/testsuite/gcc.dg/torture/pr79194.c
===
--- gcc/testsuite/gcc.dg/torture/pr79194.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr79194.c	(working copy)
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+
+int iw, vr;
+
+void
+d9 (unsigned int j3, long long int f5, int kp)
+{
+  int *qb = 
+
+  if (kp != 0)
+{
+  long long int oq;
+  unsigned int tl = 0;
+
+  for (j3 = 0; j3 < 1; ++j3)
+qb = 
+  goto ed;
+
+ l7:
+  oq = 1;
+  while (oq < 2)
+oq *= j3;
+
+ ed:
+  do
+{
+  oq -= *qb;
+  if (oq != 0)
+{
+  long long int ie = j3 & f5;
+  int ws = (j3 != 0 && kp != 0);
+
+  tl = ie > ws;
+  iw = vr = tl;
+}
+  else
+tl = (kp != 0 && (0 % 0) != 0); /* { dg-warning "division by zero" } */
+}
+  while (tl != 0);
+}
+  goto l7;
+}


Re: [RFC] sched: Do not move expensive insns speculatively (PR68664)

2017-01-27 Thread Bernd Schmidt

On 01/27/2017 02:19 AM, Segher Boessenkool wrote:


But what is "insn cost"?  Latency is no good at all -- we *want* insns
with higher latency to be earlier.  fsqrt is not pipelined, and that is
what makes it so costly.  (This isn't modeled in the scheduling
description btw: that would make the automata sizes explode, the usual
problem).


On other machines sqrt might be pipelined, so the patch as-is clearly 
isn't suitable. rtx_cost/insn_cost probably also won't do, since it 
doesn't model that property either.


Maybe instead of a hook you could have an insn attribute that the 
scheduler looks for.



Bernd


Re: Improve things for PR71724, in combine/if_then_else_cond

2017-01-26 Thread Bernd Schmidt

On 01/25/2017 08:46 PM, Segher Boessenkool wrote:


It turns out my patch (see the PR) causes (or at least triggers)
miscompilations on tilegx.  I will drop it for now.


Curious, it looked very reasonable. What's needed to reproduce this?


Bernd


Re: Improve things for PR71724, in combine/if_then_else_cond

2017-01-25 Thread Bernd Schmidt

On 01/25/2017 10:18 AM, Kyrill Tkachov wrote:

The test is supposed to test the generation of the vsel instruction.
I believe adding an -mcpu=cortex-a57 to the testcases would be best, as
VSEL isn't actually available on Cortex-A5, it's just enabled by the
-mfpu=fp-armv8 option.
A more realistic configuration would target an ARMv8-A CPU like the
Cortex-A57.


Ok, let me know if there's anything else you need from my side.


Bernd



Re: Improve things for PR71724, in combine/if_then_else_cond

2017-01-24 Thread Bernd Schmidt

On 01/24/2017 06:03 PM, Christophe Lyon wrote:

Ha... the regression occurred between r 244818  and r 244816,
and I read r 244816 ChangeLog too quickly and did not notice
it was modifying ifcvt.c in addition to x86-only files.

So it's likely that it's your other patch for pr78634
that caused the regression I mentioned. Does it make
more sense?


That's possible. That added a missing cost check, so the question 
becomes - is the change in generated assembly sensible, given the 
selected CPU type?



Bernd


Re: Improve things for PR71724, in combine/if_then_else_cond

2017-01-24 Thread Bernd Schmidt

On 01/24/2017 05:50 PM, Kyrill Tkachov wrote:


Actually trying it out with an explicit -mcpu=cortex-a5 (so -O2 -S
-mfpu=fp-armv8 -mcpu=cortex-a57 -mfloat-abi=hard) I get
the test failing before and after the patch. The code generated is
vcmp.f64d0, d1
vmrsAPSR_nzcv, FPSCR
vmovvs.f64  d0, d1
bx  lr

whereas the desired (e.g. with -mcpu=cortex-a57) is:
vcmp.f64d0, d1
vmrsAPSR_nzcv, FPSCR
vselvs.f64  d0, d1, d0
bx  lr


Yes, I've seen both of these generated with different options, but the 
patch did not make a difference here either.


For the moment I'll assume this was a false alarm, i.e. Christophe 
misidentified the patch and something else went wrong.



Bernd


Re: Improve things for PR71724, in combine/if_then_else_cond

2017-01-24 Thread Bernd Schmidt

On 01/24/2017 05:30 PM, Kyrill Tkachov wrote:


The -mfpu is overridden in the testcase to add the ARMv8 instructions.
So to reproduce the compilation in that testcase you'd want
-mfpu=fp-armv8 or
something equivalent rather than vfpv3-d16-fp16.


Exact steps please. No one who's not well-versed in all the ARM variants 
will be able to figure this out. I've been able to generate identical 
before/after code, both with and without vselvs.f64 instructions, after 
trying out a number of switch combinations, but I've not been able to 
find a way to show where the patch makes a difference.



Bernd



Re: Improve things for PR71724, in combine/if_then_else_cond

2017-01-24 Thread Bernd Schmidt

On 01/24/2017 09:38 AM, Christophe Lyon wrote:

It seems that Bernd's patch causes regressions on arm-linux-gnueabihf
--with-cpu=cortex-a5 --with-fpu=vfpv3-d16-fp16:

  gcc.target/arm/vselvcdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
  gcc.target/arm/vselvcsf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1
  gcc.target/arm/vselvsdf.c scan-assembler-times vselvs.f64\td[0-9]+ 1
  gcc.target/arm/vselvssf.c scan-assembler-times vselvs.f32\ts[0-9]+ 1

Maybe the upcoming patch from Segher intends to address this?


Not really. How exactly do I reproduce this - can you give me a command 
line and expected assembly output (I'm having some trouble figuring out 
the right set of switches)?



Bernd



Re: [PATCH v5] add -fprolog-pad=N,M option

2017-01-23 Thread Bernd Schmidt
There's still a a few details that need addressing, and some questions I 
have. Also Ccing Jakub to have another pair of eyes on the name of the 
section - I don't know if we want some sort of .gnu.something name.


On 01/13/2017 01:19 PM, Torsten Duwe wrote:


2017-01-13  Torsten Duwe :


First, some people seem to care, so I'll have to ask to please check the 
correct formatting of this line (spacing and all) by looking at existing 
entries. Also remove some of the blank lines in the ChangeLog, you could 
still group the doc entries separately from the code ones.



 * c-family/c-attribs.c : introduce prolog_pad attribute and create
   a handler for it.

 * lto/lto-lang.c : Likewise.


> * testsuite/c-c++-common/attribute-prolog_pad-1.c : New test.

These three directories have separate ChangeLogs. The top-level 
directory name should be stripped and the entries added to the 
appropriate file.



+  for (it = _ATTRIBUTES (newdecl); *it; it = _CHAIN(*it))
+   {
+ if (IDENTIFIER_LENGTH (get_attribute_name (*it)) !=
+ strlen("prolog_pad"))


Still several instances of bad formatting, in the entire block of code. 
Please refer to my previous email.



+static tree
+handle_prolog_pad_attribute (tree *, tree, tree, int,
+bool *)
+{
+  /* Nothing to be done here.  */
+  return NULL_TREE;
+}


Should still add at least one-liner comments before these empty 
functions to say what interface they're implementing. Also, probably 
don't need to wrap the line containing the arguments.



diff --git a/gcc/testsuite/c-c++-common/attribute-prolog_pad-1.c 
b/gcc/testsuite/c-c++-common/attribute-prolog_pad-1.c
new file mode 100644
index 000..2236aa8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attribute-prolog_pad-1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-fprolog-pad=3,1" } */


This test does not actually seem to verify that the padding has the 
expected size.



  /* Do not use IPA optimizations for register allocation if profiler is active
+or prolog pads are inserted for run-time instrumentation
 or port does not emit prologue and epilogue as RTL.  */
-  if (profile_flag || !targetm.have_prologue () || !targetm.have_epilogue ())
+  if (profile_flag || prolog_nop_pad_size
+  || !targetm.have_prologue () || !targetm.have_epilogue ())
 flag_ipa_ra = 0;


Was this explained? Why would ipa-ra be a problem?


+  if (pad_size > pad_entry)
+targetm.asm_out.print_prolog_pad (asm_out_file, pad_size-pad_entry,
+ (pad_entry == 0));


Unnecessary parens around the last argument, and missing spaces around 
operators.



Bernd


Improve things for PR71724, in combine/if_then_else_cond

2017-01-20 Thread Bernd Schmidt
The PR is about infinite recursion in combine_simplify_rtx, because 
if_then_else_cond does strange things to an expression, and we end up 
simplifying something to itself. The patch below tries to address this 
by improving that function a little. As stated in the PR, the situation 
is that we have


(plus:SI (if_then_else:SI (eq (reg:CC 185)
(const_int 0 [0]))
(reg:SI 165)
(reg:SI 174 [ t9.0_1+4 ]))
(reg:SI 165))

Reg 165 is known to be zero or one, so it gets turned into a condition, 
and we have two different conditions on the operands. That causes us to 
fail to make the fairly obvious transformation to


 cond = reg:CC 185
 true_rtx = (plus r165 r165)
 false_rtx = (plus r174 r165)

So, when looking for situations where we have only one condition, we can 
try to undo the conversion of a plain REG into a condition, on the 
grounds that this is probably less helpful.


This seems to cure the testcase, but Segher also has a patch in the PR 
that looks like a good and more direct approach. IMO both should be 
applied. This one was bootstrapped and tested on x86_64-linux. Ok?



Bernd
	PR rtl-optimization/71724
	* combine.c (if_then_else_cond): Look for situations where it is
	beneficial to undo the work of one of the recursive calls.

Index: gcc/combine.c
===
--- gcc/combine.c	(revision 244528)
+++ gcc/combine.c	(working copy)
@@ -9044,11 +9044,31 @@ if_then_else_cond (rtx x, rtx *ptrue, rt
  the same value, compute the new true and false values.  */
   else if (BINARY_P (x))
 {
-  cond0 = if_then_else_cond (XEXP (x, 0), , );
-  cond1 = if_then_else_cond (XEXP (x, 1), , );
+  rtx op0 = XEXP (x, 0);
+  rtx op1 = XEXP (x, 1);
+  cond0 = if_then_else_cond (op0, , );
+  cond1 = if_then_else_cond (op1, , );
+
+  if ((cond0 != 0 && cond1 != 0 && !rtx_equal_p (cond0, cond1))
+	  && (REG_P (op0) || REG_P (op1)))
+	{
+	  /* Try to enable a simplification by undoing work done by
+	 if_then_else_cond if it converted a REG into something more
+	 complex.  */
+	  if (REG_P (op0))
+	{
+	  cond0 = 0;
+	  true0 = false0 = op0;
+	}
+	  else
+	{
+	  cond1 = 0;
+	  true1 = false1 = op1;
+	}
+	}
 
   if ((cond0 != 0 || cond1 != 0)
-	  && ! (cond0 != 0 && cond1 != 0 && ! rtx_equal_p (cond0, cond1)))
+	  && ! (cond0 != 0 && cond1 != 0 && !rtx_equal_p (cond0, cond1)))
 	{
 	  /* If if_then_else_cond returned zero, then true/false are the
 	 same rtl.  We must copy one of them to prevent invalid rtl


Another cprop trap_if fix, PR79125

2017-01-20 Thread Bernd Schmidt
This is essentially the same patch I sent for the previous instance of 
this problem, but this time applied to local_cprop_pass. Bootstrapped 
and tested on x86_64-linux, and it seems to fix the testcase with a ppc 
cross. Ok?



Bernd
	PR rtl-optimization/79125
	* cprop.c (local_cprop_pass): Handle cases where we make an
	unconditional trap.

	PR rtl-optimization/79125
	* gcc.dg/torture/pr79125.c: New test.

Index: gcc/cprop.c
===
--- gcc/cprop.c	(revision 244528)
+++ gcc/cprop.c	(working copy)
@@ -1248,6 +1248,8 @@ local_cprop_pass (void)
   bool changed = false;
   unsigned i;
 
+  auto_vec uncond_traps;
+
   cselib_init (0);
   FOR_EACH_BB_FN (bb, cfun)
 {
@@ -1255,6 +1257,9 @@ local_cprop_pass (void)
 	{
 	  if (INSN_P (insn))
 	{
+	  bool was_uncond_trap
+		= (GET_CODE (PATTERN (insn)) == TRAP_IF
+		   && XEXP (PATTERN (insn), 0) == const1_rtx);
 	  rtx note = find_reg_equal_equiv_note (insn);
 	  do
 		{
@@ -1273,6 +1278,13 @@ local_cprop_pass (void)
 			  break;
 			}
 		}
+		  if (!was_uncond_trap
+		  && GET_CODE (PATTERN (insn)) == TRAP_IF
+		  && XEXP (PATTERN (insn), 0) == const1_rtx)
+		{
+		  uncond_traps.safe_push (insn);
+		  break;
+		}
 		  if (insn->deleted ())
 		break;
 		}
@@ -1287,6 +1299,14 @@ local_cprop_pass (void)
 
   cselib_finish ();
 
+  while (!uncond_traps.is_empty ())
+{
+  rtx_insn *insn = uncond_traps.pop ();
+  basic_block to_split = BLOCK_FOR_INSN (insn);
+  remove_edge (split_block (to_split, insn));
+  emit_barrier_after_bb (to_split);
+}
+
   return changed;
 }
 
Index: gcc/testsuite/gcc.dg/torture/pr79125.c
===
--- gcc/testsuite/gcc.dg/torture/pr79125.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr79125.c	(working copy)
@@ -0,0 +1,32 @@
+int za;
+
+void
+hl (void)
+{
+  short int o8 = 0;
+  short int m6 = 1;
+  short int *ni = 
+
+  for (;;)
+{
+  int af;
+  short int *fd = (short int *)
+
+  if (ni != 0)
+{
+  if (m6 != 0)
+*ni = 0;
+  else
+za = 0;
+  af = (o8 * o8) || o8;
+  if (af == 0)
+m6 /= 0; /* { dg-warning "division" } */
+  while (za != 0)
+{
+}
+}
+  *fd =  /* { dg-warning "without a cast" } */
+  for (af = 0; af < 2; ++af)
+af = za;
+}
+}


Re: PR78634: ifcvt/i386 cost updates

2017-01-18 Thread Bernd Schmidt

On 12/09/2016 12:49 PM, Bernd Schmidt wrote:

On 12/03/2016 10:49 AM, Uros Bizjak wrote:


Based on the above explanation, the patch is OK.


I'll be treating the ifcvt part of it as obvious. However, testing
showed an issue with the i386 funcspec-11 test:

/* PR target/36936 */
/* { dg-do compile } */
/* { dg-require-effective-target ia32 } */
/* { dg-options "-O2 -march=i386" } */
/* { dg-final { scan-assembler "cmov" } } */

extern int foo (int) __attribute__((__target__("arch=i686")));

int
foo (int x)
{
  if (x < 0)
x = 1;
  return x;
}

It wants to test that we can temporarily switch the arch, but we're
still using the i386 cost table, with a branch cost of 1. This means the
cmov now won't get generated for cost reasons.

It seems like whether we're generating cmov when tuning for i386 isn't
the most critical thing to worry about, so I propose amending the test
to also pass in -mtune=i686. Ok with that change?


Ping?


Bernd



Re: [PATCH, GCC/LRA, gcc-5/6-branch] Fix PR78617: Fix conflict detection in rematerialization

2017-01-17 Thread Bernd Schmidt

On 01/16/2017 08:26 PM, Jeff Law wrote:

On 01/13/2017 11:19 AM, Thomas Preudhomme wrote:

Ping? I'm not sure if an ok from Valdimir is enough or if I also need RM
approval.

Vlad's approval is all you need.


Is that a general rule? I'm never too certain on that.


Bernd



[i386] New lea patterns for PR71321

2016-12-20 Thread Bernd Schmidt
The problem here is that we don't have complete coverage of lea patterns 
for HImode/QImode: the combiner can't recognize a (plus (ashift reg 2) 
reg) pattern it builds.


My first idea was to canonicalize ASHIFT by constant inside PLUS to 
MULT. The docs say that this is done only inside a MEM context, but that 
seems misguided considering the existence of lea patterns. Maybe that's 
something to revisit for gcc-8. In the meantime, the patch below is more 
like a minimal fix, and it seems to produce better results at the moment 
anyway.


Bootstrapped and tested on x86_64-linux. Ok?


Bernd
	PR target/71321
	* config/i386/i386.md (lea_general_2b, lea_general_3b): New
	patterns.
	* config/i386/predicates.md (const123_operand): New.

	PR target/71321
	* gcc.target/i386/pr71321.c: New test.

Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md	(revision 243548)
+++ gcc/config/i386/i386.md	(working copy)
@@ -6266,6 +6266,27 @@ (define_insn_and_split "*lea_gener
   [(set_attr "type" "lea")
(set_attr "mode" "SI")])
 
+(define_insn_and_split "*lea_general_2b"
+  [(set (match_operand:SWI12 0 "register_operand" "=r")
+	(plus:SWI12
+	  (ashift:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
+			(match_operand 2 "const123_operand" "n"))
+	  (match_operand:SWI12 3 "nonmemory_operand" "ri")))]
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(plus:SI
+	  (ashift:SI (match_dup 1) (match_dup 2))
+	  (match_dup 3)))]
+{
+  operands[0] = gen_lowpart (SImode, operands[0]);
+  operands[1] = gen_lowpart (SImode, operands[1]);
+  operands[3] = gen_lowpart (SImode, operands[3]);
+}
+  [(set_attr "type" "lea")
+   (set_attr "mode" "SI")])
+
 (define_insn_and_split "*lea_general_3"
   [(set (match_operand:SWI12 0 "register_operand" "=r")
 	(plus:SWI12
@@ -6284,6 +6305,32 @@ (define_insn_and_split "*lea_gener
 	(match_dup 3))
 	  (match_dup 4)))]
 {
+  operands[0] = gen_lowpart (SImode, operands[0]);
+  operands[1] = gen_lowpart (SImode, operands[1]);
+  operands[3] = gen_lowpart (SImode, operands[3]);
+  operands[4] = gen_lowpart (SImode, operands[4]);
+}
+  [(set_attr "type" "lea")
+   (set_attr "mode" "SI")])
+
+(define_insn_and_split "*lea_general_3b"
+  [(set (match_operand:SWI12 0 "register_operand" "=r")
+	(plus:SWI12
+	  (plus:SWI12
+	(ashift:SWI12 (match_operand:SWI12 1 "index_register_operand" "l")
+			  (match_operand 2 "const123_operand" "n"))
+	(match_operand:SWI12 3 "register_operand" "r"))
+	  (match_operand:SWI12 4 "immediate_operand" "i")))]
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(plus:SI
+	  (plus:SI
+	(ashift:SI (match_dup 1) (match_dup 2))
+	(match_dup 3))
+	  (match_dup 4)))]
+{
   operands[0] = gen_lowpart (SImode, operands[0]);
   operands[1] = gen_lowpart (SImode, operands[1]);
   operands[3] = gen_lowpart (SImode, operands[3]);
Index: gcc/config/i386/predicates.md
===
--- gcc/config/i386/predicates.md	(revision 243548)
+++ gcc/config/i386/predicates.md	(working copy)
@@ -765,6 +765,14 @@ (define_predicate "const248_operand"
   return i == 2 || i == 4 || i == 8;
 })
 
+;; Match 1, 2, or 3.  Used for lea shift amounts.
+(define_predicate "const123_operand"
+  (match_code "const_int")
+{
+  HOST_WIDE_INT i = INTVAL (op);
+  return i == 1 || i == 2 || i == 3;
+})
+
 ;; Match 2, 3, 6, or 7
 (define_predicate "const2367_operand"
   (match_code "const_int")
Index: gcc/testsuite/gcc.target/i386/pr71321.c
===
--- gcc/testsuite/gcc.target/i386/pr71321.c	(nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr71321.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned char uint8_t;
+typedef unsigned int uint32_t;
+
+unsigned cvt_to_2digit(uint8_t i, uint8_t base)
+{
+  return ((i / base) | (uint32_t)(i % base)<<8);
+}
+unsigned cvt_to_2digit_ascii(uint8_t i)
+{
+  return cvt_to_2digit(i, 10) + 0x0a3030;
+}
+/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,4" 3 } } */
+/* { dg-final { scan-assembler-times "lea.\t\\(%\[0-9a-z\]+,%\[0-9a-z\]+,8" 1 } } */


Re: [PATCH v3] add -fprolog-pad=N,M option

2016-12-19 Thread Bernd Schmidt
I'll consider myself agnostic as to whether this is a feature we want or 
need, so I'll just comment on some style questions. There's a fair 
amount of coding style violations, I'll point some of them out but 
please read the documents we have linked on this page:

  https://gcc.gnu.org/contribute.html

On 12/16/2016 03:14 PM, Torsten Duwe wrote:

Signed-off-by: Torsten Duwe 


This is meaningless for the GCC project. We require a copyright 
assignment; I assume SuSE has a blanket one that covers you. You should 
also write a ChangeLog entry.



diff --git a/gcc/attribs.c b/gcc/attribs.c
index e66349a..6ff81a8 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -365,6 +365,28 @@ decl_attributes (tree *node, tree attributes, int flags)
   if (!attributes_initialized)
 init_attributes ();

+  /* If we're building NOP pads because of a command line arg, note the size
+ for LTO builds, unless the attribute has already been overridden. */


Two spaces at the end of a sentence, including at the end of a comment.


+  if (TREE_CODE (*node) == FUNCTION_DECL &&
+  prolog_nop_pad_size > 0)


Operators go on the next line when wrapping.


+),


Don't put closing parens on their own line.


diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index db293fe..7f3e558 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -2322,6 +2322,34 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, 
tree oldtype)
   TREE_ASM_WRITTEN (olddecl) = 0;
 }

+  /* Prolog pad size may be set wrongly by a forward declaration;
+ fix it up by pulling the final value in front.
+  */


The "*/" should go on the previous line.


+  for (it = _ATTRIBUTES (newdecl); *it; it = _CHAIN(*it))


Space before opening parentheses (many occurrences).


+void
+default_print_prolog_pad (FILE *file, unsigned HOST_WIDE_INT pad_size,
+ bool record_p)


Every function needs a comment describing its purpose and that of its 
arguments. Look around for examples. There are cases in this patch of 
hook implementations which ignore all their args, there I believe we can 
relax this rule a little (but a comment saying which hook is implemented 
would still be good).




+extern void default_print_prolog_pad (FILE *, unsigned HOST_WIDE_INT , bool);


Careful with stray whitespace.

+  if (tree_fits_uhwi_p (prolog_pad_value1) )


Here too.

+   {
+ pad_size = tree_to_uhwi (prolog_pad_value1);
+   }


Lose { } braces around single statements. Several cases.

Am I missing it or is there no actual implementation of the target hook 
anywhere?



Bernd


Re: [PATCH 1/2] print-rtl.c: use '<' and '>' rather than % for pseudos in compact mode

2016-12-19 Thread Bernd Schmidt

On 12/16/2016 09:18 PM, David Malcolm wrote:


The following patch implements the change for print-rtl.c.

OK for trunk assuming it passes bootstrap?


Yes.


Bernd



Re: [PATCH] Fix assertions along default switch labels (PR tree-optimization/78819)

2016-12-16 Thread Bernd Schmidt

On 12/16/2016 12:49 PM, Marek Polacek wrote:


But as this testcase shows, this breaks when the default label shares a label
with another case.  On this testcase, when we reach the switch, we know that
argc is either 1, 2, or 3.  So by the time we reach vrp2, the IR will have
been optimized to

  switch (argc) {
default:
case 3:
  // argc will be considered 1 despite the case 3
  break;
case 2:
  ...
   }


Shouldn't we just remove the "case 3:" from the switch in this case? 
Would that fix things?



Bernd


Re: Problem with pseudo-reg syntax in RTL frontend

2016-12-16 Thread Bernd Schmidt

On 12/14/2016 05:57 PM, David Malcolm wrote:

Any preferences? (or other syntax ideas?).  My preference is one of the
currently-unused sigils e.g. "@3", or to wrap them in braces "{3}".


Either might work, I'd vaguely prefer <3> over {3} but they're 
equivalent really. Maybe using "@" is simplest if it's unused.



Bernd



Re: [PATCH] Formatting and spelling fixes for ipa-cp.c

2016-12-15 Thread Bernd Schmidt

On 12/15/2016 05:51 PM, Jakub Jelinek wrote:


This patch fixes what I've found quickly, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?


Ok.


Bernd



Re: cprop fix for PR78626

2016-12-14 Thread Bernd Schmidt

On 12/12/2016 03:21 PM, Bernd Schmidt wrote:

On 12/10/2016 08:58 PM, Segher Boessenkool wrote:

On Thu, Dec 08, 2016 at 01:21:04PM +0100, Bernd Schmidt wrote:

This is another case where an optimization turns a trap_if
unconditional. We have to defer changing the CFG, since the rest of
cprop seems to blow up when we modify things while scanning.



The problem for PR78727 is that we also need to do this for insns that
already are the last insn in the block:


+  while (!uncond_traps.is_empty ())
+{
+  rtx_insn *insn = uncond_traps.pop ();
+  basic_block to_split = BLOCK_FOR_INSN (insn);
+  remove_edge (split_block (to_split, insn));
+  emit_barrier_after_bb (to_split);
+}


We need that barrier, and we also need the successor edges removed
(which split_block+remove_edge does).

(PR78727 works fine with just that BB_END test deleted).


Ah, ok. In that case I'll probably also add a test to make sure this is
only done for insns that weren't an unconditional trap before. Retesting...


That would be this patch. Tested as before. The two new testcases seem 
to pass with a ppc cross (but I would appreciate if someone were to run 
full tests on ppc).



Bernd

	PR rtl-optimization/78626
	PR rtl-optimization/78727
	* cprop.c (one_cprop_pass): Collect unconditional traps in the middle
	of a block, and split such blocks after everything else is finished.

	PR rtl-optimization/78626
	PR rtl-optimization/78727
	* gcc.dg/torture/pr78626.c: New test.
	* gcc.dg/torture/pr78727.c: New test.

Index: gcc/cprop.c
===
--- gcc/cprop.c	(revision 243548)
+++ gcc/cprop.c	(working copy)
@@ -1794,7 +1794,7 @@ one_cprop_pass (void)
   if (set_hash_table.n_elems > 0)
 {
   basic_block bb;
-  rtx_insn *insn;
+  auto_vec uncond_traps;
 
   alloc_cprop_mem (last_basic_block_for_fn (cfun),
 		   set_hash_table.n_elems);
@@ -1810,6 +1810,9 @@ one_cprop_pass (void)
 		  EXIT_BLOCK_PTR_FOR_FN (cfun),
 		  next_bb)
 	{
+	  bool seen_uncond_trap = false;
+	  rtx_insn *insn;
+
 	  /* Reset tables used to keep track of what's still valid [since
 	 the start of the block].  */
 	  reset_opr_set_tables ();
@@ -1817,6 +1820,10 @@ one_cprop_pass (void)
 	  FOR_BB_INSNS (bb, insn)
 	if (INSN_P (insn))
 	  {
+		bool was_uncond_trap
+		  = (GET_CODE (PATTERN (insn)) == TRAP_IF
+		 && XEXP (PATTERN (insn), 0) == const1_rtx);
+
 		changed |= cprop_insn (insn);
 
 		/* Keep track of everything modified by this insn.  */
@@ -1825,11 +1832,27 @@ one_cprop_pass (void)
 		   insn into a NOTE, or deleted the insn.  */
 		if (! NOTE_P (insn) && ! insn->deleted ())
 		  mark_oprs_set (insn);
+
+		if (!was_uncond_trap && !seen_uncond_trap
+		&& GET_CODE (PATTERN (insn)) == TRAP_IF
+		&& XEXP (PATTERN (insn), 0) == const1_rtx)
+		  {
+		seen_uncond_trap = true;
+		uncond_traps.safe_push (insn);
+		  }
 	  }
 	}
 
   changed |= bypass_conditional_jumps ();
 
+  while (!uncond_traps.is_empty ())
+	{
+	  rtx_insn *insn = uncond_traps.pop ();
+	  basic_block to_split = BLOCK_FOR_INSN (insn);
+	  remove_edge (split_block (to_split, insn));
+	  emit_barrier_after_bb (to_split);
+	}
+
   FREE_REG_SET (reg_set_bitmap);
   free_cprop_mem ();
 }
Index: gcc/testsuite/gcc.dg/torture/pr78626.c
===
--- gcc/testsuite/gcc.dg/torture/pr78626.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr78626.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+
+int qs;
+
+void
+ms (int g1)
+{
+  int cy;
+  int *fr = 
+
+  for (;;)
+{
+  *fr = 1;
+  fr = 
+
+  while (qs != 0)
+{
+  if (qs | cy)
+qs = g1 / 0; /* { dg-warning "division" } */
+  ++qs;
+}
+
+  cy = 1;
+  while (cy != 0)
+cy = 2;
+}
+}
Index: gcc/testsuite/gcc.dg/torture/pr78727.c
===
--- gcc/testsuite/gcc.dg/torture/pr78727.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr78727.c	(working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+
+int
+dd (int gj, unsigned int o7)
+{
+  long long int e8 = gj;
+
+  e8 |= gj + 1u;
+  if (e8 != 0)
+{
+  short int *mn = (short int *)
+  int pv;
+
+  e8 &= e8 > gj;
+  gj = o7 > e8;
+  pv = ((gj != 0) ? gj : *mn) && e8;
+  gj |= *mn / pv;
+}
+
+  return gj;
+}


Re: PR target/78213 revisited (was Re: [PATCH 5/9] Introduce selftest::locate_file (v4))

2016-12-14 Thread Bernd Schmidt

On 12/09/2016 08:32 PM, David Malcolm wrote:

Thanks.  Unfortunately, applying the "locate_file" patch
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01186.html
would now introduce a regression in a recently-added test case:



The problem is that this DejaGnu test case uses -fself-test, and
doesn't provide any arguments.  With the locate_file patch, we need to
pass the path to $(srcdir)/testsuite/selftests as an argument to -fself
-test, and it's not clear to me how to do that sanely in a DejaGnu test
case; if I pass in a dummy value (like for pr71591.c), then the
selftests that use locate_file fail.


I think this part of the selftest framework (how to locate files and 
tests) is possibly going to need some rethinking. But for now, I'm ok 
with just adding dg-skip-if *-*-* to the problematic testcase, with a 
comment describing the situation.



Bernd



Re: cprop fix for PR78626

2016-12-12 Thread Bernd Schmidt

On 12/10/2016 08:58 PM, Segher Boessenkool wrote:

On Thu, Dec 08, 2016 at 01:21:04PM +0100, Bernd Schmidt wrote:

This is another case where an optimization turns a trap_if
unconditional. We have to defer changing the CFG, since the rest of
cprop seems to blow up when we modify things while scanning.



The problem for PR78727 is that we also need to do this for insns that
already are the last insn in the block:


+  while (!uncond_traps.is_empty ())
+   {
+ rtx_insn *insn = uncond_traps.pop ();
+ basic_block to_split = BLOCK_FOR_INSN (insn);
+ remove_edge (split_block (to_split, insn));
+ emit_barrier_after_bb (to_split);
+   }


We need that barrier, and we also need the successor edges removed
(which split_block+remove_edge does).

(PR78727 works fine with just that BB_END test deleted).


Ah, ok. In that case I'll probably also add a test to make sure this is 
only done for insns that weren't an unconditional trap before. Retesting...



Bernd


Re: [nvptx] propagating conditionals in worker-vector partitioned loops

2016-12-09 Thread Bernd Schmidt

On 10/27/2016 12:29 AM, Cesar Philippidis wrote:

Currently, the nvptx backend is only neutering the worker axis when
propagating variables used in conditional expressions across the worker
and vector axes. That's a problem with the worker-state spill and fill
propagation implementation because all of the vector threads in worker 0
all write the the same address location being spilled. As the attached
test case demonstrates, this might cause an infinite loop depending on
the values in the vector threads being propagated.

This patch fixes this issue by introducing a new worker-vector
predicate, so that both the worker and vector threads can be predicated
together, not separately. I.e., instead of first neutering worker axis,
then neutering the vector axis, this patch uses a single predicate for
tid.x == 0 && tid.y == 0.

Is this patch ok for trunk?


This is more of an OpenACC patch than an nvptx patch. Nathan would be 
the best person to review it, but if he is disinclined, I'll just 
approve it on the grounds that you're probably in the best position to know.



Bernd



Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Bernd Schmidt

On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:


Regardless, 'reload_cse_simplify' would never perform the opposite
transformation.  It checks whether it can replace anything within the
first argument INSN, with the second argument TESTREG. As the name
implies this will always be a register. I double checked, the function
is only called in 'reload_cse_regs' and 'testreg' is created using
'gen_rtx_REG'.


Ok, let's go ahead with it.


Bernd



Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Bernd Schmidt

On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:


Regardless, the other testcases I add in this patch show a sub-optimal
transformation done by postreload, turning direct calls into indirect
calls, for targets which have specifically pointed out that no CSE
should be done on functions through 'NO_FUNCTION_CSE'.


What I'm wondering about is whether the patch wouldn't also prevent the 
opposite transformation. Is there a reason not to do that one? Can the 
problem be modeled by tweaking costs?



Would you prefer I create a new PR for the problem this is actually
fixing and refile this PATCH under that PR?


Well, as long as you're working on fixing it I see no reason to clutter 
the bug database for the function cse issue, but do keep the existing PR 
open if there also ought to be register class changes.



Bernd



Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Bernd Schmidt

On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote:

This patch fixes the issue reported in PR78255 by making postreload
aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
defined to true.

Bootstrap and full regression on arm-none-linux-gnueabihf and
aarch64-unknown-linux-gnu.

Also checked this fixed the reported issue on arm-none-eabi.

Is this OK for trunk?


Hmm, it probably doesn't hurt, but looking at the PR I think the 
originally reported problem suggests you need a different fix: a 
separate register class to be used for indirect sibling calls. I 
remember seeing similar issues on other targets.



Bernd


  1   2   3   4   5   6   7   8   9   10   >