date:20170223

GCC doc patch committed

2017-02-23 Thread Ian Lance Taylor

I committed this patch to wwwdocs/htdocs/gcc-7/changes.html to
describe the status of Go in the GCC 7 release.

Ian
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.65
diff -u -r1.65 changes.html
--- changes.html19 Feb 2017 21:55:56 -  1.65
+++ changes.html24 Feb 2017 05:18:33 -
@@ -749,7 +749,20 @@
   Improved diagnostics (polymorphic results in pure functions).
 
 
-
+Go
+
+  GCC 7 provides a complete implementation of the Go 1.8
+user packages.
+
+  Compared to the Go 1.8 toolchain, the garbage collector is more
+conservative and less concurrent.
+
+  Escape analysis is available for experimental use via
+the -fgo-optimize-allocs option.
+The -fgo-debug-escape prints information useful for
+debugging escape analysis choices.
+
+
 
 Java (GCJ)
 The GCC Java frontend and associated libjava runtime library have been

Re: [PATCH] PR79584, lra ICE in base_to_reg

2017-02-23 Thread Alan Modra

I'm going to wait for Vlad's opinion.  I've written a couple of
replies and erased them, since I figure whatever I have to say doesn't
carry much weight.

-- 
Alan Modra
Australia Development Lab, IBM

enable -Wformat-truncation with -Og (PR 79691)

2017-02-23 Thread Martin Sebor


Bug 79691 - -Wformat-truncation suppressed by (and only by) -Og
points out that the gimple-ssa-sprintf pass doesn't run when
this optimization option is used.  That's because I forgot to
add it to the set of optimization passes that run with that
option.  The attached trivial patch tested on x86_64 corrects
the oversight.

Is this okay for 7.0?

Martin
PR tree-optimization/79691 - -Wformat-truncation suppressed by (and only by) -Og

gcc/ChangeLog:

	PR c/79691
	* passes.def (pass_all_optimizations_g): Enable pass_sprintf_length.

gcc/testsuite/ChangeLog:

	PR c/79691
	* gcc.dg/tree-ssa/pr79691.c: New test.

diff --git a/gcc/passes.def b/gcc/passes.def
index c09ec22..6b0f05b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -364,6 +364,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_object_sizes);
   /* Fold remaining builtins.  */
   NEXT_PASS (pass_fold_builtins);
+  NEXT_PASS (pass_sprintf_length, true);
   /* Copy propagation also copy-propagates constants, this is necessary
  to forward object-size and builtin folding results properly.  */
   NEXT_PASS (pass_copy_prop);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
new file mode 100644
index 000..badbfcd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
@@ -0,0 +1,27 @@
+/* PR tree-optimization/79691 - -Wformat-truncation suppressed by
+   (and only by) -Og
+
+   { dg-compile }
+   { dg-options "-Og -Wall" } */
+
+char d[2];
+
+/* Verify -Wformat-overflow works.  */
+void f (void)
+{
+  __builtin_sprintf (d, "%i", 123);   /* { dg-warning "directive writing 3 bytes" } */
+}
+
+/* Verify -Wformat-truncation works.  */
+void g (void)
+{
+  __builtin_snprintf (d, sizeof d, "%i", 1234);   /* { dg-warning "output truncated writing 4 bytes" } */
+}
+
+/* Verify -fprintf-return-value works.  */
+int h (void)
+{
+  return __builtin_snprintf (0, 0, "%i", 12345);
+}
+
+/* { dg-final { scan-tree-dump-not "snprintf" "optimized" } } */

Re: [PATCH] PR79584, lra ICE in base_to_reg

2017-02-23 Thread Richard Sandiford

Alan Modra  writes:
> On Thu, Feb 23, 2017 at 11:41:09AM +1030, Alan Modra wrote:
>> lo_sum is indeed not valid for mem:SD.  simplify_operand_subreg is
>> where the subreg disappears.
>
> Richard, doesn't the following say that lra is expecting to reload
> exactly the lo_sum address you seem to think it should not handle in
> process_address?
>
> /* We still can reload address and if the address is
>valid, we can remove subreg without reloading its
>inner memory.  */
> && valid_address_p (GET_MODE (subst),
> regno_reg_rtx
> [ira_class_hard_regs
>  [base_reg_class (GET_MODE (subst),
>   MEM_ADDR_SPACE (subst),
>   ADDRESS, SCRATCH)][0]],
> MEM_ADDR_SPACE (subst

Yeah, I think that's a bit too broad.  It was added in:

2016-02-03  Vladimir Makarov  
Alexandre Oliva  

PR target/69461
* lra-constraints.c (simplify_operand_subreg): Check additionally
address validity after potential reloading.
(process_address_1): Check insns validity.  In case of failure do
nothing.

to allow the subreg to be simplified for the stack mem:

(mem/c:V2DF (plus:DI (reg/f:DI 113 sfp)
(const_int 96 [0x60])) [6 %sfp+96 S16 A128])

Coping with that kind of stack address is no problem.  But the patch
seems to allow any other address through as well, even though
process_address_1 still has the old assumption that the address is
"basically" valid.  E.g. as well as the assert you were patching,
there's:

  /* Any index existed before LRA started, so we can assume that the
 presence and shape of the index is valid.  */
  push_to_sequence (*before);
  lra_assert (ad.disp == ad.disp_term);

which could also fire if we allow an address that has the right shape
for one mode to be used with any other mode.

The patch made process_address_1 punt and return false for the MEM
quoted above.  I think it'd be dangerous to extend that to all
other types of MEM.  Wouldn't that require the target to provide a
load-address pattern for every possible MEM address?  E.g. if the
address requires relocation operators, the load-address version might
need a different relocation sequence from the load/store version.

Thanks,
Richard

Re: [PATCH] restore -Wunused-variable on a typedef'd variable in a function template (PR 79548)

2017-02-23 Thread Jason Merrill

On Thu, Feb 23, 2017 at 12:56 PM, Martin Sebor  wrote:
> On 02/22/2017 05:43 PM, Jason Merrill wrote:
>> On Wed, Feb 22, 2017 at 3:44 PM, Martin Sebor  wrote:
>>> On 02/22/2017 11:02 AM, Jason Merrill wrote:

>>> The TREE_USED bit on the type (i.e., on
>>> TREE_TYPE(decl) where decl is the u in the test case above) is
>>> set when the function template is instantiated, in
>>> set_underlying_type called from tsubst_decl.
>>
>> Aha!  That seems like the problem.  Does removing that copy of
>> TREE_USED from the decl to the type break anything?
>
> As far as I can see it breaks just gcc.dg/unused-3.c which is:
>
>   typedef short unused_type __attribute__ ((unused));
>
>   void f (void)
>   {
> unused_type y;
>   }

Ah.  So the problem here is that we're using TREE_USED on a TYPE_DECL
for two things: to track whether the typedef has been used, and to
determine whether to treat a variable of that type as already used.
Normally this isn't too much of a problem because we copy the
TREE_USED flag from decl to type before there could be any uses, but
in a template I guess we mark the typedef within the template when it
is used, and then when we instantiate the typedef we incorrectly pass
that mark along to the type.

Your patch deals with this by ignoring TREE_USED on the type, so it
doesn't matter that it's wrong.  Another approach would be to correct
the setting of TREE_USED by setting it based on looking up the unused
attribute, rather than copying it from the TYPE_DECL.  So also going
back to the attribute, but in set_underlying_type rather than
poplevel.

Jason

[visium] Clean up C testsuite

2017-02-23 Thread Eric Botcazou

This eliminates all the regressions that recently crept in (except for the 
famous gcc.dg/tree-ssa/ssa-thread-14.c present on several other platforms).

Tested on visium-elf, applied on the mainline.


2017-02-23  Eric Botcazou  

* config/visium/visium.md (type): Add trap.
(b): New mode attribute.
(*btst): Rename into...
(*btst): ...this and adjust.
(*cbranchsi4_btst_insn): Rename into...
(*cbranch4_btst_insn): ...this and adjust.
(trap): New define_insn.


2017-02-23  Eric Botcazou  

* gcc.target/visium/bit_test.c: Accept any lsr form.
* gcc.target/visium/block_move.c: Tweak.

-- 
Eric BotcazouIndex: config/visium/visium.md
===
--- config/visium/visium.md	(revision 245625)
+++ config/visium/visium.md	(working copy)
@@ -137,9 +137,10 @@ (define_c_enum "unspecv" [
 ;nop   No operation.
 ;multi Multiple instructions which split.
 ;asm   User asm instructions.
+;trap  Trap instructions.
 
 (define_attr "type"
-"imm_reg,mem_reg,eam_reg,fp_reg,reg_mem,reg_eam,reg_fp,arith,arith2,logic,abs_branch,branch,bmi,call,ret,rfi,dsi,cmp,div,divd,mul,shiftdi,fdiv,fsqrt,ftoi,itof,fmove,fcmp,fp,nop,multi,asm" (const_string "logic"))
+"imm_reg,mem_reg,eam_reg,fp_reg,reg_mem,reg_eam,reg_fp,arith,arith2,logic,abs_branch,branch,bmi,call,ret,rfi,dsi,cmp,div,divd,mul,shiftdi,fdiv,fsqrt,ftoi,itof,fmove,fcmp,fp,nop,multi,asm,trap" (const_string "logic"))
 
 ; Those insns that occupy 4 bytes.
 (define_attr "single_insn" "no,yes"
@@ -205,6 +206,7 @@ (define_attr "cpu" "gr5,gr6" (const (sym
 
 (define_mode_iterator QHI [QI HI])
 (define_mode_iterator I [QI HI SI])
+(define_mode_attr b [(QI "8") (HI "16") (SI "32")])
 (define_mode_attr s [(QI ".b") (HI ".w") (SI ".l")])
 
 ; This code iterator allows signed and unsigned widening multiplications
@@ -1986,15 +1988,15 @@ (define_insn_and_split "*zero_extendsidi
 
 ; BITS_BIG_ENDIAN is defined to 1 so operand #1 counts from the MSB.
 
-(define_insn "*btst"
+(define_insn "*btst"
   [(set (reg:CCC R_FLAGS)
-	(compare:CCC (zero_extract:SI
-		   (match_operand:SI 0 "register_operand" "r")
+	(compare:CCC (zero_extract:I
+		   (match_operand:I 0 "register_operand" "r")
 		   (const_int 1)
 		   (match_operand:QI 1 "const_shift_operand" "K"))
 		 (const_int 0)))]
   "reload_completed"
-  "lsr.l   r0,%0,32-%1"
+  "lsr   r0,%0,-%1"
   [(set_attr "type" "logic")])
 
 ;;
@@ -2373,11 +2375,11 @@ (define_insn_and_split "*cbranch4_
 }
   [(set_attr "type" "cmp")])
 
-(define_insn_and_split "*cbranchsi4_btst_insn"
+(define_insn_and_split "*cbranch4_btst_insn"
   [(set (pc)
 	(if_then_else (match_operator 0 "visium_equality_comparison_operator"
-		   [(zero_extract:SI
-			   (match_operand:SI 1 "register_operand" "r")
+		   [(zero_extract:I
+			   (match_operand:I 1 "register_operand" "r")
 			   (const_int 1)
 			   (match_operand:QI 2 "const_shift_operand" "K"))
 		(const_int 0)])
@@ -2512,6 +2514,20 @@ (define_insn "tablejump"
 
 ;;
 ;;;
+;;
+;; trap instructions
+;;
+;;;
+;;
+
+(define_insn "trap"
+  [(trap_if (const_int 1) (const_int 0))]
+  ""
+  "stop0,r0"
+  [(set_attr "type" "trap")])
+
+;;
+;;;
 ;;
 ;; Subprogram call instructions
 ;;

[PATCH,MIPS] Handle paired single test changes

2017-02-23 Thread Matthew Fortune

Hi Catherine,

I missed a couple of testsuite changes that are needed to deal with the
fallout of fixing the ABI issues for floating point vectors.  I had them
in my tree but forgot to post.  The ABI change for V2SF i.e. paired
single is a bug fix as the behaviour was unintended and violates the goal
of having FP64 a compatible ABI extension for o32.  The probability of
having code dependent on this corner case of the calling convention in
the wild is exceptionally low so I see no significant risk still.

The tests for paired single just need a little encouragement to still
produce the necessary instructions now that paired single is not returned
in registers.

Does it look OK to you?

Thanks,
Matthew

gcc/testsuite/

* gcc.target/mips/mips-ps-type-2.c (move): Force generation
of mov.ps.
* gcc.target/mips/mips-ps-type.c (move): Likewise.
(cond_move1): Simplify condition to force generation of
mov[nz].ps.
(cond_move2): Likewise.
---

diff --git a/gcc/testsuite/gcc.target/mips/mips-ps-type-2.c 
b/gcc/testsuite/gcc.target/mips/mips-ps-type-2.c
index fecc35b..ed5d6ee 100644
--- a/gcc/testsuite/gcc.target/mips/mips-ps-type-2.c
+++ b/gcc/testsuite/gcc.target/mips/mips-ps-type-2.c
@@ -32,6 +32,11 @@ NOMIPS16 v2sf init (float a, float b)
 /* Move between registers */
 NOMIPS16 v2sf move (v2sf a)
 {
+  register v2sf b __asm__("$f0") = a;
+  register v2sf c __asm__("$f2");
+  __asm__ __volatile__ ("" : "+f" (b));
+  c = b;
+  __asm__ __volatile__ ("" : : "f" (c));
   return a;
 }
 
diff --git a/gcc/testsuite/gcc.target/mips/mips-ps-type.c 
b/gcc/testsuite/gcc.target/mips/mips-ps-type.c
index d74d4b5..731649c 100644
--- a/gcc/testsuite/gcc.target/mips/mips-ps-type.c
+++ b/gcc/testsuite/gcc.target/mips/mips-ps-type.c
@@ -30,6 +30,11 @@ NOMIPS16 v2sf init (float a, float b)
 /* Move between registers */
 NOMIPS16 v2sf move (v2sf a)
 {
+  register v2sf b __asm__("$f0") = a;
+  register v2sf c __asm__("$f2");
+  __asm__ __volatile__ ("" : "+f" (b));
+  c = b;
+  __asm__ __volatile__ ("" : : "f" (c));
   return a;
 }
 
@@ -96,7 +101,7 @@ NOMIPS16 v2sf nmsub (v2sf a, v2sf b, v2sf c)
 /* Conditional Move */ 
 NOMIPS16 v2sf cond_move1 (v2sf a, v2sf b, long i)
 {
-  if (i > 0)
+  if (i != 0)
 return a;
   else
 return b;
@@ -105,7 +110,7 @@ NOMIPS16 v2sf cond_move1 (v2sf a, v2sf b, long i)
 /* Conditional Move */ 
 NOMIPS16 v2sf cond_move2 (v2sf a, v2sf b, int i)
 {
-  if (i > 0)
+  if (i != 0)
 return a;
   else
 return b;
-- 
2.2.1

Re: fwprop fix for PR79405

2017-02-23 Thread Jeff Law


On 02/23/2017 01:57 AM, Richard Biener wrote:

On Wed, Feb 22, 2017 at 6:19 PM, Jeff Law  wrote:

On 02/16/2017 12:41 PM, Bernd Schmidt wrote:


We have two registers being assigned to each other:

 (set (reg 213) (reg 209))
 (set (reg 209) (reg 213))

These being the only definitions, we are happy to forward propagate reg
209 for reg 213 into a third insn, making a new use for reg 209. We are
then happy to forward propagate reg 213 for it in the same insn...
ending up in an infinite loop.

I don't really see an elegant way to prevent this, so the following just
tries to detect the situation (and more general ones) by brute force.
Bootstrapped and tested on x86_64-linux, verified that the test passes
with a ppc cross, ok?


Bernd


79405.diff


PR rtl-optimization/79405
* fwprop.c (forward_propagate_into): Detect potentially cyclic
replacements and bail out for them.

PR rtl-optimization/79405
* gcc.dg/torture/pr79405.c: New test.


OK.


Err - this looks quite costly done for each fwprop.  And placing it before
less costly bailouts even...

See my discussion with Bernd anyway.
I read your last message as being OK with Bernd's approach?  Did I 
mis-understand?


jeff

Re: [PATCH, doc]: Mention that -mfpmath=sse is the default on 32bit x86 w/ SSE2 and -ffast-math

2017-02-23 Thread Martin Sebor


On 02/23/2017 12:01 PM, Uros Bizjak wrote:

On Thu, Feb 23, 2017 at 7:07 PM, Martin Sebor  wrote:


A minor grammatical nit:

  +This is the default choice for most of x86-32 targets.

"for most x86-32 targets" is correct unless the targets are some
specific subset, in which case "most of the [previously mentioned]
x86-32 targets" would work.


Maybe we can say "This is the default choice for non-Darwin x86-32
targets." here?

And further extend:
"This is the default choice for the x86-64 compiler, Darwin x86-32
targets, and the default choice for x86-32 targets with SSE2
instruction set when @option{-ffast-math} is enabled."


The phrasing looks good to me.  There should be an article before
SSE2, i.e., "targets with the SSE2 instruction set"

Martin

Re: [PATCH] Move -Wrestrict warning later in the FEs and fix some issues in it (PR c++/79588)

2017-02-23 Thread Jeff Law


On 02/20/2017 01:35 PM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, -Wrestrict warning is done way too early, where
e.g. default arguments aren't filled up yet (reason for ICE on first
testcase) or where arguments in templates aren't instantiated yet (reason
why we don't diagnose anything on the second testcase).

This patch moves it later where e.g. -Wformat is diagnosed and fixes
some issues I found while looking at the code.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-20  Jakub Jelinek  

PR c++/79588
c-family/
* c-common.c (check_function_arguments): Add FNDECL argument.
Handle -Wrestrict here.
* c-warn.c (warn_for_restrict): Remove ARGS argument, add ARGARRAY
and NARGS.  Use auto_vec for ARG_POSITIONS, simplify.
* c-common.h (check_function_arguments): Add FNDECL argument.
(warn_for_restrict): Remove ARGS argument, add ARGARRAY and NARGS.
c/
* c-parser.c (c_parser_postfix_expression_after_primary): Don't
handle -Wrestrict here.
* c-typeck.c (build_function_call_vec): Adjust
check_function_arguments caller.
cp/
* call.c (build_over_call): Call check_function_arguments even for
-Wrestrict, adjust check_function_arguments caller.
* parser.c (cp_parser_postfix_expression): Don't handle -Wrestrict
here.
* typeck.c (cp_build_function_call_vec): Adjust
check_function_arguments caller.
testsuite/
* g++.dg/warn/Wrestrict-1.C: New test.
* g++.dg/warn/Wrestrict-2.C: New test.
Please  refactor the restrict warning bits into their own function, then 
calling that from check_function_arguments.  That's the style already 
used there.


OK with that change.

jeff

Re: [PATCH PR79663]Only reversely combine refs for ZERO length chains in predcom

2017-02-23 Thread Jeff Law


On 02/23/2017 04:24 AM, Bin Cheng wrote:

Hi,
This patch resolves spec2k/mgrid regression as reported by PR79663.  Root cause 
has been
described thoroughly in comment #1/#2 of 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79663
This patch handles ZERO/non-ZERO length chains differently and only reversely 
combines refs
for ZERO length chains.  It also does small improvement with in place list swap.
Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin

2017-01-21  Bin Cheng  

PR tree-optimization/79663
* tree-predcom.c (combine_chains): Process refs in reverse order
only for ZERO length chains, and add explaining comment.


I went ahead and installed this.

jeff

Re: [PATCH,testsuite] Use logical_op_short_circuit to skip targets in ssa-thread-14.c.

2017-02-23 Thread Jeff Law


On 02/23/2017 04:04 AM, Toma Tabacu wrote:

Hi,

The ssa-thread-14.c test has been failing for MIPS for a while.

According to Patrick Palka, who modified this test's target selector in the fix
for PR71314, this test fails on targets which don't use non-short-circuit
logical ops and should be skipped for such targets.
In the case of MIPS, LOGICAL_OP_NON_SHORT_CIRCUIT is set to 0, so the test
should be skipped for MIPS targets.

This patch adds the !logical_op_short_circuit requirement (defined in
testsuite/lib/target-supports.exp:7965) to ssa-thread-14.c's dg-options, which
will exclude MIPS targets. It also removes the "-mbranch-cost" options from
being passed to targets which will be skipped because of the newly added
!logical_op_short_circuit requirement.

This makes ssa-thread-14.c's target selector more similar the one from
ssa-thread-11.c (which was mentioned as a solution to PR71314 in the Bugzilla
thread).

Here are some links, for your convenience:
PR71314 on Bugzilla:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71314
The patch submission for fixing PR71314:
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02359.html

Does this look OK ?
Is the !logical_op_short_circuit too heavy-handed here ?

Regards,
Toma Tabacu

gcc/testsuite/

* gcc.dg/tree-ssa/ssa-thread-14.c (dg-options): Use
logical_op_short_circuit to skip targets.
(dg-additional-options): Don't pass -mbranch-cost=2 for MIPS, AVR
and s390.

I don't think using !logical_op_short_circuit is too heavy handed here.

We get good coverage from the x86 target, so I don't mind losing 
coverage from avr/s390 as the target selector is a lot more likely to be 
correct after your change for the various targets now and in the future.


OK for the trunk.

Thanks,

jeff

[PATCH][PR tree-optimization/79578] Use OEP_ADDRESS_OF

2017-02-23 Thread Jeff Law

Per Richi's request use OEP_ADDRESS_OF in the call to operand_equal_p. 
Bootstrapped and regression tested on x86_64-linux-gnu.  Installed on 
the trunk.


jeff
commit 9715bea8d2c2a8332acca572afdd6c0403e677a9
Author: law 
Date:   Thu Feb 23 21:43:03 2017 +

PR tree-optimization/79578
* tree-ssa-dse.c (clear_bytes_written_by): Use OEP_ADDRESS_OF
in call to operand_equal_p.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245688 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d003ab1..37ae06a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-02-23  Jeff Law  
+
+   PR tree-optimization/79578
+   * tree-ssa-dse.c (clear_bytes_written_by): Use OEP_ADDRESS_OF
+   in call to operand_equal_p.
+
 2017-01-23  Dominique d'Humieres  
 
PR target/71017
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index a82e164..53feaf3 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -176,7 +176,7 @@ clear_bytes_written_by (sbitmap live_bytes, gimple *stmt, 
ao_ref *ref)
   /* Verify we have the same base memory address, the write
  has a known size and overlaps with REF.  */
   if (valid_ao_ref_for_dse ()
-  && operand_equal_p (write.base, ref->base, 0)
+  && operand_equal_p (write.base, ref->base, OEP_ADDRESS_OF)
   && write.size == write.max_size
   && ((write.offset < ref->offset
   && write.offset + write.size > ref->offset)

Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Bernd Schmidt


On 02/23/2017 10:27 PM, Jakub Jelinek wrote:

Now successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk?


LGTM.


Bernd

Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Jakub Jelinek

On Thu, Feb 23, 2017 at 03:07:14PM +0100, Jakub Jelinek wrote:
> On Thu, Feb 23, 2017 at 02:47:11PM +0100, Bernd Schmidt wrote:
> > On 02/23/2017 02:36 PM, Jakub Jelinek wrote:
> > > and both UNLT and GE can be reversed.  But if the arguments of the 
> > > condition
> > > are canonicalized, we run into:
> > >   /* Test for an integer condition, or a floating-point comparison
> > >  in which NaNs can be ignored.  */
> > >   if (CONST_INT_P (arg0)
> > >   || (GET_MODE (arg0) != VOIDmode
> > >   && GET_MODE_CLASS (mode) != MODE_CC
> > >   && !HONOR_NANS (mode)))
> > > return reverse_condition (code);
> > > and thus always return UNKNOWN.
> > 
> > So... do you think we could add (in gcc-8, probably, although if it fixes
> > this regression...)
> > 
> >else if (GET_MODE (arg0) != VOIDmode
> > && GET_MODE_CLASS (mode) != MODE_CC
> > && HONOR_NANS (mode))
> >  return reverse_condition_maybe_unordered (code);
> > 
> > to make this work?
> 
> Maybe, though I'd feel safer about trying it only in gcc 8.  I can certainly
> test such a change on a couple of targets.  It would not be sufficient, we'd
> either need to also reverse_condition_maybe_unordered for the UN* codes
> we don't handle yet, or break so that we perhaps reach this spot.
> 
> Updated (not yet tested) version of the patch is below.

Now successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk?

> 2017-02-23  Jakub Jelinek  
> 
>   PR tree-optimization/79389
>   * ifcvt.c (struct noce_if_info): Add rev_cond field.
>   (noce_reversed_cond_code): New function.
>   (noce_emit_store_flag): Use rev_cond if non-NULL instead of
>   reversed_comparison_code.  Formatting fix.
>   (noce_try_store_flag): Test rev_cond != NULL in addition to
>   reversed_comparison_code.
>   (noce_try_store_flag_constants): Likewise.
>   (noce_try_store_flag_mask): Likewise.
>   (noce_try_addcc): Use rev_cond if non-NULL instead of
>   reversed_comparison_code.
>   (noce_try_cmove_arith): Likewise.  Formatting fixes.
>   (noce_try_minmax, noce_try_abs): Clear rev_cond.
>   (noce_find_if_block): Initialize rev_cond.
>   (find_cond_trap): Call noce_get_condition with then_bb == trap_bb
>   instead of false as last argument never attempt to reverse it
>   afterwards.

Jakub

Re: [PATCH][PR tree-optimization/79578] Use operand_equal_p rather than pointer equality for base test

2017-02-23 Thread Jeff Law


On 02/23/2017 02:02 AM, Richard Biener wrote:


diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 84c0b11..a82e164 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -176,7 +176,7 @@ clear_bytes_written_by (sbitmap live_bytes, gimple
*stmt, ao_ref *ref)
   /* Verify we have the same base memory address, the write
  has a known size and overlaps with REF.  */
   if (valid_ao_ref_for_dse ()
-  && write.base == ref->base
+  && operand_equal_p (write.base, ref->base, 0)


As you've identified size and offset match you are really interested
in comparing the base addresses and thus should use OEP_ADDRESS_OF.
I pondered that, but (perhaps incorrectly) thought that OEP_ADDRESS_OF 
was an optimization and that a more simple o_e_p with no flags was safer.


I'm happy to change it, particularly if it's a correctness issue (in 
which case I think we've designed a horrible API for o_e_p, but such is 
life).  In fact, I've already bootstrapped and regression tested that 
change.


jeff

Re: [PATCH] restore -Wunused-variable on a typedef'd variable in a function template (PR 79548)

2017-02-23 Thread Martin Sebor


On 02/22/2017 05:43 PM, Jason Merrill wrote:

On Wed, Feb 22, 2017 at 3:44 PM, Martin Sebor  wrote:

On 02/22/2017 11:02 AM, Jason Merrill wrote:


On Tue, Feb 21, 2017 at 4:27 PM, Martin Sebor  wrote:


Ah, I see, your patch changes attribute unused handling for local
variables from tracking TREE_USED to lookup_attribute.  I'm not
opposed to this change, but I'd like to understand why the TREE_USED
handling wasn't working.




In the test case in the bug:

  template 
  void g ()
  {
T t;   // warning, ok

typedef T U;
U u;   // no warning, bug
  }

  template void g();

both TREE_USED(T) and TREE_USED(t) are zero in initialize_local_var
so the function doesn't set already_used or TREE_USED(t) and we get
a warning as expected.

But because TREE_USED(U) is set to 1 in maybe_record_typedef_use
(to implement -Wunused-local-typedefs),  initialize_local_var then
sets already_used to 1 and later also TREE_USED(u) to 1, suppressing
the warning.


Hmm, I would expect maybe_record_typedef_use to set TREE_USED on the
TYPE_DECL, not on the *_TYPE which initialize_local_var checks.


That's what it does:

  void
  maybe_record_typedef_use (tree t)
  {
if (!is_typedef_decl (t))
  return;

TREE_USED (t) = true;
  }

Here, t is a TYPE_DECL of the typedef U.


Yes.  It is a TYPE_DECL, not a type.


It has the effect of TREE_USED (TREE_TYPE (decl)) being set in
initialize_local_var.  The TREE_USED bit on the type (i.e., on
TREE_TYPE(decl) where decl is the u in the test case above) is
set when the function template is instantiated, in
set_underlying_type called from tsubst_decl.


Aha!  That seems like the problem.  Does removing that copy of
TREE_USED from the decl to the type break anything?


As far as I can see it breaks just gcc.dg/unused-3.c which is:

  typedef short unused_type __attribute__ ((unused));

  void f (void)
  {
unused_type y;
  }

The reason is that the type is TREE_USED isn't set on the variable
(I didn't look where the bit from the type is copied into the var).,

This could be fixed by also looking at the type's attribute in this
case.  That would seem to me like the better approach because it
more faithfully represents what's going on in the code.  I.e., that
the variable is in fact unused, but that its type says not to complain
about it.

Let me know what you prefer.

While looking into this I noticed that regardless of this change,
the C++ front end warns on the following modified version of the
test case:

  static unused_type x;   // bogus -Wunused-variable

  void f (void)
  {
static unused_type y;   // bogus -Wunused-variable
  }

I opened bug 79695 for it.

Martin

Re: [PATCH, doc]: Mention that -mfpmath=sse is the default on 32bit x86 w/ SSE2 and -ffast-math

2017-02-23 Thread Uros Bizjak

On Thu, Feb 23, 2017 at 7:07 PM, Martin Sebor  wrote:

> A minor grammatical nit:
>
>   +This is the default choice for most of x86-32 targets.
>
> "for most x86-32 targets" is correct unless the targets are some
> specific subset, in which case "most of the [previously mentioned]
> x86-32 targets" would work.

Maybe we can say "This is the default choice for non-Darwin x86-32
targets." here?

And further extend:
"This is the default choice for the x86-64 compiler, Darwin x86-32
targets, and the default choice for x86-32 targets with SSE2
instruction set when @option{-ffast-math} is enabled."

Uros.

[PATCH] Ensure includes

2017-02-23 Thread Jonathan Wakely


Lars noticed that  didn't include  as
it's supposed to. I checked the other TS headers and they're OK.

* include/experimental/iterator: Include .
* testsuite/experimental/iterator/requirements.cc: Check for contents
of .

Tested powerpc64le-linux, committed to trunk.


commit 42ee67b0131274b1c3405d6cab1d83d38dc17c55
Author: Jonathan Wakely 
Date:   Thu Feb 23 17:48:27 2017 +

Ensure  includes 

	* include/experimental/iterator: Include .
	* testsuite/experimental/iterator/requirements.cc: Check for contents
	of .

diff --git a/libstdc++-v3/include/experimental/iterator b/libstdc++-v3/include/experimental/iterator
index e8ecb34..8a8395d 100644
--- a/libstdc++-v3/include/experimental/iterator
+++ b/libstdc++-v3/include/experimental/iterator
@@ -39,10 +39,9 @@
 # include 
 #else
 
-#include 
+#include 
 #include 
-#include 
-#include 
+#include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
diff --git a/libstdc++-v3/testsuite/experimental/iterator/requirements.cc b/libstdc++-v3/testsuite/experimental/iterator/requirements.cc
index 8a8e79e..5fa5872 100644
--- a/libstdc++-v3/testsuite/experimental/iterator/requirements.cc
+++ b/libstdc++-v3/testsuite/experimental/iterator/requirements.cc
@@ -19,7 +19,7 @@
 
 // This is a compile-only test with minimal includes
 #include 
-#include 
+#include  // No guarantee that  includes this!
 
 using namespace std::experimental;
 
@@ -55,3 +55,13 @@ tester ic;
 tester ww;
 tester iw;
 #endif
+
+std::ostream& os();
+
+// Ensure that contents of  are defined by :
+std::reverse_iterator ii;
+std::move_iterator mi;
+std::istream_iterator isi;
+std::ostream_iterator osi(os());
+std::istreambuf_iterator isbi;
+std::ostreambuf_iterator osbi(os());

Re: [PATCH, doc]: Mention that -mfpmath=sse is the default on 32bit x86 w/ SSE2 and -ffast-math

2017-02-23 Thread Martin Sebor


On 02/23/2017 04:09 AM, Uros Bizjak wrote:

Hello!

This patch documents a little gcc secret...

2017-02-23  Uros Bizjak  

* doc/invoke.texi (x86 Options, -mfpmath=sse): Mention that
-mfpmath=sse is the default also for x86-32 targets with SSE2
instruction set when @option{-ffast-math} is enabled

Bootstrapped on x86_64-linux-gnu.

Uros.



A minor grammatical nit:

  +This is the default choice for most of x86-32 targets.

"for most x86-32 targets" is correct unless the targets are some
specific subset, in which case "most of the [previously mentioned]
x86-32 targets" would work.

Martin

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-02-23 Thread Wilco Dijkstra

    

ping



From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type"

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-02-23 Thread Wilco Dijkstra


    

ping

From: Wilco Dijkstra
Sent: 17 January 2017 18:00
To: GCC Patches
Cc: nd; Kyrylo Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove Thumb-2 iordi_not patterns
    
After Bernd's DImode patch [1] almost all DImode operations are expanded
early (except for -mfpu=neon). This means the Thumb-2 iordi_notdi_di
patterns are no longer used - the split ORR and NOT instructions are merged
into ORN by Combine.  With -mfpu=neon the iordi_notdi_di patterns are used
on Thumb-2, and after this patch the orndi3_neon pattern matches instead
(which still emits ORN).  After this there are no Thumb-2 specific DImode 
patterns.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/thumb2.md (iordi_notdi_di): Remove pattern.
    (iordi_notzesidi_di): Likewise.
    (iordi_notdi_zesidi): Likewise.
    (iordi_notsesidi_di): Likewise.

--

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
2e7580f220eae1524fef69719b1796f50f5cf27c..91471d4650ecae4f4e87b549d84d11adf3014ad2
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1434,103 +1434,6 @@
    (set_attr "type" "alu_sreg")]
 )
 
-; Constants for op 2 will never be given to these patterns.
-(define_insn_and_split "*iordi_notdi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 1 "s_register_operand" "0,r"))
-   (match_operand:DI 2 "s_register_operand" "r,0")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2)))
-   (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[5] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notzesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (zero_extend:DI
-    (match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,?r")))]
-  "TARGET_THUMB2"
-  "#"
-  ; (not (zero_extend...)) means operand0 will always be 0x
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (const_int -1))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "4,8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notdi_zesidi"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r"))
-   (zero_extend:DI
-    (match_operand:SI 1 "s_register_operand" "r,r"]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (not:SI (match_dup 4)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[4] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notsesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (sign_extend:DI
-    (match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,r")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (ior:SI (not:SI
-   (ashiftrt:SI (match_dup 2) (const_int 31)))
-  (match_dup 4)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
 (define_insn "*orsi_notsi_si"
   [(set (match_operand:SI 0

Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-02-23 Thread Wilco Dijkstra


    

ping


From: Wilco Dijkstra
Sent: 17 January 2017 15:14
To: Richard Earnshaw; GCC Patches; James Greenhalgh
Cc: nd
Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit
    
Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a
declaration is an integer. So the question is whether we should allow
largish offsets outside of the bounds of symbols (v1), no offsets (this 
version), or
small offsets (small negative and positive offsets just outside a symbol are 
common).
The only thing we can't allow is any offset like we currently do...

In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations.
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like _char + 0xff00.

To avoid this, limit the offset to +/-1GB so that the symbol needs to be within 
a
3GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.
For symbols with a defined size, limit the offset to be within the size of the 
symbol.


ChangeLog:
2017-01-17  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.c (aarch64_classify_symbol):
    Apply reasonable limit to symbol offsets.

    testsuite/
    * gcc.target/aarch64/symbol-range.c (foo): Set new limit.
    * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e8d65ead95a3c5730c2ffe64a9e057779819f7b4..f1d54e332dc1cf1ef0bc4b1e46b0ebebe1c4cea4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9809,6 +9809,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
   if (aarch64_tls_symbol_p (x))
 return aarch64_classify_tls_symbol (x);
 
+  const_tree decl = SYMBOL_REF_DECL (x);
+
   switch (aarch64_cmodel)
 {
 case AARCH64_CMODEL_TINY:
@@ -9817,25 +9819,45 @@ aarch64_classify_symbol (rtx x, rtx offset)
  we have no way of knowing the address of symbol at compile time
  so we can't accurately say if the distance between the PC and
  symbol + offset is outside the addressible range of +/-1M in the
-    TINY code model.  So we rely on images not being greater than
-    1M and cap the offset at 1M and anything beyond 1M will have to
-    be loaded using an alternative mechanism.  Furthermore if the
-    symbol is a weak reference to something that isn't known to
-    resolve to a symbol in this module, then force to memory.  */
+    TINY code model.  So we limit the maximum offset to +/-64KB and
+    assume the offset to the symbol is not larger than +/-(1M - 64KB).
+    Furthermore force to memory if the symbol is a weak reference to
+    something that doesn't resolve to a symbol in this module.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (INTVAL (offset), -0x1, 0x1))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_TINY_ABSOLUTE;
 
 case AARCH64_CMODEL_SMALL:
   /* Same reasoning as the tiny code model, but the offset cap here is
-    4G.  */
+    1G, allowing +/-3G for the offset to the symbol.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (INTVAL (offset), -0x4000, 0x4000))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_SMALL_ABSOLUTE;
 
 case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
---

Re: [PATCH][ARM] Fix ldrd offsets

2017-02-23 Thread Wilco Dijkstra


    
ping


From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets
    
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091.  This reduces the number of addressing instructions
when using DI mode operations (such as in PR77308).

Bootstrap & regress OK.

ChangeLog:
2015-11-03  Wilco Dijkstra  

    gcc/
    * config/arm/arm.c (arm_legitimate_index_p): Add comment.
    (thumb2_legitimate_index_p): Use correct range for DI/DF mode.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
3c4c7042d9c2101619722b5822b3d1ca37d637b9..5d12cf9c46c27d60a278d90584bde36ec86bb3fe
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7486,6 +7486,8 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
 {
   HOST_WIDE_INT val = INTVAL (index);
 
+ /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+    If vldr is selected it uses arm_coproc_mem_operand.  */
   if (TARGET_LDRD)
 return val > -256 && val < 256;
   else
@@ -7613,11 +7615,13 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
index, int strict_p)
   if (code == CONST_INT)
 {
   HOST_WIDE_INT val = INTVAL (index);
- /* ??? Can we assume ldrd for thumb2?  */
- /* Thumb-2 ldrd only has reg+const addressing modes.  */
- /* ldrd supports offsets of +-1020.
-    However the ldr fallback does not.  */
- return val > -256 && val < 256 && (val & 3) == 0;
+ /* Thumb-2 ldrd only has reg+const addressing modes.
+    Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+    If vldr is selected it uses arm_coproc_mem_operand.  */
+ if (TARGET_LDRD)
+   return IN_RANGE (val, -1020, 1020) && (val & 3) == 0;
+ else
+   return IN_RANGE (val, -255, 4095 - 4);
 }
   else
 return 0;

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-02-23 Thread Wilco Dijkstra


    

ping


From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic
    
Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
increasing codesize.  Given 4 works well for Thumb-2, use the same limit for ARM
for consistency. 

ChangeLog:
2016-11-04  Wilco Dijkstra  

    * config/arm/arm.c (arm_option_params_internal): Improve setting of
    max_insns_skipped.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2901,20 +2901,12 @@ arm_option_params_internal (void)
   targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
 }
 
-  if (optimize_size)
-    {
-  /* If optimizing for size, bump the number of instructions that we
- are prepared to conditionally execute (even on a StrongARM).  */
-  max_insns_skipped = 6;
+  /* Increase the number of conditional instructions with -Os.  */
+  max_insns_skipped = optimize_size ? 4 : current_tune->max_insns_skipped;
 
-  /* For THUMB2, we limit the conditional sequence to one IT block.  */
-  if (TARGET_THUMB2)
-    max_insns_skipped = arm_restrict_it ? 1 : 4;
-    }
-  else
-    /* When -mrestrict-it is in use tone down the if-conversion.  */
-    max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
-  ? 1 : current_tune->max_insns_skipped;
+  /* For THUMB2, we limit the conditional sequence to one IT block.  */
+  if (TARGET_THUMB2)
+    max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
 }
 
 /* True if -mflip-thumb should next add an attribute for the default

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-02-23 Thread Wilco Dijkstra


ping



From: Wilco Dijkstra
Sent: 29 November 2016 11:05
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8
    
Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
applied only to one of the duplicates.

Bootstrap OK for ARM and Thumb-2 gnueabihf targets. OK for commit?

ChangeLog:
2016-11-29  Wilco Dijkstra  

    * config/arm/vfp.md (movdi_vfp): Merge changes from movdi_vfp_cortexa8.
    * (movdi_vfp_cortexa8): Remove pattern.
--

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
2051f1018f1cbff9c5bf044e71304d78e615458e..a917aa625a7b15f6c9e2b549ab22e5219bb9b99c
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -304,9 +304,9 @@
 ;; DImode moves
 
 (define_insn "*movdi_vfp"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,q,q,m,w,r,w,w, 
Uv")
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
"=r,r,r,r,q,q,m,w,!r,w,w, Uv")
    (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,q,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune != TARGET_CPU_cortexa8
+  "TARGET_32BIT && TARGET_HARD_FLOAT
    && (   register_operand (operands[0], DImode)
    || register_operand (operands[1], DImode))
    && !(TARGET_NEON && CONST_INT_P (operands[1])
@@ -339,71 +339,25 @@
 }
   "
   [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1,4,5,6") (const_int 8)
+   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
   (eq_attr "alternative" "2") (const_int 12)
   (eq_attr "alternative" "3") (const_int 16)
+ (eq_attr "alternative" "4,5,6")
+  (symbol_ref "arm_count_output_move_double_insns 
(operands) * 4")
   (eq_attr "alternative" "9")
    (if_then_else
  (match_test "TARGET_VFP_SINGLE")
  (const_int 8)
  (const_int 4))]
   (const_int 4)))
+   (set_attr "predicable"    "yes")
    (set_attr "arm_pool_range" "*,*,*,*,1020,4096,*,*,*,*,1020,*")
    (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
    (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
+   (set (attr "ce_count") (symbol_ref "get_attr_length (insn) / 4"))
    (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,any")]
 )
 
-(define_insn "*movdi_vfp_cortexa8"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
"=r,r,r,r,r,r,m,w,!r,w,w, Uv")
-   (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,r,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune == TARGET_CPU_cortexa8
-    && (   register_operand (operands[0], DImode)
-    || register_operand (operands[1], DImode))
-    && !(TARGET_NEON && CONST_INT_P (operands[1])
-    && neon_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
-  "*
-  switch (which_alternative)
-    {
-    case 0: 
-    case 1:
-    case 2:
-    case 3:
-  return \"#\";
-    case 4:
-    case 5:
-    case 6:
-  return output_move_double (operands, true, NULL);
-    case 7:
-  return \"vmov%?\\t%P0, %Q1, %R1\\t%@ int\";
-    case 8:
-  return \"vmov%?\\t%Q0, %R0, %P1\\t%@ int\";
-    case 9:
-  return \"vmov%?.f64\\t%P0, %P1\\t%@ int\";
-    case 10: case 11:
-  return output_move_vfp (operands);
-    default:
-  gcc_unreachable ();
-    }
-  "
-  [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
-   (eq_attr "alternative" "2") (const_int 12)
-   (eq_attr "alternative" "3") (const_int 16)
-   (eq_attr "alternative" "4,5,6") 
-  (symbol_ref 
-   "arm_count_output_move_double_insns (operands) \
- * 4")]
-  (const_int 4)))
-   (set_attr "predicable"    "yes")
-   (set_attr "arm_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
-   (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
-   (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
-   (set (attr "ce_count") 
-   (symbol_ref "get_attr_length (insn) / 4"))
-   (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,any")]
- )
-
 ;; HFmode moves
 
 (define_insn "*movhf_vfp_fp16"

Re: [C++ Patch] PR 79361

2017-02-23 Thread Jason Merrill

Ok.

On Thu, Feb 23, 2017 at 6:02 AM, Paolo Carlini  wrote:
> Hi,
>
> in this error recovery regression, we ICE after (a lot after) a sensible
> diagnostic, when lower_function_body encounters an error_mark_node. I worked
> quite a bit on the issue, and, all in all, I propose to simply check the
> return value of duplicate_decls as called by register_specialization and
> bail out.
>
> In principle it may make sense to continue and, for example, also emit
> diagnostic about '= default' making sense only for special member functions
> - returning spec instead of error_mark_node would achieve that without
> regressions for the second testcase - but I'm not sure we want to do this
> kind of change right here right now together with fixing the ICE, because we
> do *not* emit additional diagnostic in the non-template case, eg for:
>
> void foo(int) {}
> void foo(int) = default;
>
> Tested x86_64-linux.
>
> Thanks, Paolo.
>
> /
>
>

Re: [PR 78140] Reuse same IPA bits and VR info

2017-02-23 Thread Martin Jambor

Hi,

On Wed, Feb 22, 2017 at 11:11:14AM +0100, Martin Jambor wrote:
> Hello,
> 
> this is a fix for PR 78140 which is about LTO WPA of Firefox taking
> 1GB memory more than gcc 6.
> 
> It works by reusing the ipa_bits and value_range that we previously
> had directly in jump functions and which are just too big to be
> allocated for each actual argument in all of Firefox.  Reusing is
> achieved by two hash table traits derived from ggc_cache_remove which
> apparently has been created just for this purpose and once I
> understood them they made my life a lot easier.  In future, I will
> have a look at applying this method to other parts of jump functions
> as well.
> 
> According to my measurements, the patch saves about 1.2 GB of memory.
> The problem is that some change last week (between revision 245382 and
> 245595) has more than invalidated this:
> 
>   | compiler| WPA mem (GB) |
>   |-+--|
>   | gcc 6 branch| 3.86 |
>   | trunk rev. 245382   | 5.21 |
>   | patched rev. 245382 | 4.06 |
>   | trunk rev. 245595   | 6.59 |
>   | patched rev. 245595 | 5.25 |
> 

I have found out that previously I was comparing a build with
--enable-gather-detailed-mem-stats with one configured without it,
which explains the difference.  So the further 1GB increase was
fortunately only a mistake in measurements, the real data (without
detailed memory statistics overhead) are below and the proposed patch
does fix lion's share of the gcc 7 memory consumption increase:

  | compiler| wpa mem (KB) | wpa mem (GB) |
  |-+--+--|
  | gcc 6 branch|  4046451 | 3.86 |
  | trunk rev. 245382   |  5468227 | 5.21 |
  | patched rev. 245382 |  4255799 | 4.06 |
  | trunk rev. 245595   |  5452515 | 5.20 |
  | patched rev. 245595 |  4240379 | 4.04 |

I suppose we can look for the remaining ~200MB after the patch is
approved.

Sorry for the noise,

Martin


> (I have verified this by Martin's way of measuring things.)  I will
> try to bisect what commit has caused the increase.  Still, the patch
> helps a lot.
> 
> There is one thing in the patch that intrigues me, I do not understand
> why I had to mark value_range with GTY((for_user)) - as opposed to
> just GTY(()) that was there before - whereas ipa_bits does not need
> it.  If anyone could enlighten me, that would be great.  But I suppose
> this is not an indication of anything being wrong under the hood.
> 
> I have bootstrapped and LTO-bootstrapped the patch on x86_64-linux and
> also bootstrapped (C, C++ and Fortran) on an aarch64 and i686 cfarm
> machine.  I have also LTO-built Firefox with the patch and used it to
> browse for a while and it seemed fine.
> 
> OK for trunk?
> 
> Thanks,
> 
> Martin
> 
> 
> 2017-02-20  Martin Jambor  
> 
>   PR lto/78140
>   * ipa-prop.h (ipa_bits): Removed field known.
>   (ipa_jump_func): Removed field vr_known.  Changed fields bits and m_vr
>   to pointers.  Adjusted their comments to warn about their sharing.
>   (ipcp_transformation_summary): Change bits to a vector of pointers.
>   (ipa_check_create_edge_args): Moved to ipa-prop.c, declare.
>   (ipa_get_ipa_bits_for_value): Declare.
>   * tree-vrp.h (value_range): Mark as GTY((for_user)).
>   * ipa-prop.c (ipa_bit_ggc_hash_traits): New.
>   (ipa_bits_hash_table): Likewise.
>   (ipa_vr_ggc_hash_traits): Likewise.
>   (ipa_vr_hash_table): Likewise.
>   (ipa_print_node_jump_functions_for_edge): Adjust for bits and m_vr
>   being pointers and vr_known being removed.
>   (ipa_set_jf_unknown): Likewise.
>   (ipa_get_ipa_bits_for_value): New function.
>   (ipa_set_jfunc_bits): Likewise.
>   (ipa_get_value_range): New overloaded functions.
>   (ipa_set_jfunc_vr): Likewise.
>   (ipa_compute_jump_functions_for_edge): Use the above functions to
>   construct bits and vr parts of jump functions.
>   (ipa_check_create_edge_args): Move here from ipa-prop.h, also allocate
>   ipa_bits_hash_table and ipa_vr_hash_table if they do not already
>   exist.
>   (ipcp_grow_transformations_if_necessary): Also allocate
>   ipa_bits_hash_table and ipa_vr_hash_table if they do not already
>   exist.
>   (ipa_node_params_t::duplicate): Do not copy bits, just pointers to
>   them.  Fix too long lines.
>   (ipa_write_jump_function): Adjust for bits and m_vr being pointers and
>   vr_known being removed.
>   (ipa_read_jump_function): Use new setter functions to construct bits
>   and vr parts of jump functions or set them to NULL.
>   (write_ipcp_transformation_info): Adjust for bits being pointers.
>   (read_ipcp_transformation_info): Likewise.
>   (ipcp_update_bits): Likewise.  Fix excessively long lines a trailing
>   space.
>   Include

[gomp4] DEV-PHASE change

2017-02-23 Thread Tom de Vries


[ was: r241221 [1/2] - in /branches/gomp-4_0-branch:  ]

On 16/10/16 22:13, tschwi...@gcc.gnu.org wrote:

Author: tschwinge
Date: Sun Oct 16 20:13:18 2016
New Revision: 241221

URL: https://gcc.gnu.org/viewcvs?rev=241221=gcc=rev
Log:
svn merge -r 235033:240831 svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-6-branch

Modified:
branches/gomp-4_0-branch/gcc/DEV-PHASE


Hi Thomas,

In this merge, the DEV-PHASE file was cleared:
...
$ svn diff -c 241221 DEV-PHASE
Index: DEV-PHASE
===
--- DEV-PHASE (revision 241220)
+++ DEV-PHASE (revision 241221)
@@ -1 +0,0 @@
-experimental
...

Consequently, when bootstrapping the branch, -Werror is not enabled by 
default (as we can see here in configure):

...
else
  if test -d ${srcdir}/gcc && test x"`cat $srcdir/gcc/DEV-PHASE`" = 
xexperimental; then

  enable_werror=yes
else
  enable_werror=no
fi
...

Was this change unintentionally merged from the 6 branch?

Thanks,
- Tom

Re: [PATCH] PR 68749: S/390: Disable ifcvt-4.c for -m31.

2017-02-23 Thread Andreas Krebbel

On 02/15/2017 12:19 PM, Dominik Vogt wrote:
> The attached patch disables the test ifcvt-4.c on s390 and on
> s390x with -31, and adds -march=z196 for s390x.  It should no
> longer fail on s390 and s390x.
> 
> Tested on s390x biarch.
> 
> Ciao
> 
> Dominik ^_^  ^_^
> 
Applied. Thanks!

-Andreas-

[PATCH] PR79389, path-splitting

2017-02-23 Thread Richard Biener


This PR shows another defect in path-splittings cost model which the
following patch tries to improve further.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-02-23  Richard Biener  

PR tree-optimization/79389
* gimple-ssa-split-paths.c (is_feasible_trace): Verify more
properly that a threading opportunity exists.  Detect conditional
copy/constant propagation opportunities.

* gcc.dg/tree-ssa/split-path-10.c: New testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/split-path-10.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/split-path-10.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/split-path-10.c   (working copy)
@@ -0,0 +1,49 @@
+/* PR tree-optimization/79389  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-split-paths-details" } */
+
+typedef struct
+{
+  int m[17];
+  int seed; 
+  int i;
+  int j;
+  int haveRange;
+  double left;
+  double right;
+  double width;
+}
+Random_struct, *Random;
+
+Random new_Random_seed(int seed);
+double Random_nextDouble(Random R);
+void Random_delete(Random R);
+
+static const int SEED = 113;
+
+double MonteCarlo_integrate(int Num_samples)
+{
+
+
+  Random R = new_Random_seed(SEED);
+
+
+  int under_curve = 0;
+  int count;
+
+  for (count=0; count

Re: PR79286, ira combine_and_move_insns in loops

2017-02-23 Thread Jeff Law


On 02/23/2017 02:52 AM, Alan Modra wrote:

On Thu, Feb 23, 2017 at 12:46:17AM -0700, Jeff Law wrote:

And thus we keep the equivalence.  Ultimately may_trap_p considers a PIC
memory reference as non-trapping.


Which is obviously a bug, because this access is segfaulting..
Not that I want to poke at the bug.  :)
They're explicitly called out as not causing faults.  I'm not sure I 
want to revisit that history right now, whatever it is,  It's clearly 
wrong IMHO though.


Jeff

Re: [gomp4] add -finform-parallelism

2017-02-23 Thread Cesar Philippidis

On 02/22/2017 12:17 AM, Thomas Schwinge wrote:
> On Mon, 20 Feb 2017 20:42:59 -0800, Cesar Philippidis 
>  wrote:

>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
> 
>> +/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
>> +   children.  */
>> +
>> +static void
>> +inform_oacc_loop (oacc_loop *loop)
>> +{
>> +  const char *seq = loop->mask == 0 ? " SEQ" : "";
>> +  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
>> +? " GANG" : "";
>> +  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
>> +? " WORKER" : "";
>> +  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
>> +? " VECTOR" : "";
>> +
>> +  inform (loop->loc, "ACC LOOP%s%s%s%s", seq, gang, worker, vector);
> 
> Likewise.

This is now lower case.

> Per
> <8737f68y3r.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/8737f68y3r.fsf@euler.schwinge.homeip.net>
> I'm suggesting this to be done a bit differently: instead of "inform",
> this would then use the appropriate "-fopt-info-note-omp" option group
> output.

Thank you for finding that. I see that you want to rename
OPTGROUP_OPENMP to OPTGROUP_OMP. In order to keep gomp-4_0-branch
somewhat consistent with trunk, I keep the original OPENMP name. We can
fix that later.

> If that's not yet there, possibly there could be some new flag added for
> "-fopt-info" to display the "rich" location, which will also print the
> original source code?

No it doesn't support rich locations yet. But that's a good idea to
support it. But we can add that later. For the time being, the line
number shall suffice.

Cesar

2017-02-23  Cesar Philippidis  

	gcc/
	* omp-low.c (inform_oacc_loop): New function.
	(execute_oacc_device_lower): Use it to display loop parallelism.

	gcc/testsuite/
	* c-c++-common/goacc/note-parallelism.c: New test.
	* gfortran.dg/goacc/note-parallelism.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b25fe27..40f2003 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -20399,6 +20399,30 @@ debug_oacc_loop (oacc_loop *loop)
   dump_oacc_loop (stderr, loop, 0);
 }
 
+/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
+   children.  */
+
+static void
+inform_oacc_loop (oacc_loop *loop)
+{
+  const char *seq = loop->mask == 0 ? " seq" : "";
+  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
+? " gang" : "";
+  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
+? " worker" : "";
+  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+? " vector" : "";
+
+  dump_printf_loc (MSG_NOTE, loop->loc,
+		   "Detected parallelism \n", seq, gang,
+		   worker, vector);
+
+  if (loop->child)
+inform_oacc_loop (loop->child);
+  if (loop->sibling)
+inform_oacc_loop (loop->sibling);
+}
+
 /* DFS walk of basic blocks BB onwards, creating OpenACC loop
structures as we go.  By construction these loops are properly
nested.  */
@@ -21069,6 +21093,8 @@ execute_oacc_device_lower ()
   dump_oacc_loop (dump_file, loops, 0);
   fprintf (dump_file, "\n");
 }
+  if (dump_enabled_p () && loops->child)
+inform_oacc_loop (loops->child);
 
   /* Offloaded targets may introduce new basic blocks, which require
  dominance information to update SSA.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
new file mode 100644
index 000..ddbce99
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -0,0 +1,76 @@
+/* Test the output of -fopt-info-not-openmp.  */
+
+/* { dg-additional-options "-fopt-info-note-openmp" } */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc parallel loop seq
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop worker
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop vector
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang vector
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang worker
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop worker vector
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang worker vector
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+for (y = 0; y < 10; y++)
+  ;
+
+#pragma acc parallel loop gang
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker
+for (y = 0; y < 10; y++)
+#pragma acc loop vector
+  for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
+
+/* { dg-message "note-parallelism.c:10:9: note: Detected parallelism " "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:14:9: note: Detected parallelism " "" { target *-*-* } 0 } */
+/* { dg-message "note-parallelism.c:18:9: note: Detected parallelism " "" { target *-*-* } 0 } */
+/*

Re: [PATCH PR79663]Only reversely combine refs for ZERO length chains in predcom

2017-02-23 Thread Richard Biener

On Thu, Feb 23, 2017 at 12:24 PM, Bin Cheng  wrote:
> Hi,
> This patch resolves spec2k/mgrid regression as reported by PR79663.  Root 
> cause has been
> described thoroughly in comment #1/#2 of 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79663
> This patch handles ZERO/non-ZERO length chains differently and only reversely 
> combines refs
> for ZERO length chains.  It also does small improvement with in place list 
> swap.
> Bootstrap and test on x86_64 and AArch64, is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
>
> 2017-01-21  Bin Cheng  
>
> PR tree-optimization/79663
> * tree-predcom.c (combine_chains): Process refs in reverse order
> only for ZERO length chains, and add explaining comment.

[gomp4] backport OPTGROUP_OPENMP

2017-02-23 Thread Cesar Philippidis

This patch backports OPTGROUP_OPENMP from trunk. The patch was
originally posted by Martin Jambor here
.

Thomas, I know that you have a patch to rename this to OPTGROUP_OMP. But
in order to keep things consistent with trunk, I kept it
OPTGROUP_OPENMP. If you want, I can backport your patch once it gets
merged into trunk.

I applied this patch to gomp-4_0-branch.

Cesar
2017-02-23  Cesar Philippidis  

	Backport from trunk:
	2016-11-23  Martin Jambor  
		Martin Liska  

	gcc/
	* doc/optinfo.texi (Optimization groups): Document OPTGROUP_OPENMP.
	* dumpfile.c (optgroup_options): Added OPTGROUP_OPENMP.
	* dumpfile.h (OPTGROUP_OPENMP): Define.
	* omp-low.c (pass_data_expand_omp): Changed optinfo_flags to
	OPTGROUP_OPENMP.
	(pass_data_expand_omp_ssa): Likewise.
	(pass_data_omp_device_lower): Likewsie.
	(pass_data_lower_omp): Likewise.
	(pass_data_diagnose_omp_blocks): Likewise.
	(pass_data_oacc_device_lower): Likewise.
	(pass_data_omp_target_link): Likewise.


diff --git a/gcc/doc/optinfo.texi b/gcc/doc/optinfo.texi
index 3c8fdba..20ca560 100644
--- a/gcc/doc/optinfo.texi
+++ b/gcc/doc/optinfo.texi
@@ -59,6 +59,9 @@ Loop optimization passes. Enabled by @option{-loop}.
 @item OPTGROUP_INLINE
 Inlining passes. Enabled by @option{-inline}.
 
+@item OPTGROUP_OPENMP
+OpenMP passes. Enabled by @option{-openmp}.
+
 @item OPTGROUP_VEC
 Vectorization passes. Enabled by @option{-vec}.
 
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index 144e371..f2430f3 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -136,6 +136,7 @@ static const struct dump_option_value_info optgroup_options[] =
   {"ipa", OPTGROUP_IPA},
   {"loop", OPTGROUP_LOOP},
   {"inline", OPTGROUP_INLINE},
+  {"openmp", OPTGROUP_OPENMP},
   {"vec", OPTGROUP_VEC},
   {"optall", OPTGROUP_ALL},
   {NULL, 0}
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index c168cbf..72f696b 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -97,7 +97,8 @@ enum tree_dump_index
 #define OPTGROUP_LOOP(1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE  (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC (1 << 4)   /* Vectorization passes */
-#define OPTGROUP_OTHER   (1 << 5)   /* All other passes */
+#define OPTGROUP_OPENMP  (1 << 5)	/* OpenMP specific transformations */
+#define OPTGROUP_OTHER   (1 << 6)   /* All other passes */
 #define OPTGROUP_ALL	 (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
   | OPTGROUP_VEC | OPTGROUP_OTHER)
 
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6ea8738..f5e66e0 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -14771,7 +14771,7 @@ const pass_data pass_data_expand_omp =
 {
   GIMPLE_PASS, /* type */
   "ompexp", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -14818,7 +14818,7 @@ const pass_data pass_data_expand_omp_ssa =
 {
   GIMPLE_PASS, /* type */
   "ompexpssa", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg | PROP_ssa, /* properties_required */
   PROP_gimple_eomp, /* properties_provided */
@@ -19087,7 +19087,7 @@ const pass_data pass_data_lower_omp =
 {
   GIMPLE_PASS, /* type */
   "omplower", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   PROP_gimple_lomp, /* properties_provided */
@@ -19569,7 +19569,7 @@ const pass_data pass_data_diagnose_omp_blocks =
 {
   GIMPLE_PASS, /* type */
   "*diagnose_omp_blocks", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
   0, /* properties_provided */
@@ -21250,7 +21250,7 @@ const pass_data pass_data_oacc_device_lower =
 {
   GIMPLE_PASS, /* type */
   "oaccdevlow", /* name */
-  OPTGROUP_NONE, /* optinfo_flags */
+  OPTGROUP_OPENMP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
   0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
@@ -21295,7 +21295,7 @@ const pass_data pass_data_omp_target_link =
 {
   GIMPLE_PASS,			/* type */
   "omptargetlink",		/* name */
-  OPTGROUP_NONE,		/* optinfo_flags */
+  OPTGROUP_OPENMP,		/* optinfo_flags */
   TV_NONE,			/* tv_id */
   PROP_ssa,			/* properties_required */
   0,/* properties_provided */

Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Jakub Jelinek

On Thu, Feb 23, 2017 at 03:07:14PM +0100, Jakub Jelinek wrote:
> On Thu, Feb 23, 2017 at 02:47:11PM +0100, Bernd Schmidt wrote:
> > On 02/23/2017 02:36 PM, Jakub Jelinek wrote:
> > > and both UNLT and GE can be reversed.  But if the arguments of the 
> > > condition
> > > are canonicalized, we run into:
> > >   /* Test for an integer condition, or a floating-point comparison
> > >  in which NaNs can be ignored.  */
> > >   if (CONST_INT_P (arg0)
> > >   || (GET_MODE (arg0) != VOIDmode
> > >   && GET_MODE_CLASS (mode) != MODE_CC
> > >   && !HONOR_NANS (mode)))
> > > return reverse_condition (code);
> > > and thus always return UNKNOWN.
> > 
> > So... do you think we could add (in gcc-8, probably, although if it fixes
> > this regression...)
> > 
> >else if (GET_MODE (arg0) != VOIDmode
> > && GET_MODE_CLASS (mode) != MODE_CC
> > && HONOR_NANS (mode))
> >  return reverse_condition_maybe_unordered (code);
> > 
> > to make this work?
> 
> Maybe, though I'd feel safer about trying it only in gcc 8.  I can certainly
> test such a change on a couple of targets.  It would not be sufficient, we'd
> either need to also reverse_condition_maybe_unordered for the UN* codes
> we don't handle yet, or break so that we perhaps reach this spot.

This stuff has been introduced with
http://gcc.gnu.org/ml/gcc-patches/2001-01/msg00454.html
where the initial submission indeed just returned
reverse_condition_maybe_unordered for UNLT etc.
but then Richard complained about that:
http://gcc.gnu.org/ml/gcc-patches/2001-01/msg00470.html

By keeping jump.c as is and just changing ifcvt.c we allow the targets
to say what exactly they do support and what is and what isn't reversible
through REVERSIBLE_CC_MODE and REVERSE_CONDITION macros.

Jakub

[PATCH, wwwdocs] Update AArch64 entry in readings.html

2017-02-23 Thread Richard Earnshaw (lists)

This patch tweaks the wording in the entry for AArch64 and also adds a
Manufacturer: entry similar to that for ARM.

Committed.

R.
Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.261
diff -u -r1.261 readings.html
--- readings.html   22 Feb 2017 11:22:44 -  1.261
+++ readings.html   22 Feb 2017 11:43:38 -
@@ -60,9 +60,10 @@
 
 
  AArch64
-  The 64-bit execution state introduced in the ARMv8-A architecture 
profile.
-  http://infocenter.arm.com/help/index.jsp;>
-   ARM Documentation
+  The 64-bit execution state of the ARM Architecture, first introduced
+  by the ARMv8-A architecture.
+  Manufacturer: Various, by license from ARM.
+  http://infocenter.arm.com/help/index.jsp;>ARM 
Documentation
  
 
  andes (nds32)

Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Jakub Jelinek

On Thu, Feb 23, 2017 at 02:47:11PM +0100, Bernd Schmidt wrote:
> On 02/23/2017 02:36 PM, Jakub Jelinek wrote:
> > and both UNLT and GE can be reversed.  But if the arguments of the condition
> > are canonicalized, we run into:
> >   /* Test for an integer condition, or a floating-point comparison
> >  in which NaNs can be ignored.  */
> >   if (CONST_INT_P (arg0)
> >   || (GET_MODE (arg0) != VOIDmode
> >   && GET_MODE_CLASS (mode) != MODE_CC
> >   && !HONOR_NANS (mode)))
> > return reverse_condition (code);
> > and thus always return UNKNOWN.
> 
> So... do you think we could add (in gcc-8, probably, although if it fixes
> this regression...)
> 
>else if (GET_MODE (arg0) != VOIDmode
> && GET_MODE_CLASS (mode) != MODE_CC
> && HONOR_NANS (mode))
>  return reverse_condition_maybe_unordered (code);
> 
> to make this work?

Maybe, though I'd feel safer about trying it only in gcc 8.  I can certainly
test such a change on a couple of targets.  It would not be sufficient, we'd
either need to also reverse_condition_maybe_unordered for the UN* codes
we don't handle yet, or break so that we perhaps reach this spot.

Updated (not yet tested) version of the patch is below.

2017-02-23  Jakub Jelinek  

PR tree-optimization/79389
* ifcvt.c (struct noce_if_info): Add rev_cond field.
(noce_reversed_cond_code): New function.
(noce_emit_store_flag): Use rev_cond if non-NULL instead of
reversed_comparison_code.  Formatting fix.
(noce_try_store_flag): Test rev_cond != NULL in addition to
reversed_comparison_code.
(noce_try_store_flag_constants): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_addcc): Use rev_cond if non-NULL instead of
reversed_comparison_code.
(noce_try_cmove_arith): Likewise.  Formatting fixes.
(noce_try_minmax, noce_try_abs): Clear rev_cond.
(noce_find_if_block): Initialize rev_cond.
(find_cond_trap): Call noce_get_condition with then_bb == trap_bb
instead of false as last argument never attempt to reverse it
afterwards.

--- gcc/ifcvt.c.jj  2017-02-22 22:32:34.411724499 +0100
+++ gcc/ifcvt.c 2017-02-23 14:45:54.011715821 +0100
@@ -777,6 +777,9 @@ struct noce_if_info
   /* The jump condition.  */
   rtx cond;
 
+  /* Reversed jump condition.  */
+  rtx rev_cond;
+
   /* New insns should be inserted before this one.  */
   rtx_insn *cond_earliest;
 
@@ -843,6 +846,17 @@ static int noce_try_minmax (struct noce_
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
 
+/* Return the comparison code for reversed condition for IF_INFO,
+   or UNKNOWN if reversing the condition is not possible.  */
+
+static inline enum rtx_code
+noce_reversed_cond_code (struct noce_if_info *if_info)
+{
+  if (if_info->rev_cond)
+return GET_CODE (if_info->rev_cond);
+  return reversed_comparison_code (if_info->cond, if_info->jump);
+}
+
 /* Return TRUE if SEQ is a good candidate as a replacement for the
if-convertible sequence described in IF_INFO.  */
 
@@ -888,6 +902,14 @@ noce_emit_store_flag (struct noce_if_inf
   if (if_info->then_else_reversed)
reversep = !reversep;
 }
+  else if (reversep
+  && if_info->rev_cond
+  && general_operand (XEXP (if_info->rev_cond, 0), VOIDmode)
+  && general_operand (XEXP (if_info->rev_cond, 1), VOIDmode))
+{
+  cond = if_info->rev_cond;
+  reversep = false;
+}
 
   if (reversep)
 code = reversed_comparison_code (cond, if_info->jump);
@@ -898,7 +920,7 @@ noce_emit_store_flag (struct noce_if_inf
   && (normalize == 0 || STORE_FLAG_VALUE == normalize))
 {
   rtx src = gen_rtx_fmt_ee (code, GET_MODE (x), XEXP (cond, 0),
-   XEXP (cond, 1));
+   XEXP (cond, 1));
   rtx set = gen_rtx_SET (x, src);
 
   start_sequence ();
@@ -1209,8 +1231,7 @@ noce_try_store_flag (struct noce_if_info
   else if (if_info->b == const0_rtx
   && CONST_INT_P (if_info->a)
   && INTVAL (if_info->a) == STORE_FLAG_VALUE
-  && (reversed_comparison_code (if_info->cond, if_info->jump)
-  != UNKNOWN))
+  && noce_reversed_cond_code (if_info) != UNKNOWN)
 reversep = 1;
   else
 return FALSE;
@@ -1371,9 +1392,7 @@ noce_try_store_flag_constants (struct no
 
   diff = trunc_int_for_mode (diff, mode);
 
-  can_reverse = (reversed_comparison_code (if_info->cond, if_info->jump)
-!= UNKNOWN);
-
+  can_reverse = noce_reversed_cond_code (if_info) != UNKNOWN;
   reversep = false;
   if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
{
@@ -1553,11 +1572,18 @@ noce_try_addcc (struct noce_if_info *if_
 
   if (GET_CODE (if_info->a) == PLUS
   && rtx_equal_p (XEXP (if_info->a, 0), if_info->b)
-  &&

[C++ Patch] PR 79361

2017-02-23 Thread Paolo Carlini


Hi,

in this error recovery regression, we ICE after (a lot after) a sensible 
diagnostic, when lower_function_body encounters an error_mark_node. I 
worked quite a bit on the issue, and, all in all, I propose to simply 
check the return value of duplicate_decls as called by 
register_specialization and bail out.


In principle it may make sense to continue and, for example, also emit 
diagnostic about '= default' making sense only for special member 
functions - returning spec instead of error_mark_node would achieve that 
without regressions for the second testcase - but I'm not sure we want 
to do this kind of change right here right now together with fixing the 
ICE, because we do *not* emit additional diagnostic in the non-template 
case, eg for:


void foo(int) {}
void foo(int) = default;

Tested x86_64-linux.

Thanks, Paolo.

/


/cp
2017-02-23  Paolo Carlini  

PR c++/79361
* pt.c (register_specialization): Check duplicate_decls return value
for error_mark_node and pass it back.

/testsuite
2017-02-23  Paolo Carlini  

PR c++/79361
* g++.dg/cpp0x/pr79361-1.C: New.
* g++.dg/cpp0x/pr79361-2.C: Likewise.
Index: cp/pt.c
===
--- cp/pt.c (revision 245655)
+++ cp/pt.c (working copy)
@@ -1599,7 +1599,12 @@ register_specialization (tree spec, tree tmpl, tre
}
   else if (DECL_TEMPLATE_SPECIALIZATION (fn))
{
- if (!duplicate_decls (spec, fn, is_friend) && DECL_INITIAL (spec))
+ tree dd = duplicate_decls (spec, fn, is_friend);
+ if (dd == error_mark_node)
+   /* We've already complained in duplicate_decls.  */
+   return error_mark_node;
+
+ if (dd == NULL_TREE && DECL_INITIAL (spec))
/* Dup decl failed, but this is a new definition. Set the
   line number so any errors match this new
   definition.  */
Index: testsuite/g++.dg/cpp0x/pr79361-1.C
===
--- testsuite/g++.dg/cpp0x/pr79361-1.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/pr79361-1.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/79361
+// { dg-do compile { target c++11 } }
+
+template void foo(T);
+
+template<> void foo(int) {}   // { dg-message "declared" }
+template<> void foo(int) = delete;  // { dg-error "redefinition" }
Index: testsuite/g++.dg/cpp0x/pr79361-2.C
===
--- testsuite/g++.dg/cpp0x/pr79361-2.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/pr79361-2.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/79361
+// { dg-do compile { target c++11 } }
+
+template void foo(T);
+
+template<> void foo(int) {}   // { dg-message "declared" }
+template<> void foo(int) = default;  // { dg-error "redefinition" }

Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Bernd Schmidt


On 02/23/2017 02:36 PM, Jakub Jelinek wrote:

and both UNLT and GE can be reversed.  But if the arguments of the condition
are canonicalized, we run into:
  /* Test for an integer condition, or a floating-point comparison
 in which NaNs can be ignored.  */
  if (CONST_INT_P (arg0)
  || (GET_MODE (arg0) != VOIDmode
  && GET_MODE_CLASS (mode) != MODE_CC
  && !HONOR_NANS (mode)))
return reverse_condition (code);
and thus always return UNKNOWN.


So... do you think we could add (in gcc-8, probably, although if it 
fixes this regression...)


   else if (GET_MODE (arg0) != VOIDmode
&& GET_MODE_CLASS (mode) != MODE_CC
&& HONOR_NANS (mode))
 return reverse_condition_maybe_unordered (code);

to make this work?


Bernd

Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Jakub Jelinek

On Thu, Feb 23, 2017 at 02:26:24PM +0100, Bernd Schmidt wrote:
> On 02/23/2017 12:46 PM, Jakub Jelinek wrote:
> > But as soon as we only have the (unlt (reg:DF 100) (reg:DF 97)),
> > reversed_comparison_code fails on it:
> > case UNLT:
> > case UNLE:
> > case UNGT:
> > case UNGE:
> >   /* We don't have safe way to reverse these yet.  */
> >   return UNKNOWN;
> 
> I do have to wonder why. The reverse_condition_maybe_unordered function
> knows how to reverse these, and I can't quite figure out what the problem is
> here. The comment isn't super helpful.

Yeah, and it is very old code, r44846 apparently.
Note that the function returns UNKNOWN not just for these, but also for
GT/GE/LT/LE.  If the condition is not canonicalized yet, we hit the
  if (GET_MODE_CLASS (mode) == MODE_CC
  && REVERSIBLE_CC_MODE (mode))
return REVERSE_CONDITION (code, mode);
case or
  if (GET_MODE_CLASS (mode) == MODE_CC || CC0_P (arg0))
{
and both UNLT and GE can be reversed.  But if the arguments of the condition
are canonicalized, we run into:
  /* Test for an integer condition, or a floating-point comparison
 in which NaNs can be ignored.  */
  if (CONST_INT_P (arg0)
  || (GET_MODE (arg0) != VOIDmode
  && GET_MODE_CLASS (mode) != MODE_CC
  && !HONOR_NANS (mode)))
return reverse_condition (code);
and thus always return UNKNOWN.

> > -  && (reversed_comparison_code (if_info->cond, if_info->jump)
> > -  != UNKNOWN))
> > +  && (if_info->rev_cond
> > +  || reversed_comparison_code (if_info->cond,
> > +   if_info->jump) != UNKNOWN))
> 
> Maybe have a reversed_cond_code (if_info) function? This pattern seems to
> occur a few times.

Ok, will do.

> > @@ -4676,7 +4713,12 @@ find_cond_trap (basic_block test_bb, edg
> >  {
> >code = reversed_comparison_code (cond, jump);
> >if (code == UNKNOWN)
> > -   return FALSE;
> > +   {
> > + cond = noce_get_condition (jump, _earliest, true);
> > + if (!cond)
> > +   return FALSE;
> > + code = GET_CODE (cond);
> > +   }
> 
> This one is an extra optimization, similar but unrelated to the others,
> right?

Yes.

> Can't you figure out the need to reverse a bit earlier and pass it
> into the first noce_get_condition call?

That is indeed possible.  Let me adjust the patch.

Jakub

Re: [PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Bernd Schmidt


On 02/23/2017 12:46 PM, Jakub Jelinek wrote:

But as soon as we only have the (unlt (reg:DF 100) (reg:DF 97)),
reversed_comparison_code fails on it:
case UNLT:
case UNLE:
case UNGT:
case UNGE:
  /* We don't have safe way to reverse these yet.  */
  return UNKNOWN;


I do have to wonder why. The reverse_condition_maybe_unordered function 
knows how to reverse these, and I can't quite figure out what the 
problem is here. The comment isn't super helpful.



-  && (reversed_comparison_code (if_info->cond, if_info->jump)
-  != UNKNOWN))
+  && (if_info->rev_cond
+  || reversed_comparison_code (if_info->cond,
+   if_info->jump) != UNKNOWN))


Maybe have a reversed_cond_code (if_info) function? This pattern seems 
to occur a few times.



@@ -4676,7 +4713,12 @@ find_cond_trap (basic_block test_bb, edg
 {
   code = reversed_comparison_code (cond, jump);
   if (code == UNKNOWN)
-   return FALSE;
+   {
+ cond = noce_get_condition (jump, _earliest, true);
+ if (!cond)
+   return FALSE;
+ code = GET_CODE (cond);
+   }


This one is an extra optimization, similar but unrelated to the others, 
right? Can't you figure out the need to reverse a bit earlier and pass 
it into the first noce_get_condition call?


Otherwise ok.


Bernd

[PATCH] Improve ifcvt (PR tree-optimization/79389)

2017-02-23 Thread Jakub Jelinek

Hi!

Uros noted in the PR that in many cases with floating point comparisons
ifcvt fails to RTL if-convert or end up being RTL if-converted in worse
way than it could be (e.g. something that ought to be noce_try_addcc
optimizable is only if-converted using noce_try_cmove_arith, resulting
in worse performance).

The problem is that reversed_comparison_code fails (returns UNKNOWN)
in many cases, because important information is lost during
canonicalize_condition.  Initially we e.g. have
(ge (reg:CCFP 17 flags)
(const_int 0 [0]))
which is canonicalized non-reversed as:
(ge (reg:DF 100)
(reg:DF 97))
and reversed as:
(unlt (reg:DF 100)
(reg:DF 97))
But as soon as we only have the (unlt (reg:DF 100) (reg:DF 97)),
reversed_comparison_code fails on it:
case UNLT:
case UNLE:
case UNGT:
case UNGE:
  /* We don't have safe way to reverse these yet.  */
  return UNKNOWN;
On the benchmark (at -O2, at -O3 we run into a split-path pass messing
it up so ifcvt is not possible) due to the order/content of then/else
branches, we canonicalize the condition as reversed, so if_info->cond
is that UNLT, and when we want to reverse it again in e.g. noce_try_addcc
(i.e. get the original non-reversed, just canonicalized), it fails.

The following patch fixes that by remembering both the condition and
reverse condition in noce_if_info structure (if the latter is successful),
and then using the rev_cond instead of cond if we need to reverse cond.
The reversed_comparison_code calls are kept as fallback, e.g. some
functions change if_info->cond and then we don't have the reverse condition
for that, or canonicalize_condition could fail for the reverse.
If rev_cond is available, the advantage is also that that condition is
in canonic form, while just by using reversed_comparison_code and
the original cond's arguments it can be non-canonic.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-23  Jakub Jelinek  

PR tree-optimization/79389
* ifcvt.c (struct noce_if_info): Add rev_cond field.
(noce_emit_store_flag): Use rev_cond if non-NULL instead of
reversed_comparison_code.  Formatting fix.
(noce_try_store_flag): Test rev_cond != NULL in addition to
reversed_comparison_code.
(noce_try_store_flag_constants): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_addcc): Use rev_cond if non-NULL instead of
reversed_comparison_code.
(noce_try_cmove_arith): Likewise.  Formatting fixes.
(noce_try_minmax, noce_try_abs): Clear rev_cond.
(noce_find_if_block): Initialize rev_cond.
(find_cond_trap): If reversed_comparison_code fails, try
noce_get_condition with true as last argument.

--- gcc/ifcvt.c.jj  2017-02-22 22:32:34.411724499 +0100
+++ gcc/ifcvt.c 2017-02-23 10:24:09.364416631 +0100
@@ -777,6 +777,9 @@ struct noce_if_info
   /* The jump condition.  */
   rtx cond;
 
+  /* Reversed jump condition.  */
+  rtx rev_cond;
+
   /* New insns should be inserted before this one.  */
   rtx_insn *cond_earliest;
 
@@ -888,6 +891,14 @@ noce_emit_store_flag (struct noce_if_inf
   if (if_info->then_else_reversed)
reversep = !reversep;
 }
+  else if (reversep
+  && if_info->rev_cond
+  && general_operand (XEXP (if_info->rev_cond, 0), VOIDmode)
+  && general_operand (XEXP (if_info->rev_cond, 1), VOIDmode))
+{
+  cond = if_info->rev_cond;
+  reversep = false;
+}
 
   if (reversep)
 code = reversed_comparison_code (cond, if_info->jump);
@@ -898,7 +909,7 @@ noce_emit_store_flag (struct noce_if_inf
   && (normalize == 0 || STORE_FLAG_VALUE == normalize))
 {
   rtx src = gen_rtx_fmt_ee (code, GET_MODE (x), XEXP (cond, 0),
-   XEXP (cond, 1));
+   XEXP (cond, 1));
   rtx set = gen_rtx_SET (x, src);
 
   start_sequence ();
@@ -1209,8 +1220,9 @@ noce_try_store_flag (struct noce_if_info
   else if (if_info->b == const0_rtx
   && CONST_INT_P (if_info->a)
   && INTVAL (if_info->a) == STORE_FLAG_VALUE
-  && (reversed_comparison_code (if_info->cond, if_info->jump)
-  != UNKNOWN))
+  && (if_info->rev_cond
+  || reversed_comparison_code (if_info->cond,
+   if_info->jump) != UNKNOWN))
 reversep = 1;
   else
 return FALSE;
@@ -1371,8 +1383,9 @@ noce_try_store_flag_constants (struct no
 
   diff = trunc_int_for_mode (diff, mode);
 
-  can_reverse = (reversed_comparison_code (if_info->cond, if_info->jump)
-!= UNKNOWN);
+  can_reverse = (if_info->rev_cond
+|| reversed_comparison_code (if_info->cond,
+ if_info->jump) != UNKNOWN);
 
   reversep = false;
   if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE)
@@ -1553,11 +1566,20

[PATCH PR79663]Only reversely combine refs for ZERO length chains in predcom

2017-02-23 Thread Bin Cheng

Hi,
This patch resolves spec2k/mgrid regression as reported by PR79663.  Root cause 
has been
described thoroughly in comment #1/#2 of 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79663
This patch handles ZERO/non-ZERO length chains differently and only reversely 
combines refs
for ZERO length chains.  It also does small improvement with in place list swap.
Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin

2017-01-21  Bin Cheng  

PR tree-optimization/79663
* tree-predcom.c (combine_chains): Process refs in reverse order
only for ZERO length chains, and add explaining comment.diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 9723e9c..57d8f7d 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -2283,7 +2283,7 @@ combine_chains (chain_p ch1, chain_p ch2)
   enum tree_code op = ERROR_MARK;
   bool swap = false;
   chain_p new_chain;
-  unsigned i;
+  int i, j, num;
   gimple *root_stmt;
   tree rslt_type = NULL_TREE;
 
@@ -2305,6 +2305,9 @@ combine_chains (chain_p ch1, chain_p ch2)
return NULL;
 }
 
+  ch1->combined = true;
+  ch2->combined = true;
+
   if (swap)
 std::swap (ch1, ch2);
 
@@ -2317,39 +2320,41 @@ combine_chains (chain_p ch1, chain_p ch2)
   new_chain->length = ch1->length;
 
   gimple *insert = NULL;
-  auto_vec tmp_refs;
-  gcc_assert (ch1->refs.length () == ch2->refs.length ());
-  /* Process in reverse order so dominance point is ready when it comes
- to the root ref.  */
-  for (i = ch1->refs.length (); i > 0; i--)
-{
-  r1 = ch1->refs[i - 1];
-  r2 = ch2->refs[i - 1];
+  num = ch1->refs.length ();
+  i = (new_chain->length == 0) ? num - 1 : 0;
+  j = (new_chain->length == 0) ? -1 : 1;
+  /* For ZERO length chain, process refs in reverse order so that dominant
+ position is ready when it comes to the root ref.
+ For non-ZERO length chain, process refs in order.  See PR79663.  */
+  for (; num > 0; num--, i += j)
+{
+  r1 = ch1->refs[i];
+  r2 = ch2->refs[i];
   nw = XCNEW (struct dref_d);
   nw->distance = r1->distance;
-  nw->stmt = stmt_combining_refs (r1, r2, i == 1 ? insert : NULL);
 
-  /* Record dominance point where root combined stmt should be inserted
-for chains with 0 length.  Though all root refs dominate following
-refs, it's possible the combined stmt doesn't.  See PR70754.  */
-  if (ch1->length == 0
+  /* For ZERO length chain, insert combined stmt of root ref at dominant
+position.  */
+  nw->stmt = stmt_combining_refs (r1, r2, i == 0 ? insert : NULL);
+  /* For ZERO length chain, record dominant position where combined stmt
+of root ref should be inserted.  In this case, though all root refs
+dominate following ones, it's possible that combined stmt doesn't.
+See PR70754.  */
+  if (new_chain->length == 0
  && (insert == NULL || stmt_dominates_stmt_p (nw->stmt, insert)))
insert = nw->stmt;
 
-  tmp_refs.safe_push (nw);
+  new_chain->refs.safe_push (nw);
 }
-
-  /* Restore the order for new chain's refs.  */
-  for (i = tmp_refs.length (); i > 0; i--)
-new_chain->refs.safe_push (tmp_refs[i - 1]);
-
-  ch1->combined = true;
-  ch2->combined = true;
-
-  /* For chains with 0 length, has_max_use_after must be true since root
- combined stmt must dominates others.  */
   if (new_chain->length == 0)
 {
+  /* Restore the order for ZERO length chain's refs.  */
+  num = new_chain->refs.length () >> 1;
+  for (i = 0, j = new_chain->refs.length () - 1; i < num; i++, j--)
+   std::swap (new_chain->refs[i], new_chain->refs[j]);
+
+  /* For ZERO length chain, has_max_use_after must be true since root
+combined stmt must dominates others.  */
   new_chain->has_max_use_after = true;
   return new_chain;
 }

[PATCH, doc]: Mention that -mfpmath=sse is the default on 32bit x86 w/ SSE2 and -ffast-math

2017-02-23 Thread Uros Bizjak

Hello!

This patch documents a little gcc secret...

2017-02-23  Uros Bizjak  

* doc/invoke.texi (x86 Options, -mfpmath=sse): Mention that
-mfpmath=sse is the default also for x86-32 targets with SSE2
instruction set when @option{-ffast-math} is enabled

Bootstrapped on x86_64-linux-gnu.

Uros.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6e5fa56..9640b1b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -24591,7 +24591,7 @@ The temporary results are computed in 80-bit precision 
instead of the precision
 specified by the type, resulting in slightly different results compared to most
 of other chips.  See @option{-ffloat-store} for more detailed description.
 
-This is the default choice for x86-32 targets.
+This is the default choice for most of x86-32 targets.
 
 @item sse
 Use scalar floating-point instructions present in the SSE instruction set.
@@ -24611,7 +24611,8 @@ The resulting code should be considerably faster in the 
majority of cases and av
 the numerical instability problems of 387 code, but may break some existing
 code that expects temporaries to be 80 bits.
 
-This is the default choice for the x86-64 compiler.
+This is the default choice for the x86-64 compiler, and the default choice for
+x86-32 targets with SSE2 instruction set when @option{-ffast-math} is enabled.
 
 @item sse,387
 @itemx sse+387

[PATCH,testsuite] Use logical_op_short_circuit to skip targets in ssa-thread-14.c.

2017-02-23 Thread Toma Tabacu

Hi,

The ssa-thread-14.c test has been failing for MIPS for a while.

According to Patrick Palka, who modified this test's target selector in the fix
for PR71314, this test fails on targets which don't use non-short-circuit
logical ops and should be skipped for such targets.
In the case of MIPS, LOGICAL_OP_NON_SHORT_CIRCUIT is set to 0, so the test
should be skipped for MIPS targets.

This patch adds the !logical_op_short_circuit requirement (defined in
testsuite/lib/target-supports.exp:7965) to ssa-thread-14.c's dg-options, which
will exclude MIPS targets. It also removes the "-mbranch-cost" options from
being passed to targets which will be skipped because of the newly added
!logical_op_short_circuit requirement.

This makes ssa-thread-14.c's target selector more similar the one from
ssa-thread-11.c (which was mentioned as a solution to PR71314 in the Bugzilla
thread).

Here are some links, for your convenience:
PR71314 on Bugzilla:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71314
The patch submission for fixing PR71314:
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02359.html

Does this look OK ?
Is the !logical_op_short_circuit too heavy-handed here ?

Regards,
Toma Tabacu

gcc/testsuite/

* gcc.dg/tree-ssa/ssa-thread-14.c (dg-options): Use
logical_op_short_circuit to skip targets.
(dg-additional-options): Don't pass -mbranch-cost=2 for MIPS, AVR
and s390.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
index c754b5b..aa1323a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
@@ -1,6 +1,6 @@
-/* { dg-do compile { target { ! { m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* 
v850*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
xtensa*-*-* hppa*-*-* nios2*-*-* } } } }  */
+/* { dg-do compile { target { ! { logical_op_short_circuit || { m68k*-*-* 
mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* 
mcore*-*-* powerpc*-*-* xtensa*-*-* hppa*-*-* nios2*-*-* } } } } }  */
 /* { dg-additional-options "-O2 -fdump-tree-vrp-details" }  */
-/* { dg-additional-options "-mbranch-cost=2" { target mips*-*-* avr-*-* 
s390*-*-* i?86-*-* x86_64-*-* } }  */
+/* { dg-additional-options "-mbranch-cost=2" { target i?86-*-* x86_64-*-* } }  
*/
 /* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */
 
 void foo (void);

Re: [PATCH, i386]: Fix PR79593: Poor/Worse code generation for FPU on versions after 6

2017-02-23 Thread Uros Bizjak

On Tue, Feb 21, 2017 at 7:52 PM, Uros Bizjak  wrote:
> Hello!
>
> Attached patch fixes oversight in standard_x87sse_constant_load
> splitter and its float-extend counterpart, where a FP reg-reg move
> insn RTX can be tagged with REG_EQUIV or REG_EQUAL const_double RTX.
>
> find_constant_src and ix86_standard_x87sse_constant_load_p predicate
> are able to handle this situation, and patched splitters emit direct
> constant load instead of a reg-reg move. This also lowers regstack
> register pressure, as evident from the testcase:
>
> --- pr79593.s_  2017-02-21 19:41:36.615740647 +0100
> +++ pr79593.s   2017-02-21 19:41:47.251622966 +0100
> @@ -15,21 +15,16 @@
> fldz
>  .L2:
> fld1
> -   fld %st(0)
> -   fcomp   %st(2)
> +   fcomp   %st(1)
> fnstsw  %ax
> sahf
> -   jnb .L5
> -   fstp%st(1)
> -   jmp .L3
> -   .p2align 4,,10
> -   .p2align 3
> -.L5:
> +   jnb .L3
> fstp%st(0)
> +   fld1
>  .L3:
> rep ret
> .cfi_endproc
>  .LFE2:
> .size   bar, .-bar
> -   .ident  "GCC: (GNU) 7.0.0 20170117 (experimental) [trunk
> revision 244540]"
> +   .ident  "GCC: (GNU) 7.0.1 20170221 (experimental) [trunk
> revision 245630]"
> .section.note.GNU-stack,"",@progbits
>
> Patched compiler also removed a jump to a BB where only compensating
> regstack pop was emitted.
>
> 2017-02-21  Uros Bizjak  
>
> PR target/79593
> * config/i386/i386.md (standard_x87sse_constant_load splitter):
> Use nonimmediate_operand instead of memory_operand for operand 1.
> (float-extend standard_x87sse_constant_load splitter): Ditto.
>
> testsuite/ChangeLog:
>
> 2017-02-21  Uros Bizjak  
>
> PR target/79593
> * gcc.target/i386/pr79593.c: New test.
>
> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Committed to mainline SVN.

Now with a patch.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index cfbe0b0..23f2ea0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3660,7 +3660,7 @@
 
 (define_split
   [(set (match_operand 0 "any_fp_register_operand")
-   (match_operand 1 "memory_operand"))]
+   (match_operand 1 "nonimmediate_operand"))]
   "reload_completed
&& (GET_MODE (operands[0]) == TFmode
|| GET_MODE (operands[0]) == XFmode
@@ -3672,7 +3672,7 @@
 
 (define_split
   [(set (match_operand 0 "any_fp_register_operand")
-   (float_extend (match_operand 1 "memory_operand")))]
+   (float_extend (match_operand 1 "nonimmediate_operand")))]
   "reload_completed
&& (GET_MODE (operands[0]) == TFmode
|| GET_MODE (operands[0]) == XFmode
/* PR target/79593 */
/* { dg-do compile } */
/* { dg-options "-Ofast -mfpmath=387" } */

extern float global_data[1024];

static long double MIN (long double a, long double b) { return a < b ? a : b; }
static long double MAX (long double a, long double b) { return a > b ? a : b; }

float bar (void)
{
  long double delta = (global_data[0]);

  return (MIN (MAX (delta, 0.0l), 1.0l));
}

/* { dg-final { scan-assembler-not "fld\[ \t\]+%st" } } */

Re: PR79286, ira combine_and_move_insns in loops

2017-02-23 Thread Alan Modra

On Thu, Feb 23, 2017 at 12:46:17AM -0700, Jeff Law wrote:
> And thus we keep the equivalence.  Ultimately may_trap_p considers a PIC
> memory reference as non-trapping.

Which is obviously a bug, because this access is segfaulting..
Not that I want to poke at the bug.  :)

> I really wonder if we should just drop the may_trap_p test and always do the
> domination check.

Yeah, seems reasonable to me.

-- 
Alan Modra
Australia Development Lab, IBM

[PATCH] Fix PR79684

2017-02-23 Thread Richard Biener


The following makes sure to initialize the location range for c_exprs
the gimple parser routines eventually return.

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2017-02-23  Richard Biener  

PR c/79684
* gimple-parser.c (c_parser_gimple_statement): Use set_error
to initialize c_exprs to return.
(c_parser_gimple_binary_expression): Likewise.
(c_parser_gimple_unary_expression): Likewise.
(c_parser_gimple_postfix_expression): Likewise.

Index: gcc/c/gimple-parser.c
===
--- gcc/c/gimple-parser.c   (revision 245676)
+++ gcc/c/gimple-parser.c   (working copy)
@@ -270,7 +270,7 @@ c_parser_gimple_statement (c_parser *par
 
   lhs = c_parser_gimple_unary_expression (parser);
   loc = EXPR_LOCATION (lhs.value);
-  rhs.value = error_mark_node;
+  rhs.set_error ();
 
   /* GIMPLE call statement without LHS.  */
   if (c_parser_next_token_is (parser, CPP_SEMICOLON)
@@ -455,7 +455,7 @@ c_parser_gimple_binary_expression (c_par
   /* Location of the binary operator.  */
   struct c_expr ret, lhs, rhs;
   enum tree_code code = ERROR_MARK;
-  ret.value = error_mark_node;
+  ret.set_error ();
   lhs = c_parser_gimple_postfix_expression (parser);
   if (c_parser_error (parser))
 return ret;
@@ -553,9 +553,7 @@ c_parser_gimple_unary_expression (c_pars
   struct c_expr ret, op;
   location_t op_loc = c_parser_peek_token (parser)->location;
   location_t finish;
-  ret.original_code = ERROR_MARK;
-  ret.original_type = NULL;
-  ret.value = error_mark_node;
+  ret.set_error ();
   switch (c_parser_peek_token (parser)->type)
 {
 case CPP_AND:
@@ -723,11 +721,10 @@ c_parser_parse_ssa_name (c_parser *parse
 static struct c_expr
 c_parser_gimple_postfix_expression (c_parser *parser)
 {
-  struct c_expr expr;
   location_t loc = c_parser_peek_token (parser)->location;
   source_range tok_range = c_parser_peek_token (parser)->get_range ();
-  expr.original_code = ERROR_MARK;
-  expr.original_type = NULL;
+  struct c_expr expr;
+  expr.set_error ();
   switch (c_parser_peek_token (parser)->type)
 {
 case CPP_NUMBER:

Re: [PATCH][PR tree-optimization/79578] Use operand_equal_p rather than pointer equality for base test

2017-02-23 Thread Richard Biener

On Thu, Feb 23, 2017 at 6:49 AM, Jeff Law  wrote:
>
> tree-ssa-dse.c needs to verify when two writes have the same base address.
> Right now it uses pointer equality.  The testcase in BZ79578 shows that we
> should have been using operand_equal_p.
>
> This one-liner fixes that oversight.  Bootstrapped and regression tested on
> x86_64-linux-gnu.  Installed on the trunk.
>
> Jeff
>
> commit ef506ec9114a7fe27d9ee892c17edd100f72a963
> Author: law 
> Date:   Thu Feb 23 05:47:43 2017 +
>
> PR tree-optimization/79578
> * tree-ssa-dse.c (clear_bytes_written_by): Use operand_equal_p
> to compare base operands.
>
> PR tree-optimization/79578
> * g++.dg/tree-ssa/ssa-dse-3.C: New test.
>
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245675
> 138bc75d-0d04-0410-961f-82ee72b054a4
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 7155850..6da1d74 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2017-02-22 Jeff Law  
> +
> +   PR tree-optimization/79578
> +   * tree-ssa-dse.c (clear_bytes_written_by): Use operand_equal_p
> +   to compare base operands.
> +
>  2017-02-22  Segher Boessenkool  
>
> PR target/79211
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index ea5e251..d900cc3 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,8 @@
> +2017-02-22  Jeff Law  
> +
> +   PR tree-optimization/79578
> +   * g++.dg/tree-ssa/ssa-dse-3.C: New test.
> +
>  2017-02-22  Sameera Deshpande  
>
> * gcc.target/mips/msa-fp-cc.c: New test.
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-3.C
> b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-3.C
> new file mode 100644
> index 000..fe8f309
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-3.C
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c++14 -O3 -fdump-tree-dse1-details" } */
> +
> +#include 
> +#include 
> +
> +struct A
> +{
> +std::uint16_t a, b;
> +};
> +
> +A* f(char* b) __attribute__((noinline));
> +
> +A* f(char* b) {
> +auto a = new(b) A{};
> +a->a = 1;
> +a->b = 2;
> +return a;
> +}
> +
> +int main() {
> +char b[sizeof(A)] alignas(A);
> +f(b);
> +}
> +
> +
> +/* { dg-final { scan-tree-dump "Deleted dead store: " "dse1" } } */
> +
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 84c0b11..a82e164 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -176,7 +176,7 @@ clear_bytes_written_by (sbitmap live_bytes, gimple
> *stmt, ao_ref *ref)
>/* Verify we have the same base memory address, the write
>   has a known size and overlaps with REF.  */
>if (valid_ao_ref_for_dse ()
> -  && write.base == ref->base
> +  && operand_equal_p (write.base, ref->base, 0)

As you've identified size and offset match you are really interested
in comparing the base addresses and thus should use OEP_ADDRESS_OF.

Richard.

>&& write.size == write.max_size
>&& ((write.offset < ref->offset
>&& write.offset + write.size > ref->offset)
>

Re: fwprop fix for PR79405

2017-02-23 Thread Richard Biener

On Wed, Feb 22, 2017 at 6:19 PM, Jeff Law  wrote:
> On 02/16/2017 12:41 PM, Bernd Schmidt wrote:
>>
>> We have two registers being assigned to each other:
>>
>>  (set (reg 213) (reg 209))
>>  (set (reg 209) (reg 213))
>>
>> These being the only definitions, we are happy to forward propagate reg
>> 209 for reg 213 into a third insn, making a new use for reg 209. We are
>> then happy to forward propagate reg 213 for it in the same insn...
>> ending up in an infinite loop.
>>
>> I don't really see an elegant way to prevent this, so the following just
>> tries to detect the situation (and more general ones) by brute force.
>> Bootstrapped and tested on x86_64-linux, verified that the test passes
>> with a ppc cross, ok?
>>
>>
>> Bernd
>>
>>
>> 79405.diff
>>
>>
>> PR rtl-optimization/79405
>> * fwprop.c (forward_propagate_into): Detect potentially cyclic
>> replacements and bail out for them.
>>
>> PR rtl-optimization/79405
>> * gcc.dg/torture/pr79405.c: New test.
>
> OK.

Err - this looks quite costly done for each fwprop.  And placing it before
less costly bailouts even...

See my discussion with Bernd anyway.

Richard.

> jeff
>

[PATCH] Fix PR79683

2017-02-23 Thread Richard Biener


The following fixes BB vectorization to not forget addr-space 
qualifications of loads.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-02-23  Richard Biener  

PR tree-optimization/79683
* tree-vect-stmts.c (vect_analyze_stmt): Do not overwrite
vector types for data-refs.

* gcc.target/i386/pr79683.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 245676)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -8486,37 +8486,42 @@ vect_analyze_stmt (gimple *stmt, bool *n
 {
   gcc_assert (PURE_SLP_STMT (stmt_info));
 
-  scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
-  if (dump_enabled_p ())
-{
-  dump_printf_loc (MSG_NOTE, vect_location,
-   "get vectype for scalar type:  ");
-  dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
-  dump_printf (MSG_NOTE, "\n");
-}
+  /* Memory accesses already got their vector type assigned
+ in vect_analyze_data_refs.  */
+  if (! STMT_VINFO_DATA_REF (stmt_info))
+   {
+ scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "get vectype for scalar type:  ");
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
+ dump_printf (MSG_NOTE, "\n");
+   }
 
-  vectype = get_vectype_for_scalar_type (scalar_type);
-  if (!vectype)
-{
-  if (dump_enabled_p ())
-{
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"not SLPed: unsupported data-type ");
-   dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-  scalar_type);
-  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-}
-  return false;
-}
+ vectype = get_vectype_for_scalar_type (scalar_type);
+ if (!vectype)
+   {
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "not SLPed: unsupported data-type ");
+ dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+scalar_type);
+ dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+   }
+ return false;
+   }
 
-  if (dump_enabled_p ())
-{
-  dump_printf_loc (MSG_NOTE, vect_location, "vectype:  ");
-  dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);
-  dump_printf (MSG_NOTE, "\n");
-}
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_NOTE, vect_location, "vectype:  ");
+ dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);
+ dump_printf (MSG_NOTE, "\n");
+   }
 
-  STMT_VINFO_VECTYPE (stmt_info) = vectype;
+ STMT_VINFO_VECTYPE (stmt_info) = vectype;
+   }
}
 
   if (STMT_VINFO_RELEVANT_P (stmt_info))
Index: gcc/testsuite/gcc.target/i386/pr79683.c
===
--- gcc/testsuite/gcc.target/i386/pr79683.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr79683.c (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -msse2" } */
+
+struct s {
+__INT64_TYPE__ a;
+__INT64_TYPE__ b;
+};
+void test(struct s __seg_gs *x) {
+x->a += 1;
+x->b -= 1;
+}
+
+/* We get the function vectorized, verify the load and store are
+   address-space qualified.  */
+/* { dg-final { scan-assembler-times "padd" 1 } } */
+/* { dg-final { scan-assembler-times "%gs" 2 } } */

52 matches

Mail list logo