date:20150727

Re: [PATCH] Honour DriverOnly for enum values in error messages.

2015-07-27 Thread Andreas Krebbel

On 07/27/2015 01:45 PM, Dominik Vogt wrote:
/gcc/ChangeLog

* opts-common.c (read_cmdline_option): List DriverOnly enum values as
valid only in the error message of the driver, not in the messages of
the language compilers.

Applied. Thanks!

-Andreas-

[PATCH] Refactoring masked built-in decls

2015-07-27 Thread Petr Murzin

Hello,
This patch converts mask type for masked builtins from signed to
unsigned. Furthermore, several redundant builtins definitions were
removed. Please have a look. It it ok for trunk?

Thanks,
Petr

2015-07-27  Petr Murzin  petr.mur...@intel.com

* config/i386/i386.c
(bdesc_special_args): Convert mask type from signed to unsigned for
masked builtins.
(ix86_expand_args_builtin): Do not handle UINT_FTYPE_V2DF,
UINT64_FTYPE_V2DF, UINT64_FTYPE_V4SF, V16QI_FTYPE_V8DI,
V16HI_FTYPE_V16SI, V16SI_FTYPE_V16SI, V16SF_FTYPE_FLOAT,
V8HI_FTYPE_V8DI, V8UHI_FTYPE_V8UHI, V8SI_FTYPE_V8DI, V8SF_FTYPE_V8DF,
V8DI_FTYPE_INT64, V8DI_FTYPE_V4DI, V8DI_FTYPE_V8DI, V8DF_FTYPE_DOUBLE,
V8DF_FTYPE_V8SI, V16SI_FTYPE_V16SI_V16SI, V16SF_FTYPE_V16SF_V16SI,
V8DI_FTYPE_V8DI_V8DI, V8DF_FTYPE_V8DF_V8DI, V4SI_FTYPE_V4SF_V4SF,
V4SF_FTYPE_V4SF_UINT64, V2UDI_FTYPE_V4USI_V4USI, V2DI_FTYPE_V2DF_V2DF,
V2DF_FTYPE_V2DF_UINT64, V4UDI_FTYPE_V8USI_V8USI, QI_FTYPE_V8DI_V8DI,
HI_FTYPE_V16SI_V16SI, HI_FTYPE_HI_INT, V16SF_FTYPE_V16SF_V16SF_V16SF,
V16SF_FTYPE_V16SF_V16SI_V16SF, V16SF_FTYPE_V16SI_V16SF_HI,
V16SF_FTYPE_V16SI_V16SF_V16SF, V16SI_FTYPE_V16SF_V16SI_HI,
V8DI_FTYPE_V8SF_V8DI_QI, V8SF_FTYPE_V8DI_V8SF_QI, V8DI_FTYPE_PV4DI,
V8DF_FTYPE_V8DI_V8DF_QI, V16SI_FTYPE_V16SI_V16SI_V16SI,
V2DI_FTYPE_V2DI_V2DI_V2DI, V8DI_FTYPE_V8DF_V8DI_QI, V8DF_FTYPE_PV4DF,
V8SI_FTYPE_V8SI_V8SI_V8SI, V8DF_FTYPE_V8DF_V8DF_V8DF, UINT_FTYPE_V4SF,
V8DF_FTYPE_V8DF_V8DI_V8DF, V8DF_FTYPE_V8DI_V8DF_V8DF,
V8DF_FTYPE_V8SF_V8DF_QI, V8DI_FTYPE_V8DI_V8DI_V8DI, V16SF_FTYPE_PV4SF,
V8SF_FTYPE_V8DF_V8SF_QI, V8SI_FTYPE_V8DF_V8SI_QI, V16SI_FTYPE_PV4SI,
V2DF_FTYPE_V2DF_V4SF_V2DF_QI, V4SF_FTYPE_V4SF_V2DF_V4SF_QI,
V8DI_FTYPE_V8DI_SI_V8DI_V8DI, QI_FTYPE_V8DF_V8DF_INT_QI,
HI_FTYPE_V16SF_V16SF_INT_HI, V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI,
VOID_FTYPE_PDOUBLE_V2DF_QI, VOID_FTYPE_PFLOAT_V4SF_QI,
V2DF_FTYPE_PCDOUBLE_V2DF_QI, V4SF_FTYPE_PCFLOAT_V4SF_QI.
* config/i386/i386-builtin-types.def
(V16QI_FTYPE_V16SI): Remove.
(V8DF_FTYPE_V8SI): Ditto.
(V8HI_FTYPE_V8DI): Ditto.
(V8SI_FTYPE_V8DI): Ditto.
(V8SF_FTYPE_V8DF): Ditto.
(V8SF_FTYPE_V8DF_V8SF_QI): Ditto.
(V16HI_FTYPE_V16SI): Ditto.
(V16SF_FTYPE_V16HI): Ditto.
(V16SF_FTYPE_V16HI_V16SF_HI): Ditto.
(V16SF_FTYPE_V16SI): Ditto.
(V4DI_FTYPE_V4DI): Ditto.
(V16SI_FTYPE_V16SF): Ditto.
(V16SF_FTYPE_FLOAT): Ditto.
(V8DF_FTYPE_DOUBLE): Ditto.
(V8DI_FTYPE_INT64): Ditto.
(V8DI_FTYPE_V4DI): Ditto.
(V16QI_FTYPE_V8DI): Ditto.
(UINT_FTYPE_V4SF): Ditto.
(UINT64_FTYPE_V4SF): Ditto.
(UINT_FTYPE_V2DF): Ditto.
(UINT64_FTYPE_V2DF): Ditto.
(V16SI_FTYPE_V16SI): Ditto.
(V8DI_FTYPE_V8DI): Ditto.
(V16SI_FTYPE_PV4SI): Ditto.
(V16SF_FTYPE_PV4SF): Ditto.
(V8DI_FTYPE_PV2DI): Ditto.
(V8DF_FTYPE_PV2DF): Ditto.
(V4DI_FTYPE_PV2DI): Ditto.
(V4DF_FTYPE_PV2DF): Ditto.
(V16SI_FTYPE_PV2SI): Ditto.
(V16SF_FTYPE_PV2SF): Ditto.
(V8DI_FTYPE_PV4DI): Ditto.
(V8DF_FTYPE_PV4DF): Ditto.
(V8SF_FTYPE_FLOAT): Ditto.
(V4SF_FTYPE_FLOAT): Ditto.
(V4DF_FTYPE_DOUBLE): Ditto.
(V8SF_FTYPE_PV4SF): Ditto.
(V8SI_FTYPE_PV4SI): Ditto.
(V4SI_FTYPE_PV2SI): Ditto.
(V8SF_FTYPE_PV2SF): Ditto.
(V8SI_FTYPE_PV2SI): Ditto.
(V16SF_FTYPE_PV8SF): Ditto.
(V16SI_FTYPE_PV8SI): Ditto.
(V8DI_FTYPE_V8SF): Ditto.
(V4DI_FTYPE_V4SF): Ditto.
(V2DI_FTYPE_V4SF): Ditto.
(V64QI_FTYPE_QI): Ditto.
(V32HI_FTYPE_HI): Ditto.
(V8UHI_FTYPE_V8UHI): Ditto.
(V16UHI_FTYPE_V16UHI): Ditto.
(V32UHI_FTYPE_V32UHI): Ditto.
(V2UDI_FTYPE_V2UDI): Ditto.
(V4UDI_FTYPE_V4UDI): Ditto.
(V8UDI_FTYPE_V8UDI): Ditto.
(V4USI_FTYPE_V4USI): Ditto.
(V8USI_FTYPE_V8USI): Ditto.
(V16USI_FTYPE_V16USI): Ditto.
(V2DF_FTYPE_V2DF_UINT64): Ditto.
(V2DI_FTYPE_V2DF_V2DF): Ditto.
(V2UDI_FTYPE_V4USI_V4USI): Ditto.
(V8DF_FTYPE_V8DF_V8DI): Ditto.
(V4SF_FTYPE_V4SF_UINT64): Ditto.
(V4SI_FTYPE_V4SF_V4SF): Ditto.
(V16SF_FTYPE_V16SF_V16SI): Ditto.
(V64QI_FTYPE_V32HI_V32HI): Ditto.
(V32HI_FTYPE_V16SI_V16SI): Ditto.
(V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI): Ditto.
(V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI): Ditto.
(V32HI_FTYPE_V64QI_V64QI): Ditto.
(V32HI_FTYPE_V32HI_V32HI): Ditto.
(V16HI_FTYPE_V16HI_V16HI_INT_V16HI_HI): Ditto.
(V16SI_FTYPE_V16SI_V4SI): Ditto.
(V16SI_FTYPE_V16SI_V16SI): Ditto.
(V16SI_FTYPE_V32HI_V32HI): Ditto.
(V16SI_FTYPE_V16SI_SI): Ditto.
(V8DI_FTYPE_V8DI_V8DI): Ditto.
(V4UDI_FTYPE_V8USI_V8USI): Ditto.
(V8DI_FTYPE_V16SI_V16SI): Ditto.
(V8DI_FTYPE_V8DI_V2DI): Ditto.
(QI_FTYPE_QI): Ditto.
(SI_FTYPE_SI): Ditto.
(DI_FTYPE_DI): Ditto.
(QI_FTYPE_QI_QI): Ditto.
(QI_FTYPE_QI_INT): Ditto.
(HI_FTYPE_HI_INT): Ditto.
(SI_FTYPE_SI_INT): Ditto.
(DI_FTYPE_DI_INT): Ditto.
(HI_FTYPE_V16QI_V16QI): Ditto.
(SI_FTYPE_V32QI_V32QI): Ditto.
(DI_FTYPE_V64QI_V64QI): Ditto.
(QI_FTYPE_V8HI_V8HI): Ditto.
(HI_FTYPE_V16HI_V16HI): Ditto.
(SI_FTYPE_V32HI_V32HI): Ditto.
(QI_FTYPE_V4SI_V4SI): Ditto.
(QI_FTYPE_V8SI_V8SI): Ditto.
(QI_FTYPE_V2DI_V2DI): Ditto.
(QI_FTYPE_V4DI_V4DI): Ditto.
(QI_FTYPE_V8DI_V8DI): Ditto.
(HI_FTYPE_V16SI_V16SI): Ditto.
(HI_FTYPE_V16SI_V16SI_INT_HI): Ditto.
(QI_FTYPE_V8DF_V8DF_INT_QI): Ditto.
(HI_FTYPE_V16SF_V16SF_INT_HI): Ditto.
(V32HI_FTYPE_V32HI_V32HI_V32HI): Ditto.
(V4SF_FTYPE_V4SF_V2DF_V4SF_QI):

Re: [wwwdocs] Mention -Wshift-overflow

2015-07-27 Thread Marek Polacek

On Mon, Jul 27, 2015 at 12:34:06AM +0200, Gerald Pfeifer wrote:
 Hi Marek,
 
 On Tue, 21 Jul 2015, Marek Polacek wrote:
  +liA new command-line option code-Wshift-overflow/code has been
  +   added for the C and C++ compilers, which warns about left shift
  +   overflows.  code-Wshift-overflow=2/code also warns about
  +   left-shifting 1 into the sign bit./li
 
 from what I can tell, this is enabled by default for modern 
 dialects of C and C++ (so many people will see new warnings 
 when upgrading)
 
 Shouldn't this be mentioned here?

Ok, fixed.  Thanks.

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.15
diff -u -r1.15 changes.html
--- changes.html21 Jul 2015 09:30:14 -  1.15
+++ changes.html27 Jul 2015 11:14:11 -
@@ -46,7 +46,8 @@
 liA new command-line option code-Wshift-overflow/code has been
added for the C and C++ compilers, which warns about left shift
overflows.  code-Wshift-overflow=2/code also warns about
-   left-shifting 1 into the sign bit./li
+   left-shifting 1 into the sign bit.  This warning is enabled by
+   default./li
   /ul
 
 h3 id=cC/h3

Marek

RFA: RL78: Add an optimization to the addsi3 pattern

2015-07-27 Thread Nick Clifton

Hi DJ,

  This patch adds a small optimization to the RL78 addsi3 pattern so
  that it can skip adding in the high part of a constant symbolic
  address when -mes0 is active.  Under these circumstances we know that
  the address has to be in low memory and that it is invalid to attempt
  to access anything in high memory relative to this symbol.

  Tested with no regressions on an rl78-elf toolchain.

  OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2015-07-27  Nick Clifton  ni...@redhat.com

* config/rl78/rl78.c (rl78_addsi3_internal): New function.
Optimizes the case where -mes0 is active and a constant symbolic
address is used.
* config/rl78/rl78-protos.h: Prototype the new function.
* config/rl78/rl78.md (addsi3_internal_real): Call new function.

Index: gcc/config/rl78/rl78-protos.h
===
--- gcc/config/rl78/rl78-protos.h   (revision 226240)
+++ gcc/config/rl78/rl78-protos.h   (working copy)
@@ -18,6 +18,7 @@
along with GCC; see the file COPYING3.  If not see
http://www.gnu.org/licenses/.  */
 
+const char *rl78_addsi3_internal (rtx *, unsigned int);
 void   rl78_emit_eh_epilogue (rtx);
 void   rl78_expand_compare (rtx *);
 void   rl78_expand_movsi (rtx *);
Index: gcc/config/rl78/rl78.c
===
--- gcc/config/rl78/rl78.c  (revision 226240)
+++ gcc/config/rl78/rl78.c  (working copy)
@@ -4638,6 +4641,34 @@
 
   return res;
 }
+
+const char *
+rl78_addsi3_internal (rtx * operands, unsigned int alternative)
+{
+  /* Issue 5905133:
+ If we are adding in a constant symbolic address when -mes0
+ is active then we know that the address must be 64K and
+ that it is invalid to access anything above 64K relative to
+ this address.  So we can skip adding in the high bytes.  */
+  if (TARGET_ES0
+   GET_CODE (operands[2]) == SYMBOL_REF
+   TREE_CODE (SYMBOL_REF_DECL (operands[2])) == VAR_DECL
+   TREE_READONLY (SYMBOL_REF_DECL (operands[2]))
+   ! TREE_SIDE_EFFECTS (SYMBOL_REF_DECL (operands[2])))
+return movw ax, %h1\n\taddw ax, %h2\n\tmovw %h0, ax;
+
+  switch (alternative)
+{
+case 0:
+case 1:
+  return movw ax, %h1\n\taddw ax, %h2\n\tmovw %h0, ax\n\tmovw ax, 
%H1\n\tsknc\n\tincw ax\n\taddw ax, %H2\n\tmovw %H0, ax;
+case 2:
+  return movw ax, %h1\n\taddw ax,%h2\n\tmovw bc, ax\n\tmovw ax, 
%H1\n\tsknc\n\tincw ax\n\taddw ax, %H2\n\tmovw %H0, ax\n\tmovw ax, bc\n\tmovw 
%h0, ax;
+default:
+  gcc_unreachable ();
+}
+}
+
 
 #undef  TARGET_PREFERRED_RELOAD_CLASS
 #define TARGET_PREFERRED_RELOAD_CLASS rl78_preferred_reload_class
Index: gcc/config/rl78/rl78.md
===
--- gcc/config/rl78/rl78.md (revision 226240)
+++ gcc/config/rl78/rl78.md (working copy)
@@ -243,10 +243,7 @@
(clobber (reg:HI BC_REG))
]
   rl78_real_insns_ok ()
-  @
-   movw ax,%h1 \;addw ax,%h2 \;movw %h0, ax \;movw ax,%H1 \;sknc \;incw ax 
\;addw ax,%H2 \;movw %H0,ax
-   movw ax,%h1 \;addw ax,%h2 \;movw %h0, ax \;movw ax,%H1 \;sknc \;incw ax 
\;addw ax,%H2 \;movw %H0,ax
-   movw ax,%h1 \;addw ax,%h2 \;movw bc,  ax \;movw ax,%H1 \;sknc \;incw ax 
\;addw ax,%H2 \;movw %H0,ax \;movw ax,bc \;movw %h0, ax
+  { return rl78_addsi3_internal (operands, which_alternative); }
   [(set_attr valloc macax)]
 )

Re: [PATCH 3/16][ARM] Add float16x4_t intrinsics

2015-07-27 Thread Alan Lawrence


Ramana Radhakrishnan wrote:

I haven't seen the patch yet but here are my thoughts on where this should be 
going.

Thus in summary - 


1. -mfpu=neon implies the presence of the float16x(4/8) types and all the 
intrinsics that treat these values as bags of bits.
2. -mfpu=neon-fp16 implies the presence of the vcvt* intrinsics that are needed 
for the float16 types.


So I think the problems are statements in ACLE that

(a) we should only have float16x(4/8)_t types, when we have scalar types as 
well;

(b) whenever we have a scalar __fp16 type, we should have one or other of the 
__ARM_FP16_FORMAT_(IEEE/ALTERNATIVE) macros defined to indicate the format 
that's in use.


Sadly these seem to forbid the current situation whereby we expose hardware 
conversion instructions (that work with either fp16 format, according to the 
status of the FPSCR bit) and allow compiling a binary that will work with either 
format :(.


The situation is further complicated by GCC's support for the alternative format 
(not mandated by ACLE), that we can support either format in the absence of any 
hardware (as we have software emulation routines for scalar conversions in 
either format, as long as we know which at compile time), and that object files 
compiled with different -mfp16-format cannot be linked together (the ABI 
attributes conflict).


However, I think we can still go with Ramana's point 1. albeit _only_when_ a 
-mfp16-format is specified. _2_ similarly (i.e. -mfpu=neon-fp16 will not provide 
any additional intrinsics unless an -mfp16-format is specified).


I'll repost the patch series shortly with those changes implemented. In the 
meantime: are patches 1  2 ( ARM __builtin_arm_neon_lane_bounds and 
qualifier_lane_index) OK to commit? These contain nothing float16-specific, and 
would unblock Charles Baylis' work on PR63870 
(https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00545.html). I'll ping the 
AArch64 changes separately.


Cheers, Alan

Re: [gomp4] Additional tests for routine directive

2015-07-27 Thread Thomas Schwinge

Hi!

On Fri, 24 Jul 2015 15:43:36 -0500, James Norris jnor...@codesourcery.com 
wrote:
 The attached patch adds additional test for the routine
 directive for C/C++/Fortran.
 
 Committed to gomp-4_0-branch.

Thanks, but I see a number of FAILs, including the following:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
UNRESOLVED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 compilation failed to produce 
executable

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is , should 
match foo not found

FAIL: libgomp.oacc-fortran/routine-8.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  (test for excess errors)
UNRESOLVED: libgomp.oacc-fortran/routine-8.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  compilation failed to produce executable
[same for other torture testing flags]

PASS: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1  -O0  (test for excess errors)
PASS: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1  -O0  execution test
FAIL: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1  -O0  output pattern test, is , should match not found
[same for other torture testing flags]

(I have not reviewed your test case changes.)


Grüße,
 Thomas


pgp899Yl6vG_G.pgp
Description: PGP signature

Re: [gomp4] acc routines bugfix

2015-07-27 Thread Thomas Schwinge

Hi Cesar!

On Fri, 24 Jul 2015 08:05:00 -0700, Cesar Philippidis ce...@codesourcery.com 
wrote:
 [...] I couldn't
 think of a way to test the lto error message because that involves
 having two compilers present. I wonder if it's ok to have libgomp check
 for compiler expected compiler errors? However, that's more of a
 gcc/testsuite type of check.

Yes, such tests need to live in libgomp, even if they're compilation
tests and not execution tests: the libgomp testsuite is the only place in
the GCC testsuite where offloading (compilation) testing is possible.


 applied this patch to gomp-4_0-branch for now.

Thanks!


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: OMP. More constification

2015-07-27 Thread Nathan Sidwell


On 07/27/15 04:23, Jakub Jelinek wrote:

On Sun, Jul 26, 2015 at 11:01:09AM -0400, Nathan Sidwell wrote:

I found some more missing consts.  The size, kind, var and function arrays
emitted by omp-low are read only, but are not so marked.  This patch


First of all, the hostaddrs array is going to be written by the library
side for GOMP_MAP_FIRSTPRIVATE, so can't be const and shouldn't be const on
the compiler emitted side either.  Similarly for the use_device_ptr clause
on GOMP_target_data* handling.
The size array is const from the library side, sure, but on the compiler
side is only conditionally static (static only if all the values written to
it are INTEGER_CSTs), so not sure if we want to make it use const qualified
type unconditionally, rather than only change it to the const qualified type
if it is TREE_STATIC at the end.
The kinds array I'm afraid might very well soon follow the size array model.


Ok, thanks for the explanation.  I'm puzzled though about writing the sizes and 
kinds arrays.  They're global object, so that doesn't seem thread-safe?  Or is 
it ill-formed to have multiple host threads execute the same potentially 
offloaded function concurrently?


nathan

[PATCH][AArch64] Change aarch64 vector cost to match vectorizer

2015-07-27 Thread Pawel Kupidura


 Hello,

I've applied changes for AArch64 vectorizer config suggested by Richard 
Biener (here https://gcc.gnu.org/ml/gcc/2015-04/msg00086.html) and 
checked the effect on spec2006 and geekbench benchmarks.


Benchmarks were compiled with and without patch, in both cases with and 
without '-mcpu=cortex-a57'.


As it turns out, there is no effect of this change on generated code. 
With cortex-a57 flag, mentioned loop is not vectorized before and after 
the patch. Binaries for benchmarks test cases are identical. Without the 
flag loop is vectorized in both cases.


However, this change makes aarch64 vector costs consistent vectorizer 
code, while approach based on loop nesting level clearly underestimates 
cost of those statements.


Thanks,
Pawel

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 10df325..ffafc3f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-07-27  Pawel Kupidura pawel.kupid...@arm.com
+
+* config/aarch64/aarch64.c: Changed inner loop statement cost
+to be consistent with vectorizer code.
+
 2015-07-26  Uros Bizjak ubiz...@gmail.com

 * config/alpha/alpha.c: Use SUBREG_P predicate.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 020f63c..3b6f8c5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7079,15 +7079,9 @@ aarch64_add_stmt_cost (void *data, int count, 
enum vect_cost_for_stmt kind,


   /* Statements in an inner loop relative to the loop being
  vectorized are weighted more heavily.  The value here is
- a function (linear for now) of the loop nest level.  */
+ arbitrary and could potentially be improved with analysis.  */
   if (where == vect_body  stmt_info  stmt_in_inner_loop_p 
(stmt_info))

-{
-  loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
-  struct loop *loop =  LOOP_VINFO_LOOP (loop_info);
-  unsigned nest_level = loop_depth (loop);
-
-  count *= nest_level;
-}
+count *= 50; /* FIXME */

   retval = (unsigned) (count * stmt_cost);
   cost[where] += retval;

[PATCH] Honour DriverOnly for enum values in error messages.

2015-07-27 Thread Dominik Vogt

The attached patch fixes a glicht in the error message generated
for invalid values of enum options.  When a DriverOnly option
was passed directoy top the compiler, it would still list that as
valid in the error message, e.g. on s390:

  $ cc1 ... -march=native
  cc1: error: unrecognized argument in option ‘-march=native’
  cc1: note: valid arguments to ‘-march=’ are: g5 g6 native z10 z13 z196 z9-109 
z9-ec z900 z990 zEC12

The patched code prints DriverOnly enum values only when the
driver generates the error message.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
/gcc/ChangeLog

* opts-common.c (read_cmdline_option): List DriverOnly enum values as
valid only in the error message of the driver, not in the messages of
the language compilers.
From 6d59f56f33804e70ab5be6ced734ed2e95eeea7b Mon Sep 17 00:00:00 2001
From: Dominik Vogt v...@linux.vnet.ibm.com
Date: Mon, 27 Jul 2015 12:39:30 +0100
Subject: [PATCH] Honour DriverOnly for enum values in error messages.

---
 gcc/opts-common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index 8e51974..3bcbaf1 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -1079,6 +1079,8 @@ read_cmdline_option (struct gcc_options *opts,
   p = s;
   for (i = 0; e-values[i].arg != NULL; i++)
 	{
+	  if (!enum_arg_ok_for_language (e-values[i], lang_mask))
+	continue;
 	  size_t arglen = strlen (e-values[i].arg);
 	  memcpy (p, e-values[i].arg, arglen);
 	  p[arglen] = ' ';
-- 
2.3.0

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-27 Thread Marek Polacek

On Fri, Jul 24, 2015 at 03:10:17PM -0600, Jeff Law wrote:
 OK.  Note I think the pt.c change will need trivial updating as it conflicts
 with another change of yours that I approved recently.

Committed after resolving trivial merge conflict.

Thanks for review,

Marek

Re: [PATCH][RFT] Reduce gimple-match.c compile-time(?)

2015-07-27 Thread Richard Biener

On Fri, 24 Jul 2015, Richard Biener wrote:

 
 The following patch implements the simplest approach of splitting
 the huge functions in gimple-match.c (not yet generic-match.c).
...
 Can people who reported huge compile-times test this patch
 and report back?

Positive feedback on IRC and thus I am applying the following
simplified variant.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2015-07-27  Richard Biener  rguent...@suse.de

* genmatch.c (decision_tree::gen_gimple): Split out large
subtrees into separate functions.
(decision_tree::gen_generic): Likewise.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 226240)
+++ gcc/genmatch.c  (working copy)
@@ -2951,6 +2964,32 @@ decision_tree::gen_gimple (FILE *f)
 
   for (unsigned n = 1; n = 3; ++n)
 {
+  /* First generate split-out functions.  */
+  for (unsigned i = 0; i  root-kids.length (); i++)
+   {
+ dt_operand *dop = static_castdt_operand *(root-kids[i]);
+ expr *e = static_castexpr *(dop-op);
+ if (e-ops.length () != n)
+   continue;
+
+ fprintf (f, \nstatic bool\n
+  gimple_simplify_%s (code_helper *res_code, tree *res_ops,\n
+   gimple_seq *seq, tree (*valueize)(tree) 
+  ATTRIBUTE_UNUSED,\n
+   code_helper ARG_UNUSED (code), tree 
+  ARG_UNUSED (type)\n,
+  e-operation-id);
+ for (unsigned i = 0; i  n; ++i)
+   fprintf (f, , tree op%d, i);
+ fprintf (f, )\n);
+ fprintf (f, {\n);
+ dop-gen_kids (f, 2, true);
+ fprintf (f,   return false;\n);
+ fprintf (f, }\n);
+   }
+
+  /* Then generate the main entry with the outermost switch and
+ tail-calls to the split-out functions.  */
   fprintf (f, \nstatic bool\n
   gimple_simplify (code_helper *res_code, tree *res_ops,\n
gimple_seq *seq, tree (*valueize)(tree),\n
@@ -2976,10 +3015,11 @@ decision_tree::gen_gimple (FILE *f)
fprintf (f, case %s%s:\n,
 is_a fn_id * (e-operation) ? - : ,
 e-operation-id);
- fprintf (f, {\n);
- dop-gen_kids (f, 8, true);
- fprintf (f,   break;\n);
- fprintf (f, }\n);
+ fprintf (f,   return gimple_simplify_%s (res_code, res_ops, 
+  seq, valueize, code, type, e-operation-id);
+ for (unsigned i = 0; i  n; ++i)
+   fprintf (f, , op%d, i);
+ fprintf (f, );\n);
}
   fprintf (f,   default:;\n
}\n);
@@ -3003,6 +3043,34 @@ decision_tree::gen_generic (FILE *f)
 
   for (unsigned n = 1; n = 3; ++n)
 {
+  /* First generate split-out functions.  */
+  for (unsigned i = 0; i  root-kids.length (); i++)
+   {
+ dt_operand *dop = static_castdt_operand *(root-kids[i]);
+ expr *e = static_castexpr *(dop-op);
+ if (e-ops.length () != n
+ /* Builtin simplifications are somewhat premature on
+GENERIC.  The following drops patterns with outermost
+calls.  It's easy to emit overloads for function code
+though if necessary.  */
+ || e-operation-kind != id_base::CODE)
+   continue;
+
+ fprintf (f, \nstatic tree\n
+  generic_simplify_%s (location_t ARG_UNUSED (loc), enum 
+  tree_code ARG_UNUSED (code), tree ARG_UNUSED (type),
+  e-operation-id);
+ for (unsigned i = 0; i  n; ++i)
+   fprintf (f, , tree op%d, i);
+ fprintf (f, )\n);
+ fprintf (f, {\n);
+ dop-gen_kids (f, 2, false);
+ fprintf (f,   return NULL_TREE;\n);
+ fprintf (f, }\n);
+   }
+
+  /* Then generate the main entry with the outermost switch and
+ tail-calls to the split-out functions.  */
   fprintf (f, \ntree\n
   generic_simplify (location_t loc, enum tree_code code, 
   tree type ATTRIBUTE_UNUSED);
@@ -3030,10 +3098,11 @@ decision_tree::gen_generic (FILE *f)
fprintf (f, CASE_CONVERT:\n);
  else
fprintf (f, case %s:\n, e-operation-id);
- fprintf (f, {\n);
- dop-gen_kids (f, 8, false);
- fprintf (f,   break;\n
- }\n);
+ fprintf (f,   return generic_simplify_%s (loc, code, type,
+  e-operation-id);
+ for (unsigned i = 0; i  n; ++i)
+   fprintf (f, , op%d, i);
+ fprintf (f, );\n);
}
   fprintf (f, default:;\n
  }\n);

Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly

2015-07-27 Thread Richard Biener

On Mon, 27 Jul 2015, Richard Biener wrote:

 
 I noticed that the code we generate for a simple gimple_assign_rhs1 (stmt)
 is quite bad as we have two checking pieces.  The implementation is now
 
  static inline tree
  gimple_assign_rhs1 (const_gimple gs)
  {
GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
return gimple_op (gs, 1);
  }
 
 and the hidden checking is due to gimple_op being
 
 static inline tree *
 gimple_ops (gimple gs)
 {
   size_t off;
 
   /* All the tuples have their operand vector at the very bottom
  of the structure.  Note that those structures that do not
  have an operand vector have a zero offset.  */
   off = gimple_ops_offset_[gimple_statement_structure (gs)];
   gcc_gimple_checking_assert (off != 0);
 
   return (tree *) ((char *) gs + off);
 
 (ugly in itself considering that we just verified we have a gassign
 which has an operand vector as member...).  gimple_op incurs two
 additional memory reference to compute gimple_statement_structure
 and gimple_ops_offset_ plus the checking bits.
 
 The simplest thing would be to use
 
   GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
   return as_a const gassign * (gs)-op[1];
 
 but that's not optimal either because we check we have a gassign
 again (not optimized in stage1).  So I decided to invent a fancy
 as_a which reports errors similar to GIMPLE_CHECK and makes the
 code look like
 
   const gassign *ass = GIMPLE_CHECK2const gassign * (gs);
   return ass-op[1];
 
 for some reason I needed to overload is_a_helper for const_gimple
 (I thought the is_a machinery would transparently support qualified
 types?).
 
 I'm missing a fallback for !ENABLE_GIMPLE_CHECKING as well
 as an overload for 'gimple' I guess.  I just changed
 gimple_assign_rhs1 for now.
 
 So this is a RFC.
 
 I understand in the end the whole GCC may use gassign everywhere
 we access gimple_assign_rhs1 but I don't see this happening soon
 and this simple proof-of-concept patch already reduces unoptimized
 text size of gimple-match.c from 836080 to 836359 (too bad) and
 optimized from 585731 to 193190 bytes (yay!) on x86_64 using trunk.
 Some inlining must go very differently - we just have 544 calls to
 gimple_assign_rhs1 in gimple-match.c.  .optimizes shows all calls
 remaining as
 
   bb 89:
   _ZL18gimple_assign_rhs1PK21gimple_statement_base.part.32 (def_stmt_47);
 
 which is
 
 tree_node* gimple_assign_rhs1(const_gimple) (const struct 
 gimple_statement_base * gs)
 {
   bb 2:
   gimple_check_failed (gs_1(D), 
 /space/rguenther/tramp3d/trunk/gcc/gimple.h[0], 2381, 
 gimple_assign_rhs1[0], 6, 0);
 
 }
 
 so eventually we just do a better job with profile estimation.

Ok, numbers are off because of a typo I introduced late.

 +  if (ret)
 +gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK);

should be if (!ret) of course ...

Optimized code size after the patch is 588878 (we still avoid
a lot of code and the checking fails are still split out).  Not sure
where the code size increase is from.  Unoptimized code size
after the patch is 836233.

Richard.

 I think inliner-wise we are even as we get rid of the gimple_op ()
 inline but instead get the GIMPLE_CHECK2 inline.
 
 So - any comments?
 
 Thanks,
 Richard.
 
 2015-07-27  Richard Biener  rguent...@suse.de
 
   * gimple.h (GIMPLE_CHECK2): New inline template function.
   (gassign::code_): New constant static member.
   (is_a_helperconst gassign *): Add.
   (gimple_assign_rhs1): Use GIMPLE_CHECK2 and a cheaper way
   to access operand 1.
   * gimple.c (gassign::code_): Define.
 
 Index: gcc/gimple.h
 ===
 --- gcc/gimple.h  (revision 226240)
 +++ gcc/gimple.h  (working copy)
 @@ -51,6 +51,16 @@ extern void gimple_check_failed (const_g
gimple_check_failed (__gs, __FILE__, __LINE__, __FUNCTION__,   \
  (CODE), ERROR_MARK); \
} while (0)
 +template typename T
 +static inline T
 +GIMPLE_CHECK2(const_gimple gs, const char *file = __builtin_FILE (),
 +   int line = __builtin_LINE (), const char *fun = 
 __builtin_FUNCTION ())
 +{
 +  T ret = dyn_cast T (gs);
 +  if (ret)
 +gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK);
 +  return ret;
 +}
  #else  /* not ENABLE_GIMPLE_CHECKING  */
  #define gcc_gimple_checking_assert(EXPR) ((void)(0  (EXPR)))
  #define GIMPLE_CHECK(GS, CODE)   (void)0
 @@ -832,6 +842,7 @@ struct GTY((tag(GSS_WITH_OPS)))
  struct GTY((tag(GSS_WITH_MEM_OPS)))
gassign : public gimple_statement_with_memory_ops
  {
 +  static const enum gimple_code code_ = GIMPLE_ASSIGN;
/* no additional fields; this uses the layout for GSS_WITH_MEM_OPS. */
  };
  
 @@ -864,6 +875,14 @@ is_a_helper gassign *::test (gimple gs
  template 
  template 
  inline bool
 +is_a_helper const gassign *::test (const_gimple gs)
 +{
 +  return gs-code == GIMPLE_ASSIGN;
 +}
 +
 +template 
 +template 
 +inline bool

Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly

2015-07-27 Thread Richard Biener

On Mon, 27 Jul 2015, Richard Biener wrote:

 On Mon, 27 Jul 2015, Richard Biener wrote:
 
  
  I noticed that the code we generate for a simple gimple_assign_rhs1 (stmt)
  is quite bad as we have two checking pieces.  The implementation is now
  
   static inline tree
   gimple_assign_rhs1 (const_gimple gs)
   {
 GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
 return gimple_op (gs, 1);
   }
  
  and the hidden checking is due to gimple_op being
  
  static inline tree *
  gimple_ops (gimple gs)
  {
size_t off;
  
/* All the tuples have their operand vector at the very bottom
   of the structure.  Note that those structures that do not
   have an operand vector have a zero offset.  */
off = gimple_ops_offset_[gimple_statement_structure (gs)];
gcc_gimple_checking_assert (off != 0);
  
return (tree *) ((char *) gs + off);
  
  (ugly in itself considering that we just verified we have a gassign
  which has an operand vector as member...).  gimple_op incurs two
  additional memory reference to compute gimple_statement_structure
  and gimple_ops_offset_ plus the checking bits.
  
  The simplest thing would be to use
  
GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
return as_a const gassign * (gs)-op[1];
  
  but that's not optimal either because we check we have a gassign
  again (not optimized in stage1).  So I decided to invent a fancy
  as_a which reports errors similar to GIMPLE_CHECK and makes the
  code look like
  
const gassign *ass = GIMPLE_CHECK2const gassign * (gs);
return ass-op[1];
  
  for some reason I needed to overload is_a_helper for const_gimple
  (I thought the is_a machinery would transparently support qualified
  types?).
  
  I'm missing a fallback for !ENABLE_GIMPLE_CHECKING as well
  as an overload for 'gimple' I guess.  I just changed
  gimple_assign_rhs1 for now.
  
  So this is a RFC.
  
  I understand in the end the whole GCC may use gassign everywhere
  we access gimple_assign_rhs1 but I don't see this happening soon
  and this simple proof-of-concept patch already reduces unoptimized
  text size of gimple-match.c from 836080 to 836359 (too bad) and
  optimized from 585731 to 193190 bytes (yay!) on x86_64 using trunk.
  Some inlining must go very differently - we just have 544 calls to
  gimple_assign_rhs1 in gimple-match.c.  .optimizes shows all calls
  remaining as
  
bb 89:
_ZL18gimple_assign_rhs1PK21gimple_statement_base.part.32 (def_stmt_47);
  
  which is
  
  tree_node* gimple_assign_rhs1(const_gimple) (const struct 
  gimple_statement_base * gs)
  {
bb 2:
gimple_check_failed (gs_1(D), 
  /space/rguenther/tramp3d/trunk/gcc/gimple.h[0], 2381, 
  gimple_assign_rhs1[0], 6, 0);
  
  }
  
  so eventually we just do a better job with profile estimation.
 
 Ok, numbers are off because of a typo I introduced late.
 
  +  if (ret)
  +gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK);
 
 should be if (!ret) of course ...
 
 Optimized code size after the patch is 588878 (we still avoid
 a lot of code and the checking fails are still split out).  Not sure
 where the code size increase is from.  Unoptimized code size
 after the patch is 836233.

And here is a more complete variant (covering all op accesses of
gimple assigns).  stage2 cc1plus size drops from

   textdata bss dec hex filename
24970991  73776 1392952 264377191936857 
/abuild/rguenther/obj2/prev-gcc/cc1plus

to

  textdata bss dec hex filename
24809991  69608 1388536 26268135190d1e7 
../obj/prev-gcc/cc1plus

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Thanks,
Richard.

2015-07-27  Richard Biener  rguent...@suse.de

* gimple.h (GIMPLE_CHECK2): New inline template function.
(gassign::code_): New constant static member.
(is_a_helperconst gassign *): Add.
(gimple_assign_lhs): Use GIMPLE_CHECK2 and a cheaper way
to access the operand.
(gimple_assign_lhs_ptr): Likewise.
(gimple_assign_set_lhs): Likewise.
(gimple_assign_rhs1, gimple_assign_rhs1_ptr, gimple_assign_set_rhs1):
Likewise.
(gimple_assign_rhs2, gimple_assign_rhs2_ptr, gimple_assign_set_rhs2):
Likewise.
(gimple_assign_rhs3, gimple_assign_rhs3_ptr, gimple_assign_set_rhs3):
Likewise.
(gimple_assign_rhs_code): Likewise.
* gimple.c (gassign::code_): Define.


Index: gcc/gimple.h
===
--- gcc/gimple.h(revision 226240)
+++ gcc/gimple.h(working copy)
@@ -51,9 +51,57 @@ extern void gimple_check_failed (const_g
   gimple_check_failed (__gs, __FILE__, __LINE__, __FUNCTION__, \
   (CODE), ERROR_MARK); \
   } while (0)
+template typename T
+static inline T
+GIMPLE_CHECK2(const_gimple gs,
+#if __GNUC__  4 || (__GNUC__ == 4  __GNUC_MINOR__ = 8)
+ const char *file = __builtin_FILE

Re: OMP. More constification

2015-07-27 Thread Jakub Jelinek

On Mon, Jul 27, 2015 at 07:33:59AM -0400, Nathan Sidwell wrote:
 On 07/27/15 04:23, Jakub Jelinek wrote:
 On Sun, Jul 26, 2015 at 11:01:09AM -0400, Nathan Sidwell wrote:
 I found some more missing consts.  The size, kind, var and function arrays
 emitted by omp-low are read only, but are not so marked.  This patch
 
 First of all, the hostaddrs array is going to be written by the library
 side for GOMP_MAP_FIRSTPRIVATE, so can't be const and shouldn't be const on
 the compiler emitted side either.  Similarly for the use_device_ptr clause
 on GOMP_target_data* handling.
 The size array is const from the library side, sure, but on the compiler
 side is only conditionally static (static only if all the values written to
 it are INTEGER_CSTs), so not sure if we want to make it use const qualified
 type unconditionally, rather than only change it to the const qualified type
 if it is TREE_STATIC at the end.
 The kinds array I'm afraid might very well soon follow the size array model.
 
 Ok, thanks for the explanation.  I'm puzzled though about writing the sizes
 and kinds arrays.  They're global object, so that doesn't seem thread-safe?
 Or is it ill-formed to have multiple host threads execute the same
 potentially offloaded function concurrently?

They are only global objects if they are filled with constants.
Right know for kinds array that is true always, for sizes array if no VLAs
nor variable length array sections are used.
They are initially TREE_STATIC (var) = 1; but we later on clear it
with TREE_STATIC (var) = 0; if it is constant (this is still before the
decls are finalized).

Jakub

Re: OMP. More constification

2015-07-27 Thread Nathan Sidwell


On 07/27/15 07:43, Jakub Jelinek wrote:

On Mon, Jul 27, 2015 at 07:33:59AM -0400, Nathan Sidwell wrote:



They are only global objects if they are filled with constants.
Right know for kinds array that is true always, for sizes array if no VLAs
nor variable length array sections are used.
They are initially TREE_STATIC (var) = 1; but we later on clear it
with TREE_STATIC (var) = 0; if it is constant (this is still before the
decls are finalized).


Ok, I'd missed that.  thanks.

Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing

2015-07-27 Thread Jonathan Wakely


On 26/07/15 05:20 -0700, Tim Shen wrote:

@@ -389,7 +391,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
#endif
  }

-  void
+  _StringT
  _M_add_collating_element(const _StringT __s)
  {
auto __st = _M_traits.lookup_collatename(__s.data(),


Changing this return type is an ABI change. When compiled with older
versions of GCC this function will not return anything, so code
compiled with newer versions that links to that older code will read
garbage off the stack if linked to the old version.

(This isn't a problem for _M_expression_term because the return type
is part of the mangled name for function templates).

One solution would be to rename the function, so that code compiled
with the new GCC will use the new function, not link to the old
version that doesn't return anything.

[RFC AArch64][PR 63304] Handle literal pools for functions 1 MiB in size.

2015-07-27 Thread Ramana Radhakrishnan

Hi,

This patch appears to fix the issue in PR63304 where we have
functions that are  1MiB. The idea is to use adrp / ldr or adrp / add
instructions to address the literal pools under the use of a command line
option. The patch as attached turns this feature on by default as
that is the mode I've used for testing.

My thought is that we turn this on by default on trunk but keep this
disabled by default for the release branches in order to get some
serious testing for this feature while it bakes on trunk.

As a follow-up I would like to try and see if estimate_num_insns or
something else can give us a heuristic to turn this on for large functions.
After all the number of incidences of this are quite low in real life,
so may be we should look to restrict this use as much as possible on the
grounds that this code generation implies an extra integer register for
addressing for every floating point and vector constant and I don't think
that's great in code that already may have high register pressure.

Tested on aarch64-none-elf with no regressions.
Bootstrapped and regression tested on
aarch64-none-linux-gnu. Additionally a test run of SPEC2k showed
no issues other than a miscompare for eon which also
appeared in the base run. Thus for now I'm confident this is
reasonably sane for the minute.

I will also note that this patch will also need rebasing on top of Kyrill's
work with target attributes and thus cannot be final for trunk - however this
version is still applicable for all release branches.

More testing is still underway but in the meanwhile I'd like to
put this up for some comments please.

regards
Ramana

DATE  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

PR target/63304
* config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Handle
nopcrelative_literal_loads.
(aarch64_classify_address): Likewise.
(aarch64_constant_pool_reload_icode): Define.
(aarch64_secondary_reload): Handle secondary reloads for
literal pools.
(aarch64_override_options): Handle nopcrelative_literal_loads.
(aarch64_classify_symbol): Handle nopcrelative_literal_loads.
* config/aarch64/aarch64.md (aarch64_reload_movcpALLTF:modeP:mode):
Define.
(aarch64_reload_movcpVALL:modeP:mode): Likewise.
* config/aarch64/aarch64.opt: New option mnopc-relative-literal-loads
* config/aarch64/predicates.md (aarch64_constant_pool_symref): New
predicate.
* doc/invoke.texi (mnopc-relative-literal-loads): Document.
---
 gcc/config/aarch64/aarch64.c | 102 +--
 gcc/config/aarch64/aarch64.md|  26 ++
 gcc/config/aarch64/aarch64.opt   |   4 ++
 gcc/config/aarch64/predicates.md |   4 ++
 gcc/doc/invoke.texi  |   8 +++
 5 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 020f63c..f37a031 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1612,11 +1612,27 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
  aarch64_emit_move (dest, base);
  return;
}
+
  mem = force_const_mem (ptr_mode, imm);
  gcc_assert (mem);
+
+ /* If we aren't generating PC relative literals, then
+we need to expand the literal pool access carefully.
+This is something that needs to be done in a number
+of places, so could well live as a separate function.  */
+ if (nopcrelative_literal_loads)
+   {
+ gcc_assert (can_create_pseudo_p ());
+ base = gen_reg_rtx (ptr_mode);
+ aarch64_expand_mov_immediate (base, XEXP (mem, 0));
+ mem = gen_rtx_MEM (ptr_mode, base);
+   }
+
  if (mode != ptr_mode)
mem = gen_rtx_ZERO_EXTEND (mode, mem);
+
  emit_insn (gen_rtx_SET (dest, mem));
+
  return;
 
 case SYMBOL_SMALL_TLSGD:
@@ -3728,9 +3744,10 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
  rtx sym, addend;
 
  split_const (x, sym, addend);
- return (GET_CODE (sym) == LABEL_REF
- || (GET_CODE (sym) == SYMBOL_REF
-  CONSTANT_POOL_ADDRESS_P (sym)));
+ return ((GET_CODE (sym) == LABEL_REF
+  || (GET_CODE (sym) == SYMBOL_REF
+   CONSTANT_POOL_ADDRESS_P (sym)
+   !nopcrelative_literal_loads)));
}
   return false;
 
@@ -4918,12 +4935,69 @@ aarch64_legitimize_reload_address (rtx *x_p,
 }
 
 
+/* Return the reload icode required for a constant pool in mode.  */
+static enum insn_code
+aarch64_constant_pool_reload_icode (machine_mode mode)
+{
+  switch (mode)
+{
+case SFmode:
+  return CODE_FOR_aarch64_reload_movcpsfdi;
+
+case DFmode:
+  return CODE_FOR_aarch64_reload_movcpdfdi;
+
+case TFmode:
+  return

Re: Fix logic error in Fortran OpenACC parsing

2015-07-27 Thread Thomas Schwinge

Hi!

On Fri, 8 May 2015 14:24:15 +0300, Ilmir Usmanov i.usma...@samsung.com wrote:
 On 06.05.2015 14:38, Thomas Schwinge wrote:
  On Tue, 5 May 2015 15:38:03 -0400, David Malcolm dmalc...@redhat.com 
  wrote:
  On Wed, 2015-04-29 at 14:10 +0200, Mikael Morin wrote:
  Le 29/04/2015 02:02, David Malcolm a écrit :
  diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
  index 2c7c554..30e4eab 100644
  --- a/gcc/fortran/parse.c
  +++ b/gcc/fortran/parse.c
  @@ -4283,7 +4283,7 @@ parse_oacc_structured_block (gfc_statement acc_st)
   unexpected_eof ();
  else if (st != acc_end_st)
   gfc_error (Expecting %s at %C, gfc_ascii_statement 
  (acc_end_st));
  -reject_statement ();
  +  reject_statement ();
}
  while (st != acc_end_st);

  I think this one is a bug; there should be braces around 'gfc_error' and
  'reject_statement'.
 If 'st' is 'acc_end_st', as it shall be, the statement is rejected. So, 
 this is a bug.
 
 
  At least that's the pattern in 'parse_oacc_loop', and how the
  'unexpected_statement' function is used.
  FWIW, Jeff had approved that patch, so I've committed the patch to trunk
  (as r222823), making the indentation reflect the block structure.
 
  Thomas:  should the
 reject_statement ();
  call in the above be guarded by the
else if (st != acc_end_st)
  clause?
  Indeed, this seems to be a bug that has been introduced very early in the
  OpenACC Fortran front end development -- see how the
  parse_oacc_structured_block function evolved in the patches posted in
  http://news.gmane.org/find-root.php?message_id=%3C52E1595D.907%40samsung.com%3E
  and following (Ilmir, CCed just in case).  I also see that the
  corresponding OpenMP code, parse_omp_structured_block, just calls
  unexpected_statement, which Ilmir's initial patch also did, but at some
  point, he then changed this to the current code: gfc_error followed by
  reject_statement, as cited above -- I would guess for the reason to get a
  better error message?  (Tobias, should this thus also be done for OpenMP,
  and/or extend unexpected_statement accordingly?)
 That's true.
 I've checked abandoned openacc-1_0-branch and I used 
 unexpected_statement there (there still odd *_acc_* naming presents 
 instead of new-and-shiny *_oacc_* one), but, as you mentioned, I've 
 changed this for better error reporting... and introduced the bug.
 
 
  And then, I'm a bit confused: is it OK that despite this presumed logic
  error, which affects all (?) valid executions of this parsing code, we're
  not running into any issues with the OpenACC Fortran front end test
  cases?
 I think, this is OK, since this is an !$ACC END _smth_ statement and it 
 shall not present in the AST. So, it is abandoned later anyway ;)  (if I 
 remember correctly, during gfc_clear_new_st call). Although the bug does 
 not affect the logic, it is still a bug.
 
  OK for trunk?
  From my point of view, OK.

Thanks for your review!  Nobody else had any comments, so I have now
committed my patch to trunk in r226246:

commit 1ed4ddb2615c807c239e1b3eb214655a4cc87f1a
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Mon Jul 27 14:26:41 2015 +

Fix logic error in Fortran OpenACC parsing

gcc/fortran/
* parse.c (parse_oacc_structured_block): Fix logic error.
Reported by Mikael Morin mikael.mo...@sfr.fr.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@226246 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/fortran/ChangeLog |5 +
 gcc/fortran/parse.c   |6 --
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git gcc/fortran/ChangeLog gcc/fortran/ChangeLog
index 0ed6b9b..e5b7681 100644
--- gcc/fortran/ChangeLog
+++ gcc/fortran/ChangeLog
@@ -1,3 +1,8 @@
+2015-07-27  Thomas Schwinge  tho...@codesourcery.com
+
+   * parse.c (parse_oacc_structured_block): Fix logic error.
+   Reported by Mikael Morin mikael.mo...@sfr.fr.
+
 2015-07-24  Mikael Morin  mik...@gcc.gnu.org
 
PR fortran/64986
diff --git gcc/fortran/parse.c gcc/fortran/parse.c
index 45ad63f..04b4c80 100644
--- gcc/fortran/parse.c
+++ gcc/fortran/parse.c
@@ -4383,8 +4383,10 @@ parse_oacc_structured_block (gfc_statement acc_st)
   if (st == ST_NONE)
unexpected_eof ();
   else if (st != acc_end_st)
-   gfc_error (Expecting %s at %C, gfc_ascii_statement (acc_end_st));
-  reject_statement ();
+   {
+ gfc_error (Expecting %s at %C, gfc_ascii_statement (acc_end_st));
+ reject_statement ();
+   }
 }
   while (st != acc_end_st);
 


Grüße,
 Thomas


pgp8lg16qnyQJ.pgp
Description: PGP signature

Re: [gomp4] fiuxup openacc default handling

2015-07-27 Thread Tom de Vries


On 26/07/15 19:09, Nathan Sidwell wrote:

I've committed this update to my earlier breakout of default handling.
After complaining about something because of 'none', we should fall
through to the default handling, to prevent ICEing later (on patch
seriesI'm working on).  This matches the OMP default handling.  Also
tweaked the setting of GOVD_ flags slightly, to make the firstprivate
handling I'm working on less invasive.



Hi,

this causes PR 67027 - [gomp4] FAIL: gfortran.dg/goacc/modules.f95 -O 
(internal compiler error).


Thanks,
- Tom


nathan

gomp4-gimp-fix.patch


2015-07-26  Nathan Sidwellnat...@codesourcery.com

* gimplify.c (oacc_default_clause): Fallthrough to unspecified
handling.  Propagate mapping from outer scope.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 226226)
+++ gcc/gimplify.c  (working copy)
@@ -5930,7 +5930,7 @@ oacc_default_clause (struct gimplify_omp
   DECL_NAME (lang_hooks.decls.omp_report_decl (decl)), rkind);
error_at (ctx-location, enclosing OpenACC %s construct, rkind);
}
-break;
+  /* FALLTHRU.  */

  case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
{
@@ -5944,33 +5944,39 @@ oacc_default_clause (struct gimplify_omp
  continue;
if (!(octx-region_type  (ORT_TARGET_DATA | ORT_TARGET)))
  break;
-   if (splay_tree_lookup (octx-variables, (splay_tree_key) decl))
+ splay_tree_node n2
+   = splay_tree_lookup (octx-variables, (splay_tree_key) decl);
+ if (n2)
+   {
+ flags |= n2-value  GOVD_MAP;
  goto found_outer;
+   }
  }
  }

-   {
- tree type = TREE_TYPE (decl);
- /*  Should this  be REFERENCE_TYPE_P? */
- if (POINTER_TYPE_P (type))
-   type = TREE_TYPE (type);
+   if (is_global_var (decl)  device_resident_p (decl))
+ flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;
+   /* Scalars under kernels are default 'copy'.  */
+   else if (ctx-acc_region_kind == ARK_KERNELS)
+ flags |= GOVD_FORCE_MAP | GOVD_MAP;
+   else if (ctx-acc_region_kind == ARK_PARALLEL)
+ {
+   tree type = TREE_TYPE (decl);
+
+   /*  Should this  be REFERENCE_TYPE_P? */
+   if (POINTER_TYPE_P (type))
+ type = TREE_TYPE (type);

- /* For OpenACC regions, array and aggregate variables
-default to present_or_copy, while scalar variables
-by default are firstprivate (gang-local) in parallel.  */
- if (!AGGREGATE_TYPE_P (type))
-   {
- if (is_global_var (decl)  device_resident_p (decl))
-   flags |= GOVD_MAP_TO_ONLY;
- else if (ctx-acc_region_kind == ARK_PARALLEL)
-   flags |= (GOVD_GANGLOCAL | GOVD_MAP_TO_ONLY);
- /* Scalars under kernels are default 'copy'.  */
- else if (ctx-acc_region_kind == ARK_KERNELS)
-   flags |= GOVD_FORCE_MAP;
- else
-   gcc_unreachable ();
-   }
+   if (AGGREGATE_TYPE_P (type))
+ /* Aggregates default to 'copy'.  This should really
+include GOVD_FORCE_MAP.  */
+ flags |= GOVD_MAP;
+   else
+ /* Scalars default tp 'firstprivate'.  */
+ flags |= GOVD_GANGLOCAL | GOVD_MAP_TO_ONLY | GOVD_MAP;
  }
+   else
+ gcc_unreachable ();
found_outer:;
}
break;
@@ -6020,7 +6026,8 @@ omp_notice_variable (struct gimplify_omp

  if (is_oacc)
flags = oacc_default_clause (ctx, decl, in_code, flags);
- flags |= GOVD_MAP;
+ else
+   flags |= GOVD_MAP;

  if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (decl), is_oacc))
{

Re: [gomp4] fiuxup openacc default handling

2015-07-27 Thread Nathan Sidwell


On 07/27/15 11:21, Tom de Vries wrote:

On 26/07/15 19:09, Nathan Sidwell wrote:

I've committed this update to my earlier breakout of default handling.
After complaining about something because of 'none', we should fall
through to the default handling, to prevent ICEing later (on patch
seriesI'm working on).  This matches the OMP default handling.  Also
tweaked the setting of GOVD_ flags slightly, to make the firstprivate
handling I'm working on less invasive.



Hi,

this causes PR 67027 - [gomp4] FAIL: gfortran.dg/goacc/modules.f95 -O (internal
compiler error).


Will take a look ...

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-27 Thread Kyrill Tkachov



On 27/07/15 16:59, Kyrill Tkachov wrote:

Hi Marek,

On 27/07/15 13:41, Marek Polacek wrote:

On Fri, Jul 24, 2015 at 03:10:17PM -0600, Jeff Law wrote:

OK.  Note I think the pt.c change will need trivial updating as it conflicts
with another change of yours that I approved recently.

Committed after resolving trivial merge conflict.

This caused an arm bootstrap failure with -Werror.
https://gcc.gnu.org/bugzilla/process_bug.cgi


Err, the correct link is:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67030



Thanks,
Kyrill


Thanks for review,

Marek

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-27 Thread Jeff Law


On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:

I experimented with resource.c and the roadblock I hit is that it
seems to have an assumption that it operates on hard regs (in fact
the struct it uses to describe the resources has a HARD_REG_SET for
the regs) and so it triggers various HARD_REGISTER_P asserts when I
try to use the functions there. if-conversion runs before register
allocation, so we're dealing with pseudos here.

Sigh.  resource.c probably isn't going to be useful then.


My other attempt was to go over BB_A and mark the set registers in a
 bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
we return false. This seemed to do the job and testing worked out ok.
That would require one walk over BB_A, one walk over BB_B but I don't
know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?
I think the latter is reasonable.  Ultimately we have to do a full look 
at those rtxs, so it's unavoidable to some extent.


The only other possibility would be to use the DF framework.  I'm not 
sure if it's even initialized for the ifcvt code.  If it is, then you 
might be able to avoid some of the walking of the insns and instead walk 
the DF structures.





snip

It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have not seen
it cause any trouble. If it fails then back in noce_try_cmove_arith
we goto end_seq_and_fail which ends the sequence and throws it away
(and cancels if-conversion down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not
fail, but it seems that was a bad assumption on my part.So
given noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.


I'll look into that, but what is the machinery by which one moves
insns?
I don't think we have any good generic machinery for this.  I think 
every pass that needs this capability unlinks the insn from the chain 
and patches it back in at the new location.


jeff

RE: [PATCH][AArch64] Improve spill code - swap order in shl pattern

2015-07-27 Thread Wilco Dijkstra

ping

 -Original Message-
 From: Wilco Dijkstra [mailto:wdijk...@arm.com]
 Sent: 27 April 2015 14:37
 To: GCC Patches
 Subject: [PATCH][AArch64] Improve spill code - swap order in shl pattern

 Various instructions are supported as integer operations as well as SIMD on 
 AArch64. When
 register pressure is high, lra-constraints inserts spill code without taking 
 the allocation
 class into account, and basically chooses the first available pattern that 
 matches. Since this
 instruction has the SIMD version first it is usually chosen eventhough some 
 of the operands
 are eventually allocated to integer registers. The result is inefficient code 
 not only due to
 the higher latency of SIMD instructions but also due to the extra int-FP 
 moves. Placing the
 integer variant first in the shl pattern generates far more optimal spill 
 code. A few more
 patterns are the wrong way around, which I'll address in a separate patch. 
 I'm also looking
 into fixing lra-constraints to generate the expected code by taking the 
 allocno class into
 account in the cost calculations during spilling.

 2015-04-27  Wilco Dijkstra  wdijk...@arm.com

 * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_mode3):
 Place integer variant first.

 ---
  gcc/config/aarch64/aarch64.md | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index 7163025..baef56a 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -3334,17 +3334,17 @@

  ;; Logical left shift using SISD or Integer instruction
  (define_insn *aarch64_ashl_sisd_or_int_mode3
 -  [(set (match_operand:GPI 0 register_operand =w,w,r)
 +  [(set (match_operand:GPI 0 register_operand =r,w,w)
  (ashift:GPI
 -  (match_operand:GPI 1 register_operand w,w,r)
 -  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode 
 Uscmode,w,rUscmode)))]
 +  (match_operand:GPI 1 register_operand r,w,w)
 +  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode 
 rUscmode,Uscmode,w)))]

@
 +   lsl\t%w0, %w1, %w2
 shl\t%rtn0vas, %rtn1vas, %2
 -   ushl\t%rtn0vas, %rtn1vas, %rtn2vas
 -   lsl\t%w0, %w1, %w2
 -  [(set_attr simd yes,yes,no)
 -   (set_attr type neon_shift_immq, neon_shift_regq,shift_reg)]
 +   ushl\t%rtn0vas, %rtn1vas, %rtn2vas
 +  [(set_attr simd no,yes,yes)
 +   (set_attr type shift_reg,neon_shift_immq, neon_shift_regq)]
  )

  ;; Logical right shift using SISD or Integer instruction
 --
 1.9.1

Re: [gomp4, fortran] Patch to fix continuation checks of OpenACC and OpenMP directives

2015-07-27 Thread Thomas Schwinge

Hi!

On Tue, 30 Jun 2015 03:39:42 +0300, Ilmir Usmanov m...@ilmir.us wrote:
 08.06.2015, 17:59, Cesar Philippidis ce...@codesourcery.com:
  On 06/07/2015 02:05 PM, Ilmir Usmanov wrote:
   08.06.2015, 00:01, Ilmir Usmanov m...@ilmir.us:
   This patch fixes checks of OpenMP and OpenACC continuations in
   case if someone mixes them (i.e. continues OpenMP directive with
   !$ACC sentinel or vice versa).

Thanks for working on this!

   OK for gomp branch?

The same applies to GCC trunk, as far as I can tell -- any reason not to
apply the patch to trunk?

  Thanks for working on this. Does this fix PR63858 by any chance?
 No problem. I had a feeling that something is wrong in the scanner since 
 I've committed an initial support of OpenACC ver. 1.0 to gomp branch 
 (more than a year ago).
 Now it does fix the PR, because I've added support of fixed form to the 
 patch. BTW, your test in the PR has a wrong continuation. Fixed test 
 added to the patch.

I understand correctly that your patch does fix
https://gcc.gnu.org/PR63858, right?  In that case, please assign the PR
to yourself, and close it as fixed, once it is fixed in trunk.

 --- a/gcc/fortran/ChangeLog.gomp
 +++ b/gcc/fortran/ChangeLog.gomp
 @@ -1,3 +1,13 @@
 +2015-06-30  Ilmir Usmanov  m...@ilmir.us
 +
 + PR/63858

The format to use is PR [component]/[number], so here: PR fortran/63858.

 --- a/gcc/testsuite/ChangeLog.gomp
 +++ b/gcc/testsuite/ChangeLog.gomp
 @@ -1,3 +1,10 @@
 +2015-06-30  Ilmir Usmanov  m...@ilmir.us
 +
 + PR/63858

Same.

 --- /dev/null
 +++ b/gcc/testsuite/gfortran.dg/goacc/omp-fixed.f
 @@ -0,0 +1,32 @@
 +! { dg-do compile }
 +! { dg-additional-options -fopenmp }

For C, C++, we have a goacc-gomp directory, where both OpenACC and OpenMP
are enabled -- does that make sense to have for Fortran, too?


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly

2015-07-27 Thread Michael Matz

Hi,

On Mon, 27 Jul 2015, Richard Biener wrote:

static inline tree
gimple_assign_rhs1 (const_gimple gs)
{
  GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
  return gimple_op (gs, 1);
}
   
   and the hidden checking is due to gimple_op being
   
   static inline tree *
   gimple_ops (gimple gs)
   {
 size_t off;
   
 /* All the tuples have their operand vector at the very bottom
of the structure.  Note that those structures that do not
have an operand vector have a zero offset.  */
 off = gimple_ops_offset_[gimple_statement_structure (gs)];
 gcc_gimple_checking_assert (off != 0);

I once experimented with making the multiple checking and removing the 
actual memory accesses by exposing the real content of gss_for_code_[] 
(for gss_for_code(), used by gimple_statement_structure()) and 
gimple_ops_offset_[] to the inlined functions, i.e. by defining them in 
gimple.h, not just declaring them.

Still wouldn't help the -O0 case, but with -O2 only the first check for 
being GIMPLE_ASSING remains, everything else is constant folded.

   The simplest thing would be to use
   
 GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
 return as_a const gassign * (gs)-op[1];
   
   but that's not optimal either because we check we have a gassign
   again (not optimized in stage1).  So I decided to invent a fancy
   as_a which reports errors similar to GIMPLE_CHECK and makes the
   code look like

I'm confused.  as_a already does checking:
as_a (U *p)
{
  gcc_checking_assert (is_a T (p));
  return is_a_helper T::cast (p);
}

(were the is_a variant here has the same test as GIMPLE_CHECK).
So, if you have the as_a call, you can as well simply remove the 
GIMPLE_CHECK, and be done with it, without ...

 const gassign *ass = GIMPLE_CHECK2const gassign * (gs);

... introducing this.  What am I missing?


Ciao,
Michael.

Re: [PATCH 2/16][ARM] PR/63870 Add __builtin_arm_lane_check.

2015-07-27 Thread Alan Lawrence


Kyrill Tkachov wrote:

Hi Alan,

Can you please add a comment on top of this saying that this builtin only 
exists to perform the
lane check, just to make it explicit for the future.


Done, and pushed as r226252.

Charles, thanks for your patience, and I hope this lets you move forwards. I 
realize this leaves you slightly picking up the pieces in terms of what's 
there and what isn't, but in theory it corresponds to what we've done previously 
in AArch64 - give me a shout if you are unclear how to proceed!


Thanks, Alan

Re: [PATCH][AArch64] Improve spill code - swap order in shr patterns

2015-07-27 Thread James Greenhalgh

On Mon, Jul 27, 2015 at 03:38:55PM +0100, Wilco Dijkstra wrote:
 This is basically the same as the shl version but for the shr patterns.
 
 Various instructions are supported as integer operations as well as SIMD on 
 AArch64. When
 register pressure is high, lra-constraints inserts spill code without taking 
 the allocation
 class into account, and basically chooses the first available pattern that 
 matches. Since this
 instruction has the SIMD version first it is usually chosen eventhough some 
 of the operands
 are eventually allocated to integer registers. The result is inefficient code 
 not only due to
 the higher latency of SIMD instructions but also due to the extra int-FP 
 moves. Placing the
 integer variant first in the shr pattern generates far more optimal spill 
 code.
 
 2015-07-27  Wilco Dijkstra  wdijk...@arm.com
 
 * gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_mode3):
 Place integer variant first.  (aarch64_ashr_sisd_or_int_mode3):
 Likewise.

This ChangeLog looks odd to me. It should include details of the
new alternative added to aarch64_lshr_sis_or_int_mode3, and also
be formatted as so:

* gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_mode3):
Place integer variant first.
(aarch64_ashr_sisd_or_int_mode3): Likewise.

Otherwise, this patch looks OK to me, so I've applied it on your behalf
as revision 226253.

Thanks,
James

 ---
  gcc/config/aarch64/aarch64.md | 27 ++-
  1 file changed, 14 insertions(+), 13 deletions(-)
 
 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index baef56a..06256e3 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -3349,17 +3349,18 @@
  
  ;; Logical right shift using SISD or Integer instruction
  (define_insn *aarch64_lshr_sisd_or_int_mode3
 -  [(set (match_operand:GPI 0 register_operand =w,w,r)
 +  [(set (match_operand:GPI 0 register_operand =r,w,w,w)
  (lshiftrt:GPI
 -  (match_operand:GPI 1 register_operand w,w,r)
 -  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode 
 Uscmode,w,rUscmode)))]
 +  (match_operand:GPI 1 register_operand r,w,w,w)
 +  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode 
 rUscmode,Uscmode,w,0)))]

@
 +   lsr\t%w0, %w1, %w2
 ushr\t%rtn0vas, %rtn1vas, %2
 #
 -   lsr\t%w0, %w1, %w2
 -  [(set_attr simd yes,yes,no)
 -   (set_attr type neon_shift_immq,neon_shift_regq,shift_reg)]
 +   #
 +  [(set_attr simd no,yes,yes,yes)
 +   (set_attr type 
 shift_reg,neon_shift_immq,neon_shift_regq,neon_shift_regq)]
  )
  
  (define_split
 @@ -3394,18 +3395,18 @@
  
  ;; Arithmetic right shift using SISD or Integer instruction
  (define_insn *aarch64_ashr_sisd_or_int_mode3
 -  [(set (match_operand:GPI 0 register_operand =w,w,w,r)
 +  [(set (match_operand:GPI 0 register_operand =r,w,w,w)
  (ashiftrt:GPI
 -  (match_operand:GPI 1 register_operand w,w,w,r)
 -  (match_operand:QI 2 aarch64_reg_or_shift_imm_di 
 Uscmode,w,0,rUscmode)))]
 +  (match_operand:GPI 1 register_operand r,w,w,w)
 +  (match_operand:QI 2 aarch64_reg_or_shift_imm_di 
 rUscmode,Uscmode,w,0)))]

@
 +   asr\t%w0, %w1, %w2
 sshr\t%rtn0vas, %rtn1vas, %2
 #
 -   #
 -   asr\t%w0, %w1, %w2
 -  [(set_attr simd yes,yes,yes,no)
 -   (set_attr type 
 neon_shift_immq,neon_shift_regq,neon_shift_regq,shift_reg)]
 +   #
 +  [(set_attr simd no,yes,yes,yes)
 +   (set_attr type 
 shift_reg,neon_shift_immq,neon_shift_regq,neon_shift_regq)]
  )
  
  (define_split
 -- 
 1.9.1

Re: [PATCH] Refactoring masked built-in decls

2015-07-27 Thread Uros Bizjak

On Mon, Jul 27, 2015 at 2:46 PM, Petr Murzin petrmurz...@gmail.com wrote:
 Hello,
 This patch converts mask type for masked builtins from signed to
 unsigned. Furthermore, several redundant builtins definitions were
 removed. Please have a look. It it ok for trunk?

I gave the patch a quick look and didn't find anything problematic.

Assuming the patch is bootstrapped and properly regression tested, it
is OK for trunk.

Thanks,
Uros.

 2015-07-27  Petr Murzin  petr.mur...@intel.com

 * config/i386/i386.c
 (bdesc_special_args): Convert mask type from signed to unsigned for
 masked builtins.
 (ix86_expand_args_builtin): Do not handle UINT_FTYPE_V2DF,
 UINT64_FTYPE_V2DF, UINT64_FTYPE_V4SF, V16QI_FTYPE_V8DI,
 V16HI_FTYPE_V16SI, V16SI_FTYPE_V16SI, V16SF_FTYPE_FLOAT,
 V8HI_FTYPE_V8DI, V8UHI_FTYPE_V8UHI, V8SI_FTYPE_V8DI, V8SF_FTYPE_V8DF,
 V8DI_FTYPE_INT64, V8DI_FTYPE_V4DI, V8DI_FTYPE_V8DI, V8DF_FTYPE_DOUBLE,
 V8DF_FTYPE_V8SI, V16SI_FTYPE_V16SI_V16SI, V16SF_FTYPE_V16SF_V16SI,
 V8DI_FTYPE_V8DI_V8DI, V8DF_FTYPE_V8DF_V8DI, V4SI_FTYPE_V4SF_V4SF,
 V4SF_FTYPE_V4SF_UINT64, V2UDI_FTYPE_V4USI_V4USI, V2DI_FTYPE_V2DF_V2DF,
 V2DF_FTYPE_V2DF_UINT64, V4UDI_FTYPE_V8USI_V8USI, QI_FTYPE_V8DI_V8DI,
 HI_FTYPE_V16SI_V16SI, HI_FTYPE_HI_INT, V16SF_FTYPE_V16SF_V16SF_V16SF,
 V16SF_FTYPE_V16SF_V16SI_V16SF, V16SF_FTYPE_V16SI_V16SF_HI,
 V16SF_FTYPE_V16SI_V16SF_V16SF, V16SI_FTYPE_V16SF_V16SI_HI,
 V8DI_FTYPE_V8SF_V8DI_QI, V8SF_FTYPE_V8DI_V8SF_QI, V8DI_FTYPE_PV4DI,
 V8DF_FTYPE_V8DI_V8DF_QI, V16SI_FTYPE_V16SI_V16SI_V16SI,
 V2DI_FTYPE_V2DI_V2DI_V2DI, V8DI_FTYPE_V8DF_V8DI_QI, V8DF_FTYPE_PV4DF,
 V8SI_FTYPE_V8SI_V8SI_V8SI, V8DF_FTYPE_V8DF_V8DF_V8DF, UINT_FTYPE_V4SF,
 V8DF_FTYPE_V8DF_V8DI_V8DF, V8DF_FTYPE_V8DI_V8DF_V8DF,
 V8DF_FTYPE_V8SF_V8DF_QI, V8DI_FTYPE_V8DI_V8DI_V8DI, V16SF_FTYPE_PV4SF,
 V8SF_FTYPE_V8DF_V8SF_QI, V8SI_FTYPE_V8DF_V8SI_QI, V16SI_FTYPE_PV4SI,
 V2DF_FTYPE_V2DF_V4SF_V2DF_QI, V4SF_FTYPE_V4SF_V2DF_V4SF_QI,
 V8DI_FTYPE_V8DI_SI_V8DI_V8DI, QI_FTYPE_V8DF_V8DF_INT_QI,
 HI_FTYPE_V16SF_V16SF_INT_HI, V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI,
 VOID_FTYPE_PDOUBLE_V2DF_QI, VOID_FTYPE_PFLOAT_V4SF_QI,
 V2DF_FTYPE_PCDOUBLE_V2DF_QI, V4SF_FTYPE_PCFLOAT_V4SF_QI.
 * config/i386/i386-builtin-types.def
 (V16QI_FTYPE_V16SI): Remove.
 (V8DF_FTYPE_V8SI): Ditto.
 (V8HI_FTYPE_V8DI): Ditto.
 (V8SI_FTYPE_V8DI): Ditto.
 (V8SF_FTYPE_V8DF): Ditto.
 (V8SF_FTYPE_V8DF_V8SF_QI): Ditto.
 (V16HI_FTYPE_V16SI): Ditto.
 (V16SF_FTYPE_V16HI): Ditto.
 (V16SF_FTYPE_V16HI_V16SF_HI): Ditto.
 (V16SF_FTYPE_V16SI): Ditto.
 (V4DI_FTYPE_V4DI): Ditto.
 (V16SI_FTYPE_V16SF): Ditto.
 (V16SF_FTYPE_FLOAT): Ditto.
 (V8DF_FTYPE_DOUBLE): Ditto.
 (V8DI_FTYPE_INT64): Ditto.
 (V8DI_FTYPE_V4DI): Ditto.
 (V16QI_FTYPE_V8DI): Ditto.
 (UINT_FTYPE_V4SF): Ditto.
 (UINT64_FTYPE_V4SF): Ditto.
 (UINT_FTYPE_V2DF): Ditto.
 (UINT64_FTYPE_V2DF): Ditto.
 (V16SI_FTYPE_V16SI): Ditto.
 (V8DI_FTYPE_V8DI): Ditto.
 (V16SI_FTYPE_PV4SI): Ditto.
 (V16SF_FTYPE_PV4SF): Ditto.
 (V8DI_FTYPE_PV2DI): Ditto.
 (V8DF_FTYPE_PV2DF): Ditto.
 (V4DI_FTYPE_PV2DI): Ditto.
 (V4DF_FTYPE_PV2DF): Ditto.
 (V16SI_FTYPE_PV2SI): Ditto.
 (V16SF_FTYPE_PV2SF): Ditto.
 (V8DI_FTYPE_PV4DI): Ditto.
 (V8DF_FTYPE_PV4DF): Ditto.
 (V8SF_FTYPE_FLOAT): Ditto.
 (V4SF_FTYPE_FLOAT): Ditto.
 (V4DF_FTYPE_DOUBLE): Ditto.
 (V8SF_FTYPE_PV4SF): Ditto.
 (V8SI_FTYPE_PV4SI): Ditto.
 (V4SI_FTYPE_PV2SI): Ditto.
 (V8SF_FTYPE_PV2SF): Ditto.
 (V8SI_FTYPE_PV2SI): Ditto.
 (V16SF_FTYPE_PV8SF): Ditto.
 (V16SI_FTYPE_PV8SI): Ditto.
 (V8DI_FTYPE_V8SF): Ditto.
 (V4DI_FTYPE_V4SF): Ditto.
 (V2DI_FTYPE_V4SF): Ditto.
 (V64QI_FTYPE_QI): Ditto.
 (V32HI_FTYPE_HI): Ditto.
 (V8UHI_FTYPE_V8UHI): Ditto.
 (V16UHI_FTYPE_V16UHI): Ditto.
 (V32UHI_FTYPE_V32UHI): Ditto.
 (V2UDI_FTYPE_V2UDI): Ditto.
 (V4UDI_FTYPE_V4UDI): Ditto.
 (V8UDI_FTYPE_V8UDI): Ditto.
 (V4USI_FTYPE_V4USI): Ditto.
 (V8USI_FTYPE_V8USI): Ditto.
 (V16USI_FTYPE_V16USI): Ditto.
 (V2DF_FTYPE_V2DF_UINT64): Ditto.
 (V2DI_FTYPE_V2DF_V2DF): Ditto.
 (V2UDI_FTYPE_V4USI_V4USI): Ditto.
 (V8DF_FTYPE_V8DF_V8DI): Ditto.
 (V4SF_FTYPE_V4SF_UINT64): Ditto.
 (V4SI_FTYPE_V4SF_V4SF): Ditto.
 (V16SF_FTYPE_V16SF_V16SI): Ditto.
 (V64QI_FTYPE_V32HI_V32HI): Ditto.
 (V32HI_FTYPE_V16SI_V16SI): Ditto.
 (V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI): Ditto.
 (V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI): Ditto.
 (V32HI_FTYPE_V64QI_V64QI): Ditto.
 (V32HI_FTYPE_V32HI_V32HI): Ditto.
 (V16HI_FTYPE_V16HI_V16HI_INT_V16HI_HI): Ditto.
 (V16SI_FTYPE_V16SI_V4SI): Ditto.
 (V16SI_FTYPE_V16SI_V16SI): Ditto.
 (V16SI_FTYPE_V32HI_V32HI): Ditto.
 (V16SI_FTYPE_V16SI_SI): Ditto.
 (V8DI_FTYPE_V8DI_V8DI): Ditto.
 (V4UDI_FTYPE_V8USI_V8USI): Ditto.
 (V8DI_FTYPE_V16SI_V16SI): Ditto.
 (V8DI_FTYPE_V8DI_V2DI): Ditto.
 (QI_FTYPE_QI): Ditto.
 (SI_FTYPE_SI): Ditto.
 (DI_FTYPE_DI): Ditto.
 (QI_FTYPE_QI_QI): Ditto.
 (QI_FTYPE_QI_INT): Ditto.
 (HI_FTYPE_HI_INT): Ditto.
 (SI_FTYPE_SI_INT): Ditto.
 (DI_FTYPE_DI_INT): Ditto.
 (HI_FTYPE_V16QI_V16QI): Ditto.
 (SI_FTYPE_V32QI_V32QI): Ditto.
 (DI_FTYPE_V64QI_V64QI): Ditto.
 (QI_FTYPE_V8HI_V8HI): Ditto.
 (HI_FTYPE_V16HI_V16HI): Ditto.
 (SI_FTYPE_V32HI_V32HI): Ditto.

Re: [PATCH][RFC][match.pd] optimize (X C) == N when C is power of 2

2015-07-27 Thread Segher Boessenkool

On Mon, Jul 27, 2015 at 09:11:12AM +0100, Kyrill Tkachov wrote:
 On 25/07/15 03:19, Segher Boessenkool wrote:
 On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:
 This transformation folds (X % C) == N into
 X  ((1  (size - 1)) | (C - 1))) == N
 for constants C and N where N is positive and C is a power of 2.
 For N = 0 you can transform it to
 
  ((unsigned)X % C) == N
 
 and for 0  N  C you can transform it to
 
  X  0  ((unsigned)X % C) == N  (or X = 0)
 
 and for -C  N  0 it is
 
  X  0  ((unsigned)X % C) == N + C  (or X = 0)
 
 and for other N it is
 
  0.
 
 For N not a constant, well, do you really care?  :-)
 
 (That second case might eventually fold to your original expression).
 
 Yeah, these avoid the potentially expensive mask,

Fun fact: the current code ends up using the exact same mask, for some
targets.

 but introduce more operations,
 which I believe may not be desirable at this stage.

It is getting rid of the (expensive) division/modulo.  In many cases it
could get rid of the sign test, or hoist it to some outer structure, hard
to test here though (at least, I have no idea how to do that).

 Unless these transformations are ok for match.pd I'll try to implement this 
 transformation
 at RTL expansion time.

If you have to do conditional jumps, the RTL optimisers will not be able
to do very much :-(


Segher

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-27 Thread Kyrill Tkachov


Hi Marek,

On 27/07/15 13:41, Marek Polacek wrote:

On Fri, Jul 24, 2015 at 03:10:17PM -0600, Jeff Law wrote:

OK.  Note I think the pt.c change will need trivial updating as it conflicts
with another change of yours that I approved recently.

Committed after resolving trivial merge conflict.


This caused an arm bootstrap failure with -Werror.
https://gcc.gnu.org/bugzilla/process_bug.cgi

Thanks,
Kyrill


Thanks for review,

Marek

Re: [PATCH 2/16][ARM] PR/63870 Add __builtin_arm_lane_check.

2015-07-27 Thread Kyrill Tkachov


Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01334.html


+  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
+{
+  tree nlanes = CALL_EXPR_ARG (exp, 0);
+  gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+  rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+  if (CONST_INT_P (lane_idx))
+   neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+  else
+   error (%Klane index must be a constant immediate, exp);
+  /* Don't generate any RTL.  */
+  return const0_rtx;
+}

Can you please add a comment on top of this saying that this builtin only 
exists to perform the
lane check, just to make it explicit for the future.

Ok with a comment added.
Thanks,
Kyrill

Re: [PATCH 1/16][ARM] PR/63870 Add qualifier to check lane bounds in expand

2015-07-27 Thread Kyrill Tkachov


Hi Alan,

On 07/07/15 13:34, Alan Lawrence wrote:

As per https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01333.html

(While this falls under PR/63870, and I will link to that in the ChangeLog, it
is only a small step towards fixing that PR.)


This is ok for trunk.
Thanks,
Kyrill

Re: [gomp4] Remove device-specific filtering during parsing for OpenACC

2015-07-27 Thread Julian Brown

On Fri, 17 Jul 2015 14:57:14 +0200
Thomas Schwinge tho...@codesourcery.com wrote:

 In combination with the equivant change to
 gcc/cp/parser.c:cp_parser_oacc_all_clauses,
 gcc/c-family/c-omp.c:c_oacc_filter_device_types, and transitively also
 the struct identifier_hasher and c_oacc_extract_device_id function
 preceding it, are now unused.  (Not an exhaustive list; have not
 checked which other auxilliary functions etc. Cesar has added in his
 device_type changes.)  Does it make any sense to keep these for
 later, or dump them now?

The attached patch removes this dead code...

  --- a/gcc/c/c-typeck.c
  +++ b/gcc/c/c-typeck.c
  @@ -12568,6 +12568,10 @@ c_finish_omp_clauses (tree clauses, bool
  oacc) pc = OMP_CLAUSE_CHAIN (c);
continue;
   
  +case OMP_CLAUSE_DEVICE_TYPE:
  + pc = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
  + continue;
  +
  case OMP_CLAUSE_INBRANCH:
  case OMP_CLAUSE_NOTINBRANCH:
if (branch_seen)
 
 From a quick glance only, this seems to be different from the C++
 front end (have not checked Fortran).
 
 I have not looked at what the front end parsing is now actually
 doing; is it just attaching any clauses following a device_type
 clause to the latter?  (The same should be done for all front ends,
 obviously.  Even if it's not important right now, because of the
 sorry diagnostic that will be emitted later on as soon as there is
 one device_type clause, this should best be addressed now, while you
 still remember what's going on here ;-) so that there will be no bad
 surprises once we actually implement the handling in OMP
 lowering/streaming/device compilers.)
 
 Do we need manually need to take care to
 finalize (c_finish_omp_clauses et al.) such masked clause chains,
 or will the right thing happen automatically?

...and fixes the C and C++ frontend to finalize parsed
device_type clauses properly (although so far finalization doesn't do
anything for the clauses that can be associated with a device_type
clause anyway, so there's no actual change in behaviour).

I haven't moved the sorry reporting for the unsupported device_type
clause to scan_sharing_clauses because it doesn't seem to be
particularly a more logical place, and doing so breaks the tests that
scan the omp-low dumps.

I will apply to gomp4 branch as obvious, shortly.

Thanks,

Julian

ChangeLog

gcc/
* c-family/c-omp.c (c_oacc_extract_device_id, identifier_hasher)
(c_oacc_filter_device_types): Remove dead code.
* c/c-typeck.c (c_finish_omp_clauses): Add scanning for sub-clauses
of device_type clause.
* cp/semantics.c (finish_omp_clauses): Likewise.commit e24a9cd14d4b8b5dab8b37218b29844787809648
Author: Julian Brown jul...@codesourcery.com
Date:   Mon Jul 27 07:31:10 2015 -0700

Clause finalization cleanups and dead code removal.

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index b76de69..10190d7 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -1081,132 +1081,6 @@ c_omp_predetermined_sharing (tree decl)
   return OMP_CLAUSE_DEFAULT_UNSPECIFIED;
 }
 
-/* Return a numerical code representing the device_type.  Currently,
-   only device_type(nvidia) is supported.  All device_type parameters
-   are treated as case-insensitive keywords.  */
-
-static int
-c_oacc_extract_device_id (const char *device)
-{
-  if (!strcasecmp (device, nvidia))
-return GOMP_DEVICE_NVIDIA_PTX;
-  else if (!strcmp (device, *))
-return GOMP_DEVICE_DEFAULT;
-  return GOMP_DEVICE_NONE;
-}
-
-struct identifier_hasher : ggc_cache_ptr_hashtree_node
-{
-  static hashval_t hash (tree t) { return htab_hash_pointer (t); }
-  static bool equal (tree a, tree b)
-  {
-return !strcmp(IDENTIFIER_POINTER (a), IDENTIFIER_POINTER (b));
-  }
-};
-
-/* Filter out the list of unsupported OpenACC device_types.  */
-
-tree
-c_oacc_filter_device_types (tree clauses)
-{
-  tree c, prev;
-  tree dtype = NULL_TREE;
-  tree seen_nvidia = NULL_TREE;
-  tree seen_default = NULL_TREE;
-  hash_tableidentifier_hasher *dt_htab
-= hash_tableidentifier_hasher::create_ggc (10);
-
-  /* First scan for all device_type clauses.  */
-  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
-{
-  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
-	{
-	  tree t;
-
-	  for (t = OMP_CLAUSE_DEVICE_TYPE_DEVICES (c); t; t = TREE_CHAIN (t))
-	{
-	  if (dt_htab-find (t))
-		{
-		  error_at (OMP_CLAUSE_LOCATION (c),
-			duplicate device_type (%s),
-			IDENTIFIER_POINTER (t));
-		  goto filter_dtype;
-		}
-
-	  int code = c_oacc_extract_device_id (IDENTIFIER_POINTER (t));
-
-	  if (code == GOMP_DEVICE_DEFAULT)
-		seen_default = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
-	  else if (code == GOMP_DEVICE_NVIDIA_PTX)
-		seen_nvidia = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
-	  else
-		{
-		  /* The OpenACC technical committee advises compilers
-		 to silently ignore unknown devices.  */
-		}
-
-	  tree *slot = dt_htab-find_slot (t, INSERT);
-	  *slot = t;
-	}

Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly

2015-07-27 Thread Richard Biener

On July 27, 2015 5:18:55 PM GMT+02:00, Michael Matz m...@suse.de wrote:
Hi,

On Mon, 27 Jul 2015, Richard Biener wrote:

static inline tree
gimple_assign_rhs1 (const_gimple gs)
{
  GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
  return gimple_op (gs, 1);
}
   
   and the hidden checking is due to gimple_op being
   
   static inline tree *
   gimple_ops (gimple gs)
   {
 size_t off;
   
 /* All the tuples have their operand vector at the very bottom
of the structure.  Note that those structures that do not
have an operand vector have a zero offset.  */
 off = gimple_ops_offset_[gimple_statement_structure (gs)];
 gcc_gimple_checking_assert (off != 0);

I once experimented with making the multiple checking and removing the 
actual memory accesses by exposing the real content of gss_for_code_[] 
(for gss_for_code(), used by gimple_statement_structure()) and 
gimple_ops_offset_[] to the inlined functions, i.e. by defining them in

gimple.h, not just declaring them.

Still wouldn't help the -O0 case, but with -O2 only the first check for

being GIMPLE_ASSING remains, everything else is constant folded.

   The simplest thing would be to use
   
 GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
 return as_a const gassign * (gs)-op[1];
   
   but that's not optimal either because we check we have a gassign
   again (not optimized in stage1).  So I decided to invent a fancy
   as_a which reports errors similar to GIMPLE_CHECK and makes the
   code look like

I'm confused.  as_a already does checking:
as_a (U *p)
{
  gcc_checking_assert (is_a T (p));
  return is_a_helper T::cast (p);
}

(were the is_a variant here has the same test as GIMPLE_CHECK).
So, if you have the as_a call, you can as well simply remove the 
GIMPLE_CHECK, and be done with it, without ...

 const gassign *ass = GIMPLE_CHECK2const gassign * (gs);

... introducing this.  What am I missing?

It would remove the fancy checking fail message in favor of a simple ICE.

Otherwise nothing, of course.  Oh, and
the ability to separately turn on/off GIMPLE checking.

Richard.


Ciao,
Michael.

Re: offload data version number

2015-07-27 Thread Nathan Sidwell


On 07/24/15 09:32, Nathan Sidwell wrote:

On 07/21/15 11:21, Nathan Sidwell wrote:




I committed a version to gomp4 branch, but would still like to get this to trunk
ASAP.


I committed this update to the gomp4 branch to match the updated version 
currently under review for trunk.


nathan

2015-07-27  Nathan Sidwell  nat...@codesourcery.com

	include/
	* gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Restore to zero.

	libgomp/
	* libgomp.map: Rename GOMP_offload_{,un}register_2 to
	GOMP_offload_{,un}register_ver.
	* target.c (gomp_load_image_to_device): Check version function to
	determine interface to plugin.
	(gomp_unload_image_from_device): Likewise.
	(GOMP_offload_register_2): Rename to ...
	(GOMP_offload_register_ver): ... here.
	(GOMP_offload_unregister_2): Rename to ...
	(GOMP_offload_unregister_ver): ... here.
	(GOMP_offload_register, GOMP_offload_unregister): Adjust
	forwarding.
	(gomp_load_plugin_for_device): Reimplement DLSYM  DLSYM_OPT
	macros.  Look for versioning function and check it.  Fetch
	versioning loader and unloader if available.
	* libgomp.h (gomp_device_descr): Add version function field.  Put
	loader and unloader fields in unions.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_version): New.
	(GOMP_OFFLOAD_load_image_2): Renamed to ...
	(GOMP_OFFLOAD_load_image_ver): ... here.
	(GOMP_OFFLOAD_unload_image_2): Renamed to ...
	(GOMP_OFFLOAD_unload_image_ver): ... here.
	(GOMP_OFFLOAD_load_image, GOMP_OFFLOAD_unload_image): Delete.
	* oacc-host.c (host_dispatch): Adjust.

	gcc/
	* config/nvptx/mkoffload.c (process): Rename _2 functions to _ver.

Index: libgomp/libgomp.map
===
--- libgomp/libgomp.map	(revision 226237)
+++ libgomp/libgomp.map	(working copy)
@@ -236,8 +236,8 @@ GOMP_4.0.1 {
 
 GOMP_4.0.2 {
   global:
-	GOMP_offload_register_2;
-	GOMP_offload_unregister_2;
+	GOMP_offload_register_ver;
+	GOMP_offload_unregister_ver;
 } GOMP_4.0.1;
 
 OACC_2.0 {
Index: libgomp/target.c
===
--- libgomp/target.c	(revision 226237)
+++ libgomp/target.c	(working copy)
@@ -671,17 +671,17 @@ gomp_load_image_to_device (unsigned vers
   struct addr_pair *target_table = NULL;
   int i, num_target_entries;
 
-  if (devicep-load_image_2_func)
+  if (devicep-version_func)
 num_target_entries
-  = devicep-load_image_2_func (version, devicep-target_id,
-target_data, target_table);
+  = devicep-load_image.ver_func (version, devicep-target_id,
+	target_data, target_table);
   else if (GOMP_VERSION_DEV (version))
 gomp_fatal (Plugin too old for offload data (0  %u),
 		GOMP_VERSION_DEV (version));
   else
 num_target_entries
-  = devicep-load_image_func (devicep-target_id,
-  target_data, target_table);
+  = devicep-load_image.unver_func (devicep-target_id,
+	target_data, target_table);
 
   if (num_target_entries != num_funcs + num_vars)
 {
@@ -790,11 +790,11 @@ gomp_unload_image_from_device (unsigned
   node = splay_tree_lookup (devicep-mem_map, k);
 }
 
-  if (devicep-unload_image_2_func)
-devicep-unload_image_2_func (version,
-  devicep-target_id, target_data);
+  if (devicep-version_func)
+devicep-unload_image.ver_func (version,
+devicep-target_id, target_data);
   else
-devicep-unload_image_func (devicep-target_id, target_data);
+devicep-unload_image.unver_func (devicep-target_id, target_data);
 
   /* Remove mappings from splay tree.  */
   for (j = 0; j  num_funcs; j++)
@@ -823,8 +823,8 @@ gomp_unload_image_from_device (unsigned
the target, and TARGET_DATA needed by target plugin.  */
 
 void
-GOMP_offload_register_2 (unsigned version, const void *host_table,
-			 int target_type, const void *target_data)
+GOMP_offload_register_ver (unsigned version, const void *host_table,
+			   int target_type, const void *target_data)
 {
   int i;
 
@@ -863,7 +863,7 @@ void
 GOMP_offload_register (const void *host_table, int target_type,
 		   const void *target_data)
 {
-  GOMP_offload_register_2 (0, host_table, target_type, target_data);
+  GOMP_offload_register_ver (0, host_table, target_type, target_data);
 }
 
 /* This function should be called from every offload image while unloading.
@@ -871,8 +871,8 @@ GOMP_offload_register (const void *host_
the target, and TARGET_DATA needed by target plugin.  */
 
 void
-GOMP_offload_unregister_2 (unsigned version, const void *host_table,
-			   int target_type, const void *target_data)
+GOMP_offload_unregister_ver (unsigned version, const void *host_table,
+			 int target_type, const void *target_data)
 {
   int i;
 
@@ -904,7 +904,7 @@ void
 GOMP_offload_unregister (const void *host_table, int target_type,
 			 const void *target_data)
 {
-  GOMP_offload_unregister_2 (0, host_table, target_type, target_data);
+  GOMP_offload_unregister_ver (0, host_table, target_type, target_data);
 }
 
 /* This function initializes the target device, specified

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-27 Thread Jeff Law


On 07/27/2015 03:17 AM, Richard Biener wrote:

On Mon, Jul 27, 2015 at 5:10 AM,  tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders tbsaunde+...@tbsaunde.org

Hi,

$subject.

patches individually bootstrapped + regtested on x86_64-linux-gnu, and run
through config-list.mk with more patches removing usage of the macro.  Ok?


With POINTER_SIZE now being expensive (target hook) you might consider
moving most users to use pointer_sized_int_node or some other global
derived from POINTER_SIZE.

Which of course raises the question of why we are hookizing this...  if you'd
want a truly switchable target you'd have to switch all global trees as well
(or hookize them individually).

Not sure -- it doesn't remove any conditionally compiled code...

One could easily argue that it's just another step on the path towards a 
switchable target -- which in and of itself is a reasonable design goal.


Trevor, maybe a quick note on the motivation would help here...

jeff

[wwwdocs] Add -Wtautological-compare to changes.html

2015-07-27 Thread Marek Polacek

Another one.  Committed to CVS.

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.17
diff -u -r1.17 changes.html
--- changes.html27 Jul 2015 11:16:15 -  1.17
+++ changes.html27 Jul 2015 14:35:59 -
@@ -48,6 +48,10 @@
overflows.  code-Wshift-overflow=2/code also warns about
left-shifting 1 into the sign bit.  This warning is enabled by
default./li
+liA new command-line option code-Wtautological-compare/code has been
+   added for the C and C++ compilers, which warns if a self-comparison
+   always evaluates to true or false.  This warning is enabled by
+   code-Wall/code./li
   /ul
 
 h3 id=cC/h3

Marek

[PATCH][AArch64] Improve spill code - swap order in shr patterns

2015-07-27 Thread Wilco Dijkstra

This is basically the same as the shl version but for the shr patterns.

Various instructions are supported as integer operations as well as SIMD on 
AArch64. When
register pressure is high, lra-constraints inserts spill code without taking 
the allocation
class into account, and basically chooses the first available pattern that 
matches. Since this
instruction has the SIMD version first it is usually chosen eventhough some of 
the operands
are eventually allocated to integer registers. The result is inefficient code 
not only due to
the higher latency of SIMD instructions but also due to the extra int-FP 
moves. Placing the
integer variant first in the shr pattern generates far more optimal spill code.

2015-07-27  Wilco Dijkstra  wdijk...@arm.com

* gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_mode3):
Place integer variant first.  (aarch64_ashr_sisd_or_int_mode3):
Likewise.


---
 gcc/config/aarch64/aarch64.md | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index baef56a..06256e3 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3349,17 +3349,18 @@
 
 ;; Logical right shift using SISD or Integer instruction
 (define_insn *aarch64_lshr_sisd_or_int_mode3
-  [(set (match_operand:GPI 0 register_operand =w,w,r)
+  [(set (match_operand:GPI 0 register_operand =r,w,w,w)
 (lshiftrt:GPI
-  (match_operand:GPI 1 register_operand w,w,r)
-  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode 
Uscmode,w,rUscmode)))]
+  (match_operand:GPI 1 register_operand r,w,w,w)
+  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode 
rUscmode,Uscmode,w,0)))]
   
   @
+   lsr\t%w0, %w1, %w2
ushr\t%rtn0vas, %rtn1vas, %2
#
-   lsr\t%w0, %w1, %w2
-  [(set_attr simd yes,yes,no)
-   (set_attr type neon_shift_immq,neon_shift_regq,shift_reg)]
+   #
+  [(set_attr simd no,yes,yes,yes)
+   (set_attr type 
shift_reg,neon_shift_immq,neon_shift_regq,neon_shift_regq)]
 )
 
 (define_split
@@ -3394,18 +3395,18 @@
 
 ;; Arithmetic right shift using SISD or Integer instruction
 (define_insn *aarch64_ashr_sisd_or_int_mode3
-  [(set (match_operand:GPI 0 register_operand =w,w,w,r)
+  [(set (match_operand:GPI 0 register_operand =r,w,w,w)
 (ashiftrt:GPI
-  (match_operand:GPI 1 register_operand w,w,w,r)
-  (match_operand:QI 2 aarch64_reg_or_shift_imm_di 
Uscmode,w,0,rUscmode)))]
+  (match_operand:GPI 1 register_operand r,w,w,w)
+  (match_operand:QI 2 aarch64_reg_or_shift_imm_di 
rUscmode,Uscmode,w,0)))]
   
   @
+   asr\t%w0, %w1, %w2
sshr\t%rtn0vas, %rtn1vas, %2
#
-   #
-   asr\t%w0, %w1, %w2
-  [(set_attr simd yes,yes,yes,no)
-   (set_attr type 
neon_shift_immq,neon_shift_regq,neon_shift_regq,shift_reg)]
+   #
+  [(set_attr simd no,yes,yes,yes)
+   (set_attr type 
shift_reg,neon_shift_immq,neon_shift_regq,neon_shift_regq)]
 )
 
 (define_split
-- 
1.9.1

Re: [PATCH][AArch64] Improve spill code - swap order in shl pattern

2015-07-27 Thread James Greenhalgh

On Mon, Jul 27, 2015 at 03:38:40PM +0100, Wilco Dijkstra wrote:
 ping

  Various instructions are supported as integer operations as well as SIMD on 
  AArch64. When
  register pressure is high, lra-constraints inserts spill code without 
  taking the allocation
  class into account, and basically chooses the first available pattern that 
  matches. Since this
  instruction has the SIMD version first it is usually chosen eventhough some 
  of the operands
  are eventually allocated to integer registers. The result is inefficient 
  code not only due to
  the higher latency of SIMD instructions but also due to the extra int-FP 
  moves. Placing the
  integer variant first in the shl pattern generates far more optimal spill 
  code. A few more
  patterns are the wrong way around, which I'll address in a separate patch. 
  I'm also looking
  into fixing lra-constraints to generate the expected code by taking the 
  allocno class into
  account in the cost calculations during spilling.
  
  2015-04-27  Wilco Dijkstra  wdijk...@arm.com
  
  * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_mode3):
  Place integer variant first.

I approved this in April
( https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01652.html ).

I've committed it on your behalf as revision 226247.

Thanks,
James

  -Original Message-
  From: Wilco Dijkstra [mailto:wdijk...@arm.com]
  Sent: 27 April 2015 14:37
  To: GCC Patches
  Subject: [PATCH][AArch64] Improve spill code - swap order in shl pattern
  
  ---
   gcc/config/aarch64/aarch64.md | 14 +++---
   1 file changed, 7 insertions(+), 7 deletions(-)
  
  diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
  index 7163025..baef56a 100644
  --- a/gcc/config/aarch64/aarch64.md
  +++ b/gcc/config/aarch64/aarch64.md
  @@ -3334,17 +3334,17 @@
  
   ;; Logical left shift using SISD or Integer instruction
   (define_insn *aarch64_ashl_sisd_or_int_mode3
  -  [(set (match_operand:GPI 0 register_operand =w,w,r)
  +  [(set (match_operand:GPI 0 register_operand =r,w,w)
   (ashift:GPI
  -  (match_operand:GPI 1 register_operand w,w,r)
  -  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode 
  Uscmode,w,rUscmode)))]
  +  (match_operand:GPI 1 register_operand r,w,w)
  +  (match_operand:QI 2 aarch64_reg_or_shift_imm_mode 
  rUscmode,Uscmode,w)))]
 
 @
  +   lsl\t%w0, %w1, %w2
  shl\t%rtn0vas, %rtn1vas, %2
  -   ushl\t%rtn0vas, %rtn1vas, %rtn2vas
  -   lsl\t%w0, %w1, %w2
  -  [(set_attr simd yes,yes,no)
  -   (set_attr type neon_shift_immq, neon_shift_regq,shift_reg)]
  +   ushl\t%rtn0vas, %rtn1vas, %rtn2vas
  +  [(set_attr simd no,yes,yes)
  +   (set_attr type shift_reg,neon_shift_immq, neon_shift_regq)]
   )
  
   ;; Logical right shift using SISD or Integer instruction
  --
  1.9.1

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-27 Thread Marek Polacek

On Mon, Jul 27, 2015 at 05:01:30PM +0100, Kyrill Tkachov wrote:
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67030

I hope the patch I've posted in the PR helps.  Can you please check?

Marek

Re: [PATCH][RFC][match.pd] optimize (X C) == N when C is power of 2

2015-07-27 Thread Jeff Law


On 07/27/2015 01:36 AM, Richard Biener wrote:


I think it also aligns with the idea of pushing back RTL expansion and
expose some target specifics after another GIMPLE lowering phase.
I'm also thinking of addressing-mode selection and register promotion.

So at least if we think of that target specific match.pd pass as
containing all RTL expansion tricks done with TER only then it
should be quite simple to make it work.
Yea -- I was also thinking it might allow us to clean up some of the 
actual expansion code as well.  It seems promising enough for some 
experimentation.


jeff

Re: [PATCH] Honour DriverOnly for enum values in error messages.

2015-07-27 Thread Jeff Law


On 07/27/2015 05:45 AM, Dominik Vogt wrote:

The attached patch fixes a glicht in the error message generated
for invalid values of enum options.  When a DriverOnly option
was passed directoy top the compiler, it would still list that as
valid in the error message, e.g. on s390:

   $ cc1 ... -march=native
   cc1: error: unrecognized argument in option ‘-march=native’
   cc1: note: valid arguments to ‘-march=’ are: g5 g6 native z10 z13 z196 
z9-109 z9-ec z900 z990 zEC12

The patched code prints DriverOnly enum values only when the
driver generates the error message.

Ciao

Dominik ^_^  ^_^

-- Dominik Vogt IBM Germany


0001-ChangeLog


/gcc/ChangeLog

* opts-common.c (read_cmdline_option): List DriverOnly enum values as
valid only in the error message of the driver, not in the messages of
the language compilers.

OK.
jeff

Re: [PATCH] Use lowpart_subreg instead of simplify_gen_subreg

2015-07-27 Thread Jeff Law


On 07/26/2015 02:02 PM, Anatoliy Sokolov wrote:

Hello.

   This patch change function call simplify_gen_subreg (omode, x, imode,
subreg_lowpart_offset (omode, imode)) with lowpart_subreg (omode, x, imode)
and move lowpart_subreg function from loop-iv.c to simplify-rtx.c.

Bootstrapped and reg-tested on x86_64-unknown-linux-gnu.

OK for trunk?

Anatoliy.

2015-07-26  Anatoly Sokolov  ae...@post.ru

 * rtl.h (lowpart_subreg): Move in file.
 * loop-iv.c (lowpart_subreg): Move to...
 * simplify-rtx.c (lowpart_subreg): ...here.
   (simplify_binary_operation_1): Use lowpart_subreg instead of
   simplify_gen_subreg.
 * expr.c (expand_expr_real_2): Ditto.
 * emit-rtl.c (gen_lowpart_common): Ditto.
 * combine.c (gen_lowpart_for_combine): Ditto.
 * cfgexpand.c (convert_debug_memory_address, expand_debug_expr,
   expand_debug_source_expr): Ditto.

OK.
jeff

Re: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode

2015-07-27 Thread Jeff Law


On 07/27/2015 04:25 AM, David Sherwood wrote:

Hi,

Part 1 of this change is a clean-up. I have changed calls to GET_MODE_INNER (m)
so that it returns m in cases where there is no inner mode. This simplifies some
of the calling code by removing the need to check for VOIDmode and allows
calling it unconditionally. I also removed element_precision () as it was only
called in one place and thought it neater to call GET_MODE_PRECISION explicitly.

Parts 2-4 will include further tidy-ups and optimisations based on [1/N].

Good to go?

Regards,
David Sherwood.

2015-07-17  David Sherwooddavid.sherw...@arm.com

 gcc/
 * config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call
 GET_MODE_INNER unconditionally.
 * config/spu/spu.c (arith_immediate_p): Likewise.
 * config/i386/i386.c (ix86_build_signbit_mask): Likewise.  New 
variable.
 * expmed.c (synth_mult): Remove check for VOIDmode result from
 GET_MODE_INNER.
 (expand_mult_const): Likewise.
 * fold-const.c (): Replace call to element_precision with call to
 GET_MODE_PRECISION.
 * genmodes.c (emit_mode_inner_inline): Replace void_mode-name with
 m-name.
 (emit_mode_inner): Likewise.
 * lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER
 result check.
 * machmode.h (GET_MODE_UNIT_SIZE): Simplify.
 (GET_MODE_UNIT_PRECISION): Likewise.
 * rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally.
 * simplify-rtx.c (simplify_immed_subreg): Likewise.
 * stor-layout.c (bitwise_type_for_mode): Update assert.
 (element_precision): Remove.
Somehow my brain kept translating INNER into NARROWER.  Naturally I was 
having considerable trouble seeing how the patch could be correct ;-) 
Looking at insn-modes.h cleared things up quickly.


In a lot of ways this makes GET_INNER_MODE act more like 
GET_MODE_NUNITS, which is probably good.


You need to update the comment for GET_MODE_INNER in machmode.h to 
reflect the change in its return value for non-vector modes.


With that update, this patch is fine.

jeff

Re: [gomp4] Additional tests for routine directive

2015-07-27 Thread James Norris


Thomas,

The attached patch adds XFAILs so as to quiet the errors
until such time as the development is complete.

Committed to gomp-4_0-branch.

Thank you, thank you
Jim

On 07/27/2015 08:36 AM, Thomas Schwinge wrote:

Hi!

On Fri, 24 Jul 2015 15:43:36 -0500, James Norris jnor...@codesourcery.com 
wrote:

The attached patch adds additional test for the routine
directive for C/C++/Fortran.

Committed to gomp-4_0-branch.


Thanks, but I see a number of FAILs, including the following:

 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 (test for excess errors)
 UNRESOLVED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-5.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 compilation failed to produce 
executable

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 (test for excess errors)
 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 execution test
 FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/routine-3.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is , should 
match foo not found

 FAIL: libgomp.oacc-fortran/routine-8.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  (test for excess errors)
 UNRESOLVED: libgomp.oacc-fortran/routine-8.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0  -O0  compilation failed to produce executable
 [same for other torture testing flags]

 PASS: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1  -O0  (test for excess errors)
 PASS: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1  -O0  execution test
 FAIL: libgomp.oacc-fortran/routine-6.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1  -O0  output pattern test, is , should match not found
 [same for other torture testing flags]

(I have not reviewed your test case changes.)


Grüße,
  Thomas



diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-3.c
index 73ca528..a191758 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-3.c
@@ -1,5 +1,6 @@
 
 /* { dg-do run } */
+/* { dg-xfail-if foo not found { openacc_host_selected } } */
 
 #include stdlib.h
 
@@ -28,5 +29,3 @@ main()
 
   return 0;
 }
-
-/* { dg-output foo not found { target openacc_host_selected } } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-5.c
index 6d0fbe3..4e34f78 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-5.c
@@ -1,7 +1,9 @@
 
 /* { dg-do run } */
+/* { dg-warning TODO implicit { xfail *-*-* } 17 } */
+/* { dg-warning TODO implicit { xfail *-*-* } 27 } */
+/* { dg-xfail-if unresolved symbol { *-*-* } } */
 
-#include stdio.h
 #include stdlib.h
 
 #pragma acc routine bind (foo)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90
index 4b7b707..9ba6da8 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-6.f90
@@ -1,4 +1,5 @@
 ! { dg-do run }
+! { dg-xfail-if not found { openacc_host_selected } }
 
 program main
   integer :: a, n
@@ -25,4 +26,3 @@ end function
 
 end program main
 
-! { dg-output not found { target openacc_host_selected } }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-8.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-8.f90
index 2060740..5c58b43 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/routine-8.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-8.f90
@@ -1,5 +1,6 @@
 
 ! { dg-do run } 
+! { dg-error Invalid TODO { xfail *-*-* } 51 }
 
 program main
   integer, parameter :: n = 10

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-07-27 Thread Jeff Law


On 07/27/2015 03:25 AM, Richard Biener wrote:

On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison
michael.colli...@linaro.org wrote:

This patch is designed to optimize end of loop conditions involving of the
form
  i  x  i  y into i  min (x, y). Loop condition involving '' are
handled similarly using max(x,y).
As an example:

#define N 1024

int  a[N], b[N], c[N];

void add (unsignedint  m, unsignedint  n)
{
   unsignedint  i, bound = (m  n) ? m : n;
   for  (i = 0; i  m  i  n; ++i)
 a[i] = b[i] + c[i];
}


Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
arm-linux-gnueabihf, and aarch64-linux-gnu.
Okay for trunk?


So this works only for  that has been lowered to non-CFG form
(I suppose phiopt would catch that?  If not, ifcombine would be the
place to implement it I guess).
phiopt is supposed to be generating MIN/MAX expressions for us.  If it 
isn't it'd be good to see any testcases where it isn't.


I think that raises a general question though.  Does it make more sense 
to capture MIN/MAX (and others) in phiopt or in the match.pd framework?


Jeff

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-27 Thread Kyrill Tkachov



On 27/07/15 17:09, Jeff Law wrote:

On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:

I experimented with resource.c and the roadblock I hit is that it
seems to have an assumption that it operates on hard regs (in fact
the struct it uses to describe the resources has a HARD_REG_SET for
the regs) and so it triggers various HARD_REGISTER_P asserts when I
try to use the functions there. if-conversion runs before register
allocation, so we're dealing with pseudos here.

Sigh.  resource.c probably isn't going to be useful then.


My other attempt was to go over BB_A and mark the set registers in a
  bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
we return false. This seemed to do the job and testing worked out ok.
That would require one walk over BB_A, one walk over BB_B but I don't
know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?

I think the latter is reasonable.  Ultimately we have to do a full look
at those rtxs, so it's unavoidable to some extent.

The only other possibility would be to use the DF framework.  I'm not
sure if it's even initialized for the ifcvt code.  If it is, then you
might be able to avoid some of the walking of the insns and instead walk
the DF structures.


I think it is initialized (I look at df_get_live_out earlier on
in the call chain). I suppose what we want is for the live in regs for BB_B
to not include any of the set regs in BB_A?






snip

It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have not seen
it cause any trouble. If it fails then back in noce_try_cmove_arith
we goto end_seq_and_fail which ends the sequence and throws it away
(and cancels if-conversion down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not
fail, but it seems that was a bad assumption on my part.So
given noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.

I'll look into that, but what is the machinery by which one moves
insns?

I don't think we have any good generic machinery for this.  I think
every pass that needs this capability unlinks the insn from the chain
and patches it back in at the new location.


That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?

The current way the top-level noce_process_if_block is structured
it expects the various ifcvt functions (like noce_try_cmove_arith)
to generate a sequence, then it takes it, unshares it and removes
the empty basic blocks.

If we're to instead move insns around we'd need to further modify
 noce_process_if_block to handle differently
 this one case where we move insns instead of re-emitting them.
I think this would make that function more convoluted than it needs to be.
With the current approach we always call unshare_all_rtl_in_chain on the
emitted sequence which should take care of any RTL sharing issues and in
practice I don't expect to have more than 3-4 insns in these sequences since
they will be guarded by the branch cost.

So I would rather argue for re-emitting insns in this case to keep consistent
with the dozen or so similar functions in ifcvt.c that already work that way.

Thanks,
Kyrill



jeff

[committed, PATCH]: Add more tests for PR target/66232

2015-07-27 Thread H.J. Lu

Index: ChangeLog
===
--- ChangeLog   (revision 226252)
+++ ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2015-07-27  H.J. Lu  hongjiu...@intel.com
+
+   * gcc.target/i386/pr66232-6.c: New tests.
+   * gcc.target/i386/pr66232-7.c: Likewise.
+   * gcc.target/i386/pr66232-8.c: Likewise.
+   * gcc.target/i386/pr66232-9.c: Likewise.
+
 2015-07-27  Marek Polacek  pola...@redhat.com
 
PR c++/66555
Index: gcc.target/i386/pr66232-6.c
===
--- gcc.target/i386/pr66232-6.c (revision 0)
+++ gcc.target/i386/pr66232-6.c (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options -O2 -fno-pic -fno-plt } */
+
+extern void bar (void) __attribute__((visibility(hidden)));
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler-not jmp\[ \t\]*.bar@GOTPCREL { target { ! 
ia32 } } } } */
+/* { dg-final { scan-assembler-not jmp\[ \t\]*.bar@GOT { target ia32 } } } */
Index: gcc.target/i386/pr66232-7.c
===
--- gcc.target/i386/pr66232-7.c (revision 0)
+++ gcc.target/i386/pr66232-7.c (working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options -O2 -fno-pic -fno-plt } */
+
+extern void bar (void) __attribute__((visibility(hidden)));
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not call\[ \t\]*.bar@GOTPCREL { target { ! 
ia32 } } } } */
+/* { dg-final { scan-assembler-not call\[ \t\]*.bar@GOT { target ia32 } } } 
*/
Index: gcc.target/i386/pr66232-8.c
===
--- gcc.target/i386/pr66232-8.c (revision 0)
+++ gcc.target/i386/pr66232-8.c (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options -O2 -fno-pic -fno-plt } */
+
+extern int bar (void) __attribute__((visibility(hidden)));
+
+int
+foo (void)
+{
+  return bar ();
+}
+
+/* { dg-final { scan-assembler-not jmp\[ \t\]*.bar@GOTPCREL { target { ! 
ia32 } } } } */
+/* { dg-final { scan-assembler-not jmp\[ \t\]*.bar@GOT { target ia32 } } } */
Index: gcc.target/i386/pr66232-9.c
===
--- gcc.target/i386/pr66232-9.c (revision 0)
+++ gcc.target/i386/pr66232-9.c (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options -O2 -fno-pic -fno-plt } */
+
+extern int bar (void) __attribute__((visibility(hidden)));
+
+int
+foo (void)
+{
+  return bar () + 1;
+}
+
+/* { dg-final { scan-assembler-not call\[ \t\]*.bar@GOTPCREL { target { ! 
ia32 } } } } */
+/* { dg-final { scan-assembler-not call\[ \t\]*.bar@GOT { target ia32 } } } 
*/

Re: [MOXIE] Hookize PRINT_OPERAND and PRINT_OPERAND_ADDRESS

2015-07-27 Thread Jeff Law


On 07/26/2015 02:02 PM, Anatoliy Sokolov wrote:

Hi.

This patch removes obsolete PRINT_OPERAND and PRINT_OPERAND_ADDRESS macros
from the MOXIE back end in the GCC and introduces equivalent
TARGET_PRINT_OPERAND and TARGET_PRINT_OPERAND_ADDRESS target hooks.

Regression tested on moxie-unknown-elf.

OK for trunk?

Anatoly.

2015-07-26  Anatoly Sokolov  ae...@post.ru

 * config/moxie/moxie.h (PRINT_OPERAND,
   PRINT_OPERAND_ADDRESS): Remove macros.
 * config/moxie/moxie-protos.h (moxie_print_operand,
   moxie_print_operand_address): Remove declaration.
 * config/moxie/moxie.c (TARGET_PRINT_OPERAND,
   TARGET_PRINT_OPERAND_ADDRESS): Define.
   (moxie_print_operand, moxie_print_operand_address): Make static.

OK.
jeff

Re: RFA: RL78: Add an optimization to the addsi3 pattern

2015-07-27 Thread DJ Delorie


Ok, but please don't put non-public issue numbers in the comments.

Re: C++ delayed folding branch review

2015-07-27 Thread Jason Merrill

I've trimmed this to the previously mentioned issues that still need to 
be addressed; I'll do another full review after these are dealt with.


On 06/13/2015 12:15 AM, Jason Merrill wrote:

On 06/12/2015 12:11 PM, Kai Tietz wrote:

@@ -1052,6 +1054,9 @@ adjust_temp_type (tree type, tree temp)
  {
if (TREE_TYPE (temp) == type)
  return temp;
+  STRIP_NOPS (temp);
+  if (TREE_TYPE (temp) == type)
+return temp;
@@ -1430,6 +1438,8 @@ cxx_eval_call_expression (const constexpr_ctx
*ctx,
tree t,
  bool
  reduced_constant_expression_p (tree t)
  {
+  /* Make sure we remove useless initial NOP_EXPRs.  */
+  STRIP_NOPS (t);


Within the constexpr code we should be folding away NOPs as they are
generated, they shouldn't live this long.


Well, we might see them on overflows ...


We shouldn't within the constexpr code.  NOPs for expressions that are
non-constant due to overflow are added in
cxx_eval_outermost_constant_expr, so we shouldn't see them in the middle
of constexpr evaluation.


@@ -1088,7 +1093,10 @@ cxx_bind_parameters_in_call (const constexpr_ctx
*ctx, tree t,
is_dummy_object (x))
 {
   x = ctx-object;
- x = cp_build_addr_expr (x, tf_warning_or_error);
+ if (x)
+   x = cp_build_addr_expr (x, tf_warning_or_error);
+ else
+   x = get_nth_callarg (t, i);


This still should not be necessary.


Yeah, most likely.  But I got initially here some issues, so I don't
see that this code would worsen things.


If this code path is hit, that means something has broken my design, and
I don't want to just paper over that.  Please revert this change.


  case SIZEOF_EXPR:
+  if (processing_template_decl
+  (!COMPLETE_TYPE_P (TREE_TYPE (t))
+ || TREE_CODE (TYPE_SIZE (TREE_TYPE (t))) != INTEGER_CST))
+   return t;


Why is this necessary?


We don't want to resolve SIZEOF_EXPR within template-declarations for
incomplete types, of if its size isn't fixed.  Issue is that we
otherwise get issues about expressions without existing type (as usual
within template-declarations for some expressions).


Yes, but we shouldn't have gotten this far with a dependent sizeof;
maybe_constant_value just returns if
instantiation_dependent_expression_p is true.


@@ -3391,8 +3431,23 @@ cxx_eval_constant_expression (const
constexpr_ctx
*ctx, tree t,
  case CONVERT_EXPR:
  case VIEW_CONVERT_EXPR:
  case NOP_EXPR:
+case UNARY_PLUS_EXPR:
{
+   enum tree_code tcode = TREE_CODE (t);
 tree oldop = TREE_OPERAND (t, 0);
+
+   if (tcode == NOP_EXPR  TREE_TYPE (t) == TREE_TYPE (oldop) 
TREE_OVERFLOW_P (oldop))
+ {
+   if (!ctx-quiet)
+ permerror (input_location, overflow in constant
expression);
+   /* If we're being permissive (and are in an enforcing
+   context), ignore the overflow.  */
+   if (!flag_permissive)
+ *overflow_p = true;
+   *non_constant_p = true;
+
+   return t;
+ }
 tree op = cxx_eval_constant_expression (ctx, oldop,


Why doesn't the call to cxx_eval_constant_expression at the bottom here
handle oldop having TREE_OVERFLOW set?


I just handled the case that we see here a wrapping NOP_EXPR around an
overflow.  As this isn't handled by cxx_eval_constant_expression.


How does it need to be handled?  A NOP_EXPR wrapped around an overflow
is there to indicated that the expression is non-constant, and it can't
be simplified any farther.

Please give an example of what was going wrong.


@@ -565,6 +571,23 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p,
gimple_seq *post_p)

switch (code)
  {
+case SIZEOF_EXPR:
+  if (SIZEOF_EXPR_TYPE_P (*expr_p))
+   *expr_p = cxx_sizeof_or_alignof_type (TREE_TYPE (TREE_OPERAND
(*expr_p,
+
0)),
+ SIZEOF_EXPR, false);
+  else if (TYPE_P (TREE_OPERAND (*expr_p, 0)))
+   *expr_p = cxx_sizeof_or_alignof_type (TREE_OPERAND (*expr_p,
0),
+ SIZEOF_EXPR, false);
+  else
+   *expr_p = cxx_sizeof_or_alignof_expr (TREE_OPERAND (*expr_p,
0),
+ SIZEOF_EXPR, false);
+  if (*expr_p == error_mark_node)
+   *expr_p = size_one_node;
+
+  *expr_p = maybe_constant_value (*expr_p);
+  ret = GS_OK;
+  break;


Why are these surviving until gimplification time?


This might be still necessary. I will retest, when bootstrap works.
As we now added SIZEOF_EXPR folding to cp_fold, and if we catch all
expressions a sizeof can occure, this shouldn't be necessary anymore.
AFAIR I saw here some issues about initialzation for global-variables,
which weren't caught.


Hmm, I wonder why you would see issues with global initializers that
aren't seen on trunk?  In any case, if the issue is with global
initializers, they should be handled sooner, not here.


@@ -608,9 +608,13 @@ cp_fold_convert (tree

Re: [committed] Typo in ipa-devirt.c

2015-07-27 Thread Richard Biener

On July 27, 2015 9:35:35 PM GMT+02:00, Marek Polacek pola...@redhat.com wrote:
This patch fixes an obvious typo in types_same_for_odr detected by the
new -Wtautological-compare warning.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

Might want to back port this.  Looks like a latent wrong-code bug to me.

Richard.

2015-07-27  Marek Polacek  pola...@redhat.com

   * ipa-devirt.c (types_same_for_odr): Fix typo.

diff --git gcc/ipa-devirt.c gcc/ipa-devirt.c
index b7afc3b..0a92768 100644
--- gcc/ipa-devirt.c
+++ gcc/ipa-devirt.c
@@ -550,7 +550,7 @@ types_same_for_odr (const_tree type1, const_tree
type2, bool strict)
   return false;
   if (TREE_CODE (type1) == RECORD_TYPE
  (TYPE_BINFO (type1) == NULL_TREE)
-!= (TYPE_BINFO (type1) == NULL_TREE))
+!= (TYPE_BINFO (type2) == NULL_TREE))
   return false;
   if (TREE_CODE (type1) == RECORD_TYPE  TYPE_BINFO (type1)
  (BINFO_VTABLE (TYPE_BINFO (type1)) == NULL_TREE)

   Marek

[committed] Update outer-4.c and uns-outer-4.c

2015-07-27 Thread Tom de Vries


Hi,

this patch cleans up testcases autopar/outer-4.c and autopar/uns-outer-4.c.

Committed as obvious.

Thanks,
- Tom
Update outer-4.c and uns-outer-4.c

2015-07-27  Tom de Vries  t...@codesourcery.com

	* gcc.dg/autopar/outer-4.c (parloop): Remove superfluous noinline
	attribute.  Update comment.
	(main): Remove.
	Add scan for not parallelizing inner loop.
	* gcc.dg/autopar/uns-outer-4.c (parloop): Remove superfluous noinline
	attribute.
	(main): Remove.
---
 gcc/testsuite/gcc.dg/autopar/outer-4.c | 19 +++
 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c | 11 +--
 2 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c
index 2027499..681cf85 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c
@@ -6,15 +6,16 @@ void abort (void);
 int g_sum=0;
 int x[500][500];
 
-__attribute__((noinline))
-void parloop (int N)
+void
+parloop (int N)
 {
   int i, j;
   int sum;
 
-  /* Double reduction is currently not supported, outer loop is not 
- parallelized.  Inner reduction is detected, inner loop is 
- parallelized.  */
+  /* The inner reduction is not recognized as reduction because we cannot assume
+ that int wraps on overflow.  The way to fix this is to implement the
+ reduction operation in unsigned type, but we've not yet implemented
+ this.  */
   sum = 0;
   for (i = 0; i  N; i++)
 for (j = 0; j  N; j++)
@@ -23,13 +24,7 @@ void parloop (int N)
   g_sum = sum;
 }
 
-int main(void)
-{
-  parloop(500);
-
-  return 0;
-}
-
 
+/* { dg-final { scan-tree-dump-times parallelizing inner loop 0 parloops } } */
 /* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops { xfail *-*-* } } } */
 /* { dg-final { scan-tree-dump-times loopfn 4 optimized { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
index 8365a89..30ead25 100644
--- a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
@@ -6,7 +6,7 @@ void abort (void);
 unsigned int g_sum=0;
 unsigned int x[500][500];
 
-void __attribute__((noinline))
+void
 parloop (int N)
 {
   int i, j;
@@ -23,14 +23,5 @@ parloop (int N)
   g_sum = sum;
 }
 
-int
-main (void)
-{
-  parloop (500);
-
-  return 0;
-}
-
-
 /* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops { xfail *-*-* } } } */
 /* { dg-final { scan-tree-dump-times loopfn 4 optimized } } */
-- 
1.9.1

[C/C++ PATCH, committed] Fix -Wtautological-compare (PR bootstrap/67030)

2015-07-27 Thread Marek Polacek

This PR exposed a defect in my -Wtautological-compare warning: we don't
want to warn when the expression comes from a macro expansion.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-07-27  Marek Polacek  pola...@redhat.com

PR bootstrap/67030
* c-common.c (warn_tautological_cmp): Don't warn for macro expansion.

* c-c++-common/Wtautological-compare-2.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 6a79b95..caa801e 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -1890,6 +1890,12 @@ warn_tautological_cmp (location_t loc, enum tree_code 
code, tree lhs, tree rhs)
   if (TREE_CODE_CLASS (code) != tcc_comparison)
 return;
 
+  /* Don't warn for various macro expansions.  */
+  if (from_macro_expansion_at (loc)
+  || from_macro_expansion_at (EXPR_LOCATION (lhs))
+  || from_macro_expansion_at (EXPR_LOCATION (rhs)))
+return;
+
   /* We do not warn for constants because they are typical of macro
  expansions that test for features, sizeof, and similar.  */
   if (CONSTANT_CLASS_P (lhs) || CONSTANT_CLASS_P (rhs))
diff --git gcc/testsuite/c-c++-common/Wtautological-compare-2.c 
gcc/testsuite/c-c++-common/Wtautological-compare-2.c
index e69de29..c8aecef 100644
--- gcc/testsuite/c-c++-common/Wtautological-compare-2.c
+++ gcc/testsuite/c-c++-common/Wtautological-compare-2.c
@@ -0,0 +1,15 @@
+/* PR bootstrap/67030 */
+/* { dg-do compile } */
+/* { dg-options -Wtautological-compare } */
+
+extern int foo (void);
+
+#define A a
+#define B A
+#define FOO (A  B)
+
+void
+fn1 (int a)
+{
+  if (FOO);
+}

Marek

[committed] Typo in ipa-devirt.c

2015-07-27 Thread Marek Polacek

This patch fixes an obvious typo in types_same_for_odr detected by the
new -Wtautological-compare warning.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-07-27  Marek Polacek  pola...@redhat.com

* ipa-devirt.c (types_same_for_odr): Fix typo.

diff --git gcc/ipa-devirt.c gcc/ipa-devirt.c
index b7afc3b..0a92768 100644
--- gcc/ipa-devirt.c
+++ gcc/ipa-devirt.c
@@ -550,7 +550,7 @@ types_same_for_odr (const_tree type1, const_tree type2, 
bool strict)
return false;
   if (TREE_CODE (type1) == RECORD_TYPE
   (TYPE_BINFO (type1) == NULL_TREE)
- != (TYPE_BINFO (type1) == NULL_TREE))
+ != (TYPE_BINFO (type2) == NULL_TREE))
return false;
   if (TREE_CODE (type1) == RECORD_TYPE  TYPE_BINFO (type1)
   (BINFO_VTABLE (TYPE_BINFO (type1)) == NULL_TREE)

Marek

Re: [committed] Typo in ipa-devirt.c

2015-07-27 Thread Marek Polacek

On Mon, Jul 27, 2015 at 09:45:19PM +0200, Richard Biener wrote:
 On July 27, 2015 9:35:35 PM GMT+02:00, Marek Polacek pola...@redhat.com 
 wrote:
 This patch fixes an obvious typo in types_same_for_odr detected by the
 new -Wtautological-compare warning.
 
 Bootstrapped/regtested on x86_64-linux, applying to trunk.
 
 Might want to back port this.  Looks like a latent wrong-code bug to me.

Yeah, I'm going to put this into 5 as well.  4.9 probably doesn't have
this problem.

Marek

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-27 Thread Richard Sandiford

Jeff Law l...@redhat.com writes:
 On 07/27/2015 03:17 AM, Richard Biener wrote:
 On Mon, Jul 27, 2015 at 5:10 AM,  tbsaunde+...@tbsaunde.org wrote:
 From: Trevor Saunders tbsaunde+...@tbsaunde.org

 Hi,

 $subject.

 patches individually bootstrapped + regtested on x86_64-linux-gnu, and run
 through config-list.mk with more patches removing usage of the macro.  Ok?

 With POINTER_SIZE now being expensive (target hook) you might consider
 moving most users to use pointer_sized_int_node or some other global
 derived from POINTER_SIZE.

 Which of course raises the question of why we are hookizing this...  if you'd
 want a truly switchable target you'd have to switch all global trees as well
 (or hookize them individually).
 Not sure -- it doesn't remove any conditionally compiled code...

 One could easily argue that it's just another step on the path towards a 
 switchable target -- which in and of itself is a reasonable design goal.

 Trevor, maybe a quick note on the motivation would help here...

I think at least we should use data hooks rather than function hooks,
since this value should a constant within a subtarget.  It should only
change for target_reinit.

Alternatively we could have a new target_globals structure that is
initialised with the result of calling the hook.  If we do that though,
it might make sense to consolidate the hooks rather than have one for
every value.  E.g. having one function for UNITS_PER_WORD, one for
POINTER_SIZE, one for Pmode, etc., would lead to some very verbose
target code.

Perhaps the main problem with these approaches is ensuring that the
value is set up early enough.

Thanks,
Richard

Re: [PATCH 1/4] convert ASM_OUTPUT_ASCII to a hook

2015-07-27 Thread Richard Sandiford

tbsaunde+...@tbsaunde.org writes:
 +/* The routine used to output sequences of byte values.  We use a special
 +   version of this for most svr4 targets because doing so makes the
 +   generated assembly code more compact (and thus faster to assemble)
 +   as well as more readable.  Note that if we find subparts of the
 +   character sequence which end with NUL (and which are shorter than
 +   ELF_STRING_LIMIT) we output those using ASM_OUTPUT_LIMITED_STRING.  */
 +
 +void
 +ix86_elf_output_ascii (FILE *f, const char *string, size_t length)
 +{
 +  const unsigned char *_ascii_bytes = (const unsigned char *) string;
 +  const unsigned char *limit = _ascii_bytes + length;
 +  unsigned bytes_in_chunk = 0;
 +  for (; _ascii_bytes  limit; _ascii_bytes++)
 +{
 +  const unsigned char *p;
 +  if (bytes_in_chunk = 64)
 + {
 +   fputc ('\n', f);
 +   bytes_in_chunk = 0;
 + }
 +  for (p = _ascii_bytes; p  limit  *p != '\0'; p++)
 + continue;
 +  if (p  limit  (p - _ascii_bytes) = (long) ELF_STRING_LIMIT)
 + {
 +   if (bytes_in_chunk  0)
 + {
 +   fputc ('\n', f);
 +   bytes_in_chunk = 0;
 + }
 +   ASM_OUTPUT_LIMITED_STRING (f, (const char *) _ascii_bytes);
 +   _ascii_bytes = p;
 + }
 +   else
 + {
 +   if (bytes_in_chunk == 0)
 + fputs (ASM_BYTE, f);
 +   else
 + fputc (',', f);
 +   fprintf (f, 0x%02x, *_ascii_bytes); \
 + bytes_in_chunk += 5;
 + }

_ascii_bytes = p; onwards seems to be indented too far.

 +DEFHOOK
 +(output_ascii,
 + A target hook to output an assembly instruction to assemble a string\n\
 + constant containing the @var{length} bytes at @var{str}.\n\
 +\n\
 +The defalt hook uses the @code{.ascii} pseudo op as found in the Berkely 
 Unix assembler.,
 + void, (FILE *f, const char *str, size_t length),
 + default_output_ascii)

Preexisting, but how about s/an assembly instruction/assembly instructions/
(or assembly directives?), just to emphasise that more than one instruction
can be used.

 @@ -7586,6 +7586,37 @@ make_debug_expr_from_rtl (const_rtx exp)
return dval;
  }
  
 +/* Default implementation of TARGET_ASM_OUTPUT_ASCII using .ascii.  */
 +
 +void
 +default_output_ascii (FILE *f, const char *str, size_t length)
 +{
 +  const unsigned char *p = (const unsigned char *) str;
 +
 +  fprintf (f, \t.ascii \);
 +for (unsigned int i = 0; i  length; i++)
 +  {

Some strange indentation here too.

Thanks,
Richard

Re: [PR64164] drop copyrename, integrate into expand

2015-07-27 Thread Alexandre Oliva

On Jul 24, 2015, David Edelsohn dje@gmail.com wrote:

 On Fri, Jul 24, 2015 at 4:02 PM, Alexandre Oliva aol...@redhat.com wrote:
 On Jul 23, 2015, David Edelsohn dje@gmail.com wrote:
 
 I request that this patch be reverted (again).
 
 Might I kindly ask you to please do so for me.  I've just found out
 that, after yesterday's memory upgrade on my local build machine, the
 filesystem that I normally use for GCC development got corrupted, and I
 don't want to mess with it before running an fsck which will take me a
 while.

 I have reverted the patch.

Thank you very much.  Long story short, the filesystem got corrupted
beyond repair before I realized something was wrong, so I spend my
weekend backing up the bits I still could and recreating it all from
scratch.  *fun* :-/

I even ran memtest before booting up, but everything was fine in the
single-threaded tests it runs by default.  It was only with all cores
actively using memory intensely that something overheated (memory
modules?  chipset?  cpu?  no clue) and started randomly corrupting bits.
So, I'm now back at lower memory clock speeds, and everything appears to
be rock solid again.  Phew!  So, I'm back to debugging the
newly-reported problems and thinking how much further I should extend
testing coverage so that the next round doesn't have to be reverted
again ;-)

Thanks again,

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [PR64164] drop copyrename, integrate into expand

2015-07-27 Thread H.J. Lu

On Mon, Jul 27, 2015 at 2:22 PM, Alexandre Oliva aol...@redhat.com wrote:
 On Jul 24, 2015, David Edelsohn dje@gmail.com wrote:

 On Fri, Jul 24, 2015 at 4:02 PM, Alexandre Oliva aol...@redhat.com wrote:
 On Jul 23, 2015, David Edelsohn dje@gmail.com wrote:

 I request that this patch be reverted (again).

 Might I kindly ask you to please do so for me.  I've just found out
 that, after yesterday's memory upgrade on my local build machine, the
 filesystem that I normally use for GCC development got corrupted, and I
 don't want to mess with it before running an fsck which will take me a
 while.

 I have reverted the patch.

 Thank you very much.  Long story short, the filesystem got corrupted
 beyond repair before I realized something was wrong, so I spend my
 weekend backing up the bits I still could and recreating it all from
 scratch.  *fun* :-/

 I even ran memtest before booting up, but everything was fine in the
 single-threaded tests it runs by default.  It was only with all cores
 actively using memory intensely that something overheated (memory
 modules?  chipset?  cpu?  no clue) and started randomly corrupting bits.
 So, I'm now back at lower memory clock speeds, and everything appears to
 be rock solid again.  Phew!  So, I'm back to debugging the

The exactly same thing happened to my machine.  It took me
several weeks before I lowered memory clock.  My machine has
been running fine for over a year under very heavy load.

BTW, this is what I use to test ia32 on Intel64:

PATH=/usr/local32/bin:/bin:/usr/bin
.../configure --prefix=/usr/6.0.0 --enable-clocale=gnu
--with-system-zlib --enable-shared --with-demangler-in-ld
--enable-libmpx i686-linux --with-fpmath=sse
--enable-languages=c,c++,fortran,java,lto,objc

where /usr/local32/bin has ia32 binutils.

-- 
H.J.

Elimitate duplication of get_catalogs in different abi

2015-07-27 Thread François Dumont

Hi

This is the patch to get rid of the duplication of the get_catalogs
functions in the .so.

I used c++locale_internal.h that seems to be there for this kind of
purpose.

* config/locale/gnu/messages_members.cc (Catalog_info, Catalogs):
Move...
* config/locale/gnu/c++locale_internal.h: ...here in std namespace.
* config/locale/gnu/c_locale.cc: Move implementation of latter here.

Tested under linux x86_64.

Ok to commit ?

François

diff --git libstdc++-v3/config/locale/gnu/c++locale_internal.h libstdc++-v3/config/locale/gnu/c++locale_internal.h
index f1959d6..eeac620 100644
--- libstdc++-v3/config/locale/gnu/c++locale_internal.h
+++ libstdc++-v3/config/locale/gnu/c++locale_internal.h
@@ -36,8 +36,13 @@
 #include cstddef
 #include langinfo.h
 
+#include vector
+#include string.h	// ::strdup
+
+#include ext/concurrence.h
+
 #if __GLIBC__  2 || (__GLIBC__ == 2  __GLIBC_MINOR__  2)
-  
+
 extern C __typeof(nl_langinfo_l) __nl_langinfo_l;
 extern C __typeof(strcoll_l) __strcoll_l;
 extern C __typeof(strftime_l) __strftime_l;
@@ -61,3 +66,49 @@ extern C __typeof(wctype_l) __wctype_l;
 #endif 
 
 #endif // GLIBC 2.3 and later
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  struct Catalog_info
+  {
+Catalog_info(messages_base::catalog __id, const char* __domain,
+		 locale __loc)
+  : _M_id(__id), _M_domain(strdup(__domain)), _M_locale(__loc)
+{ }
+
+~Catalog_info()
+{ free(_M_domain); }
+
+messages_base::catalog _M_id;
+char* _M_domain;
+locale _M_locale;
+  };
+
+  class Catalogs
+  {
+  public:
+Catalogs() : _M_catalog_counter(0) { }
+~Catalogs();
+
+messages_base::catalog
+_M_add(const char* __domain, locale __l);
+
+void
+_M_erase(messages_base::catalog __c);
+
+const Catalog_info*
+_M_get(messages_base::catalog __c) const;
+
+  private:
+mutable __gnu_cxx::__mutex _M_mutex;
+messages_base::catalog _M_catalog_counter;
+vectorCatalog_info* _M_infos;
+  };
+
+  Catalogs
+  get_catalogs();
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace
diff --git libstdc++-v3/config/locale/gnu/c_locale.cc libstdc++-v3/config/locale/gnu/c_locale.cc
index 4612c64..0d6d204 100644
--- libstdc++-v3/config/locale/gnu/c_locale.cc
+++ libstdc++-v3/config/locale/gnu/c_locale.cc
@@ -31,9 +31,12 @@
 #include locale
 #include stdexcept
 #include limits
+#include algorithm
 #include langinfo.h
 #include bits/c++locale_internal.h
 
+#include backward/auto_ptr.h
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -170,6 +173,85 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 return __changed;
   }
 
+  struct _CatalogIdComp
+  {
+bool
+operator()(messages_base::catalog __cat, const Catalog_info* __info) const
+{ return __cat  __info-_M_id; }
+
+bool
+operator()(const Catalog_info* __info, messages_base::catalog __cat) const
+{ return __info-_M_id  __cat; }
+  };
+
+  Catalogs::~Catalogs()
+  {
+for (vectorCatalog_info*::iterator __it = _M_infos.begin();
+	 __it != _M_infos.end(); ++__it)
+  delete *__it;
+  }
+
+  messages_base::catalog
+  Catalogs::_M_add(const char* __domain, locale __l)
+  {
+__gnu_cxx::__scoped_lock lock(_M_mutex);
+
+// The counter is not likely to roll unless catalogs keep on being
+// opened/closed which is consider as an application mistake for the
+// moment.
+if (_M_catalog_counter == numeric_limitsmessages_base::catalog::max())
+  return -1;
+
+auto_ptrCatalog_info info(new Catalog_info(_M_catalog_counter++,
+		 __domain, __l));
+
+// Check if we managed to allocate memory for domain.
+if (!info-_M_domain)
+  return -1;
+
+_M_infos.push_back(info.get());
+return info.release()-_M_id;
+  }
+
+  void
+  Catalogs::_M_erase(messages_base::catalog __c)
+  {
+__gnu_cxx::__scoped_lock lock(_M_mutex);
+
+vectorCatalog_info*::iterator __res =
+  lower_bound(_M_infos.begin(), _M_infos.end(), __c, _CatalogIdComp());
+if (__res == _M_infos.end() || (*__res)-_M_id != __c)
+  return;
+
+delete *__res;
+_M_infos.erase(__res);
+
+// Just in case closed catalog was the last open.
+if (__c == _M_catalog_counter - 1)
+  --_M_catalog_counter;
+  }
+
+  const Catalog_info*
+  Catalogs::_M_get(messages_base::catalog __c) const
+  {
+__gnu_cxx::__scoped_lock lock(_M_mutex);
+
+vectorCatalog_info*::const_iterator __res =
+  lower_bound(_M_infos.begin(), _M_infos.end(), __c, _CatalogIdComp());
+
+if (__res != _M_infos.end()  (*__res)-_M_id == __c)
+  return *__res;
+
+return 0;
+  }
+
+  Catalogs
+  get_catalogs()
+  {
+static Catalogs __catalogs;
+return __catalogs;
+  }
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git libstdc++-v3/config/locale/gnu/messages_members.cc libstdc++-v3/config/locale/gnu/messages_members.cc
index 2e6122d..90f4b8d

[committed] Simplify gather_scalar_reductions

2015-07-27 Thread Tom de Vries


Hi,

this patch adds an early-out to gather_scalar_reductions, and further 
simplifies the function structure.


Bootstrapped on x86_64. Reg-tested with relevant testcases.

Committed as obvious.

Thanks,
- Tom
Simplify gather_scalar_reductions

2015-07-27  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (gather_scalar_reductions): Simplify function
	structure.
---
 gcc/tree-parloops.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 6e75cb1..daf23f2 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2370,6 +2370,8 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list
   loop_vec_info simple_loop_info;
 
   simple_loop_info = vect_analyze_loop_form (loop);
+  if (simple_loop_info == NULL)
+return;
 
   for (gsi = gsi_start_phis (loop-header); !gsi_end_p (gsi); gsi_next (gsi))
 {
@@ -2381,15 +2383,16 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list
   if (virtual_operand_p (res))
 	continue;
 
-  if (!simple_iv (loop, loop, res, iv, true)
-	 simple_loop_info)
-	{
-	   gimple reduc_stmt
-	 = vect_force_simple_reduction (simple_loop_info, phi, true,
-	double_reduc, true);
-	   if (reduc_stmt  !double_reduc)
-  build_new_reduction (reduction_list, reduc_stmt, phi);
-}
+  if (simple_iv (loop, loop, res, iv, true))
+	continue;
+
+  gimple reduc_stmt
+	= vect_force_simple_reduction (simple_loop_info, phi, true,
+   double_reduc, true);
+  if (!reduc_stmt || double_reduc)
+	continue;
+
+  build_new_reduction (reduction_list, reduc_stmt, phi);
 }
   destroy_loop_vec_info (simple_loop_info, true);
 
-- 
1.9.1

[committed] Remove unused decl in a test

2015-07-27 Thread Marek Polacek

Applied.

2015-07-27  Marek Polacek  pola...@redhat.com

* c-c++-common/Wtautological-compare-2.c: Remove unused declaration.

diff --git gcc/testsuite/c-c++-common/Wtautological-compare-2.c 
gcc/testsuite/c-c++-common/Wtautological-compare-2.c
index c8aecef..260d9f7 100644
--- gcc/testsuite/c-c++-common/Wtautological-compare-2.c
+++ gcc/testsuite/c-c++-common/Wtautological-compare-2.c
@@ -2,8 +2,6 @@
 /* { dg-do compile } */
 /* { dg-options -Wtautological-compare } */
 
-extern int foo (void);
-
 #define A a
 #define B A
 #define FOO (A  B)

Marek

Re: PATCH: PR bootstrap/66978: [6 Regression] bootstrap failure with --with-multilib-list=m32,m64,mx32

2015-07-27 Thread Alexandre Oliva

On Jul 24, 2015, H.J. Lu hjl.to...@gmail.com wrote:

 On Fri, Jul 24, 2015 at 10:53 AM, Alexandre Oliva aol...@redhat.com wrote:
 On Jul 24, 2015, H.J. Lu hongjiu...@intel.com wrote:
 
 Static chain returned from get_rtl_for_parm_ssa_default_def may not
 have Pmode.  This patch converts static chain to Pmode if needed.
 Tested on Linux/x86-64.  OK for trunk?
 
 Yeah, this looks good to me, thanks.  Even the static chain parm_decl
 undergoes promotion through promote_decl_mode.  I'd have placed the test
 and convert_to_mode call as an else to the if (!local), though.  Ok with
 that change too.
 

 I tried it and it doesn't work due to

  mark_reg_pointer (local, TYPE_ALIGN (TREE_TYPE (TREE_TYPE (parm;

 mark_reg_pointer won't take a SUBREG.  That is why I placed it after
 mark_reg_pointer .  Is my orginal patch OK?

Aah!  Yes, thanks, I will fold it into a future updated version of the
patch for PR64164.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: C++ delayed folding branch review

2015-07-27 Thread Kai Tietz

2015-07-27 18:51 GMT+02:00 Jason Merrill ja...@redhat.com:
 I've trimmed this to the previously mentioned issues that still need to be
 addressed; I'll do another full review after these are dealt with.

Thanks for doing this summary of missing parts of prior review.

 On 06/13/2015 12:15 AM, Jason Merrill wrote:

 On 06/12/2015 12:11 PM, Kai Tietz wrote:

 @@ -1052,6 +1054,9 @@ adjust_temp_type (tree type, tree temp)
   {
 if (TREE_TYPE (temp) == type)
   return temp;
 +  STRIP_NOPS (temp);
 +  if (TREE_TYPE (temp) == type)
 +return temp;
 @@ -1430,6 +1438,8 @@ cxx_eval_call_expression (const constexpr_ctx
 *ctx,
 tree t,
   bool
   reduced_constant_expression_p (tree t)
   {
 +  /* Make sure we remove useless initial NOP_EXPRs.  */
 +  STRIP_NOPS (t);


 Within the constexpr code we should be folding away NOPs as they are
 generated, they shouldn't live this long.


 Well, we might see them on overflows ...


 We shouldn't within the constexpr code.  NOPs for expressions that are
 non-constant due to overflow are added in
 cxx_eval_outermost_constant_expr, so we shouldn't see them in the middle
 of constexpr evaluation.

 @@ -1088,7 +1093,10 @@ cxx_bind_parameters_in_call (const constexpr_ctx
 *ctx, tree t,
 is_dummy_object (x))
  {
x = ctx-object;
 - x = cp_build_addr_expr (x, tf_warning_or_error);
 + if (x)
 +   x = cp_build_addr_expr (x, tf_warning_or_error);
 + else
 +   x = get_nth_callarg (t, i);


 This still should not be necessary.


 Yeah, most likely.  But I got initially here some issues, so I don't
 see that this code would worsen things.


 If this code path is hit, that means something has broken my design, and
 I don't want to just paper over that.  Please revert this change.

   case SIZEOF_EXPR:
 +  if (processing_template_decl
 +  (!COMPLETE_TYPE_P (TREE_TYPE (t))
 + || TREE_CODE (TYPE_SIZE (TREE_TYPE (t))) != INTEGER_CST))
 +   return t;


 Why is this necessary?


 We don't want to resolve SIZEOF_EXPR within template-declarations for
 incomplete types, of if its size isn't fixed.  Issue is that we
 otherwise get issues about expressions without existing type (as usual
 within template-declarations for some expressions).


 Yes, but we shouldn't have gotten this far with a dependent sizeof;
 maybe_constant_value just returns if
 instantiation_dependent_expression_p is true.

 @@ -3391,8 +3431,23 @@ cxx_eval_constant_expression (const
 constexpr_ctx
 *ctx, tree t,
   case CONVERT_EXPR:
   case VIEW_CONVERT_EXPR:
   case NOP_EXPR:
 +case UNARY_PLUS_EXPR:
 {
 +   enum tree_code tcode = TREE_CODE (t);
  tree oldop = TREE_OPERAND (t, 0);
 +
 +   if (tcode == NOP_EXPR  TREE_TYPE (t) == TREE_TYPE (oldop) 
 TREE_OVERFLOW_P (oldop))
 + {
 +   if (!ctx-quiet)
 + permerror (input_location, overflow in constant
 expression);
 +   /* If we're being permissive (and are in an enforcing
 +   context), ignore the overflow.  */
 +   if (!flag_permissive)
 + *overflow_p = true;
 +   *non_constant_p = true;
 +
 +   return t;
 + }
  tree op = cxx_eval_constant_expression (ctx, oldop,


 Why doesn't the call to cxx_eval_constant_expression at the bottom here
 handle oldop having TREE_OVERFLOW set?


 I just handled the case that we see here a wrapping NOP_EXPR around an
 overflow.  As this isn't handled by cxx_eval_constant_expression.


 How does it need to be handled?  A NOP_EXPR wrapped around an overflow
 is there to indicated that the expression is non-constant, and it can't
 be simplified any farther.

 Please give an example of what was going wrong.

 @@ -565,6 +571,23 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p,
 gimple_seq *post_p)

 switch (code)
   {
 +case SIZEOF_EXPR:
 +  if (SIZEOF_EXPR_TYPE_P (*expr_p))
 +   *expr_p = cxx_sizeof_or_alignof_type (TREE_TYPE (TREE_OPERAND
 (*expr_p,
 +
 0)),
 + SIZEOF_EXPR, false);
 +  else if (TYPE_P (TREE_OPERAND (*expr_p, 0)))
 +   *expr_p = cxx_sizeof_or_alignof_type (TREE_OPERAND (*expr_p,
 0),
 + SIZEOF_EXPR, false);
 +  else
 +   *expr_p = cxx_sizeof_or_alignof_expr (TREE_OPERAND (*expr_p,
 0),
 + SIZEOF_EXPR, false);
 +  if (*expr_p == error_mark_node)
 +   *expr_p = size_one_node;
 +
 +  *expr_p = maybe_constant_value (*expr_p);
 +  ret = GS_OK;
 +  break;


 Why are these surviving until gimplification time?


 This might be still necessary. I will retest, when bootstrap works.
 As we now added SIZEOF_EXPR folding to cp_fold, and if we catch all
 expressions a sizeof can occure, this shouldn't be necessary anymore.
 AFAIR I saw here some issues about initialzation for global-variables,
 which weren't caught.

Re: [patch] [fixincludes] Ignore .DS_Store junk files when running make check

2015-07-27 Thread Eric Gallager

On 7/6/15, Jeff Law l...@redhat.com wrote:
 On 07/05/2015 04:58 PM, Eric Gallager wrote:
 I was just matching the code that was already used there... should the
 lines to ignore the CVS and .svn folders be re-written into the style
 you propose, too?
 Might as well have a consistent style.  Embedding them into the find
 ought to be marginally faster than a pipeline of fgrep processes.

 jeff


Okay, I tried embedding ! -name CVS/ ! -name .svn/ into the find
command, too, but
that just led to warnings such as:
find: warning: Unix filenames usually don't contain slashes (though
pathnames do).  That means that '-name `CVS/'' will probably evaluate
to false all the time on this system.  You might find the '-wholename'
test more useful, or perhaps '-samefile'.  Alternatively, if you are
using GNU grep, you could use 'find ... -print0 | grep -FzZ `CVS/''.
find: warning: Unix filenames usually don't contain slashes (though
pathnames do).  That means that '-name `.svn/'' will probably evaluate
to false all the time on this system.  You might find the '-wholename'
test more useful, or perhaps '-samefile'.  Alternatively, if you are
using GNU grep, you could use 'find ... -print0 | grep -FzZ `.svn/''.

I'm not really sure which of these alternatives to go with... cc-ing
the fixincludes maintainer to see which he wants to go with...

Re: [PING][PATCH, PR66851] Handle double reduction in parloops

2015-07-27 Thread Tom de Vries


On 24/07/15 12:30, Tom de Vries wrote:

On 13/07/15 16:55, Tom de Vries wrote:

Hi,

this patch fixes PR66851.

In parloops, we manage to parallelize outer loops, but not if the inner
loop contains a reduction. There is an xfail in autopar/outer-4.c for
this:
...
/* { dg-final { scan-tree-dump-times parallelizing outer loop 1
parloops { xfail *-*-* } } } */
...

This patch allows outer loops with a reduction in the inner loop to be
parallelized.



Updated patch checks that we actually have an inner reduction that we 
can parallelize. So, uns-outer-4.c with unsigned int reduction will be 
paralellized, while outer-4.c with signed int reduction will not be 
paralellized.


Bootstrapped on x86_64, reg-test in progress.

OK for trunk?

Thanks,
- Tom

Handle double reduction in parloops

2015-07-27  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (reduc_stmt_res): New function.
	(initialize_reductions, add_field_for_reduction)
	(create_phi_for_local_result, create_loads_for_reductions)
	(create_stores_for_reduction, build_new_reduction): Handle case that
	reduc_stmt is a phi.
	(gather_scalar_reductions): Allow double_reduc reductions.

	* gcc.dg/autopar/uns-outer-4.c: Remove xfail on scan for parallelizing
	outer loop.

	* testsuite/libgomp.c/uns-outer-4.c: New test.
---
 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c |  6 +--
 gcc/tree-parloops.c| 73 ++
 libgomp/testsuite/libgomp.c/uns-outer-4.c  | 36 +++
 3 files changed, 102 insertions(+), 13 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/uns-outer-4.c

diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
index 30ead25..5eb67ea 100644
--- a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
@@ -12,9 +12,7 @@ parloop (int N)
   int i, j;
   unsigned int sum;
 
-  /* Double reduction is currently not supported, outer loop is not
- parallelized.  Inner reduction is detected, inner loop is
- parallelized.  */
+  /* Double reduction is detected, outer loop is parallelized.  */
   sum = 0;
   for (i = 0; i  N; i++)
 for (j = 0; j  N; j++)
@@ -23,5 +21,5 @@ parloop (int N)
   g_sum = sum;
 }
 
-/* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times parallelizing outer loop 1 parloops } } */
 /* { dg-final { scan-tree-dump-times loopfn 4 optimized } } */
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index daf23f2..b06265c 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -549,6 +549,14 @@ take_address_of (tree obj, tree type, edge entry,
   return name;
 }
 
+static tree
+reduc_stmt_res (gimple stmt)
+{
+  return (gimple_code (stmt) == GIMPLE_PHI
+	  ? gimple_phi_result (stmt)
+	  : gimple_assign_lhs (stmt));
+}
+
 /* Callback for htab_traverse.  Create the initialization statement
for reduction described in SLOT, and place it at the preheader of
the loop described in DATA.  */
@@ -575,7 +583,7 @@ initialize_reductions (reduction_info **slot, struct loop *loop)
   c = build_omp_clause (gimple_location (reduc-reduc_stmt),
 			OMP_CLAUSE_REDUCTION);
   OMP_CLAUSE_REDUCTION_CODE (c) = reduc-reduction_code;
-  OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (gimple_assign_lhs (reduc-reduc_stmt));
+  OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (reduc_stmt_res (reduc-reduc_stmt));
 
   init = omp_reduction_init (c, TREE_TYPE (bvar));
   reduc-init = init;
@@ -982,7 +990,7 @@ add_field_for_reduction (reduction_info **slot, tree type)
 {
 
   struct reduction_info *const red = *slot;
-  tree var = gimple_assign_lhs (red-reduc_stmt);
+  tree var = reduc_stmt_res (red-reduc_stmt);
   tree field = build_decl (gimple_location (red-reduc_stmt), FIELD_DECL,
 			   SSA_NAME_IDENTIFIER (var), TREE_TYPE (var));
 
@@ -1042,12 +1050,12 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop)
 e = EDGE_PRED (store_bb, 1);
   else
 e = EDGE_PRED (store_bb, 0);
-  local_res = copy_ssa_name (gimple_assign_lhs (reduc-reduc_stmt));
+  tree lhs = reduc_stmt_res (reduc-reduc_stmt);
+  local_res = copy_ssa_name (lhs);
   locus = gimple_location (reduc-reduc_stmt);
   new_phi = create_phi_node (local_res, store_bb);
   add_phi_arg (new_phi, reduc-init, e, locus);
-  add_phi_arg (new_phi, gimple_assign_lhs (reduc-reduc_stmt),
-	   FALLTHRU_EDGE (loop-latch), locus);
+  add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop-latch), locus);
   reduc-new_phi = new_phi;
 
   return 1;
@@ -1140,7 +1148,7 @@ create_loads_for_reductions (reduction_info **slot, struct clsn_data *clsn_data)
   struct reduction_info *const red = *slot;
   gimple stmt;
   gimple_stmt_iterator gsi;
-  tree type = TREE_TYPE (gimple_assign_lhs (red-reduc_stmt));
+  tree type = TREE_TYPE (reduc_stmt_res (red-reduc_stmt));
   tree load_struct;
   tree name;
   tree x;
@@ -1205,7 +1213,7 @@ create_stores_for_reduction (reduction_info **slot,

Re: [PATCH] Cleaning up incomplete type warning.

2015-07-27 Thread Manuel López-Ibáñez

On 5 January 2015 at 21:14, Jeff Law l...@redhat.com wrote:
 On 12/24/14 11:07, Luis Felipe Strano Moraes wrote:

 This removes an unnecessary static variable from the code and also makes
 it a single
 warning instead of two. Patch originally proposed by Manuel López-Ibáñez.

 Luis Strano


 clean_incomplete_warning.log


 gcc/c/ChangeLog:

 2014-12-24  Luis Felipe Strano Moraesluis.str...@gmail.com

 * c-decl.c (get_parm_info): cleaning up incomplete type warning.

 OK for the trunk assuming it passed bootstrap and regression testing.

This was never committed. I regtested it and committed as
https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=226274

Cheers,

Manuel.

Re: [patch] [fixincludes] Ignore .DS_Store junk files when running make check

2015-07-27 Thread Andreas Schwab

Eric Gallager eg...@gwmail.gwu.edu writes:

 Okay, I tried embedding ! -name CVS/ ! -name .svn/ into the find

-name does an exact match, so you don't need the slash.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.

Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation

2015-07-27 Thread Patrick Palka

On Tue, Jun 23, 2015 at 3:05 PM, Patrick Palka patr...@parcs.ath.cx wrote:
 On Mon, Jun 22, 2015 at 2:56 PM, Patrick Palka patr...@parcs.ath.cx wrote:
 On Mon, Jun 22, 2015 at 1:29 PM, Jeff Law l...@redhat.com wrote:
 On 06/09/2015 11:31 AM, Patrick Palka wrote:

 This patch refactors the entry point of -Wmisleading-indentation from:

void
warn_for_misleading_indentation (location_t guard_loc,
 location_t body_loc,
 location_t next_stmt_loc,
 enum cpp_ttype next_tok_type,
 const char *guard_kind);

 to

struct token_indent_info
{
  location_t location;
  cpp_ttype type;
  rid keyword;
};

void
warn_for_misleading_indentation (const token_indent_info guard_tinfo,
 const token_indent_info body_tinfo,
 const token_indent_info next_tinfo);

 The purpose of this refactoring is to expose more information to the
 -Wmisleading-indentation implementation to allow for more advanced
 heuristics and for better coverage.

 (I decided to keep the usage of const references because nobody
 seems to mind.  Also I added a new header file, c-indentation.h.)

 gcc/c-family/ChangeLog:

 * c-indentation.h (struct token_indent_info): Define.
 (get_token_indent_info): Define.
 (warn_for_misleading_information): Declare.
 * c-common.h (warn_for_misleading_information): Remove.
 * c-identation.c (warn_for_misleading_indentation):
 Change declaration to take three token_indent_infos.  Adjust
 accordingly.
 * c-identation.c (should_warn_for_misleading_indentation):
 Likewise.  Bail out early if the body is a compound statement.
 (guard_tinfo_to_string): Define.

 gcc/c/ChangeLog:

 * c-parser.c (c_parser_if_body): Take token_indent_info
 argument. Call warn_for_misleading_indentation even when the
 body is a semicolon.  Extract token_indent_infos corresponding
 to the guard, body and next tokens.  Adjust call to
 warn_for_misleading_indentation accordingly.
 (c_parser_else_body): Likewise.
 (c_parser_if_statement): Likewise.
 (c_parser_while_statement): Likewise.
 (c_parser_for_statement): Likewise.

 gcc/cp/ChangeLog:

 * parser.c (cp_parser_selection_statement): Move handling of
 semicolon body to ...
 (cp_parser_implicitly_scoped_statement): .. here.  Call
 warn_for_misleading_indentation even when the body is a
 semicolon.  Extract token_indent_infos corresponding to the
 guard, body and next tokens.  Adjust call to
 warn_for_misleading_indentation accordingly.  Take
 token_indent_info argument.
 (cp_parser_already_scoped_statement): Likewise.
 (cp_parser_selection_statement, cp_parser_iteration_statement):
 Extract a token_indent_info corresponding to the guard token.

 The only question in my mind is bootstrap  regression testing.  From
 reading the thread for the earlier version of this patch I got the
 impression you had bootstrapped and regression tested earlier versions.

 If you could confirm that you've bootstrapped and regression tested this
 version it'd be appreciated.  You can do it on the individual patches or the
 set as a whole.

 I think I successfully bootstrapped + regtested this exact version but
 I'm not sure.  I was going to do so again before committing anyway.
 I will fire off a build tonight and confirm the results tomorrow.

 Bootstrap + regtest on x86_64-linux-gnu was successful with no new 
 regressions.



 Jeff



Hi Jeff.  Sorry for the late reply.  I would like to commit this
series now (after another bootstrap + regtest cycle) to work on
further refinements to this warning.  You clearly approved the last
two out of three patches, but it is not clear whether you approved
this first patch.  Could you clarify your stance on this patch?
Thanks again for reviewing.

Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-27 Thread Kyrill Tkachov


Hi Jeff,

On 24/07/15 19:43, Jeff Law wrote:

On 07/24/2015 03:31 AM, Kyrill Tkachov wrote:


Wouldn't it be better to walk BB_A, gathering the set of all the
registers modified, then do a single walk through BB testing for uses of
those registers?

I think so, yes. I'll try that.

You might look at resource.c -- I haven't looked at it in a long time,
but it might have many of the interfaces you need to do this kind of
book keeping.


I experimented with resource.c and the roadblock I hit is that it seems
to have an assumption that it operates on hard regs (in fact the struct it uses 
to
describe the resources has a HARD_REG_SET for the regs) and so it triggers 
various
HARD_REGISTER_P asserts when I try to use the functions there.
if-conversion runs before register allocation, so we're dealing with pseudos 
here.

My other attempt was to go over BB_A and mark the set registers in a bitmap then
go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of each insn. If a sub-rtx 
is
a reg that is set in the bitmap from BB_A we return false. This seemed to do 
the job
and testing worked out ok. That would require one walk over BB_A, one walk over 
BB_B
but I don't know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?

snip

It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have
not seen it cause any trouble. If it fails then back in
noce_try_cmove_arith we goto end_seq_and_fail which ends
the sequence and throws it away (and cancels if-conversion
down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not fail,
but it seems that was a bad assumption on my part.So given
noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.


I'll look into that, but what is the machinery by which one moves insns?

Thanks,
Kyrill



2. Investigate a more efficient means to find set/use conflicts
 between the two blocks, possibly using resource.[ch]

jeff

[PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode

2015-07-27 Thread David Sherwood

Hi,

Part 1 of this change is a clean-up. I have changed calls to GET_MODE_INNER (m)
so that it returns m in cases where there is no inner mode. This simplifies some
of the calling code by removing the need to check for VOIDmode and allows
calling it unconditionally. I also removed element_precision () as it was only
called in one place and thought it neater to call GET_MODE_PRECISION explicitly.

Parts 2-4 will include further tidy-ups and optimisations based on [1/N].

Good to go?

Regards,
David Sherwood.

2015-07-17  David Sherwood  david.sherw...@arm.com

gcc/
* config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call
GET_MODE_INNER unconditionally.
* config/spu/spu.c (arith_immediate_p): Likewise.
* config/i386/i386.c (ix86_build_signbit_mask): Likewise.  New variable.
* expmed.c (synth_mult): Remove check for VOIDmode result from
GET_MODE_INNER.
(expand_mult_const): Likewise.
* fold-const.c (): Replace call to element_precision with call to
GET_MODE_PRECISION.
* genmodes.c (emit_mode_inner_inline): Replace void_mode-name with
m-name.
(emit_mode_inner): Likewise.
* lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER
result check.
* machmode.h (GET_MODE_UNIT_SIZE): Simplify.
(GET_MODE_UNIT_PRECISION): Likewise.
* rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally.
* simplify-rtx.c (simplify_immed_subreg): Likewise.
* stor-layout.c (bitwise_type_for_mode): Update assert.
(element_precision): Remove.



mode_inner1.patch
Description: Binary data

Re: [PATCH 1/4][ARM][PR target/65697][5.1] Backport stronger barriers for __sync fetch-op builtins.

2015-07-27 Thread Matthew Wahab


Ping. Updated patch attached.

Also, retested for arm-none-linux-gnueabihf with native bootstrap and make
check and for arm-none-eabi with cross compiled make check.

On 02/07/15 14:12, Matthew Wahab wrote:

The __sync builtins are implemented using barriers that are too weak for ARMv8
targets, this has been fixed on trunk for the ARM back-end. Since GCC-5.1 is
also generating the incorrect code, it should also be fixed.

This patch backports the changes made to strengthen the barriers emitted for
the __sync fetch-and-op/op-and-fetch builtins.

The trunk patch submission is at
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01410.html
The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01235.html

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for the branch?
Matthew

2015-07-02  Matthew Wahab  matthew.wa...@arm.com

 Backport from trunk:
 2015-06-29  Matthew Wahab  matthew.wa...@arm.com

 PR target/65697
 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
 initial acquire barrier with final barrier.


From 0c2f209f869aead3475fe491f08cf7640d2bc8fe Mon Sep 17 00:00:00 2001
From: mwahab mwahab@138bc75d-0d04-0410-961f-82ee72b054a4
Date: Mon, 29 Jun 2015 16:03:34 +
Subject: [PATCH 1/4] 2015-07-01  Matthew Wahab  matthew.wa...@arm.com

Backport
	PR target/65697
	* config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
	initial acquire barrier with final barrier.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225132 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts:
	gcc/ChangeLog

Change-Id: I2074541794ecad8847ada04690cd9132a51b6404
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 614ff0d..f694e74 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27822,6 +27822,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   rtx_code_label *label;
   rtx x;
 
+  bool is_armv8_sync = arm_arch8  is_mm_sync (model);
+
   bool use_acquire = TARGET_HAVE_LDACQ
   !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_release (model));
@@ -27830,6 +27832,11 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   !(is_mm_relaxed (model) || is_mm_consume (model)
 			  || is_mm_acquire (model));
 
+  /* For ARMv8, a load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (model);
@@ -27900,7 +27907,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label));
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (model);
 }
 
-- 
1.9.1

Re: [PATCH 3/4][ARM][PR target/65697][5.1] Add tests for __sync_builtins.

2015-07-27 Thread Matthew Wahab


Ping. Updated patch attached.

Also, retested for arm-none-linux-gnueabihf with native bootstrap and make
check and for arm-none-eabi with cross compiled make check.

On 02/07/15 14:17, Matthew Wahab wrote:

This patch backports the tests added for code generated by the ARM back-end for
the __sync builtins.

The trunk patch submission is at
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01412.html
The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01237.html

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for the branch?
Matthew

gcc/testsuite
2015-07-02  Matthew Wahab  matthew.wa...@arm.com

 Backport from trunk:
 2015-06-29  Matthew Wahab  matthew.wa...@arm.com

 PR target/65697
 * gcc.target/arm/armv-sync-comp-swap.c: New.
 * gcc.target/arm/armv-sync-op-acquire.c: New.
 * gcc.target/arm/armv-sync-op-full.c: New.
 * gcc.target/arm/armv-sync-op-release.c: New.



From d1a53325eb47a5da55a8267deb5fbe168f7db4de Mon Sep 17 00:00:00 2001
From: mwahab mwahab@138bc75d-0d04-0410-961f-82ee72b054a4
Date: Mon, 29 Jun 2015 16:12:12 +
Subject: [PATCH 3/4] 2015-07-01  Matthew Wahab  matthew.wa...@arm.com

	Backport
	PR target/65697
	* gcc.target/arm/armv-sync-comp-swap.c: New.
	* gcc.target/arm/armv-sync-op-acquire.c: New.
	* gcc.target/arm/armv-sync-op-full.c: New.
	* gcc.target/arm/armv-sync-op-release.c: New.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225134 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts:
	gcc/ChangeLog

Change-Id: I16c02786765bbbfbb287fba863ba27fb6a56ddc5
---
 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c  | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 10 ++
 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c |  8 
 4 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
 create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c

diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
new file mode 100644
index 000..f96c81a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options -O2 } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include ../aarch64/sync-comp-swap.x
+
+/* { dg-final { scan-assembler-times ldrex 2 } } */
+/* { dg-final { scan-assembler-times stlex 2 } } */
+/* { dg-final { scan-assembler-times dmb 2 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
new file mode 100644
index 000..8d6659b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options -O2 } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include ../aarch64/sync-op-acquire.x
+
+/* { dg-final { scan-assembler-times ldrex 1 } } */
+/* { dg-final { scan-assembler-times stlex 1 } } */
+/* { dg-final { scan-assembler-times dmb 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
new file mode 100644
index 000..a5ad3bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options -O2 } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include ../aarch64/sync-op-full.x
+
+/* { dg-final { scan-assembler-times ldrex 12 } } */
+/* { dg-final { scan-assembler-times stlex 12 } } */
+/* { dg-final { scan-assembler-times dmb 12 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
new file mode 100644
index 000..0d3be7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options -O2 } */
+/* { dg-add-options arm_arch_v8a } */
+
+#include ../aarch64/sync-op-release.x
+
+/* { dg-final { scan-assembler-times stl 1 } } */
-- 
1.9.1

RE: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode

2015-07-27 Thread David Sherwood

Hi,

Sorry, I forgot to mention I tested this on:

aarch64 and aarch64_be - no regressions in gcc testsuite
x86_64 - bootstrap build, no testsuite regressions
arm-none-eabi - no regressions in gcc testsuite

I will also make sure that I can do cross builds on a variety of different 
targets.

Regards,
David.

 -Original Message-
 From: David Sherwood [mailto:david.sherw...@arm.com]
 Sent: 27 July 2015 11:25
 To: gcc-patches@gcc.gnu.org
 Subject: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode
 
 Hi,
 
 Part 1 of this change is a clean-up. I have changed calls to GET_MODE_INNER 
 (m)
 so that it returns m in cases where there is no inner mode. This simplifies 
 some
 of the calling code by removing the need to check for VOIDmode and allows
 calling it unconditionally. I also removed element_precision () as it was only
 called in one place and thought it neater to call GET_MODE_PRECISION 
 explicitly.
 
 Parts 2-4 will include further tidy-ups and optimisations based on [1/N].
 
 Good to go?
 
 Regards,
 David Sherwood.
 
 2015-07-17  David Sherwood  david.sherw...@arm.com
 
 gcc/
 * config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call
 GET_MODE_INNER unconditionally.
 * config/spu/spu.c (arith_immediate_p): Likewise.
 * config/i386/i386.c (ix86_build_signbit_mask): Likewise.  New 
 variable.
 * expmed.c (synth_mult): Remove check for VOIDmode result from
 GET_MODE_INNER.
 (expand_mult_const): Likewise.
 * fold-const.c (): Replace call to element_precision with call to
 GET_MODE_PRECISION.
 * genmodes.c (emit_mode_inner_inline): Replace void_mode-name with
 m-name.
 (emit_mode_inner): Likewise.
 * lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER
 result check.
 * machmode.h (GET_MODE_UNIT_SIZE): Simplify.
 (GET_MODE_UNIT_PRECISION): Likewise.
 * rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally.
 * simplify-rtx.c (simplify_immed_subreg): Likewise.
 * stor-layout.c (bitwise_type_for_mode): Update assert.
 (element_precision): Remove.

Re: [PATCH 2/4][ARM][PR target/65697][5.1] Backport stronger barriers for __sync,compare-and-swap builtins.

2015-07-27 Thread Matthew Wahab


Ping. Updated patch attached.

Also, retested for arm-none-linux-gnueabihf with native bootstrap and make
check and for arm-none-eabi with cross compiled make check.


On 02/07/15 14:15, Matthew Wahab wrote:

This patch backports the changes made to strengthen the barriers emitted for
the __sync compare-and-swap builtins.

The trunk patch submission is at
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01411.html
The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-06/msg01236.html

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for the branch?
Matthew

2015-07-02  Matthew Wahab  matthew.wa...@arm.com

 Backport from trunk:
 2015-06-29  Matthew Wahab  matthew.wa...@arm.com

 PR target/65697
 * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8,
 replace an initial acquire barrier with final barrier.



From fdcde1aa0b852f2a01bb45115e28f694b0225fcf Mon Sep 17 00:00:00 2001
From: mwahab mwahab@138bc75d-0d04-0410-961f-82ee72b054a4
Date: Mon, 29 Jun 2015 16:09:10 +
Subject: [PATCH 2/4] 2015-07-01  Matthew Wahab  matthew.wa...@arm.com

	Backport
	PR target/65697
	* config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an
	initial acquire barrier with final barrier.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225133 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts:
	gcc/ChangeLog

Change-Id: Ifab505d792d6227c7d2231813dfb2e7826f0f450
---
 gcc/config/arm/arm.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f694e74..1e67a73 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27757,6 +27757,8 @@ arm_split_compare_and_swap (rtx operands[])
   scratch = operands[7];
   mode = GET_MODE (mem);
 
+  bool is_armv8_sync = arm_arch8  is_mm_sync (mod_s);
+
   bool use_acquire = TARGET_HAVE_LDACQ
   !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_release (mod_s));
@@ -27765,6 +27767,11 @@ arm_split_compare_and_swap (rtx operands[])
   !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s)
 			  || is_mm_acquire (mod_s));
 
+  /* For ARMv8, the load-acquire is too weak for __sync memory orders.  Instead,
+ a full barrier is emitted after the store-release.  */
+  if (is_armv8_sync)
+use_acquire = false;
+
   /* Checks whether a barrier is needed and emits one accordingly.  */
   if (!(use_acquire || use_release))
 arm_pre_atomic_barrier (mod_s);
@@ -27805,7 +27812,8 @@ arm_split_compare_and_swap (rtx operands[])
 emit_label (label2);
 
   /* Checks whether a barrier is needed and emits one accordingly.  */
-  if (!(use_acquire || use_release))
+  if (is_armv8_sync
+  || !(use_acquire || use_release))
 arm_post_atomic_barrier (mod_s);
 
   if (is_mm_relaxed (mod_f))
-- 
1.9.1

Re: [PATCH 4/4][ARM][PR target/65697][5.1] Fix tests for __sync_builtins.

2015-07-27 Thread Matthew Wahab


Ping. Updated patch attached.

Also, retested for arm-none-linux-gnueabihf with native bootstrap and make
check and for arm-none-eabi with cross compiled make check.

On 02/07/15 14:18, Matthew Wahab wrote:

This patch backports fixes for the __sync builtin tests.

The trunk patch submission is at
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00031.html
The commit is at https://gcc.gnu.org/ml/gcc-cvs/2015-07/msg00025.html

Tested the series for arm-none-linux-gnueabihf with check-gcc

Ok for the branch?
Matthew

gcc/testsuite
2015-07-02  Matthew Wahab  matthew.wa...@arm.com

 Backport from trunk:
 2015-07-01  Matthew Wahab  matthew.wa...@arm.com

 * gcc.target/arm/armv8-sync-comp-swap.c: Replace
 'do-require-effective-target' with 'dg-require-effective-target'.
 * gcc.target/arm/armv8-sync-op-full.c: Likewise.
 * gcc.target/arm/armv8-sync-op-release.c: Likewise.
 * gcc.target/arm/armv8-sync-op-acquire.c: Likewise.  Also, replace
 'stlex' with 'strex' as the expected output.



From d058686fe1027927a5fdfbb81a83526e3f9b9d6d Mon Sep 17 00:00:00 2001
From: mwahab mwahab@138bc75d-0d04-0410-961f-82ee72b054a4
Date: Wed, 1 Jul 2015 12:16:01 +
Subject: [PATCH 4/4] 2015-07-01  Matthew Wahab  matthew.wa...@arm.com

	Backport
	* gcc.target/arm/armv8-sync-comp-swap.c: Replace
	'do-require-effective-target' with 'dg-require-effective-target'.
	* gcc.target/arm/armv8-sync-op-full.c: Likewise.
	* gcc.target/arm/armv8-sync-op-release.c: Likewise.
	* gcc.target/arm/armv8-sync-op-acquire.c: Likewise.  Also, replace
'stlex' with 'strex' as the expected output.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225241 138bc75d-0d04-0410-961f-82ee72b054a4

Conflicts:
	gcc/testsuite/ChangeLog

Change-Id: I19f2013f7bdd2dd035f36f0f7c9829cf6a86fb8e
---
 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c  | 2 +-
 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 4 ++--
 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 2 +-
 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
index f96c81a..0e95986 100644
--- a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-options -O2 } */
 /* { dg-add-options arm_arch_v8a } */
 
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
index 8d6659b..c448599 100644
--- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c
@@ -1,10 +1,10 @@
 /* { dg-do compile } */
-/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-options -O2 } */
 /* { dg-add-options arm_arch_v8a } */
 
 #include ../aarch64/sync-op-acquire.x
 
 /* { dg-final { scan-assembler-times ldrex 1 } } */
-/* { dg-final { scan-assembler-times stlex 1 } } */
+/* { dg-final { scan-assembler-times strex 1 } } */
 /* { dg-final { scan-assembler-times dmb 1 } } */
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
index a5ad3bd..cce9e00 100644
--- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-options -O2 } */
 /* { dg-add-options arm_arch_v8a } */
 
diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
index 0d3be7b..502a266 100644
--- a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
+++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { do-require-effective-target arm_arch_v8a_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
 /* { dg-options -O2 } */
 /* { dg-add-options arm_arch_v8a } */
 
-- 
1.9.1

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-27 Thread James Greenhalgh

On Mon, Jul 27, 2015 at 10:52:58AM +0100, pins...@gmail.com wrote:
  On Jul 27, 2015, at 2:26 AM, Jiong Wang jiong.w...@arm.com wrote:
  
  Andrew Pinski writes:
  
  On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote:
  
  James Greenhalgh writes:
  
  On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
  Current IRA still use both target macros in a few places.
  
  Tell IRA to use the order we defined rather than with it's own cost
  calculation. Allocate caller saved first, then callee saved.
  
  This is especially useful for LR/x30, as it's free to allocate and is
  pure caller saved when used in leaf function.
  
  Haven't noticed significant impact on benchmarks, but by grepping some
  keywords like Spilling, Push.*spill etc in ira rtl dump, the number
  is smaller.
  
  OK for trunk?
  
  OK, sorry for the delay.
  
  It might be mail client mangling, but please check that the trailing 
  slashes
  line up in the version that gets committed.
  
  Thanks,
  James
  
  2015-05-19  Jiong. Wang  jiong.w...@arm.com
  
  gcc/
   PR 63521
   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
   (HONOR_REG_ALLOC_ORDER): Define.
  
  Patch reverted.
  
  I did not see a reason why this patch was reverted.  Maybe I am
  missing an email or something.
  
  There are several execution regressions under gcc testsuite, although as
  far as I can see it's this patch exposed hidding bugs in those
  testcases, but there might be one other issue, so to be conservative, I
  temporarily reverted this patch.
 
 If you are talking about:
 gcc.target/aarch64/aapcs64/func-ret-2.c execution
 Etc.
 
 These test cases are too dependent on the original register allocation order
 and really can be safely ignored. Really these three tests should be moved or
 written in a more sane way. 

Yup, completely agreed - but the testcases do throw up something
interesting. If we are allocating registers to hold 128-bit values, and
we pick x7 as highest preference, we implicitly allocate x8 along with it.
I think we probably see the same thing if the first thing we do in a
function is a structure copy through a back-end expanded movmem, which
will likely begin with a 128-bit LDP using x7, x8.

If the argument for this patch is that we prefer to allocate x7-x0 first,
followed by x8, then we've potentially made a sub-optimal decision, our
allocation order for 128-bit values is x7,x8,x5,x6 etc.

My hunch is that we *might* get better code generation in this corner case
out of some permutation of the allocation order for argument
registers. I'm thinking something along the lines of

  {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }

I asked Jiong to take a look at that, and I agree with his decision to
reduce the churn on trunk and just revert the patch until we've come to
a conclusion based on some evidence - rather than just my hunch! I agree
that it would be harmless on trunk from a testing point of view, but I
think Jiong is right to revert the patch until we better understand the
code-generation implications.

Of course, it might be that I am completely wrong! If you've already taken
a look at using a register allocation order like the example I gave and
have something to share, I'd be happy to read your advice!

Thanks,
James

Re: [ARM] Correct spelling of references to ARMv6KZ

2015-07-27 Thread Kyrill Tkachov



On 27/07/15 10:47, Matthew Wahab wrote:

On 23/07/15 12:04, Kyrill Tkachov wrote:

GCC supports ARM architecture ARMv6KZ but refers to it as ARMv6ZK. This is made
visible by the command line option -march=armv6zk and by the predefined macro
__ARM_ARCH_6ZK__.

This patch corrects the spelling internally and adds -march=armv6kz. To preserve
existing behaviour, -march=armv6zk is kept as an alias of -march=armv6kz and
both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros are defined for the
architecture.

Use of -march=arm6kz will need to wait for binutils to be updated,[..]

diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index c715bb7..7873606 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -35,6 +35,7 @@ static struct vendor_cpu arm_cpu_table[] = {
   {0xb02, armv6k, mpcore},
   {0xb36, armv6j, arm1136j-s},
   {0xb56, armv6t2, arm1156t2-s},
+{0xb76, armv6kz, arm1176jz-s},
   {0xb76, armv6zk, arm1176jz-s},
   {0xc05, armv7-a, cortex-a5},
   {0xc07, armv7ve, cortex-a7},

This table is scanned from beginning to end, checking for the first field.
You introduce a duplicate here, so the second form will never be reached.
I'd suggest removing the wrong spelling from here, but the re-written march 
string
will be passed to the assembler, so if the assembler is old and doesn't support 
the
correct spelling we'll get errors. So it seems like in order to preserve 
backwards
compatibility we don't want to put the correctly spelled entry here :(
But definitely add a comment here mentioning the deliberate oversight.


Respun patch attached. I've removed armv6kz entry from 
config/arm/driver-arm.c and
replaced it with a comment for the armv6zk entry.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check.

Matthew

gcc/
2015-07-27  Matthew Wahab  matthew.wa...@arm.com

* config/arm/arm-arches.def: Add armv6kz. Replace 6ZK with 6KZ
and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ.
* config/arm/arm-c.c (arm_cpu_builtins): Emit __ARM_ARCH_6ZK__
for armv6kz targets.
* config/arm/arm-cores.def: Replace 6ZK with 6KZ.
* config/arm/arm-protos.h (FL_ARCH6KZ): New.
(FL_FOR_ARCH6ZK): Remove.
(FL_FOR_ARCH6KZ): New.
(arm_arch6zk): New declaration.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch6kz): New.
(arm_option_override): Set arm_arch6kz.
* config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ.
* config/arm/driver-arm.c: Add comment to armv6zk entry.
  * doc/invoke.texi: Replace armv6zk with armv6kz.


This is ok for trunk.
Thanks,
Kyrill

[PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly

2015-07-27 Thread Richard Biener


I noticed that the code we generate for a simple gimple_assign_rhs1 (stmt)
is quite bad as we have two checking pieces.  The implementation is now

 static inline tree
 gimple_assign_rhs1 (const_gimple gs)
 {
   GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
   return gimple_op (gs, 1);
 }

and the hidden checking is due to gimple_op being

static inline tree *
gimple_ops (gimple gs)
{
  size_t off;

  /* All the tuples have their operand vector at the very bottom
 of the structure.  Note that those structures that do not
 have an operand vector have a zero offset.  */
  off = gimple_ops_offset_[gimple_statement_structure (gs)];
  gcc_gimple_checking_assert (off != 0);

  return (tree *) ((char *) gs + off);

(ugly in itself considering that we just verified we have a gassign
which has an operand vector as member...).  gimple_op incurs two
additional memory reference to compute gimple_statement_structure
and gimple_ops_offset_ plus the checking bits.

The simplest thing would be to use

  GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
  return as_a const gassign * (gs)-op[1];

but that's not optimal either because we check we have a gassign
again (not optimized in stage1).  So I decided to invent a fancy
as_a which reports errors similar to GIMPLE_CHECK and makes the
code look like

  const gassign *ass = GIMPLE_CHECK2const gassign * (gs);
  return ass-op[1];

for some reason I needed to overload is_a_helper for const_gimple
(I thought the is_a machinery would transparently support qualified
types?).

I'm missing a fallback for !ENABLE_GIMPLE_CHECKING as well
as an overload for 'gimple' I guess.  I just changed
gimple_assign_rhs1 for now.

So this is a RFC.

I understand in the end the whole GCC may use gassign everywhere
we access gimple_assign_rhs1 but I don't see this happening soon
and this simple proof-of-concept patch already reduces unoptimized
text size of gimple-match.c from 836080 to 836359 (too bad) and
optimized from 585731 to 193190 bytes (yay!) on x86_64 using trunk.
Some inlining must go very differently - we just have 544 calls to
gimple_assign_rhs1 in gimple-match.c.  .optimizes shows all calls
remaining as

  bb 89:
  _ZL18gimple_assign_rhs1PK21gimple_statement_base.part.32 (def_stmt_47);

which is

tree_node* gimple_assign_rhs1(const_gimple) (const struct 
gimple_statement_base * gs)
{
  bb 2:
  gimple_check_failed (gs_1(D), 
/space/rguenther/tramp3d/trunk/gcc/gimple.h[0], 2381, 
gimple_assign_rhs1[0], 6, 0);

}

so eventually we just do a better job with profile estimation.

I think inliner-wise we are even as we get rid of the gimple_op ()
inline but instead get the GIMPLE_CHECK2 inline.

So - any comments?

Thanks,
Richard.

2015-07-27  Richard Biener  rguent...@suse.de

* gimple.h (GIMPLE_CHECK2): New inline template function.
(gassign::code_): New constant static member.
(is_a_helperconst gassign *): Add.
(gimple_assign_rhs1): Use GIMPLE_CHECK2 and a cheaper way
to access operand 1.
* gimple.c (gassign::code_): Define.

Index: gcc/gimple.h
===
--- gcc/gimple.h(revision 226240)
+++ gcc/gimple.h(working copy)
@@ -51,6 +51,16 @@ extern void gimple_check_failed (const_g
   gimple_check_failed (__gs, __FILE__, __LINE__, __FUNCTION__, \
   (CODE), ERROR_MARK); \
   } while (0)
+template typename T
+static inline T
+GIMPLE_CHECK2(const_gimple gs, const char *file = __builtin_FILE (),
+ int line = __builtin_LINE (), const char *fun = 
__builtin_FUNCTION ())
+{
+  T ret = dyn_cast T (gs);
+  if (ret)
+gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK);
+  return ret;
+}
 #else  /* not ENABLE_GIMPLE_CHECKING  */
 #define gcc_gimple_checking_assert(EXPR) ((void)(0  (EXPR)))
 #define GIMPLE_CHECK(GS, CODE) (void)0
@@ -832,6 +842,7 @@ struct GTY((tag(GSS_WITH_OPS)))
 struct GTY((tag(GSS_WITH_MEM_OPS)))
   gassign : public gimple_statement_with_memory_ops
 {
+  static const enum gimple_code code_ = GIMPLE_ASSIGN;
   /* no additional fields; this uses the layout for GSS_WITH_MEM_OPS. */
 };
 
@@ -864,6 +875,14 @@ is_a_helper gassign *::test (gimple gs
 template 
 template 
 inline bool
+is_a_helper const gassign *::test (const_gimple gs)
+{
+  return gs-code == GIMPLE_ASSIGN;
+}
+
+template 
+template 
+inline bool
 is_a_helper gbind *::test (gimple gs)
 {
   return gs-code == GIMPLE_BIND;
@@ -2359,8 +2378,8 @@ gimple_assign_set_lhs (gimple gs, tree l
 static inline tree
 gimple_assign_rhs1 (const_gimple gs)
 {
-  GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
-  return gimple_op (gs, 1);
+  const gassign *ass = GIMPLE_CHECK2const gassign * (gs);
+  return ass-op[1];
 }
 
 
Index: gcc/gimple.c
===
--- gcc/gimple.c(revision 226240)
+++ gcc/gimple.c(working copy)
@@ -89,6 +89,10 @@ static const char * const gimple_alloc_k

Re: [PATCHv2] [fixincludes] Ignore .DS_Store junk files when running make check

2015-07-27 Thread Eric Gallager

On 7/27/15, Andreas Schwab sch...@linux-m68k.org wrote:
 Eric Gallager eg...@gwmail.gwu.edu writes:

 Okay, I tried embedding ! -name CVS/ ! -name .svn/ into the find

 -name does an exact match, so you don't need the slash.

 Andreas.


Okay, attached a new version of the patch; make check for fixincludes
still passes with it. Could someone commit it for me? I don't have
commit access. It's still pretty trivial, so I shouldn't need a
copyright assignment for it, should I?

Eric.
 fixincludes/check.tpl | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fixincludes/check.tpl b/fixincludes/check.tpl
index 0d1f444..ffd2b66 100644
--- a/fixincludes/check.tpl
+++ b/fixincludes/check.tpl
@@ -141,9 +141,8 @@ echo $exitok`
 
 cd $TESTBASE
 
-find * -type f -print | \
-fgrep -v 'CVS/' | \
-fgrep -v '.svn/'  ${TESTDIR}/LIST
+find * -type f ! -name .DS_Store ! -name CVS ! -name .svn -print \
+ ${TESTDIR}/LIST
 
 exitok=`
 exec  ${TESTDIR}/LIST

Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing

2015-07-27 Thread Tim Shen

On Mon, Jul 27, 2015 at 4:45 AM, Jonathan Wakely jwak...@redhat.com wrote:
 On 26/07/15 05:20 -0700, Tim Shen wrote:

 @@ -389,7 +391,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   }

 -  void
 +  _StringT
   _M_add_collating_element(const _StringT __s)
   {
 auto __st = _M_traits.lookup_collatename(__s.data(),


 Changing this return type is an ABI change. When compiled with older
 versions of GCC this function will not return anything, so code
 compiled with newer versions that links to that older code will read
 garbage off the stack if linked to the old version.

 (This isn't a problem for _M_expression_term because the return type
 is part of the mangled name for function templates).

 One solution would be to rename the function, so that code compiled
 with the new GCC will use the new function, not link to the old
 version that doesn't return anything.

Done by s/_M_add_collating_element/_M_add_collate_element/.


-- 
Regards,
Tim Shen
commit 4a3b4ab285b0ecabb706a776551dabf5a37059aa
Author: Tim Shen tims...@google.com
Date:   Sun Jul 26 04:37:45 2015 -0700

PR libstdc++/67015

* include/bits/regex_compiler.h (_Compiler::_M_expression_term,
_BracketMatcher::_M_add_collating_element): Change signature
to make checking the and of bracket expression easier.
* include/bits/regex_compiler.tcc (_Compiler::_M_expression_term):
Treat '-' as a valid literal if it's at the end of bracket expression.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
New testcases.

diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 4472116..0cb0c04 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -116,8 +116,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
void
_M_insert_bracket_matcher(bool __neg);
 
+  // Returns true if successfully matched one term and should continue.
+  // Returns false if the compiler should move on.
   templatebool __icase, bool __collate
-   void
+   bool
_M_expression_term(pairbool, _CharT __last_char,
   _BracketMatcher_TraitsT, __icase, __collate
   __matcher);
@@ -389,8 +391,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   }
 
-  void
-  _M_add_collating_element(const _StringT __s)
+  _StringT
+  _M_add_collate_element(const _StringT __s)
   {
auto __st = _M_traits.lookup_collatename(__s.data(),
 __s.data() + __s.size());
@@ -400,6 +402,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_DEBUG
_M_is_ready = false;
 #endif
+   return __st;
   }
 
   void
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 33d7118..9a62311 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -424,8 +424,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__last_char.first = true;
__last_char.second = _M_value[0];
  }
-  while (!_M_match_token(_ScannerT::_S_token_bracket_end))
-   _M_expression_term(__last_char, __matcher);
+  while (_M_expression_term(__last_char, __matcher));
   __matcher._M_ready();
   _M_stack.push(_StateSeqT(
  *_M_nfa,
@@ -434,21 +433,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   templatetypename _TraitsT
   templatebool __icase, bool __collate
-void
+bool
 _Compiler_TraitsT::
 _M_expression_term(pairbool, _CharT __last_char,
   _BracketMatcher_TraitsT, __icase, __collate __matcher)
 {
+  if (_M_match_token(_ScannerT::_S_token_bracket_end))
+   return false;
+
   if (_M_match_token(_ScannerT::_S_token_collsymbol))
-   __matcher._M_add_collating_element(_M_value);
+   {
+ auto __symbol = __matcher._M_add_collate_element(_M_value);
+ if (__symbol.size() == 1)
+   {
+ __last_char.first = true;
+ __last_char.second = __symbol[0];
+   }
+   }
   else if (_M_match_token(_ScannerT::_S_token_equiv_class_name))
__matcher._M_add_equivalence_class(_M_value);
   else if (_M_match_token(_ScannerT::_S_token_char_class_name))
__matcher._M_add_character_class(_M_value, false);
-  // POSIX doesn't permit '-' as a start-range char (say [a-z--0]),
-  // except when the '-' is the first character in the bracket expression
-  // ([--0]). ECMAScript treats all '-' after a range as a normal 
character.
-  // Also see above, where _M_expression_term gets called.
+  // POSIX doesn't allow '-' as a start-range char (say [a-z--0]),
+  // except when the '-' is the first or last character in the bracket
+  // expression ([--0]). ECMAScript treats all '-' after a range as a

Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-07-27 Thread Martin Sebor


So, my suggestion would be to warn for any call with a nonzero value.


The current documentation says that you should only use nonzero values
for debug purposes.  A warning would help yes, how many people read the
manual after all :-)


Thank you both for the feedback. Attached is a simplified patch
to issue a warning for all builtin_xxx_address calls with any
non-zero argument.

Martin

gcc/ChangeLog
2015-07-27  Martin Sebor  mse...@redhat.com

* c-family/c.opt (-Wbuiltin-address): New warning option.
* doc/invoke.texi (Wbuiltin-address): Document it.
* doc/extend.texi (__builtin_frame_addrress, __builtin_return_addrress):
Clarify possible effects of calling the functions with non-zero
arguments and mention -Wbuiltin-address.
* builtins.c (expand_builtin_frame_address): Handle -Wbuiltin-address.

gcc/testsuite/ChangeLog
2015-07-27  Martin Sebor  mse...@redhat.com

* g++.dg/Wbuiltin-address-in-Wall.C: New test.
* g++.dg/Wbuiltin-address.C: New test.
* g++.dg/Wno-builtin-address.C: New test.
* gcc.dg/Wbuiltin-address-in-Wall.c: New test.
* gcc.dg/Wbuiltin-address.c: New test.
* gcc.dg/Wno-builtin-address.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index e8fe3db..48379f8 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -4564,34 +4564,38 @@ expand_builtin_frame_address (tree fndecl, tree exp)
 {
   /* The argument must be a nonnegative integer constant.
  It counts the number of frames to scan up the stack.
- The value is the return address saved in that frame.  */
+ The value is either the frame pointer value or the return
+ address saved in that frame.  */
   if (call_expr_nargs (exp) == 0)
 /* Warning about missing arg was already issued.  */
 return const0_rtx;
   else if (! tree_fits_uhwi_p (CALL_EXPR_ARG (exp, 0)))
 {
-  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
-	error (invalid argument to %__builtin_frame_address%);
-  else
-	error (invalid argument to %__builtin_return_address%);
+  error (invalid argument to %qD, fndecl);
   return const0_rtx;
 }
   else
 {
-  rtx tem
-	= expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl),
-  tree_to_uhwi (CALL_EXPR_ARG (exp, 0)));
+  /* Number of frames to scan up the stack.  */
+  const unsigned HOST_WIDE_INT count = tree_to_uhwi (CALL_EXPR_ARG (exp, 0));
+
+  rtx tem = expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl), count);

   /* Some ports cannot access arbitrary stack frames.  */
   if (tem == NULL)
 	{
-	  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
-	warning (0, unsupported argument to %__builtin_frame_address%);
-	  else
-	warning (0, unsupported argument to %__builtin_return_address%);
+	  warning (0, invalid argument to %qD, fndecl);
 	  return const0_rtx;
 	}

+  if (0  count)
+	{
+	  /* Warn since no effort is made to ensure that any frame
+	 beyond the current one exists or can be safely reached.  */
+	  warning (OPT_Wbuiltin_address, calling %qD with 
+		   a non-zero argument is unsafe, fndecl);
+	}
+
   /* For __builtin_frame_address, return what we've got.  */
   if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
 	return tem;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 285952e..bdff624 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -295,6 +295,10 @@ Wbool-compare
 C ObjC C++ ObjC++ Var(warn_bool_compare) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
 Warn about boolean expression compared with an integer value different from true/false

+Wbuiltin-address
+C ObjC C++ ObjC++ Var(warn_builtin_address) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
+Warn when __builtin_frame_address or __builtin_return_address is used unsafely
+
 Wbuiltin-macro-redefined
 C ObjC C++ ObjC++ CPP(warn_builtin_macro_redefined) CppReason(CPP_W_BUILTIN_MACRO_REDEFINED) Var(cpp_warn_builtin_macro_redefined) Init(1) Warning
 Warn when a built-in preprocessor macro is undefined or redefined
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 9ad2b68..1b4596c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8562,8 +8562,11 @@ to determine if the top of the stack has been reached.
 Additional post-processing of the returned value may be needed, see
 @code{__builtin_extract_return_addr}.

-This function should only be used with a nonzero argument for debugging
-purposes.
+Calling this function with a nonzero argument can have unpredictable
+effects, including crashing the calling program.  As a result, calls
+that are considered unsafe are diagnosed when the @option{-Wbuiltin-address}
+option is in effect.  Such calls are typically only useful in debugging
+situations.
 @end deftypefn

 @deftypefn {Built-in Function} {void *} __builtin_extract_return_addr (void *@var{addr})
@@ -8601,8 +8604,11 @@ any function other than the current one; in such cases, or when the top
 of the stack has been reached, this

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-27 Thread Trevor Saunders

On Mon, Jul 27, 2015 at 10:14:41AM -0600, Jeff Law wrote:
 On 07/27/2015 03:17 AM, Richard Biener wrote:
 On Mon, Jul 27, 2015 at 5:10 AM,  tbsaunde+...@tbsaunde.org wrote:
 From: Trevor Saunders tbsaunde+...@tbsaunde.org
 
 Hi,
 
 $subject.
 
 patches individually bootstrapped + regtested on x86_64-linux-gnu, and run
 through config-list.mk with more patches removing usage of the macro.  Ok?
 
 With POINTER_SIZE now being expensive (target hook) you might consider
 moving most users to use pointer_sized_int_node or some other global
 derived from POINTER_SIZE.
 
 Which of course raises the question of why we are hookizing this...  if you'd
 want a truly switchable target you'd have to switch all global trees as well
 (or hookize them individually).
 Not sure -- it doesn't remove any conditionally compiled code...
 
 One could easily argue that it's just another step on the path towards a
 switchable target -- which in and of itself is a reasonable design goal.

So my some what more intermediate goal was to have less files including
target specific headers, eventually getting to the point we might be
able to build a large part of the compiler independent of the
target.  Of course I think in the end really switchable targets would be
nice.

Trev

 
 Trevor, maybe a quick note on the motivation would help here...
 
 jeff

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-27 Thread Trevor Saunders

On Mon, Jul 27, 2015 at 09:05:08PM +0100, Richard Sandiford wrote:
 Jeff Law l...@redhat.com writes:
  On 07/27/2015 03:17 AM, Richard Biener wrote:
  On Mon, Jul 27, 2015 at 5:10 AM,  tbsaunde+...@tbsaunde.org wrote:
  From: Trevor Saunders tbsaunde+...@tbsaunde.org
 
  Hi,
 
  $subject.
 
  patches individually bootstrapped + regtested on x86_64-linux-gnu, and run
  through config-list.mk with more patches removing usage of the macro.  Ok?
 
  With POINTER_SIZE now being expensive (target hook) you might consider
  moving most users to use pointer_sized_int_node or some other global
  derived from POINTER_SIZE.
 
  Which of course raises the question of why we are hookizing this...  if 
  you'd
  want a truly switchable target you'd have to switch all global trees as 
  well
  (or hookize them individually).
  Not sure -- it doesn't remove any conditionally compiled code...
 
  One could easily argue that it's just another step on the path towards a 
  switchable target -- which in and of itself is a reasonable design goal.
 
  Trevor, maybe a quick note on the motivation would help here...
 
 I think at least we should use data hooks rather than function hooks,
 since this value should a constant within a subtarget.  It should only
 change for target_reinit.

I agree in principal, but I wasn't sure where all I might need to change
the values of the hooks, and of course I wondered if there might be some
crazy target where that's not good enough.

 Alternatively we could have a new target_globals structure that is
 initialised with the result of calling the hook.  If we do that though,
 it might make sense to consolidate the hooks rather than have one for
 every value.  E.g. having one function for UNITS_PER_WORD, one for
 POINTER_SIZE, one for Pmode, etc., would lead to some very verbose
 target code.

so something like

struct target_types
{
  unsigned long pointer_size;
  ...
};

const target_types targetm.get_type_data ()

? that seems pretty reasonable, and I wouldn't expect too many ordering
issues, but who knows.  Its too bad nobody has taken on the big job of
turning targetm into a class so we can hope for some devirt help from
the compiler.

thanks!

Trev

 
 Perhaps the main problem with these approaches is ensuring that the
 value is set up early enough.
 
 Thanks,
 Richard

Re: [PATCH 4/4] define ASM_OUTPUT_LABEL to the name of a function

2015-07-27 Thread Trevor Saunders

On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote:
 On Sat, Jul 25, 2015 at 4:37 AM,  tbsaunde+...@tbsaunde.org wrote:
  From: Trevor Saunders tbsaunde+...@tbsaunde.org
 
  * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h,
  config/ia64/ia64-protos.h, config/ia64/ia64.c, config/ia64/ia64.h,
  config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h,
  config/rs6000/rs6000.c, config/rs6000/xcoff.h, config/spu/spu.h,
  config/visium/visium.h, defaults.h: Define ASM_OUTPUT_LABEL to
  the name of a function.
  * output.h (default_output_label): New prototype.
  * varasm.c (default_output_label): New function.
  * vmsdbgout.c: Include tm_p.h.
  * xcoffout.c: Likewise.
 
 Just a general remark - the GCC output machinery is known to be slow,
 adding indirect calls might be not the very best idea without refactoring
 some of it.

ah, I wasn't aware of that.  On the other hand some of these hooks seem
to generally be big so the call over head might not matter that much.  I
suppose if this is something we really care about we might want to
consider pushing the libgas project farther so we can avoid all this
text formatting all together.

 Did you do any performance measurements for artificial testcases
 exercising the specific bits you change?

no since I wasn't aware it was a concern.  I can try to do that in the
next couple days.

thanks!

Trev

 
 Richard.
 
  ---
   gcc/config/arc/arc.h  |  3 +--
   gcc/config/bfin/bfin.h|  5 +
   gcc/config/frv/frv.h  |  6 +-
   gcc/config/ia64/ia64-protos.h |  1 +
   gcc/config/ia64/ia64.c| 11 +++
   gcc/config/ia64/ia64.h|  8 +---
   gcc/config/lm32/lm32.h|  3 +--
   gcc/config/mep/mep.h  |  8 +---
   gcc/config/mmix/mmix.h|  3 +--
   gcc/config/pa/pa-protos.h |  1 +
   gcc/config/pa/pa.c| 12 
   gcc/config/pa/pa.h|  9 +
   gcc/config/rs6000/rs6000-protos.h |  1 +
   gcc/config/rs6000/rs6000.c|  8 
   gcc/config/rs6000/xcoff.h |  3 +--
   gcc/config/spu/spu.h  |  3 +--
   gcc/config/visium/visium.h|  3 +--
   gcc/defaults.h|  6 +-
   gcc/output.h  |  3 +++
   gcc/varasm.c  |  9 +
   gcc/vmsdbgout.c   |  1 +
   gcc/xcoffout.c|  1 +
   22 files changed, 60 insertions(+), 48 deletions(-)
 
  diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
  index d98cce1..d3747b9 100644
  --- a/gcc/config/arc/arc.h
  +++ b/gcc/config/arc/arc.h
  @@ -1245,8 +1245,7 @@ do {  
  \
 
   /* This is how to output the definition of a user-level label named NAME,
  such as the label on a static function or variable NAME.  */
  -#define ASM_OUTPUT_LABEL(FILE, NAME) \
  -do { assemble_name (FILE, NAME); fputs (:\n, FILE); } while (0)
  +#define ASM_OUTPUT_LABEL default_output_label
 
   #define ASM_NAME_P(NAME) ( NAME[0]=='*')
 
  diff --git a/gcc/config/bfin/bfin.h b/gcc/config/bfin/bfin.h
  index 26ba7c2..08906aa 100644
  --- a/gcc/config/bfin/bfin.h
  +++ b/gcc/config/bfin/bfin.h
  @@ -1044,10 +1044,7 @@ typedef enum directives {
   ASM_OUTPUT_LABEL(FILE, NAME);  \
 } while (0)
 
  -#define ASM_OUTPUT_LABEL(FILE, NAME)\
  -  do {  assemble_name (FILE, NAME);\
  -fputs (:\n,FILE);\
  -  } while (0)
  +#define ASM_OUTPUT_LABEL default_output_label
 
   #define ASM_OUTPUT_LABELREF(FILE,NAME) \
   do {  fprintf (FILE, _%s, NAME); \
  diff --git a/gcc/config/frv/frv.h b/gcc/config/frv/frv.h
  index b0d66fd..1d25974 100644
  --- a/gcc/config/frv/frv.h
  +++ b/gcc/config/frv/frv.h
  @@ -1668,11 +1668,7 @@ do { 
  \
  `assemble_name (STREAM, NAME)' to output the name itself; before and 
  after
  that, output the additional assembler syntax for defining the name, and 
  a
  newline.  */
  -#define ASM_OUTPUT_LABEL(STREAM, NAME) \
  -do {   \
  -  assemble_name (STREAM, NAME);
  \
  -  fputs (:\n, STREAM);   \
  -} while (0)
  +#define ASM_OUTPUT_LABEL default_output_label
 
   /* Globalizing directive for a label.  */
   #define GLOBAL_ASM_OP \t.globl 
  diff --git a/gcc/config/ia64/ia64-protos.h b/gcc/config/ia64/ia64-protos.h
  index 29fc714..8e540e4 100644
  --- a/gcc/config/ia64/ia64-protos.h
  +++ b/gcc/config/ia64/ia64-protos.h
  @@ -72,6 +72,7 @@ extern rtx ia64_expand_builtin (tree, rtx, rtx, 
  machine_mode, int);
   extern rtx ia64_va_arg (tree, tree);
   #endif /*

Re: [PATCH][RFC][match.pd] optimize (X C) == N when C is power of 2

2015-07-27 Thread Richard Biener

On Fri, 24 Jul 2015, Jeff Law wrote:

 On 07/24/2015 03:44 AM, Ramana Radhakrishnan wrote:

In expr.c, with TER you can detect such patterns, in this case when
expanding the comparison, but perhaps we want a *.pd file that would
have
rules that would be only GIMPLE and only enabled in a special pass right
before (or very close to) expansion, that would perform such instruction
selection.
   
   
   Wild idea, but could it be considered to have target-specific
   match.pd files that can be included in the main match.pd?
 That way, targets would get the benefit of getting
   the target-specific folding they benefit from at the very beginning
   of compilation without stepping on other targets toes.
  
  
  The downside is preventing duplication, potentially reducing generic
  improvements and a maintenance headache for gimple optimizers.
 So how about wedding the two ideas that have sprouted out of this discussion.
 Specifically having a pass apply a target specific match.pd, but only do so at
 the end of the gimple optimization pipeline?
 
 The design goal would (of course) be to change representations in ways that
 allow the gimple-rtl expanders to generate more efficient code for the
 target.
 
 It avoids introducing the target bits early in the gimple pipeline, but still
 gives a clean way for targets to rewrite gimple for the benefit of gimple-rtl
 expansion.

I think it also aligns with the idea of pushing back RTL expansion and
expose some target specifics after another GIMPLE lowering phase.
I'm also thinking of addressing-mode selection and register promotion.

So at least if we think of that target specific match.pd pass as
containing all RTL expansion tricks done with TER only then it
should be quite simple to make it work.

Richard.

Re: [ARM] Optimize compare against smin/umin

2015-07-27 Thread Kyrill Tkachov


Hi Michael,

On 26/07/15 23:54, Michael Collison wrote:

Here is an updated patch that addresses the issues you mentioned:

2015-07-24  Michael Collison  michael.colli...@linaro.org
  Ramana Radhakrishnan ramana.radhakrish...@arm.com

* gcc/config/arm/arm.md (*arm_smin_cmp): New pattern.
(*arm_umin_cmp): Likewise.
* gcc.target/arm/mincmp.c: Test min compare idiom.


Just New test. would be a better ChangeLog entry for this.



diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0be70a8..361c292 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3455,6 +3455,44 @@
  (set_attr type multiple,multiple)]
   )

+;; t = (s/u)min (x, y)
+;; cc = cmp (t, z)
+;; is the same as
+;; cmp x, z
+;; cmpge(u) y, z
+
+(define_insn_and_split *arm_smin_cmp
+  [(set (reg:CC CC_REGNUM)
+(compare:CC
+ (smin:SI (match_operand:SI 0 s_register_operand r)
+  (match_operand:SI 1 s_register_operand r))
+ (match_operand:SI 2 s_register_operand r)))]
+  TARGET_32BIT
+  #
+   reload_completed
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 0) (match_dup 2)))
+   (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0))
+  (set (reg:CC CC_REGNUM)
+   (compare:CC (match_dup 1) (match_dup 2]
+)
+
+(define_insn_and_split *arm_umin_cmp
+  [(set (reg:CC CC_REGNUM)
+(compare:CC
+ (umin:SI (match_operand:SI 0 s_register_operand r)
+  (match_operand:SI 1 s_register_operand r))
+ (match_operand:SI 2 s_register_operand r)))]
+  TARGET_32BIT
+  #
+   reload_completed
+  [(set (reg:CC CC_REGNUM)
+(compare:CC (match_dup 0) (match_dup 2)))
+   (cond_exec (geu:CC (reg:CC CC_REGNUM) (const_int 0))
+  (set (reg:CC CC_REGNUM)
+   (compare:CC (match_dup 1) (match_dup 2]
+)
+
   (define_expand umaxsi3
 [(parallel [
   (set (match_operand:SI 0 s_register_operand )
diff --git a/gcc/testsuite/gcc.target/arm/mincmp.c
b/gcc/testsuite/gcc.target/arm/mincmp.c
new file mode 100644
index 000..2a55c6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mincmp.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+/* { dg-require-effective-target arm32 } */
+
+#define min(x, y) ((x) = (y)) ? (x) : (y)
+
+unsigned int foo (unsigned int i, unsigned int x ,unsigned int y)


Comma after unsigned int x in the wrong place.


+{
+  return i  (min (x, y));
+}
+
+int bar (int i, int x, int y)
+{
+  return i  (min (x, y));
+}


Can you please use GNU style for the testcase.
New line after return type, function name an arguments on their own line.

Ok with these changes.

Thanks,
Kyrill

Re: OMP. More constification

2015-07-27 Thread Jakub Jelinek

On Sun, Jul 26, 2015 at 11:01:09AM -0400, Nathan Sidwell wrote:
 I found some more missing consts.  The size, kind, var and function arrays
 emitted by omp-low are read only, but are not so marked.  This patch

First of all, the hostaddrs array is going to be written by the library
side for GOMP_MAP_FIRSTPRIVATE, so can't be const and shouldn't be const on
the compiler emitted side either.  Similarly for the use_device_ptr clause
on GOMP_target_data* handling.
The size array is const from the library side, sure, but on the compiler
side is only conditionally static (static only if all the values written to
it are INTEGER_CSTs), so not sure if we want to make it use const qualified
type unconditionally, rather than only change it to the const qualified type
if it is TREE_STATIC at the end.
The kinds array I'm afraid might very well soon follow the size array model.

Jakub

Re: [PATCH][RFC][match.pd] optimize (X C) == N when C is power of 2

2015-07-27 Thread Kyrill Tkachov



On 25/07/15 03:19, Segher Boessenkool wrote:

On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote:

This transformation folds (X % C) == N into
X  ((1  (size - 1)) | (C - 1))) == N
for constants C and N where N is positive and C is a power of 2.

For N = 0 you can transform it to

((unsigned)X % C) == N

and for 0  N  C you can transform it to

X  0  ((unsigned)X % C) == N  (or X = 0)

and for -C  N  0 it is

X  0  ((unsigned)X % C) == N + C  (or X = 0)

and for other N it is

0.

For N not a constant, well, do you really care?  :-)

(That second case might eventually fold to your original expression).


Yeah, these avoid the potentially expensive mask, but introduce more operations,
which I believe may not be desirable at this stage.
Unless these transformations are ok for match.pd I'll try to implement this 
transformation
at RTL expansion time.

Thanks,
Kyrill




Segher

[PATCH, PING] PR debug/53927: fix value for DW_AT_static_link

2015-07-27 Thread Pierre-Marie de Rodat


On 07/20/2015 09:39 AM, Pierre-Marie de Rodat wrote:

This patch fixes the static link description in DWARF to comply with the
specification. In order to do so, it appends a field to all FRAME
objects to hold the frame base address (DW_AT_frame_base) so that the
nested subrograms can directly reference this field.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53927 for the context
(in particular why we need this additional field in FRAME objects).


Ping for the patch submitted in 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01629.html.


--
Pierre-Marie de Rodat

[PATCH, PING] Track indirect calls for call site information in debug info.

2015-07-27 Thread Pierre-Marie de Rodat


On 07/20/2015 02:45 PM, Pierre-Marie de Rodat wrote:

On PowerPC targets with -mlongcall, most subprogram calls are turned
into indirect calls: the call target is read from a register even though
it is compile-time known. This makes it difficult for machine code
static analysis engines to recover the callee information. The attached
patch is an attempt to help such engines, generating
DW_AT_abstract_origin attributes for all DW_TAG_GNU_call_site we are
interested in.


Ping for the patch submitted in 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01641.html.


--
Pierre-Marie de Rodat

Re: [PATCH] Use single shared memory block pool for all pool allocators

2015-07-27 Thread Richard Biener

On Sun, Jul 26, 2015 at 9:09 PM,  pins...@gmail.com wrote:




 On Jul 26, 2015, at 11:50 AM, Andi Kleen a...@firstfloor.org wrote:

 Mikhail Maltsev malts...@gmail.com writes:

 Hi, all!
 Recently I did some profiling of GCC to find hotspots and areas of possible
 performance improvement among them. glibc malloc(3) is one of (perhaps
 known)

 I've been compiling gcc with tcmalloc to do a similar speedup. It would be
 interesting to compare that to your patch.

 Another useful optimization is to adjust the allocation size to be =
 2MB. Then modern Linux kernels often can give you a large page,
 which cuts down TLB overhead. I did similar changes some time
 ago for the garbage collector.

 Unless you are running with 64k pages which I do all the time on my armv8 
 system.

This can be a host configurable value of course.

But first of all (without looking at the patch but just reading the
description) this
sounds like a good idea.  Maybe still allow pools to use their own backing if
the object size is larger than the block size of the caching pool?

Thanks,
Richard.

 Thanks,
 Andrew



 BTW I saw big differences in larger LTO builds.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-27 Thread Richard Biener

On Mon, Jul 27, 2015 at 5:10 AM,  tbsaunde+...@tbsaunde.org wrote:
 From: Trevor Saunders tbsaunde+...@tbsaunde.org

 Hi,

 $subject.

 patches individually bootstrapped + regtested on x86_64-linux-gnu, and run
 through config-list.mk with more patches removing usage of the macro.  Ok?

With POINTER_SIZE now being expensive (target hook) you might consider
moving most users to use pointer_sized_int_node or some other global
derived from POINTER_SIZE.

Which of course raises the question of why we are hookizing this...  if you'd
want a truly switchable target you'd have to switch all global trees as well
(or hookize them individually).

Richard.

 Trev

 Trevor Saunders (9):
   remove POINTER_SIZE_UNITS macro
   add pointer_size target hook
   target.h: change to use targetm.pointer_size instead of POINTER_SIZE
   varasm.c: switch from POINTER_SIZE to targetm.pointer_size ()
   ubsan.c: switch from POINTER_SIZE to targetm.pointer_size ()
   tree-chkp.c: switch to targetm.pointer_size ()
   stor-layout.c: switch to targetm.pointer_size ()
   tree.c: switch to targetm.pointer_size ()
   emit-rtl.c: switch to targetm.pointer_size ()

  gcc/c-family/c-cppbuiltin.c |  2 +-
  gcc/defaults.h  |  3 ---
  gcc/doc/tm.texi |  7 +++
  gcc/doc/tm.texi.in  |  2 ++
  gcc/dwarf2asm.c |  4 ++--
  gcc/emit-rtl.c  |  5 +++--
  gcc/lto/lto-object.c|  3 ++-
  gcc/stor-layout.c   |  9 +
  gcc/target.def  |  8 
  gcc/target.h|  8 
  gcc/targhooks.c |  8 
  gcc/targhooks.h |  1 +
  gcc/tree-chkp.c | 14 --
  gcc/tree.c  |  3 ++-
  gcc/ubsan.c |  3 ++-
  gcc/varasm.c| 12 ++--
  16 files changed, 65 insertions(+), 27 deletions(-)

 --
 2.4.0

Re: [PATCH 4/9] varasm.c: switch from POINTER_SIZE to targetm.pointer_size ()

2015-07-27 Thread Richard Biener

On Mon, Jul 27, 2015 at 5:10 AM,  tbsaunde+...@tbsaunde.org wrote:
 From: Trevor Saunders tbsaunde+...@tbsaunde.org

 gcc/ChangeLog:

 2015-07-26  Trevor Saunders  tbsaunde+...@tbsaunde.org

 * varasm.c (assemble_addr_to_section): Call targetm.pointer_size ().
 (dump_tm_clone_pairs): Likewise.
 ---
  gcc/varasm.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

 diff --git a/gcc/varasm.c b/gcc/varasm.c
 index 8cb2ec9..238ae39 100644
 --- a/gcc/varasm.c
 +++ b/gcc/varasm.c
 @@ -1543,8 +1543,8 @@ void
  assemble_addr_to_section (rtx symbol, section *sec)
  {
switch_to_section (sec);
 -  assemble_align (POINTER_SIZE);
 -  assemble_integer (symbol, pointer_size_units (), POINTER_SIZE, 1);
 +  assemble_align (targetm.pointer_size ());
 +  assemble_integer (symbol, pointer_size_units (), targetm.pointer_size (), 
 1);

That'll invoke the pointer size hook twice.  Refactoring assemble_integer to
avoid this would be nice.  Also caching the values in places like this.

Richard.

  }

  /* Return the numbered .ctors.N (if CONSTRUCTOR_P) or .dtors.N (if
 @@ -5870,14 +5870,14 @@ dump_tm_clone_pairs (vectm_alias_pair 
 tm_alias_pairs)
if (!switched)
 {
   switch_to_section (targetm.asm_out.tm_clone_table_section ());
 - assemble_align (POINTER_SIZE);
 + assemble_align (targetm.pointer_size ());
   switched = true;
 }

assemble_integer (XEXP (DECL_RTL (src), 0),
 -   pointer_size_units (), POINTER_SIZE, 1);
 +   pointer_size_units (), targetm.pointer_size (), 1);
assemble_integer (XEXP (DECL_RTL (dst), 0),
 -   pointer_size_units (), POINTER_SIZE, 1);
 +   pointer_size_units (), targetm.pointer_size (), 1);
  }
  }

 --
 2.4.0

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-27 Thread Jiong Wang


Andrew Pinski writes:

 On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote:

 James Greenhalgh writes:

 On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
 Current IRA still use both target macros in a few places.

 Tell IRA to use the order we defined rather than with it's own cost
 calculation. Allocate caller saved first, then callee saved.

 This is especially useful for LR/x30, as it's free to allocate and is
 pure caller saved when used in leaf function.

 Haven't noticed significant impact on benchmarks, but by grepping some
 keywords like Spilling, Push.*spill etc in ira rtl dump, the number
 is smaller.

 OK for trunk?

 OK, sorry for the delay.

 It might be mail client mangling, but please check that the trailing slashes
 line up in the version that gets committed.

 Thanks,
 James

 2015-05-19  Jiong. Wang  jiong.w...@arm.com

 gcc/
   PR 63521
   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
   (HONOR_REG_ALLOC_ORDER): Define.

 Patch reverted.

 I did not see a reason why this patch was reverted.  Maybe I am
 missing an email or something.

There are several execution regressions under gcc testsuite, although as
far as I can see it's this patch exposed hidding bugs in those
testcases, but there might be one other issue, so to be conservative, I
temporarily reverted this patch.


 Thanks,
 Andrew




-- 
Regards,
Jiong

Re: [ARM] Correct spelling of references to ARMv6KZ

2015-07-27 Thread Matthew Wahab


On 23/07/15 12:04, Kyrill Tkachov wrote:



GCC supports ARM architecture ARMv6KZ but refers to it as ARMv6ZK. This is made
visible by the command line option -march=armv6zk and by the predefined macro
__ARM_ARCH_6ZK__.

This patch corrects the spelling internally and adds -march=armv6kz. To preserve
existing behaviour, -march=armv6zk is kept as an alias of -march=armv6kz and
both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros are defined for the
architecture.

Use of -march=arm6kz will need to wait for binutils to be updated,[..]


diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index c715bb7..7873606 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -35,6 +35,7 @@ static struct vendor_cpu arm_cpu_table[] = {
  {0xb02, armv6k, mpcore},
  {0xb36, armv6j, arm1136j-s},
  {0xb56, armv6t2, arm1156t2-s},
+{0xb76, armv6kz, arm1176jz-s},
  {0xb76, armv6zk, arm1176jz-s},
  {0xc05, armv7-a, cortex-a5},
  {0xc07, armv7ve, cortex-a7},

This table is scanned from beginning to end, checking for the first field.
You introduce a duplicate here, so the second form will never be reached.
I'd suggest removing the wrong spelling from here, but the re-written march 
string
will be passed to the assembler, so if the assembler is old and doesn't support 
the
correct spelling we'll get errors. So it seems like in order to preserve 
backwards
compatibility we don't want to put the correctly spelled entry here :(
But definitely add a comment here mentioning the deliberate oversight.



Respun patch attached. I've removed armv6kz entry from config/arm/driver-arm.c and 
replaced it with a comment for the armv6zk entry.


Tested for arm-none-linux-gnueabihf with native bootstrap and make check.

Matthew

gcc/
2015-07-27  Matthew Wahab  matthew.wa...@arm.com

* config/arm/arm-arches.def: Add armv6kz. Replace 6ZK with 6KZ
and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ.
* config/arm/arm-c.c (arm_cpu_builtins): Emit __ARM_ARCH_6ZK__
for armv6kz targets.
* config/arm/arm-cores.def: Replace 6ZK with 6KZ.
* config/arm/arm-protos.h (FL_ARCH6KZ): New.
(FL_FOR_ARCH6ZK): Remove.
(FL_FOR_ARCH6KZ): New.
(arm_arch6zk): New declaration.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch6kz): New.
(arm_option_override): Set arm_arch6kz.
* config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ.
* config/arm/driver-arm.c: Add comment to armv6zk entry.
* doc/invoke.texi: Replace armv6zk with armv6kz.


diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..3dafaa5 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -44,7 +44,8 @@ ARM_ARCH(armv6,   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
 ARM_ARCH(armv6j,  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
 ARM_ARCH(armv6k,  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
 ARM_ARCH(armv6z,  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH(armv6zk, arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
+ARM_ARCH(armv6kz, arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
+ARM_ARCH(armv6zk, arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
 ARM_ARCH(armv6t2, arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
 ARM_ARCH(armv6-m, cortexm1,	6M,			  FL_FOR_ARCH6M)
 ARM_ARCH(armv6s-m, cortexm1,	6M,			  FL_FOR_ARCH6M)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 297995b..9bf3973 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -167,6 +167,11 @@ arm_cpu_builtins (struct cpp_reader* pfile, int flags)
 }
   if (arm_arch_iwmmxt2)
 builtin_define (__IWMMXT2__);
+  /* ARMv6KZ was originally identified as the misspelled __ARM_ARCH_6ZK__.  To
+ preserve the existing behaviour, the misspelled feature macro must still be
+ defined.  */
+  if (arm_arch6kz)
+builtin_define (__ARM_ARCH_6ZK__);
   if (TARGET_AAPCS_BASED)
 {
   if (arm_pcs_default == ARM_PCS_AAPCS_VFP)
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..9d47fcf 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -125,8 +125,8 @@ ARM_CORE(arm1026ej-s,	arm1026ejs, arm1026ejs,	5TEJ, FL_LDSCHED, 9e)
 /* V6 Architecture Processors */
 ARM_CORE(arm1136j-s,		arm1136js, arm1136js,		6J,  FL_LDSCHED, 9e)
 ARM_CORE(arm1136jf-s,		arm1136jfs, arm1136jfs,		6J,  FL_LDSCHED | FL_VFPV2, 9e)
-ARM_CORE(arm1176jz-s,		arm1176jzs, arm1176jzs,		6ZK, FL_LDSCHED, 9e)
-ARM_CORE(arm1176jzf-s,	arm1176jzfs, arm1176jzfs,	6ZK, FL_LDSCHED | FL_VFPV2, 9e)
+ARM_CORE(arm1176jz-s,		arm1176jzs, arm1176jzs,		6KZ, FL_LDSCHED, 9e)
+ARM_CORE(arm1176jzf-s,	arm1176jzfs, arm1176jzfs,	6KZ, FL_LDSCHED | FL_VFPV2, 9e)
 ARM_CORE(mpcorenovfp,		mpcorenovfp, mpcorenovfp,	6K,  FL_LDSCHED, 9e)
 ARM_CORE(mpcore,		mpcore, mpcore,

Re: [V850] Hookize LIBCALL_VALUE

2015-07-27 Thread Nicholas Clifton


Hi Anatoliy,


OK for trunk?

2015-07-26  Anatoly Sokolov  ae...@post.ru

 * config/v850/v850.h (LIBCALL_VALUE): Remove macros.
 * config/v850/v850.md (RV_REGNUM): New constants.
 * config/v850/v850.c (v850_libcall_value): New functions.
 (v850_function_value_regno_p, v850_function_value): Use RV_REGNUM.
 (TARGET_LIBCALL_VALUE): Define.


Approved - please apply.

Cheers
  Nick

Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-27 Thread pinskia





 On Jul 27, 2015, at 2:26 AM, Jiong Wang jiong.w...@arm.com wrote:
 
 
 Andrew Pinski writes:
 
 On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang jiong.w...@arm.com wrote:
 
 James Greenhalgh writes:
 
 On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
 Current IRA still use both target macros in a few places.
 
 Tell IRA to use the order we defined rather than with it's own cost
 calculation. Allocate caller saved first, then callee saved.
 
 This is especially useful for LR/x30, as it's free to allocate and is
 pure caller saved when used in leaf function.
 
 Haven't noticed significant impact on benchmarks, but by grepping some
 keywords like Spilling, Push.*spill etc in ira rtl dump, the number
 is smaller.
 
 OK for trunk?
 
 OK, sorry for the delay.
 
 It might be mail client mangling, but please check that the trailing 
 slashes
 line up in the version that gets committed.
 
 Thanks,
 James
 
 2015-05-19  Jiong. Wang  jiong.w...@arm.com
 
 gcc/
  PR 63521
  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
  (HONOR_REG_ALLOC_ORDER): Define.
 
 Patch reverted.
 
 I did not see a reason why this patch was reverted.  Maybe I am
 missing an email or something.
 
 There are several execution regressions under gcc testsuite, although as
 far as I can see it's this patch exposed hidding bugs in those
 testcases, but there might be one other issue, so to be conservative, I
 temporarily reverted this patch.

If you are talking about:
gcc.target/aarch64/aapcs64/func-ret-2.c execution
Etc.

These test cases are too dependent on the original register allocation order 
and really can be safely ignored. Really these three tests should be moved or 
written in a more sane way. 

Thanks,
Andrew

 
 
 Thanks,
 Andrew
 
 
 
 -- 
 Regards,
 Jiong

1 2 >

1 - 100 of 104 matches

Mail list logo