Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx

2016-09-15 Thread Felipe Magno de Almeida
On Fri, Sep 16, 2016 at 2:42 AM, Marc Glisse  wrote:
> On Thu, 15 Sep 2016, Felipe Magno de Almeida wrote:
>
> +   || sizeof(uint32_t) == sizeof(void*)
> +|| sizeof(uint16_t) == sizeof(void*),
>
> Indentation is off?
>
>> Call _M_extract_* functions family through temporary int objects
>
>
> Would it make sense to use a template type instead of int for this
> parameter? Or possibly have a typedef that defaults to int (what POSIX
> requires). The hard case would be a libc that uses bitfields for the fields
> of struct tm (that could save some space), but I don't think anyone does
> that.

I've tried both approaches. Templates were causing problems of not
defined instantations because they were being used as ints too
in other _M_extract functions through a tmp integer. And typedef's
caused the same problem of having to use a tmp value of the right
type but for example _M_extract_wday_or_month could not have the
same type (in AVR they do) and I'd have to use a temporary anyway
then.

This was the least intrusive way.

>> float pointing
>
> floating point?

:D. Yes.

> A ChangeLog entry would also help.

OK.

> --
> Marc Glisse

Regards,
-- 
Felipe Magno de Almeida


Re: [PATCH] Fix neg overflow expansion (PR middle-end/77594)

2016-09-15 Thread Richard Biener
On September 16, 2016 12:48:04 AM GMT+02:00, Jakub Jelinek  
wrote:
>Hi!
>
>As mentioned in the PR, during expand_arith_overflow expansion
>expand_neg_overflow falls through into expand_addsub_overflow, so
>expands
>roughly the same code twice and keeps perhaps the less efficient
>version
>around.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK (really obvious).

Thanks,
Richard.

>2016-09-16  Jakub Jelinek  
>   Eric Botcazou  
>
>   PR middle-end/77594
>   * internal-fn.c (expand_arith_overflow) : Don't fall
>   through into expand_addsub_overflow after expand_neg_overflow.
>
>   * gcc.target/i386/pr77594.c: New test.
>
>--- gcc/internal-fn.c.jj   2016-08-29 12:17:14.0 +0200
>+++ gcc/internal-fn.c  2016-09-15 09:42:09.693980531 +0200
>@@ -1833,7 +1833,10 @@ expand_arith_overflow (enum tree_code co
>   {
>   case MINUS_EXPR:
> if (integer_zerop (arg0) && !unsr_p)
>-  expand_neg_overflow (loc, lhs, arg1, false);
>+  {
>+expand_neg_overflow (loc, lhs, arg1, false);
>+return;
>+  }
> /* FALLTHRU */
>   case PLUS_EXPR:
> expand_addsub_overflow (loc, code, lhs, arg0, arg1,
>--- gcc/testsuite/gcc.target/i386/pr77594.c.jj 2016-09-15
>09:44:51.879910346 +0200
>+++ gcc/testsuite/gcc.target/i386/pr77594.c2016-09-15
>09:44:43.0 +0200
>@@ -0,0 +1,11 @@
>+/* PR middle-end/77594 */
>+/* { dg-do compile } */
>+/* { dg-options "-O0" } */
>+
>+int
>+foo (int a, int *b)
>+{
>+  return __builtin_sub_overflow (0, a, b);
>+}
>+
>+/* { dg-final { scan-assembler-times "\tjn?o\t" 1 } } */
>
>   Jakub




Re: Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx

2016-09-15 Thread Marc Glisse

On Thu, 15 Sep 2016, Felipe Magno de Almeida wrote:

+   || sizeof(uint32_t) == sizeof(void*)
+|| sizeof(uint16_t) == sizeof(void*),

Indentation is off?


Call _M_extract_* functions family through temporary int objects


Would it make sense to use a template type instead of int for this 
parameter? Or possibly have a typedef that defaults to int (what POSIX 
requires). The hard case would be a libc that uses bitfields for the 
fields of struct tm (that could save some space), but I don't think anyone 
does that.



float pointing


floating point?

A ChangeLog entry would also help.

--
Marc Glisse


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-15 Thread Yuan, Pengfei
> > Setting PARAM_EARLY_INLINING_INSNS to 0 when FDO is enabled should be
> > equivalent to my patch.
> 
> Yes.  This means it's easy to experiment with other values than zero.  
> Basically
> early inlining is supposed to remove abstraction penalty to
> 
>  a) reduce FDO instrumentation overhead
>  b) get more realistic size estimates for the inliner
> 
> a) was particularly important back in time for tramp3d, reducing
> profiling runtime
> 1000-fold.  b) is generally important.
> 
> PARAM_EARLY_INLINING_INSNS is supposed to be a reasonable value to
> get abstraction removed but IIRC we increased it quite a bit to also get more
> early optimization (to get more accurate inliner estimates).
> 
> What I am saying is that reducing PARAM_EARLY_INLINING_INSNS makes
> sense for FDO but reducing it to zero is probably a bit much.
> 
> Can you do your measurements with values between zero and the current
> default of 14 (wow, 14 ... didn't know it's _that_ high currently ;)).
> What's the
> value that crosses the boundary of diminishing returns regarding to code-size
> improvements for you?
> 
> Richard.

Here are the results:

Param  Size (GCC5)Time (GCC5)  Time (GCC7)
0  44686265 (-8.26%)  58.772s  66.332s
1  45692793 (-6.19%)  40.684s  39.220s
2  45556185 (-6.47%)  35.292s  34.328s
3  46251049 (-5.05%)  28.820s  27.136s
4  47028873 (-3.45%)  24.616s  22.200s
5  47495641 (-2.49%)  20.160s  17.800s
6  47520153 (-2.44%)  16.444s  15.656s
14 48708873   5.620s   5.556s

Param: value of PARAM_EARLY_INLINING_INSNS
Size:  code size (.text) of optimized libxul.so
Time:  execution time of instrumented tramp3d (-n 25)

To balance between size reduction of optimized binary and speed penalty
of instrumented binary, I set param=6 as baseline and compare:

Param  Size score  Time score  Total
0  3.39-3.57   -0.18
1  2.54-2.470.07
2  2.65-2.150.50
3  2.07-1.750.32
4  1.41-1.50   -0.09
5  1.02-1.23   -0.21
6  1.00-1.000.00
14 0.00-0.34   -0.34

Therefore, I think param=2 is the best choice.

Is the attached patch OK?

Regards,
Yuan, Pengfei


gcc/ChangeLog
* opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
when FDO is enabled.


diff --git a/gcc/opts.c b/gcc/opts.c
index 39c190d..b59c700 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
 opts->x_param_values, opts_set->x_param_values);
 }

+  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
+  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
+  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
+maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
+  opts->x_param_values, opts_set->x_param_values);
+
   if (opts->x_flag_lto)
 {
 #ifdef ENABLE_LTO
   opts->x_flag_generate_lto = 1;



[PATCH] have __builtin_object_size handle POINTER_PLUS with non-const offset

2016-09-15 Thread Martin Sebor

__builtin_object_size fails for POINTER_PLUS expressions involving
non-constant offsets into objects of known size, causing GCC to fail
to detect (and add instrumentation to prevent) buffer overflow in
some basic cases such as the following:

  void f (unsigned i)
  {
char d [3];
memcpy (d + i, "abcdef", 5);
  }

Since the size of the destination object is known, the call to memcpy
is guaranteed to write past the end of it regardless of the value of
the offset.

The attached patch enhances __builtin_object_size to handle this case
by returning the size of the whole object as the maximum and the size
of the object minus T_MAX for the type of the offset T as the minimum.

The patch also adds handling of ranges even though only very few cases
benefit from it because the VRP pass runs after the object size pass.
The one case that does appear to profit is when the value of the offset
is constrained by its type, as in

  char a [1000];

  unsigned g (unsigned char i)
  {
char *p = a + i;
return __builtin_object_size (p, 2);
  }

Here get_range_info () lets __builtin_object_size determine that
the minimum number of bytes between (a + i) and (a + sizeof s) is
sizeof a - UCHAR_MAX.

The patch results in 64 more checking calls in a Binutils build than
before.

Martin

PS What would be a good way to arrange for the VRP pass to run before
the object size pass so that the latter can benefit more from range
information?  As an experiment I added another instance of the VRP
pass before the object size pass in passes.def and that worked, but
I suspect that running VRP a third time isn't optimal.
PR middle-end/77608 - missing protection on trivially detectable runtime
	buffer overflow

gcc/testsuite/ChangeLog:
	PR middle-end/77608
	* gcc.dg/builtin-object-size-18.c: New test.

gcc/ChangeLog:
	PR middle-end/77608
	* tree-object-size.c (plus_stmt_object_size): Handle non-constant
	offsets.

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-18.c b/gcc/testsuite/gcc.dg/builtin-object-size-18.c
new file mode 100644
index 000..e8e2269
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-18.c
@@ -0,0 +1,76 @@
+/* PR 77608 - missing protection on trivially detectable runtime buffer
+   overflow */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define concat(a, b)   a ## b
+#define CAT(a, b)  concat (a, b)
+
+/* Create a symbol name unique to each tes and object size type.  */
+#define SYM(type)  CAT (CAT (CAT (failure_on_line_, __LINE__), _type_), type)
+
+/* References to the following undefined symbol which is unique for each
+   test case are expected to be eliminated.  */
+#define TEST_FAILURE(type)			\
+  do {		\
+extern void SYM (type)(void);		\
+SYM (type)();\
+  } while (0)
+
+#define bos(obj, type) __builtin_object_size (obj, type)
+#define size(obj, n) ((size_t)n == X ? sizeof *obj : (size_t)n)
+
+/* For convenience.  Substitute for 'sizeof object' in test cases where
+   the size can vary from target to target.  */
+#define X  (size_t)0xdeadbeef
+
+#define test(expect, type, obj)			\
+  do {		\
+if (bos (obj, type)	!= size (obj, expect))	\
+  TEST_FAILURE (type);			\
+  } while (0)
+
+/* Verify that each of the expressions __builtin_object_size(OBJ, n)
+   is folded into Rn.  */
+#define fold(r0, r1, r2, r3, obj)		\
+  do {		\
+test (r0, 0, (obj));			\
+test (r1, 1, (obj));			\
+test (r2, 2, (obj));			\
+test (r3, 3, (obj));			\
+  } while (0)
+
+/* Verify that __builtin_object_size(xBUF + OFF, n) is folded into Rn
+   for the specified OFF and each of the Rn values.  */
+#define FOLD(off, r0, r1, r2, r3)		\
+  do {		\
+fold (r0, r1, r2, r3, gbuf + off);		\
+sink (gbuf);\
+fold (r0, r1, r2, r3, lbuf + off);		\
+sink (lbuf);\
+fold (r0, r1, r2, r3, abuf + off);		\
+sink (abuf);\
+  } while (0)
+
+#define UCHAR_MAX (__SCHAR_MAX__  * 2 + 1)
+#define BUFSIZE   (UCHAR_MAX * 2)
+
+typedef __SIZE_TYPE__ size_t;
+
+void sink (void*, ...);
+
+char gbuf[BUFSIZE];
+
+void test_pointer_plus (char ci, short si, int i, long li)
+{
+  char lbuf[BUFSIZE];
+
+  char *abuf = __builtin_malloc (BUFSIZE);
+
+  FOLD (ci, BUFSIZE, BUFSIZE, BUFSIZE - UCHAR_MAX, BUFSIZE - UCHAR_MAX);
+  FOLD (si, BUFSIZE, BUFSIZE, 0, 0);
+  FOLD ( i, BUFSIZE, BUFSIZE, 0, 0);
+  FOLD (li, BUFSIZE, BUFSIZE, 0, 0);
+}
+
+/* { dg-final { scan-tree-dump-not "failue_on_line" "optimized" } } */
diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 1317ad7..9e84156 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -837,21 +839,49 @@ plus_stmt_object_size (struct object_size_info *osi, tree var, gimple *stmt)
   if (object_sizes[object_size_type][varno] == unknown[object_size_type])
 return false;
 
+  bool estimated = false;
+
+  /* If the OFFSET isn't constant try to determine its range and use
+ either the lower or the upper bound of the range in its place.  */
+  if 

Fix compilation errors with libstdc++v3 for AVR target and allow --enable-libstdcxx

2016-09-15 Thread Felipe Magno de Almeida
Hello,

I've fixed a few compilation errors with libstdc++v3 with AVR-Libc and
was even able to use boost.spirit x3 with small patches to boost as
well.

Attached is the patch.

Please don't hesitate to ask modifications for upstream inclusion.

Right now std::cout is not working (and just include'ing 
makes the final executable 100k bigger), however being able to use all
C++11 and C++14 goodies from the STL is already really great.

Kind regards,
-- 
Felipe Magno de Almeida
From 53952b274a4e4f4578bf4eb2332a7aa07b26a23b Mon Sep 17 00:00:00 2001
From: Felipe Magno de Almeida 
Date: Thu, 15 Sep 2016 15:54:50 -0300
Subject: [PATCH 1/3] Add #ifdef case for 16 bits in cow-stdexcept.cc

Added #ifdef case for when void* is 16 bits so it compiles in AVR
target.
---
 libstdc++-v3/src/c++11/cow-stdexcept.cc | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/src/c++11/cow-stdexcept.cc b/libstdc++-v3/src/c++11/cow-stdexcept.cc
index 31a89df..5b7f9ba 100644
--- a/libstdc++-v3/src/c++11/cow-stdexcept.cc
+++ b/libstdc++-v3/src/c++11/cow-stdexcept.cc
@@ -208,6 +208,8 @@ extern void* _ZGTtnaX (size_t sz) __attribute__((weak));
 extern void _ZGTtdlPv (void* ptr) __attribute__((weak));
 extern uint8_t _ITM_RU1(const uint8_t *p)
   ITM_REGPARM __attribute__((weak));
+extern uint16_t _ITM_RU2(const uint16_t *p)
+  ITM_REGPARM __attribute__((weak));
 extern uint32_t _ITM_RU4(const uint32_t *p)
   ITM_REGPARM __attribute__((weak));
 extern uint64_t _ITM_RU8(const uint64_t *p)
@@ -272,12 +274,15 @@ _txnal_cow_string_C1_for_exceptions(void* that, const char* s,
 static void* txnal_read_ptr(void* const * ptr)
 {
   static_assert(sizeof(uint64_t) == sizeof(void*)
-		|| sizeof(uint32_t) == sizeof(void*),
-		"Pointers must be 32 bits or 64 bits wide");
+		|| sizeof(uint32_t) == sizeof(void*)
+|| sizeof(uint16_t) == sizeof(void*),
+		"Pointers must be 16 bits, 32 bits or 64 bits wide");
 #if __UINTPTR_MAX__ == __UINT64_MAX__
   return (void*)_ITM_RU8((const uint64_t*)ptr);
-#else
+#elif __UINTPTR_MAX__ == __UINT32_MAX__
   return (void*)_ITM_RU4((const uint32_t*)ptr);
+#else
+  return (void*)_ITM_RU2((const uint16_t*)ptr);
 #endif
 }
 
-- 
2.9.3


From 24d0e86a560fc109298b8a530f04e2129ff613c6 Mon Sep 17 00:00:00 2001
From: Felipe Magno de Almeida 
Date: Thu, 15 Sep 2016 18:52:57 -0300
Subject: [PATCH 2/3] Use temporary int objects to access struct tm members

Call _M_extract_* functions family through temporary int objects, so
it doesn't convert from lvalue to rvalue through a temporary in AVR
because of the incompatible types used in AVR-Libc.

This fixes compilation errors with AVR-Libc while compiling libstdc++
for AVR target.
---
 libstdc++-v3/include/bits/locale_facets_nonio.tcc | 44 ---
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 1a4f9a0..cc9d2df 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -659,30 +659,38 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
 		  // Abbreviated weekday name [tm_wday]
 		  const char_type*  __days1[7];
 		  __tp._M_days_abbreviated(__days1);
-		  __beg = _M_extract_name(__beg, __end, __tm->tm_wday, __days1,
+  __tm->tm_wday = __mem;
+		  __beg = _M_extract_name(__beg, __end, __mem, __days1,
 	  7, __io, __tmperr);
+  __mem = __tm->tm_wday;
 		  break;
 		case 'A':
 		  // Weekday name [tm_wday].
 		  const char_type*  __days2[7];
 		  __tp._M_days(__days2);
-		  __beg = _M_extract_name(__beg, __end, __tm->tm_wday, __days2,
+  __mem = __tm->tm_wday;
+		  __beg = _M_extract_name(__beg, __end, __mem, __days2,
 	  7, __io, __tmperr);
+  __tm->tm_wday = __mem;
 		  break;
 		case 'h':
 		case 'b':
 		  // Abbreviated month name [tm_mon]
 		  const char_type*  __months1[12];
 		  __tp._M_months_abbreviated(__months1);
-		  __beg = _M_extract_name(__beg, __end, __tm->tm_mon, 
+  __mem = __tm->tm_mon;
+		  __beg = _M_extract_name(__beg, __end, __mem, 
 	  __months1, 12, __io, __tmperr);
+  __tm->tm_mon = __mem;
 		  break;
 		case 'B':
 		  // Month name [tm_mon].
 		  const char_type*  __months2[12];
 		  __tp._M_months(__months2);
-		  __beg = _M_extract_name(__beg, __end, __tm->tm_mon, 
+  __mem = __tm->tm_mon;
+		  __beg = _M_extract_name(__beg, __end, __mem, 
 	  __months2, 12, __io, __tmperr);
+  __tm->tm_mon = __mem;
 		  break;
 		case 'c':
 		  // Default time and date representation.
@@ -693,18 +701,22 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
 		  break;
 		case 'd':
 		  // Day [01, 31]. [tm_mday]
-		  __beg = _M_extract_num(__beg, __end, __tm->tm_mday, 1, 31, 2,
+  __mem = __tm->tm_mday;
+		  __beg = 

Re: [patch,gomp4] Fix PR74600

2016-09-15 Thread Cesar Philippidis
On 09/09/2016 03:34 PM, Cesar Philippidis wrote:
> By design, the libgomp OpenACC runtime prohibits data clauses with
> aliased addresses from being used in the same construct. E.g., the user
> is not allowed to specify
> 
>   #pragma acc parallel copy(var[0:10]) copy(pvar[0:10])
> 
> where pvar is a pointer to var or if those subarrays overlap. To a
> certain extent, this is what is happening in PR74600. The fortran FE
> implements internal subprograms as nested functions. In nested
> functions, the stack variables in the parent function are shared with
> the nested function. This stack sharing is handled by tree-nested.c
> immediately after gimplification but before omp lowering. Depending on
> if the function is nested or not, tree-nested.c may introduce a FRAME
> and CHAIN struct to represent the parent function's stack frame. Because
> FRAME overlays the parent function's stack frame, the runtime will
> trigger a similar error to the duplicate data clause error from the
> example above. This happens because tree-nested.c adds an implicit PCOPY
> clause for the FRAME and CHAIN variables.
> 
> This patch replaces those PCOPY clauses with PRIVATE clauses. The
> OpenACC spec does not mention how to handle data clauses in internal
> subprograms, but it's clear that those all of the variables specified in
> the data clauses shouldn't be handled altogether as a single monolithic
> unit as that would violate the CREATE/COPYIN/COPYOUT semantics that the
> user can specify. However, this may break a program when a subprogram
> uses a variable in the main program that has a) not passed to the
> subprogram or b) not present via some data construct. This solution does
> work for variables with explicit data clauses because those variables
> end up being remapped, and therefore bypass those FRAME and CHAIN structs.
> 
> Jakub, does OpenMP face a similar problem? If so, what do you think
> about this solution? It would have to be modified for OpenMP targets though.

Alexander, do you have any thoughts on the OpenMP side of this issue?

Thanks,
Cesar

2016-09-09  Cesar Philippidis  

	PR fortran/74600

	gcc/
	* tree-nested.c (convert_nonlocal_reference_stmt): Create a private
	clause for the CHAIN stack frame.
	(convert_local_reference_stmt): Create a private clause for the
	FRAME stack frame.
	(convert_gimple_call): Look for OMP_CLAUSE_PRIVATE when scanning for
	FRAME and CHAIN data clauses.

	libgomp/
	* testsuite/libgomp.oacc-fortran/pr74600.f90: New test.


diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 984fa81..03706ed 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -1433,10 +1433,15 @@ convert_nonlocal_reference_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	{
 	  tree c, decl;
 	  decl = get_chain_decl (info);
-	  c = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_MAP);
+	  if (is_gimple_omp_oacc (stmt))
+	c = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_PRIVATE);
+	  else
+	{
+	  c = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_MAP);
+	  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
+	  OMP_CLAUSE_SIZE (c) = DECL_SIZE_UNIT (decl);
+	}
 	  OMP_CLAUSE_DECL (c) = decl;
-	  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TO);
-	  OMP_CLAUSE_SIZE (c) = DECL_SIZE_UNIT (decl);
 	  OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (stmt);
 	  gimple_omp_target_set_clauses (as_a  (stmt), c);
 	}
@@ -2064,10 +2069,15 @@ convert_local_reference_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	{
 	  tree c;
 	  (void) get_frame_type (info);
-	  c = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_MAP);
+	  if (is_gimple_omp_oacc (stmt))
+	c = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_PRIVATE);
+	  else
+	{
+	  c = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_MAP);
+	  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
+	  OMP_CLAUSE_SIZE (c) = DECL_SIZE_UNIT (info->frame_decl);
+	}
 	  OMP_CLAUSE_DECL (c) = info->frame_decl;
-	  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_TOFROM);
-	  OMP_CLAUSE_SIZE (c) = DECL_SIZE_UNIT (info->frame_decl);
 	  OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (stmt);
 	  gimple_omp_target_set_clauses (as_a  (stmt), c);
 	}
@@ -2538,9 +2548,10 @@ convert_gimple_call (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	  for (c = gimple_omp_target_clauses (stmt);
 	   c;
 	   c = OMP_CLAUSE_CHAIN (c))
-	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
-		&& OMP_CLAUSE_DECL (c) == decl)
-	  break;
+	if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		 || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+		  && OMP_CLAUSE_DECL (c) == decl)
+		break;
 	  if (c == NULL)
 	{
 	  c = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_MAP);
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr74600.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr74600.f90
new file mode 100644
index 000..3b3a0fd
--- /dev/null
+++ 

Fortran, committed: ICE out of memory on displaced implicit character (pr 69963)

2016-09-15 Thread Louis Krupp
Fixed in revision 240168.



[committed] fix-it hints can't contain newlines (yet)

2016-09-15 Thread David Malcolm
I hope to implement newline support within fix-it hints at some point,
but currently it's not supported, and leads to misleading diagnostic
output, so for now, fail gracefully.

Successfully bootstrapped on x86_64-pc-linux-gnu.

Committed to trunk as r240169.

gcc/ChangeLog:
* diagnostic-show-locus.c
(selftest::test_fixit_insert_containing_newline): New function.
(selftest::test_fixit_replace_containing_newline): New function.
(selftest::diagnostic_show_locus_c_tests): Call the above.

libcpp/ChangeLog:
* include/line-map.h (class rich_location): Note that newlines
aren't supported in fix-it text.
* line-map.c (rich_location::add_fixit_insert_before): Reject
attempts to add fix-its containing newlines.
(rich_location::add_fixit_replace): Likewise.
---
 gcc/diagnostic-show-locus.c | 83 +
 libcpp/include/line-map.h   |  2 ++
 libcpp/line-map.c   | 13 +++
 3 files changed, 98 insertions(+)

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 331eb92..f6bfcc9 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -2160,6 +2160,87 @@ test_fixit_consolidation (const line_table_case _)
   }
 }
 
+/* Insertion fix-it hint: adding a "break;" on a line by itself.
+   This will fail, as newlines aren't yet supported.  */
+
+static void
+test_fixit_insert_containing_newline (const line_table_case _)
+{
+  /* Create a tempfile and write some text to it.
+ .0111.
+ .1234567890123456.  */
+  const char *old_content = ("case 'a':\n" /* line 1. */
+"  x = a;\n"  /* line 2. */
+"case 'b':\n" /* line 3. */
+"  x = b;\n");/* line 4. */
+
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
+  line_table_test ltt (case_);
+  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 3);
+
+  /* Add a "break;" on a line by itself before line 3 i.e. before
+ column 1 of line 3. */
+  location_t case_start = linemap_position_for_column (line_table, 5);
+  location_t case_finish = linemap_position_for_column (line_table, 13);
+  location_t case_loc = make_location (case_start, case_start, case_finish);
+  rich_location richloc (line_table, case_loc);
+  location_t line_start = linemap_position_for_column (line_table, 1);
+  richloc.add_fixit_insert_before (line_start, "  break;\n");
+
+  /* Newlines are not yet supported within fix-it hints, so
+ the fix-it should not be displayed.  */
+  ASSERT_TRUE (richloc.seen_impossible_fixit_p ());
+
+  if (case_finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
+return;
+
+  test_diagnostic_context dc;
+  diagnostic_show_locus (, , DK_ERROR);
+  ASSERT_STREQ ("\n"
+   " case 'b':\n"
+   " ^\n",
+   pp_formatted_text (dc.printer));
+}
+
+/* Replacement fix-it hint containing a newline.
+   This will fail, as newlines aren't yet supported.  */
+
+static void
+test_fixit_replace_containing_newline (const line_table_case _)
+{
+  /* Create a tempfile and write some text to it.
+.0.
+.1234567890123.  */
+  const char *old_content = "foo = bar ();\n";
+
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
+  line_table_test ltt (case_);
+  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+
+  /* Replace the " = " with "\n  = ", as if we were reformatting an
+ overly long line.  */
+  location_t start = linemap_position_for_column (line_table, 4);
+  location_t finish = linemap_position_for_column (line_table, 6);
+  location_t loc = linemap_position_for_column (line_table, 13);
+  rich_location richloc (line_table, loc);
+  source_range range = source_range::from_locations (start, finish);
+  richloc.add_fixit_replace (range, "\n =");
+
+  /* Newlines are not yet supported within fix-it hints, so
+ the fix-it should not be displayed.  */
+  ASSERT_TRUE (richloc.seen_impossible_fixit_p ());
+
+  if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
+return;
+
+  test_diagnostic_context dc;
+  diagnostic_show_locus (, , DK_ERROR);
+  ASSERT_STREQ ("\n"
+   " foo = bar ();\n"
+   " ^\n",
+   pp_formatted_text (dc.printer));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -2176,6 +2257,8 @@ diagnostic_show_locus_c_tests ()
   for_each_line_table_case (test_diagnostic_show_locus_one_liner);
   for_each_line_table_case (test_diagnostic_show_locus_fixit_lines);
   for_each_line_table_case (test_fixit_consolidation);
+  for_each_line_table_case (test_fixit_insert_containing_newline);
+  for_each_line_table_case (test_fixit_replace_containing_newline);
 }
 
 } // namespace selftest
diff --git a/libcpp/include/line-map.h 

Re: [PATCH] Don't clobber dominator info in the combiner (PR target/77526)

2016-09-15 Thread Jakub Jelinek
On Thu, Sep 15, 2016 at 06:20:11PM -0500, Segher Boessenkool wrote:
> On Fri, Sep 16, 2016 at 12:50:44AM +0200, Jakub Jelinek wrote:
> > As mentioned in the PR, combiner sometimes calls
> > purge_all_dead_edges or purge_dead_edges that can invalidate the dominator
> > info if it is computed.  Other passes like CSE in that case free the
> > dominance info, this patch does the same in the combiner.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Okay, but does it need the same for the post-dominators?  Oh, those
> never exist longer than a single pass?

Yeah, indeed, passes that compute post-dominators are required to free them
as well, and combine doesn't.

Jakub


Re: [PATCH] Don't clobber dominator info in the combiner (PR target/77526)

2016-09-15 Thread Segher Boessenkool
Hi Jakub,

On Fri, Sep 16, 2016 at 12:50:44AM +0200, Jakub Jelinek wrote:
> As mentioned in the PR, combiner sometimes calls
> purge_all_dead_edges or purge_dead_edges that can invalidate the dominator
> info if it is computed.  Other passes like CSE in that case free the
> dominance info, this patch does the same in the combiner.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Okay, but does it need the same for the post-dominators?  Oh, those
never exist longer than a single pass?

Thanks,


Segher


[PATCH PR other/77609] Let the assembler choose ELF section types for miscellaneous named sections

2016-09-15 Thread Roland McGrath
This fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77609 (which I've
just filed).

OK for trunk?

I'm not sure if this kind of fix is appropriate for gcc-6-branch or not,
but I'd like to backport it there too if it is acceptable.


Thanks,
Roland


gcc/
2016-09-15  Roland McGrath  <

PR other/77609
* varasm.c (default_section_type_flags): Set SECTION_NOTYPE for
any section for which we don't know a specific type it should have,
regardless of name.  Previously this was done only for the exact
names ".init_array", ".fini_array", and ".preinit_array".

diff --git a/gcc/varasm.c b/gcc/varasm.c
index 00a9b30..0d7ea38 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6210,15 +6210,20 @@ default_section_type_flags (tree decl, const
char *name, int reloc)
   || strncmp (name, ".gnu.linkonce.tb.", 17) == 0)
 flags |= SECTION_TLS | SECTION_BSS;

-  /* These three sections have special ELF types.  They are neither
- SHT_PROGBITS nor SHT_NOBITS, so when changing sections we don't
- want to print a section type (@progbits or @nobits).  If someone
- is silly enough to emit code or TLS variables to one of these
- sections, then don't handle them specially.  */
-  if (!(flags & (SECTION_CODE | SECTION_BSS | SECTION_TLS))
-  && (strcmp (name, ".init_array") == 0
- || strcmp (name, ".fini_array") == 0
- || strcmp (name, ".preinit_array") == 0))
+  /* Various sections have special ELF types that the assembler will
+ assign by default based on the name.  They are neither SHT_PROGBITS
+ nor SHT_NOBITS, so when changing sections we don't want to print a
+ section type (@progbits or @nobits).  Rather than duplicating the
+ assembler's knowledge of what those special name patterns are, just
+ let the assembler choose the type if we don't know a specific
+ reason to set it to something other than the default.  SHT_PROGBITS
+ is the default for sections whose name is not specially known to
+ the assembler, so it does no harm to leave the choice to the
+ assembler when @progbits is the best thing we know to use.  If
+ someone is silly enough to emit code or TLS variables to one of
+ these sections, then don't handle them specially.  */
+  if (!(flags & (SECTION_CODE | SECTION_BSS | SECTION_TLS | SECTION_ENTSIZE))
+  && !(HAVE_COMDAT_GROUP && (flags & SECTION_LINKONCE)))
 flags |= SECTION_NOTYPE;

   return flags;


[PATCH] Don't clobber dominator info in the combiner (PR target/77526)

2016-09-15 Thread Jakub Jelinek
Hi!

As mentioned in the PR, combiner sometimes calls
purge_all_dead_edges or purge_dead_edges that can invalidate the dominator
info if it is computed.  Other passes like CSE in that case free the
dominance info, this patch does the same in the combiner.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-09-16  Jakub Jelinek  

PR target/77526
* combine.c (rest_of_handle_combine): If any edges have been purged,
free dominators if available.

* gcc.target/i386/pr77526.c: New test.

--- gcc/combine.c.jj2016-08-12 17:33:46.0 +0200
+++ gcc/combine.c   2016-09-15 16:12:26.154982064 +0200
@@ -14393,6 +14393,8 @@ rest_of_handle_combine (void)
  instructions.  */
   if (rebuild_jump_labels_after_combine)
 {
+  if (dom_info_available_p (CDI_DOMINATORS))
+   free_dominance_info (CDI_DOMINATORS);
   timevar_push (TV_JUMP);
   rebuild_jump_labels (get_insns ());
   cleanup_cfg (0);
--- gcc/testsuite/gcc.target/i386/pr77526.c.jj  2016-09-15 16:13:37.149105476 
+0200
+++ gcc/testsuite/gcc.target/i386/pr77526.c 2016-09-15 16:13:13.0 
+0200
@@ -0,0 +1,13 @@
+/* PR target/77526 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-Os -fno-forward-propagate -fno-gcse 
-fno-rerun-cse-after-loop -mstringop-strategy=byte_loop -Wno-psabi" } */
+
+typedef char U __attribute__((vector_size(64)));
+typedef __int128 V __attribute__((vector_size(64)));
+
+V
+foo (int a, int b, __int128 c, U u)
+{
+  u = (u >> (u & 7)) | (u << -(u & 7));
+  return a + b + c + (V)u;
+}

Jakub


[PATCH] Fix neg overflow expansion (PR middle-end/77594)

2016-09-15 Thread Jakub Jelinek
Hi!

As mentioned in the PR, during expand_arith_overflow expansion
expand_neg_overflow falls through into expand_addsub_overflow, so expands
roughly the same code twice and keeps perhaps the less efficient version
around.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-09-16  Jakub Jelinek  
Eric Botcazou  

PR middle-end/77594
* internal-fn.c (expand_arith_overflow) : Don't fall
through into expand_addsub_overflow after expand_neg_overflow.

* gcc.target/i386/pr77594.c: New test.

--- gcc/internal-fn.c.jj2016-08-29 12:17:14.0 +0200
+++ gcc/internal-fn.c   2016-09-15 09:42:09.693980531 +0200
@@ -1833,7 +1833,10 @@ expand_arith_overflow (enum tree_code co
{
case MINUS_EXPR:
  if (integer_zerop (arg0) && !unsr_p)
-   expand_neg_overflow (loc, lhs, arg1, false);
+   {
+ expand_neg_overflow (loc, lhs, arg1, false);
+ return;
+   }
  /* FALLTHRU */
case PLUS_EXPR:
  expand_addsub_overflow (loc, code, lhs, arg0, arg1,
--- gcc/testsuite/gcc.target/i386/pr77594.c.jj  2016-09-15 09:44:51.879910346 
+0200
+++ gcc/testsuite/gcc.target/i386/pr77594.c 2016-09-15 09:44:43.0 
+0200
@@ -0,0 +1,11 @@
+/* PR middle-end/77594 */
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+int
+foo (int a, int *b)
+{
+  return __builtin_sub_overflow (0, a, b);
+}
+
+/* { dg-final { scan-assembler-times "\tjn?o\t" 1 } } */

Jakub


Re: C/C++ PATCH to implement -Wbool-operation (PR c/77490)

2016-09-15 Thread Eric Gallager
Hi, I have some comments/questions in-line:

On 9/15/16, Marek Polacek  wrote:
> Now that the C++ FE boolean in/decrement changes are in, I can move
> forwards with this patch which implements a new warning named -Wbool-operation
> (better names?) which warns about nonsensical code such as ~bool, ~(i == 10), 
> or
> bool-- (in C).
>
> It could also warn for other operations such as * or / or <<, but I wasn't
> sure about those, so this isn't included in this patch.
>
> Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?
>
> 2016-09-15  Marek Polacek  
>
>   PR c/77490
>   * c.opt (Wbool-operation): New.
>
>   * c-typeck.c (build_unary_op): Warn about bit not on expressions that
>   have boolean value.  Warn about ++/-- on booleans.
>
>   * typeck.c (cp_build_unary_op): Warn about bit not on expressions that
>   have boolean value.
>
>   * doc/invoke.texi: Document -Wbool-operation.
>
>   * c-c++-common/Wbool-operation-1.c: New test.
>   * gcc.dg/Wall.c: Use -Wno-bool-operation.
>   * gcc.dg/Wbool-operation-1.c: New test.
>
> diff --git gcc/c-family/c.opt gcc/c-family/c.opt
> index c55c7c3..fb6f2d1 100644
> --- gcc/c-family/c.opt
> +++ gcc/c-family/c.opt
> @@ -315,6 +315,10 @@ Wbool-compare
>  C ObjC C++ ObjC++ Var(warn_bool_compare) Warning LangEnabledBy(C ObjC C++
> ObjC++,Wall)
>  Warn about boolean expression compared with an integer value different from
> true/false.
>
> +Wbool-operation
> +C ObjC C++ ObjC++ Var(warn_bool_op) Warning LangEnabledBy(C ObjC C++
> ObjC++,Wall)
> +Warn about certain operations on boolean expressions.
> +
>  Wframe-address
>  C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC C++
> ObjC++,Wall)
>  Warn when __builtin_frame_address or __builtin_return_address is used
> unsafely.
> diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
> index 4dec397..c455d22 100644
> --- gcc/c/c-typeck.c
> +++ gcc/c/c-typeck.c
> @@ -4196,6 +4196,22 @@ build_unary_op (location_t location, enum tree_code
> code, tree xarg,
> || (typecode == VECTOR_TYPE
> && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg
>   {
> +   tree e = arg;
> +
> +   /* Warn if the expression has boolean value.  */
> +   while (TREE_CODE (e) == COMPOUND_EXPR)
> + e = TREE_OPERAND (e, 1);
> +
> +   if ((TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE
> +|| truth_value_p (TREE_CODE (e)))
> +   && warning_at (location, OPT_Wbool_operation,
> +  "%<~%> on a boolean expression"))
> + {
> +   gcc_rich_location richloc (location);
> +   richloc.add_fixit_insert_before (location, "!");
> +   inform_at_rich_loc (, "did you mean to use logical "
> +   "not?");
> + }
> if (!noconvert)
>   arg = default_conversion (arg);
>   }
> @@ -4306,6 +4322,16 @@ build_unary_op (location_t location, enum tree_code
> code, tree xarg,
>   "decrement of enumeration value is invalid in C++");
>   }
>
> +  if (TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE)
> + {
> +   if (code == PREINCREMENT_EXPR || code == POSTINCREMENT_EXPR)
> + warning_at (location, OPT_Wbool_operation,
> + "increment of a boolean expression");
> +   else
> + warning_at (location, OPT_Wbool_operation,
> + "decrement of a boolean expression");
> + }
> +
>/* Ensure the argument is fully folded inside any SAVE_EXPR.  */
>arg = c_fully_fold (arg, false, NULL);
>
> diff --git gcc/cp/typeck.c gcc/cp/typeck.c
> index c53a85a..9fc1a4b 100644
> --- gcc/cp/typeck.c
> +++ gcc/cp/typeck.c
> @@ -5853,7 +5853,16 @@ cp_build_unary_op (enum tree_code code, tree xarg,
> bool noconvert,
>  arg, true)))
>   errstring = _("wrong type argument to bit-complement");
>else if (!noconvert && CP_INTEGRAL_TYPE_P (TREE_TYPE (arg)))
> - arg = cp_perform_integral_promotions (arg, complain);
> + {
> +   /* Warn if the expression has boolean value.  */
> +   location_t location = EXPR_LOC_OR_LOC (arg, input_location);
> +   if ((TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE
> +|| truth_value_p (TREE_CODE (arg)))
> +   && warning_at (location, OPT_Wbool_operation,
> +  "%<~%> on a boolean expression"))
> + inform (location, "did you mean to use logical not (%)?");
> +   arg = cp_perform_integral_promotions (arg, complain);
> + }
>break;
>
>  case ABS_EXPR:
> diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
> index 8eb5eff..976e137 100644
> --- gcc/doc/invoke.texi
> +++ gcc/doc/invoke.texi
> @@ -256,8 +256,8 @@ Objective-C and Objective-C++ Dialects}.
>  -pedantic-errors @gol
>  -w  -Wextra  -Wall  -Waddress  -Waggregate-return  @gol
>  -Wno-aggressive-loop-optimizations 

Re: Debug algorithms

2016-09-15 Thread François Dumont

On 14/09/2016 12:33, Jonathan Wakely wrote:

On 06/09/16 22:26 +0200, François Dumont wrote:

Hi

   Any final decision regarding this patch ?

   Note that __std_a namespace is optional, I can remove this change 
from the patch if you want.


Can you describe what difference that would make?


Then library will be internally exposed to the alternative mode. At the 
moment when any mode is activated it also impact usage of algos used for 
example in implementation of containers. When it is parallel mode I 
think it is on purpose. When it is debug mode it is useless because 
containers already have a debug doing all the checks. But those checks 
are not expensive so it is not a problem to use std:: directly.



Would it avoid all the changes to Parallel Mode?


No, changes to parallel mode comes from the fact that with this patch 
algos now have 2 available modes. If an algo is specialized in one mode 
it has to be in the other too to avoid complex namespace management 
depending on each algo. So I had to add a number of algos to parallel 
mode even if they do not have any parallel implementation.



Given that Parallel Mode is hardly used
by anyone, and will be supersedeed by the C++17 parallel algorithms, I
don't think we should be spending much time making huge changes to it.


I even had another patch to make some cleanup to parallel mode after 
this one. I might keep it for myself then. There won't be anything 
reusable for the C++17 version ?




The patch is very large, but a lot of it seems to have nothing to do
with the actual change e.g.

-  template
+  template
void
-sort(_RAIter, _RAIter, _Compare);
+sort(_RAIter, _RAIter, _Comp);

This seems to be some unrelated cleanup that just makes the patch
larger.


Yes, as I spent a lot of time working on it I get tired of the 
inconsistency of algos declaration in each mode or in algorithmfwd.h and 
decided to generalize one version. I can try to revert some of those 
changes.


But most of changes are rather reorganization of algos in headers to 
avoid playing with _GLIBCXX_BEGIN_XXX_NAMESPACE and 
_GLIBCXX_END_XXX_NAMESPACE too often. I even wanted to move non std 
algos, the helpers ones, in __details namespace.


There are also a number of std:: added to iterator_traits usages as we 
are not always using it from std namespace directly but sometimes from 
std::__cx1998.




The __std_a namespace is just used for indirection to refer to either
std::foo or std::_GLIBCXX_STD_A::foo, is that right?


Yes, I took this approach from the parallel mode __gnu_sequential.


So no function
templates in normal mode change mangled name, because they're still in
namespace std?

Sorry, I don't get your point here.


This is a pre-existing problem, but I hate the fact that depending on
the mode, _GLIBCXX_STD_A is either the top-level namespace, or a
nested one. That means it can't be used to form a fully-qualified
name. e.g. _GLIBCXX_STD_A::copy is either std::copy or
__cxx1998::copy. Why isn't the latter std::__cxx1998::copy instead?
It would be cleaner if ::_GLIBCXX_STD_A::copy was valid, i.e. the
macro expands to std or std::__cxx1998. For comparison,
 does this:

 #if _GLIBCXX_USE_STD_SPEC_FUNCS
 # define _GLIBCXX_MATH_NS ::std
 #elif defined(_GLIBCXX_TR1_CMATH)
 namespace tr1
 {
 # define _GLIBCXX_MATH_NS ::std::tr1

This means _GLIBCXX_MATH_NS::foo is always valid in any scope.


Yes I agree. I even wondered why we were not using ::std but I see that 
it is done with _GLIBCXX_MATH_NS so we could do the same. With this 
patch _GLIBCXX_STD_A is now only used to explicit the usage of the 
normal algo. The parallel mode cleanup I had prepare also remove a 
number of usage of _GLIBCXX_STD_A replaced by __gnu_sequential.





It's not clear to me from reading the patch when I should use __std_a
and when I should use std. If I forget to use __std_a (and I *will*
forget, and other people probably will too) what happens? We get
slightly slower code in Debug Mode? Anything else?

Nothing else.


But if you get rid of __std_a then maybe the questions above don't
need to be answered anyway.


Some other feedback:

I don't like the name __cxx1998_a. The __cxx1998 name is historical,
but it contains C++11 features too. For a new namespace I'd prefer a
name that isn't misleading. The "cxx1998" part is untrue, and the
important part is "a" which doesn't tell you anything.

I wanted to rename __cxx1998 into __normal and so have a __normal_a.


Does the earlier unwrapping of debug iterators lose any safety, or
will all the same checks always get run?  e.g. if you call an algo in
debug mode today and it increments past the end then the iterator
aborts. If we unwrap the debug iterator and invoke a normal algo,
will we lose that check?
No, we unwrap debug iterators only if we have been able to check that we 
won't reach past-the-end. We could re-wrap the iterator in a proxy 
checking only for past-the-end problem but it would be overkill.



Assuming the debug algo 

Re: [PATCH, updated] Add a warning for suspicious use of conditional expressions in boolean context

2016-09-15 Thread Bernd Edlinger
On 09/15/16 21:54, Jason Merrill wrote:
> On Thu, Sep 15, 2016 at 1:42 PM, Joseph Myers  wrote:
>> On Thu, 15 Sep 2016, Bernd Edlinger wrote:
>>
>>> So level 1 could be enabled with -Wall and level 2 could be enabled
>>> with -pedantic and/or -Wpedantic.
>>
>> But this warning has absolutely nothing to do with things that are
>> prohibited by, or undefined in, ISO standards, and so it would be
>> completely inappropriate for -pedantic to affect it in any way.
>
> Agreed.  We already pass several warning flags to GCC bootstrap, we
> can add another.
>

Yes.

Maybe -Wextra could also enable the more strict warning.

That option is already enabled in the bootstrap and elsewhere.


Bernd.


Re: [PATCH, updated] Add a warning for suspicious use of conditional expressions in boolean context

2016-09-15 Thread Jason Merrill
On Thu, Sep 15, 2016 at 1:42 PM, Joseph Myers  wrote:
> On Thu, 15 Sep 2016, Bernd Edlinger wrote:
>
>> So level 1 could be enabled with -Wall and level 2 could be enabled
>> with -pedantic and/or -Wpedantic.
>
> But this warning has absolutely nothing to do with things that are
> prohibited by, or undefined in, ISO standards, and so it would be
> completely inappropriate for -pedantic to affect it in any way.

Agreed.  We already pass several warning flags to GCC bootstrap, we
can add another.

Jason


Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-15 Thread Jason Merrill
OK, one more:

On Thu, Sep 15, 2016 at 8:12 AM, Christophe Lyon
 wrote:
> On 15 September 2016 at 11:26, Rainer Orth  
> wrote:
>> Hi Jason,
>>
 I confirm no change on arm* either.
 On aarch64, gen-attrs-[25]1.C, and devirt-33 now work.
 name-clash11.C still fails on both targets
>>>
>>> Ah, I needed to remove the limit on field alignment as well.  This
>>> seems to fix things.
>>
>> The failures are gone on Solaris/SPARC (sparc-sun-solaris2.12) now, with
>> one exception (when run as C test):
>>
>> 32-bit:
>>
>> FAIL: c-c++-common/pr52181.c  -Wc++-compat   (test for bogus messages, line 
>> 11)
>> FAIL: c-c++-common/pr52181.c  -Wc++-compat  (test for excess errors)
>>
>
> Hi Jason,
>
> Same on arm* if that's easier for you.
>
> Christophe
>
>
>> This is new with you last patch.
>>
>> Excess errors:
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/c-c++-common/pr52181.c:6:1: 
>> warning: requested alignment 16 is larger than 8 [-Wattributes]
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/c-c++-common/pr52181.c:8:1: 
>> warning: requested alignment 16 is larger than 8 [-Wattributes]
>>
>> Rainer
>>
>> --
>> -
>> Rainer Orth, Center for Biotechnology, Bielefeld University
commit 1a1beb3ab1279f83d411e96d7b5abbf7875c7956
Author: Jason Merrill 
Date:   Thu Sep 15 12:30:03 2016 -0400

* c-common.c (check_cxx_fundamental_alignment_constraints): Check
DECL_EXTERNAL.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 57b6671..fc25686 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -7870,7 +7870,7 @@ check_cxx_fundamental_alignment_constraints (tree node,
 
   if (VAR_P (node))
 {
-  if (TREE_STATIC (node))
+  if (TREE_STATIC (node) || DECL_EXTERNAL (node))
/* For file scope variables and static members, the target supports
   alignments that are at most MAX_OFILE_ALIGNMENT.  */
max_align = MAX_OFILE_ALIGNMENT;


Re: C/C++ PATCH for c/77423 (bogus warning with -Wlogical-not-parentheses)

2016-09-15 Thread Jeff Law

On 09/05/2016 08:28 AM, Marek Polacek wrote:

test.c:10:8: note: add parentheses around left hand side expression to
silence this warning
r += !a == ~b;
 ^~
 ( )

this will not fix it, but make  it worse.
I think a better warning would be
warning: ~ on boolean value, did you mean ! ?


Could you please open a PR?  I'll take care of it.

Still not sure about other operations.  I guess no one would
object to warning on bool1 % bool2, but should we warn for
bool1 + bool2?
Wouldn't the desire for a warning largely depend on the type of the 
result?  So I'll assume you're referring to a boolean result :-)


bool1 + bool2 does have meaning though, even when the result is a bool. 
You have to be leery of both having a true value as that causes an overflow.


Jeff


Re: [PATCH] C: fixits for modernizing structure member designators

2016-09-15 Thread David Malcolm
On Thu, 2016-09-15 at 11:27 -0600, Jeff Law wrote:
> On 08/30/2016 08:38 AM, David Malcolm wrote:
> > This patch adds fix-it hints to our warning for old-style structure
> > member designators, showing how to modernize them to C99 form.
> > 
> > For example:
> > 
> > modernize-named-inits.c:19:5: warning: obsolete use of designated
> > initializer with ‘:’ [-Wpedantic]
> >   foo: 1,
> >  ^
> >   .  =
> > 
> > In conjunction with the not-yet-in-trunk -fdiagnostics-generate
> > -patch,
> > this can generate patches like this:
> > 
> > --- modernize-named-inits.c
> > +++ modernize-named-inits.c
> > @@ -16,7 +16,7 @@
> >  /* Old-style named initializers.  */
> > 
> >  struct foo old_style_f = {
> > - foo: 1,
> > - bar: 2,
> > + .foo= 1,
> > + .bar= 2,
> >  };
> > 
> > Successfully bootstrapped on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> > 
> > gcc/c/ChangeLog:
> > * c-parser.c (c_parser_initelt): Provide fix-it hints for
> > modernizing old-style structure member designator to C99 style.
> > 
> > gcc/testsuite/ChangeLog:
> > * gcc.dg/modernize-named-inits.c: New test case.
> OK.
> jeff

Thanks.  Marek expressed some concern about how the fix-it hint is
printed:
  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg02112.html
I've been looking at rewriting how fixits get printed, to print the
affected range of the affected lines, which would address his concern,
but this work isn't finished yet.

Should that work be a blocker for committing this modernize-named
-inits.c fix-it patch, or is the patch good enough as-is?
(for example, -fdiagnostics-generate-patch does a good job on it as
-is).

Dave


Re: [PATCH, updated] Add a warning for suspicious use of conditional expressions in boolean context

2016-09-15 Thread Joseph Myers
On Thu, 15 Sep 2016, Bernd Edlinger wrote:

> So level 1 could be enabled with -Wall and level 2 could be enabled
> with -pedantic and/or -Wpedantic.

But this warning has absolutely nothing to do with things that are 
prohibited by, or undefined in, ISO standards, and so it would be 
completely inappropriate for -pedantic to affect it in any way.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] C: fixits for modernizing structure member designators

2016-09-15 Thread Jeff Law

On 08/30/2016 08:38 AM, David Malcolm wrote:

This patch adds fix-it hints to our warning for old-style structure
member designators, showing how to modernize them to C99 form.

For example:

modernize-named-inits.c:19:5: warning: obsolete use of designated initializer 
with ‘:’ [-Wpedantic]
  foo: 1,
 ^
  .  =

In conjunction with the not-yet-in-trunk -fdiagnostics-generate-patch,
this can generate patches like this:

--- modernize-named-inits.c
+++ modernize-named-inits.c
@@ -16,7 +16,7 @@
 /* Old-style named initializers.  */

 struct foo old_style_f = {
- foo: 1,
- bar: 2,
+ .foo= 1,
+ .bar= 2,
 };

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c/ChangeLog:
* c-parser.c (c_parser_initelt): Provide fix-it hints for
modernizing old-style structure member designator to C99 style.

gcc/testsuite/ChangeLog:
* gcc.dg/modernize-named-inits.c: New test case.

OK.
jeff



Re: [PATCH 0/8] make next_*_insn take rtx_insn * arguments

2016-09-15 Thread Trevor Saunders
On Thu, Sep 15, 2016 at 10:27:56AM -0600, Jeff Law wrote:
> On 09/15/2016 10:10 AM, Trevor Saunders wrote:
> > On Thu, Sep 15, 2016 at 12:52:59PM +0200, Bernd Schmidt wrote:
> > > On 09/14/2016 09:21 PM, tbsaunde+...@tbsaunde.org wrote:
> > > 
> > > > Basically $subject.  First change variable's type to rtx_insn * where 
> > > > possible.
> > > > Then change the functions and fixup callers where it is still necessary 
> > > > to
> > > > cast.
> > > 
> > > #2, #4 and #8 look good and can be applied if they work independently of 
> > > the
> > > others.
> > 
> > at most #2 should depend on #1 so it should be fine and I can check on
> > the others.
> > 
> > > Less certain about some of the others which introduce additional casts.
> > 
> > yeah, its somewhat unfortunate, though one way or another we will need
> > to add casts I think the question is just how many we will accept and
> > where.
> In my mind the casts represent the "bounds" of how far we've taken this
> work.  They occur at the boundaries where we haven't converted something
> from an "rtx" to an "rtx_insn *" and allow us to do the work piecemeal
> rather than all-at-once.
> 
> What I don't have a sense of is when we'll be able to push rtx_insn * far
> enough that we're able to start removing casts.

Well, this series does remove a few, though I'm sure it is a net
addition.

> And that might be a good way to prioritize the next batch of work.  Pick a
> cast, remove it and deal with the fallout.  :-)

yeah, that's more or less what I did here.  each of the next_X_insn
methods took a rtx and cast it to rtx_insn *, so I changed it to take a
rtx_insn * and dropped the cast.  Then I fixed up the fallout and moved
the conversion of variable types to the beginning of the series since it
is less objectionable.

> > 
> > > Maybe LABEL_REF_LABEL needs converting first, and reorg.c has a few
> > > variables that might have to be made rtx_insn * in patch #7 to avoid 
> > > casts.
> > 
> > LABEL_REF_LABEL might be doable, its a good idea I'll look into.  The
> > reorg.c stuff around target_label is rather complicated unfortunately.
> > In the end I of course agree the variables should be rtx_insn *.
> > However currently things are assigned to that variable that are not
> > insns.  So we need to break the variable up, but its involved in a lot
> > of code I don't think I know well enough to really refactor.  For
> > example it looks like target_label can hold a value between iterations
> > of the loop, I suspect that would be a bug, but I'm not really sure.
> I can probably help with reorg.  Hell, you might even be referring to my
> code!

heh, that'd be useful.  Though I'm not really sure how important it is
to clean up code specific to targets with delay slots.

Trev

> jeff


Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped

2016-09-15 Thread Jeff Law

On 09/14/2016 04:11 PM, Segher Boessenkool wrote:

On Wed, Sep 14, 2016 at 01:33:04PM -0600, Jeff Law wrote:

On 09/14/2016 01:03 PM, Segher Boessenkool wrote:

If you think about it, conceptually we want the return insn to make the
callee saved registers "used" so that DCE, regrename and friends don't
muck with them.  The fact that we don't is as much never having to care
all that much until recently.


(There is no return insn at those exits; these are exits *without*
successor block, not the exit block).

Ugh.  Anywhere we could attach this stuff in the insn chain?  If not,
the DF side of this problem gets uglier.


I put the USEs at the start of that noreturn basic block.
Just naked USEs of the REG?  For some reason I was uneasy about this, 
but I can't recall why, maybe I just latched onto 
CALL_INSN_FUNCTION_USAGE and wanted to use the same model.


Seems like we should just go with the naked USE of the REGs.



While it is just a reachability problem, I don't think we need to solve
it if we mark anything that was separately shrink wrapped as live at all
the exit points.


Agreed, but why does it work if not separately shrink-wrapping anything?
And why does it break on things that are *not* separately wrapped *anywhere*?

That I don't know...  It's a hell of a mystery.

Jeff



Re: [PATCH] Improve string::clear() performance

2016-09-15 Thread Cong Wang
On Thu, Sep 15, 2016 at 2:08 AM, Jonathan Wakely  wrote:
> On 14/09/16 10:41 -0700, Cong Wang wrote:
>>
>> For long term, I think gcc should have something as simple as
>> 'Signed-off-by' for Linux kernel, otherwise too much work for first-time
>> contributors like me. We all want to save time on this, don't we? ;)
>
>
> Signed-off-by wouldn't help. The copyright assignment is done so that
> the FSF owns the copyright in the majority of the GCC code. If I add
> a Signed-off-by to your patch that doesn't have any effect on your
> copyright of the code.
>
> It certainly does add a hurdle for contributors to GCC, but it's not a
> policy that is likely to change.
>
> Anyway, I'll fix this one way or another ASAP. Thanks again.

Thanks for your kind explanation.

I think I already finish the copyright thing after exchanging emails with
ass...@gnu.org. So copyright is not a blocker for consideration of
my patch any more.


Re: [PATCH, updated] Add a warning for suspicious use of conditional expressions in boolean context

2016-09-15 Thread Bernd Edlinger
On 09/15/16 18:23, Jeff Law wrote:
> On 09/15/2016 10:00 AM, Bernd Edlinger wrote:
>> On 09/15/16 17:44, Jeff Law wrote:
>>> On 09/14/2016 12:11 PM, Jason Merrill wrote:

 I think we could have both, where the weaker form is part of -Wall and
 people can explicitly select the stronger form.
>>> That's been a fairly standard way to handle this kind of thing.  It
>>> works for me.
>>>
>>> Jeff
>>
>> The warning could for instance be more aggressive when -pedantic is in
>> effect?
> I wouldn't do it on -pedantic.  We've usually used levels or -Wfoo
> -Wfoo-bar kinds of schemes.

It would be kind of good to enable the extended warning level on
the gcc bootstrap, where we have -Wall and -pedantic and more.

So level 1 could be enabled with -Wall and level 2 could be enabled
with -pedantic and/or -Wpedantic.


Bernd.


Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate components

2016-09-15 Thread Jeff Law

On 09/14/2016 07:38 AM, Segher Boessenkool wrote:

On Mon, Sep 12, 2016 at 12:02:50PM -0600, Jeff Law wrote:

As a final optimisation, if a block needs a prologue and its immediate
dominator has the block as a post-dominator, the dominator gets the
prologue as well.

So why not just put it in the idom and not in the dominated block?


That's what it does :-)

Then I must have mis-parsed.  Thanks for clarifying.


"As a final optimisation, if a block needs a prologue and its immediate
dominator has the block as a post-dominator, ***that immediate dominator***
gets the prologue as well."

That is clearer I hope :-)

It is :-)




Hmm, then explain again why DCE is mucking up?  I don't immediately see
how EPILOGUE_BEG notes come into play with DCE.  It seems to rely on the
DF data and AFAICT DF only cares about the EPILOGUE_BEG note in
can_move_insns_across which shouldn't be used by DCE.


The register restore *is* dead code, but we need to have the same CFI
for all convergent paths.
OK.   I think I was conflating multiple issues.  So we need to keep the 
restore alive so that we have the same CFI across those paths, even 
though it appears dead on one or more paths.


I think this points us back to what you were experimenting with to 
address the regrename problems -- specifically creating "uses" at those 
key points.  That solves the DCE problem as well as one of the regrename 
problems, right?




Whether or not an edge needs a prologue or epilogue is a function not
just of the state at the head or tail of the edge, but instead is a
function of global dataflow propagation?  Thus we can't disqualify until
after we've done the dataflow propagation?  Right?


We can figure out before we decide what blocks need what components, what
edges can not get a prologue or epilogue for which components.  This
complicates the selection algorithm a whole lot, for not much gain that
I have seen so far, so I just give up in the cases that end up "bad".
OK.  I'll drop it :-)  It was more a mental exercise in understanding 
then something I think needed to change.


Jeff


Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Wilco Dijkstra
Jakub Jelinek 
>
> Those are the generic definitions, all targets that care about performance
> obviously should replace them with assembly code.

No, that's exactly my point, it is not true that it is always best to write 
assembly
code. For example there is absolutely no benefit in writing an optimized 
mempcpy. 
At best it is as fast as memcpy, and therefore expanding mempcpy into
memcpy (p, q, n) + n would have the same performance. In actual use memcpy
will then be slightly faster due to lower I-cache pressure.

Wilco



Re: [PATCH] Allow FP to be used as a call-saved registe

2016-09-15 Thread Jeff Law

On 09/13/2016 05:10 AM, Tamar Christina wrote:

Hi Jeff,


On 12/09/16 18:16, Jeff Law wrote:

On 09/05/2016 08:59 AM, Tamar Christina wrote:

Hi All,

This patch allows the FP register to be used as a call-saved
register when -fomit-frame-pointer is used.

The change is done in such a way that the defaults do not change.
To use the FP register both -fomit-frame-pointer and
-fcall-saved- need to be used.

Regression ran on aarch64-none-linux-gnu and no regressions.
Bootstrapped and ran regressions on `x86_64` and no regressions.

A new test fp_free_1 was added to test functionality.

Ok for trunk?

Thanks,
Tamar

PS. I don't have commit rights so if OK can someone apply the patch
for me.

gcc/
2016-09-01  Tamar Christina  

* gcc/reginfo.c (fix_register): Allow FP to be set if
-fomit-frame-pointer.

I'm a little surprised you need this.  Most ports allow use of FP as a
call-saved register with -fomit-frame-pointer.

I think this is because on most architectures the FP is not in the fixed
registers list. But the AArch64 ABI (I believe) currently
mandates that it is. With the option of:

- It may permit the frame pointer register to be used as a
general-purpose callee-saved register, but provide a platform-specific
mechanism for external agents to reliably detect this condition

- It may elect not to maintain a frame chain and to use the frame
pointer register as a general-purpose callee-saved register.
So those don't seem to me to imply that the frame pointer needs to be a 
fixed register.   So the first thing I'd do is fix the aarch64 port to 
not do that and see what fallout there is and how to fix it.


Most ports simply don't mark the frame pointer as fixed.

I am a bit curious about how you're going to solve the "external agents 
to reliably detect this condition" :-)





Also note the documentation explicitly forbids using -fcall-saved for
the stack or frame pointer.


Ah, yes, hadn't noticed that before. Isn't it a bit too strict a
restriction? In general if you have -fomit-frame-pointer then shouldn't
the it be safe for the FP to be used
with -fcall-saved? Since it's probably a no-op on most ports that
support -fomit-frame-pointer anyway?
It might be.  In general I don't think the -fcall-whatever options are 
used that much anymore and I don't think anyone has seriously looked at 
their documentation in a long time.


Regardless, I still think the first step is to "unfix" the frame pointer 
hard register on the aarch64 port.


jeff



Re: [PATCH 0/8] make next_*_insn take rtx_insn * arguments

2016-09-15 Thread Jeff Law

On 09/15/2016 10:10 AM, Trevor Saunders wrote:

On Thu, Sep 15, 2016 at 12:52:59PM +0200, Bernd Schmidt wrote:

On 09/14/2016 09:21 PM, tbsaunde+...@tbsaunde.org wrote:


Basically $subject.  First change variable's type to rtx_insn * where possible.
Then change the functions and fixup callers where it is still necessary to
cast.


#2, #4 and #8 look good and can be applied if they work independently of the
others.


at most #2 should depend on #1 so it should be fine and I can check on
the others.


Less certain about some of the others which introduce additional casts.


yeah, its somewhat unfortunate, though one way or another we will need
to add casts I think the question is just how many we will accept and
where.
In my mind the casts represent the "bounds" of how far we've taken this 
work.  They occur at the boundaries where we haven't converted something 
from an "rtx" to an "rtx_insn *" and allow us to do the work piecemeal 
rather than all-at-once.


What I don't have a sense of is when we'll be able to push rtx_insn * 
far enough that we're able to start removing casts.


And that might be a good way to prioritize the next batch of work.  Pick 
a cast, remove it and deal with the fallout.  :-)






Maybe LABEL_REF_LABEL needs converting first, and reorg.c has a few
variables that might have to be made rtx_insn * in patch #7 to avoid casts.


LABEL_REF_LABEL might be doable, its a good idea I'll look into.  The
reorg.c stuff around target_label is rather complicated unfortunately.
In the end I of course agree the variables should be rtx_insn *.
However currently things are assigned to that variable that are not
insns.  So we need to break the variable up, but its involved in a lot
of code I don't think I know well enough to really refactor.  For
example it looks like target_label can hold a value between iterations
of the loop, I suspect that would be a bug, but I'm not really sure.
I can probably help with reorg.  Hell, you might even be referring to my 
code!

jeff


Re: [PATCH, updated] Add a warning for suspicious use of conditional expressions in boolean context

2016-09-15 Thread Jeff Law

On 09/15/2016 10:00 AM, Bernd Edlinger wrote:

On 09/15/16 17:44, Jeff Law wrote:

On 09/14/2016 12:11 PM, Jason Merrill wrote:


I think we could have both, where the weaker form is part of -Wall and
people can explicitly select the stronger form.

That's been a fairly standard way to handle this kind of thing.  It
works for me.

Jeff


The warning could for instance be more aggressive when -pedantic is in
effect?
I wouldn't do it on -pedantic.  We've usually used levels or -Wfoo 
-Wfoo-bar kinds of schemes.


jeff


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Richard Biener
On September 15, 2016 5:52:34 PM GMT+02:00, Jeff Law  wrote:
>On 09/14/2016 02:24 AM, Richard Biener wrote:
>> On Tue, Sep 13, 2016 at 6:15 PM, Jeff Law  wrote:
>>> On 09/13/2016 02:41 AM, Jakub Jelinek wrote:

 On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:
>
> This patch adds an optimized route to the fpclassify builtin
> for floating point numbers which are similar to IEEE-754 in
>format.
>
> The goal is to make it faster by:
> 1. Trying to determine the most common case first
>(e.g. the float is a Normal number) and then the
>rest. The amount of code generated at -O2 are
>about the same +/- 1 instruction, but the code
>is much better.
> 2. Using integer operation in the optimized path.


 Is it generally preferable to use integer operations for this
>instead
 of floating point operations?  I mean various targets have quite
>high
 costs
 of moving data in between the general purpose and floating point
>register
 file, often it has to go through memory etc.
>>>
>>> Bit testing/twiddling is obviously a trade-off for a non-addressable
>object.
>>> I don't think there's any reasonable way to always generate the most
>>> efficient code as it's going to depend on (for example) register
>allocation
>>> behavior.
>>>
>>> So what we're stuck doing is relying on the target costing bits to
>guide
>>> this kind of thing.
>>
>> I think the reason for this patch is to provide a general optimized
>> integer version.
>And just to be clear, that's fine with me.  While there are cases where
>
>bit twiddling hurts, I think bit twiddling is generally better.
>
>
>> I think it asks for a FP (class) propagation pass somewhere (maybe as
>part of
>> complex lowering which already has a similar "coarse" lattice -- not
>that I like
>> its implementation very much) and doing the "lowering" there.
>Not a bad idea -- I wonder how much a coarse tracking of the
>exceptional 
>cases would allow later optimization.

I guess it really depends on the ability to set ffast-math flags on individual 
stmts (or at least built-in calls).

Richard.

>>
>> Not something that should block this patch though.
>Agreed.
>
>jeff




Re: [PATCH 0/8] make next_*_insn take rtx_insn * arguments

2016-09-15 Thread Trevor Saunders
On Thu, Sep 15, 2016 at 12:52:59PM +0200, Bernd Schmidt wrote:
> On 09/14/2016 09:21 PM, tbsaunde+...@tbsaunde.org wrote:
> 
> > Basically $subject.  First change variable's type to rtx_insn * where 
> > possible.
> > Then change the functions and fixup callers where it is still necessary to
> > cast.
> 
> #2, #4 and #8 look good and can be applied if they work independently of the
> others.

at most #2 should depend on #1 so it should be fine and I can check on
the others.

> Less certain about some of the others which introduce additional casts.

yeah, its somewhat unfortunate, though one way or another we will need
to add casts I think the question is just how many we will accept and
where.

> Maybe LABEL_REF_LABEL needs converting first, and reorg.c has a few
> variables that might have to be made rtx_insn * in patch #7 to avoid casts.

LABEL_REF_LABEL might be doable, its a good idea I'll look into.  The
reorg.c stuff around target_label is rather complicated unfortunately.
In the end I of course agree the variables should be rtx_insn *.
However currently things are assigned to that variable that are not
insns.  So we need to break the variable up, but its involved in a lot
of code I don't think I know well enough to really refactor.  For
example it looks like target_label can hold a value between iterations
of the loop, I suspect that would be a bug, but I'm not really sure.

Trev

> 
> 
> Bernd


Re: [PATCH, updated] Add a warning for suspicious use of conditional expressions in boolean context

2016-09-15 Thread Bernd Edlinger
On 09/15/16 17:44, Jeff Law wrote:
> On 09/14/2016 12:11 PM, Jason Merrill wrote:
>>
>> I think we could have both, where the weaker form is part of -Wall and
>> people can explicitly select the stronger form.
> That's been a fairly standard way to handle this kind of thing.  It
> works for me.
>
> Jeff

The warning could for instance be more aggressive when -pedantic is in
effect?


Bernd.


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Jeff Law

On 09/14/2016 02:24 AM, Richard Biener wrote:

On Tue, Sep 13, 2016 at 6:15 PM, Jeff Law  wrote:

On 09/13/2016 02:41 AM, Jakub Jelinek wrote:


On Mon, Sep 12, 2016 at 04:19:32PM +, Tamar Christina wrote:


This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.

The goal is to make it faster by:
1. Trying to determine the most common case first
   (e.g. the float is a Normal number) and then the
   rest. The amount of code generated at -O2 are
   about the same +/- 1 instruction, but the code
   is much better.
2. Using integer operation in the optimized path.



Is it generally preferable to use integer operations for this instead
of floating point operations?  I mean various targets have quite high
costs
of moving data in between the general purpose and floating point register
file, often it has to go through memory etc.


Bit testing/twiddling is obviously a trade-off for a non-addressable object.
I don't think there's any reasonable way to always generate the most
efficient code as it's going to depend on (for example) register allocation
behavior.

So what we're stuck doing is relying on the target costing bits to guide
this kind of thing.


I think the reason for this patch is to provide a general optimized
integer version.
And just to be clear, that's fine with me.  While there are cases where 
bit twiddling hurts, I think bit twiddling is generally better.




I think it asks for a FP (class) propagation pass somewhere (maybe as part of
complex lowering which already has a similar "coarse" lattice -- not that I like
its implementation very much) and doing the "lowering" there.
Not a bad idea -- I wonder how much a coarse tracking of the exceptional 
cases would allow later optimization.




Not something that should block this patch though.

Agreed.

jeff


Re: [TREE-SSA-CCP] Issue warning when folding condition

2016-09-15 Thread Jeff Law

On 09/14/2016 03:00 AM, Richard Biener wrote:

On Tue, 13 Sep 2016, kugan wrote:


Hi Richard,


On 19/08/16 18:00, Richard Biener wrote:

On Fri, 19 Aug 2016, Kugan Vivekanandarajah wrote:


On 19 August 2016 at 12:09, Kugan Vivekanandarajah
 wrote:

The testcase pr33738.C for warning fails with early-vrp patch. The
reason is, with early-vrp ccp2 is folding the comparison that used to
be folded in simplify_stmt_for_jump_threading. Since early-vrp does
not perform jump-threading is not optimized there.

Attached patch adds this warning to tree-ssa-ccp.c. We might also run
into some other similar issues in the future.


Sorry, I attached the wrong patch (with typo). Here is the correct one.


I think emitting this warning from GIMPLE optimizations is fundamentally
flawed and the warning should be removed there and put next to
the cases we alrady handle in c/c-common.c:shorten_compare (or in
FE specific code).  I see no reason why only VRP or CCP would
do the simplification for -fstrict-enums enums (thus it seems to be
missing from the generic comparison folders).


But, If I understand this correctly, I think we will not be able to fold all
the cases we handle in FE. Therefore we will not be able to warn there. For
very simple cases yes, but not for others.


Sure.  But I do not see an issue with that.  With GIMPLE level warnings
you have the general issue that the warning may be only exposed by
optimization (like cross-unit inlining with LTO) and thus would be
considered a false positive.

I must have missed a piece of this conversation.

If someone is going to add more stuff to shorten_compare, please break 
that function into two pieces.  One which is non-destructive to the 
original trees and just emits warnings, the other which has the 
optimization.


There may well be code & work duplication, but that separation is one of 
the steps necessary to move the shortening bits out of the front-end and 
into match.pd.


jeff



Re: [PATCH, updated] Add a warning for suspicious use of conditional expressions in boolean context

2016-09-15 Thread Jeff Law

On 09/14/2016 12:11 PM, Jason Merrill wrote:


I think we could have both, where the weaker form is part of -Wall and
people can explicitly select the stronger form.
That's been a fairly standard way to handle this kind of thing.  It 
works for me.


Jeff


Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Jakub Jelinek
On Thu, Sep 15, 2016 at 03:27:52PM +, Wilco Dijkstra wrote:
> That's best done GCC as a general optimization as currently mempcpy is not 
> handled efficiently (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140),
> and it avoids having to repeat this for every C library out there...
> 
> glibc/string/mempcpy.c:
> 
> void *
> MEMPCPY (void *dest, const void *src, size_t len)
> {
>   return memcpy (dest, src, len) + len;
> }
> 
> And glibc/string/stpcpy.c:
> 
> char *
> STPCPY (char *dest, const char *src)
> {
>   size_t len = strlen (src);
>   return memcpy (dest, src, len + 1) + len;
> }

Those are the generic definitions, all targets that care about performance
obviously should replace them with assembly code.

Jakub


C/C++ PATCH to implement -Wbool-operation (PR c/77490)

2016-09-15 Thread Marek Polacek
Now that the C++ FE boolean in/decrement changes are in, I can move forwards
with this patch which implements a new warning named -Wbool-operation (better
names?) which warns about nonsensical code such as ~bool, ~(i == 10), or
bool-- (in C).

It could also warn for other operations such as * or / or <<, but I wasn't
sure about those, so this isn't included in this patch. 

Bootstrapped/regtested on x86_64-linux and ppc64-linux, ok for trunk?

2016-09-15  Marek Polacek  

PR c/77490
* c.opt (Wbool-operation): New.

* c-typeck.c (build_unary_op): Warn about bit not on expressions that
have boolean value.  Warn about ++/-- on booleans.

* typeck.c (cp_build_unary_op): Warn about bit not on expressions that
have boolean value.

* doc/invoke.texi: Document -Wbool-operation.

* c-c++-common/Wbool-operation-1.c: New test.
* gcc.dg/Wall.c: Use -Wno-bool-operation.
* gcc.dg/Wbool-operation-1.c: New test.

diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index c55c7c3..fb6f2d1 100644
--- gcc/c-family/c.opt
+++ gcc/c-family/c.opt
@@ -315,6 +315,10 @@ Wbool-compare
 C ObjC C++ ObjC++ Var(warn_bool_compare) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn about boolean expression compared with an integer value different from 
true/false.
 
+Wbool-operation
+C ObjC C++ ObjC++ Var(warn_bool_op) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
+Warn about certain operations on boolean expressions.
+
 Wframe-address
 C ObjC C++ ObjC++ Var(warn_frame_address) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn when __builtin_frame_address or __builtin_return_address is used unsafely.
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 4dec397..c455d22 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -4196,6 +4196,22 @@ build_unary_op (location_t location, enum tree_code 
code, tree xarg,
  || (typecode == VECTOR_TYPE
  && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg
{
+ tree e = arg;
+
+ /* Warn if the expression has boolean value.  */
+ while (TREE_CODE (e) == COMPOUND_EXPR)
+   e = TREE_OPERAND (e, 1);
+
+ if ((TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE
+  || truth_value_p (TREE_CODE (e)))
+ && warning_at (location, OPT_Wbool_operation,
+"%<~%> on a boolean expression"))
+   {
+ gcc_rich_location richloc (location);
+ richloc.add_fixit_insert_before (location, "!");
+ inform_at_rich_loc (, "did you mean to use logical "
+ "not?");
+   }
  if (!noconvert)
arg = default_conversion (arg);
}
@@ -4306,6 +4322,16 @@ build_unary_op (location_t location, enum tree_code 
code, tree xarg,
"decrement of enumeration value is invalid in C++");
}
 
+  if (TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE)
+   {
+ if (code == PREINCREMENT_EXPR || code == POSTINCREMENT_EXPR)
+   warning_at (location, OPT_Wbool_operation,
+   "increment of a boolean expression");
+ else
+   warning_at (location, OPT_Wbool_operation,
+   "decrement of a boolean expression");
+   }
+
   /* Ensure the argument is fully folded inside any SAVE_EXPR.  */
   arg = c_fully_fold (arg, false, NULL);
 
diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index c53a85a..9fc1a4b 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -5853,7 +5853,16 @@ cp_build_unary_op (enum tree_code code, tree xarg, bool 
noconvert,
   arg, true)))
errstring = _("wrong type argument to bit-complement");
   else if (!noconvert && CP_INTEGRAL_TYPE_P (TREE_TYPE (arg)))
-   arg = cp_perform_integral_promotions (arg, complain);
+   {
+ /* Warn if the expression has boolean value.  */
+ location_t location = EXPR_LOC_OR_LOC (arg, input_location);
+ if ((TREE_CODE (TREE_TYPE (arg)) == BOOLEAN_TYPE
+  || truth_value_p (TREE_CODE (arg)))
+ && warning_at (location, OPT_Wbool_operation,
+"%<~%> on a boolean expression"))
+   inform (location, "did you mean to use logical not (%)?");
+ arg = cp_perform_integral_promotions (arg, complain);
+   }
   break;
 
 case ABS_EXPR:
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 8eb5eff..976e137 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -256,8 +256,8 @@ Objective-C and Objective-C++ Dialects}.
 -pedantic-errors @gol
 -w  -Wextra  -Wall  -Waddress  -Waggregate-return  @gol
 -Wno-aggressive-loop-optimizations -Warray-bounds -Warray-bounds=@var{n} @gol
--Wno-attributes -Wbool-compare -Wno-builtin-macro-redefined @gol
--Wc90-c99-compat -Wc99-c11-compat @gol
+-Wno-attributes -Wbool-compare -Wbool-operation @gol

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Wilco Dijkstra
From: Jakub Jelinek 
> On Thu, Sep 15, 2016 at 02:55:48PM +, Wilco Dijkstra wrote:
>> stpcpy is not conceptually the same, but for mempcpy, yes. By default
>> it's converted into memcpy in the GLIBC headers and the generic 
>> implementation.
>> 
>> stpcpy uses strlen and memcpy which is generally the most efficient version
>> (it even beat several assembler implementations).
>
> ??  I certainly see something completely different, at least on the arches
> I've looked at.

glibc/string/string.h contains:

#if defined __USE_GNU && defined __OPTIMIZE__ \
&& defined __extern_always_inline && __GNUC_PREREQ (3,2)
# if !defined _FORCE_INLINES && !defined _HAVE_STRING_ARCH_mempcpy

#define mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)
#define __mempcpy(dest, src, n) __mempcpy_inline (dest, src, n)

__extern_always_inline void *
__mempcpy_inline (void *__restrict __dest,
  const void *__restrict __src, size_t __n)
{
  return (char *) memcpy (__dest, __src, __n) + __n;
}

That's best done GCC as a general optimization as currently mempcpy is not 
handled efficiently (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70140),
and it avoids having to repeat this for every C library out there...

glibc/string/mempcpy.c:

void *
MEMPCPY (void *dest, const void *src, size_t len)
{
  return memcpy (dest, src, len) + len;
}

And glibc/string/stpcpy.c:

char *
STPCPY (char *dest, const char *src)
{
  size_t len = strlen (src);
  return memcpy (dest, src, len + 1) + len;
}

This means that without having to write any assembly code, by default 
mempcpy, stpcpy etc are as efficient as possible (memcpy and strlen are 
optimized well on all targets, that's not true for mempcpy, stpcpy and similar
functions, and to make matters worse, the generic code used to be very 
inefficient).

Wilco



Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Wilco Dijkstra
Jakub Jelinek wrote:
On Thu, Sep 15, 2016 at 03:16:52PM +0100, Szabolcs Nagy wrote:
> > 
> > from libc point of view, rawmemchr is a rarely used
> > nonstandard function that should be optimized for size.
> > (glibc does not do this now, but it should in my opinion.)
>
> rawmemchr with 0 is to strlen conceptually like stpcpy is to strcpy.
> Are you arguing that glibc should implement strcpy using stpcpy, or vice
> versa?

stpcpy is not conceptually the same, but for mempcpy, yes. By default
it's converted into memcpy in the GLIBC headers and the generic implementation.

stpcpy uses strlen and memcpy which is generally the most efficient version
(it even beat several assembler implementations).

> rawmemchr is certainly not rarely used, strchr (p, 0) is optimized to
>__rawmemchr by the glibc header macros, so it is very frequently used.

That's pretty much its only use, so not an argument for rawmemchr really 
but for optimizing strchr (p, 0) better.

Wilco





Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Jakub Jelinek
On Thu, Sep 15, 2016 at 02:55:48PM +, Wilco Dijkstra wrote:
> Jakub Jelinek wrote:
> On Thu, Sep 15, 2016 at 03:16:52PM +0100, Szabolcs Nagy wrote:
> > > 
> > > from libc point of view, rawmemchr is a rarely used
> > > nonstandard function that should be optimized for size.
> > > (glibc does not do this now, but it should in my opinion.)
> >
> > rawmemchr with 0 is to strlen conceptually like stpcpy is to strcpy.
> > Are you arguing that glibc should implement strcpy using stpcpy, or vice
> > versa?
> 
> stpcpy is not conceptually the same, but for mempcpy, yes. By default
> it's converted into memcpy in the GLIBC headers and the generic 
> implementation.
> 
> stpcpy uses strlen and memcpy which is generally the most efficient version
> (it even beat several assembler implementations).

??  I certainly see something completely different, at least on the arches
I've looked at.

Jakub


[patch,gomp4] add support for fortran common blocks

2016-09-15 Thread Cesar Philippidis
Currently gfortran largely lacks support for fortran common blocks in
OpenACC. The notable exception is acc declare link which does support
common block arguments to some extent. This patch does two things:

 1) Adds support for common blocks in the appropriate OpenACC data
clauses.

 2) Privatizes the underlying common block struct during gimplification.
It also teaches the gimplifier to how to defer the expansion of
DECL_VALUE_EXPR for common block decls until omp lowering.

The first item allows allows common block names to be listed in data
clauses. Such names need to be surrounded by slashes. E.g.

  common /BLOCK/ a, b, c

  !$acc enter data copyin(/BLOCK/)

Note that common block names are treated in a similar manner to OpenMP
common block arguments; gfc_match_omp_map_clauses expands the common
block names to individual data clauses for each variable in the common
block.

The second item updates how common blocks behave on the accelerator.
Using the BLOCK example from above, if an OpenACC offloading region only
utilized, say, variable 'b', the gimplifier will now only transfer and
remap 'b' on the accelerator. The actual common block struct will have a
private clause. Without this patch, both the common block struct and the
individual variable were transferred to the accelerator separately, and
that would result in duplicate data mapping errors at runtime.

The second item also defers the expansion of DECL_VALUE_EXPR because
otherwise the privatized common block data would be used instead of one
that was explicitly or implicitly transferred to the accelerator.

This patch has been committed to gomp-4_0-branch.

Cesar
2016-09-15  Cesar Philippidis  

	gcc/fortrann/
	* openmp.c (gfc_match_omp_map_clause): Add new common_blocks argument.
	Propagate it to gfc_match_omp_variable_list.
	(gfc_match_omp_clauses): Update calls to gfc_match_omp_map_clauses.

	gcc/
	* gimplify.c (oacc_default_clause): Privatize fortran common blocks.
	(omp_notice_variable): Defer the expansion of DECL_VALUE_EXPR for
	common block decls.

	gcc/testsuite/
	* gfortran.dg/goacc/common-block-1.f90: New test.
	* gfortran.dg/goacc/common-block-2.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-fortran/common-block-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/common-block-2.f90: New test.
	* testsuite/libgomp.oacc-fortran/common-block-3.f90: New test.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 83c6419..92b9afe 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -633,10 +633,11 @@ cleanup:
mapping.  */
 
 static bool
-gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op)
+gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
+			  bool common_blocks)
 {
   gfc_omp_namelist **head = NULL;
-  if (gfc_match_omp_variable_list ("", list, false, NULL, , true)
+  if (gfc_match_omp_variable_list ("", list, common_blocks, NULL, , true)
   == MATCH_YES)
 {
   gfc_omp_namelist *n;
@@ -772,7 +773,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_COPY)
 	  && gfc_match ("copy ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_TOFROM))
+	   OMP_MAP_FORCE_TOFROM, openacc))
 	continue;
 	  if (mask & OMP_CLAUSE_COPYIN)
 	{
@@ -780,7 +781,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 		{
 		  if (gfc_match ("copyin ( ") == MATCH_YES
 		  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
-		   OMP_MAP_FORCE_TO))
+		   OMP_MAP_FORCE_TO, true))
 		continue;
 		}
 	  else if (gfc_match_omp_variable_list ("copyin (",
@@ -791,7 +792,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_COPYOUT)
 	  && gfc_match ("copyout ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_FROM))
+	   OMP_MAP_FORCE_FROM, true))
 	continue;
 	  if ((mask & OMP_CLAUSE_COPYPRIVATE)
 	  && gfc_match_omp_variable_list ("copyprivate (",
@@ -801,14 +802,14 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_CREATE)
 	  && gfc_match ("create ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
-	   OMP_MAP_FORCE_ALLOC))
+	   OMP_MAP_FORCE_ALLOC, true))
 	continue;
 	  break;
 	case 'd':
 	  if ((mask & OMP_CLAUSE_DELETE)
 	  && gfc_match ("delete ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
-	   OMP_MAP_DELETE))
+	   OMP_MAP_DELETE, true))
 	continue;
 	  if ((mask & OMP_CLAUSE_DEFAULT)
 	  && c->default_sharing == OMP_DEFAULT_UNKNOWN)
@@ -861,12 +862,12 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  if ((mask & OMP_CLAUSE_OACC_DEVICE)
 	  && gfc_match ("device ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
-	   

Re: [RFC][IPA-VRP] Early VRP Implementation

2016-09-15 Thread Jeff Law

On 09/14/2016 11:55 PM, Richard Biener wrote:

On September 14, 2016 11:36:16 PM GMT+02:00, Jan Hubicka  wrote:

+  /* Visit PHI stmts and discover any new VRs possible.  */
+  gimple_stmt_iterator gsi;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+   !gsi_end_p (gpi); gsi_next ())
+{
+  gphi *phi = gpi.phi ();
+  tree lhs = PHI_RESULT (phi);
+  value_range vr_result = VR_INITIALIZER;
+  if (! has_unvisived_preds
   && stmt_interesting_for_vrp (phi)
+ && stmt_visit_phi_node_in_dom_p (phi))
+   extract_range_from_phi_node (phi, _result, true);
+  else
+   set_value_range_to_varying (_result);
+  update_value_range (lhs, _result);
+}

due to a bug in IRA you need to make sure to un-set BB_VISITED after
early-vrp is finished again.

How IRA bugs affects early passes?


IRA bogously relies on BB_VISITED being cleared at pass start.
Seems like IRA ought to be fixed to clear BB_VISITED on every block as 
part of its initialization.


Jeff



Re: [PATCH, GCC/LRA] Teach LRA to not use same register value for multiple output operands of an insn

2016-09-15 Thread Thomas Preudhomme

Hi Vladimir & release managers,

The patch applies cleanly to gcc-6-branch. Ok to backport?

Best regards,

Thomas

On 14/07/16 17:25, Vladimir Makarov wrote:

On 07/08/2016 11:07 AM, Thomas Preudhomme wrote:

Hi,

While investigating the root cause a testsuite regression for the
ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug
seems to also affect trunk. The bug manifests itself as an ICE in cselib due to
a parallel insn with two SET to the same register. When processing the second
SET in cselib_record_set (), the assert gcc_assert (REG_VALUES (dreg)->elt ==
0) fails because the field was already set when processing the first SET. The
root cause seems to be a register allocation issue in lra-constraints.

When considering an output operand with matching input operand(s),
match_reload does a number of checks to see if it can reuse the first matching
input operand register value or if a new unique value should be given to the
output operand. The current check ignores the case of multiple output operands
(as in neon_vtrn_insn insn pattern in config/arm/arm.md). This can lead
to cases where multiple output operands share a same register value when the
first matching input operand has the same value as another output operand,
leading to the ICE in cselib. This patch changes match_reload to get
information about other output operands and check whether this case is met or
not.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2016-07-01  Thomas Preud'homme  

 * lra-constraints.c (match_reload): Pass information about other
 output operands.  Create new unique register value if matching input
 operand shares same register value as output operand being considered.
 (curr_insn_transform): Record output operands already processed.


Patch passed bootstrap under arm-none-linux-gnueabihf (Cortex-A57 in ARM mode
as well as Thumb mode), aarch64-linux-gnu (Cortex-A57) and x86_64-linux-gnu
and testsuite results does not regress for these and for arm-none-eabi
targeting Cortex-A8.

Sorry, it took sometime to think about the problem.  It is a nontrivial problem
with a lot of possible scenarios in general case.  The patch is safe in any case
(it can not create a wrong code if LRA w/o the patch does not create a wrong 
code).

Is this ok for trunk?




Yes.  Thank you, Thomas.



Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Jakub Jelinek
On Thu, Sep 15, 2016 at 03:16:52PM +0100, Szabolcs Nagy wrote:
> On 15/09/16 14:49, Jakub Jelinek wrote:
> > On Thu, Sep 15, 2016 at 01:38:45PM +, Wilco Dijkstra wrote:
> >> __rawmemchr is not the fastest on any target I tried, including x86, so 
> >> GLIBC's
> >> current default behaviour of always using __rawmemchr is inefficient. GCC 
> >> doesn't
> >> support __rawmemchr at all, so the strlen optimization can't optimize it.
> > 
> > Why?  It knows you are targeting glibc, and if it sees rawmemchr (or
> > __rawmemchr) in the headers as well, it can emit that.
> > As for speed, there is no inherent reason why rawmemchr should be slower
> > than strlen, on the contrary.  So, if on some target rawmemchr is slower
> > than strlen, most likely it has never been implemented there properly, or
> > somebody improved strlen without bothering to improve rawmemchr at the same
> > time.
> > 
> 
> from libc point of view, rawmemchr is a rarely used
> nonstandard function that should be optimized for size.
> (glibc does not do this now, but it should in my opinion.)

rawmemchr with 0 is to strlen conceptually like stpcpy is to strcpy.
Are you arguing that glibc should implement strcpy using stpcpy, or vice
versa?
rawmemchr is certainly not rarely used, strchr (p, 0) is optimized to
__rawmemchr by the glibc header macros, so it is very frequently used.
tree-ssa-strlen.c should be tought to handle these.
I'm not asking musl to implement rawmemchr, but glibc already has it and
should provide an efficient implementation of it (if it doesn't, I really
haven't seen any benchmark results that would say it isn't).

> libc should not imlpement n variants of similar functions,
> it is a maintainance problem and it makes the code bloated.
> 
> (glibc even has asm implementations, which history tells
> is a bad idea: string functions had bugs, performance
> regressions and other problems because the asm is not
> future proof and it is hard to guarantee consistent
> behaviour across targets e.g. memchr on x86_64 is broken
> right now BZ 19387)

I disagree with that.  For some routines like string ops, writing them in
assembly is highly desirable.

Jakub


Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-09-15 Thread Andrew Burgess
* Jakub Jelinek  [2016-09-14 15:07:56 +0200]:

> On Wed, Sep 14, 2016 at 02:00:48PM +0100, Andrew Burgess wrote:
> > In an attempt to get this patch merged (as I still think that its
> > correct) I've investigated, and documented a little more about how I
> > think things currently work.  I'm sure most people reading this will
> > already know this, but hopefully, if my understanding is wrong someone
> > can point it out.
> 
> I wonder if user_defined_section_attribute instead shouldn't be moved
> into struct function and be handled as a per-function flag then.

That would certainly solve the problem I'm trying to address.  But I
wonder, how is that different to looking for a section attribute on
the function DECL?

Thanks,
Andrew




Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Szabolcs Nagy
On 15/09/16 14:49, Jakub Jelinek wrote:
> On Thu, Sep 15, 2016 at 01:38:45PM +, Wilco Dijkstra wrote:
>> __rawmemchr is not the fastest on any target I tried, including x86, so 
>> GLIBC's
>> current default behaviour of always using __rawmemchr is inefficient. GCC 
>> doesn't
>> support __rawmemchr at all, so the strlen optimization can't optimize it.
> 
> Why?  It knows you are targeting glibc, and if it sees rawmemchr (or
> __rawmemchr) in the headers as well, it can emit that.
> As for speed, there is no inherent reason why rawmemchr should be slower
> than strlen, on the contrary.  So, if on some target rawmemchr is slower
> than strlen, most likely it has never been implemented there properly, or
> somebody improved strlen without bothering to improve rawmemchr at the same
> time.
> 

from libc point of view, rawmemchr is a rarely used
nonstandard function that should be optimized for size.
(glibc does not do this now, but it should in my opinion.)

in case of static linking strlen is most likely linked
into your binary already, but rawmemchr is not, so
emitting calls to it increases code size.

>> strchr is significantly slower than strlen, so generating that is a bad idea 
>> too.
>>
>> So the only reasonable optimization is to always emit a + strlen (a).
> 
> I disagree.  Another, on glibc more reasonable, optimization is to make sure
> rawmemchr is fast enough.
> 

libc should not imlpement n variants of similar functions,
it is a maintainance problem and it makes the code bloated.

(glibc even has asm implementations, which history tells
is a bad idea: string functions had bugs, performance
regressions and other problems because the asm is not
future proof and it is hard to guarantee consistent
behaviour across targets e.g. memchr on x86_64 is broken
right now BZ 19387)



Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Bernd Schmidt

On 09/15/2016 03:38 PM, Wilco Dijkstra wrote:

__rawmemchr is not the fastest on any target I tried, including x86,


Interesting. Care to share your test program? I just looked at the libc 
sources and strlen/rawmemchr are practically identical code so I'd 
expect any difference to be lost in the noise. Of course there might be 
inlines interfering with the comparison.



So the only reasonable optimization is to always emit a + strlen (a).


Not sure about "only reasonable" but on the whole I'd agree that it's 
reasonable and we shouldn't let the perfect be the enemy of the good 
here. I'm sure we can come up with lots of different ways to do this but 
let's just pick one and if the one Wilco submitted looks decent let's 
just put it in.


Out of curiousity, is there real-world code that this is intended to 
optimize?



Bernd


Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Jakub Jelinek
On Thu, Sep 15, 2016 at 01:38:45PM +, Wilco Dijkstra wrote:
> __rawmemchr is not the fastest on any target I tried, including x86, so 
> GLIBC's
> current default behaviour of always using __rawmemchr is inefficient. GCC 
> doesn't
> support __rawmemchr at all, so the strlen optimization can't optimize it.

Why?  It knows you are targeting glibc, and if it sees rawmemchr (or
__rawmemchr) in the headers as well, it can emit that.
As for speed, there is no inherent reason why rawmemchr should be slower
than strlen, on the contrary.  So, if on some target rawmemchr is slower
than strlen, most likely it has never been implemented there properly, or
somebody improved strlen without bothering to improve rawmemchr at the same
time.

> strchr is significantly slower than strlen, so generating that is a bad idea 
> too.
> 
> So the only reasonable optimization is to always emit a + strlen (a).

I disagree.  Another, on glibc more reasonable, optimization is to make sure
rawmemchr is fast enough.

Jakub


Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Wilco Dijkstra
Richard Biener wrote:
> On Wed, Sep 14, 2016 at 3:45 PM, Jakub Jelinek  wrote:
> > On Wed, Sep 14, 2016 at 03:41:33PM +0200, Richard Biener wrote:
> >> > We've seen several different proposals for where/how to do this 
> >> > simplification, why did you
> >> > say strlenopt is best? It would be an unconditional strchr (a, 0) -> a + 
> >> > strlen (a) rewrite,
> >> > ie. completely unrelated to what strlenopt does. We do all the other 
> >> > simplifications based
> >> > on constant arguments in builtins.c and gimple-fold.c, why is strchr (s, 
> >> > 0) different?
>>>
> >> I was thinking about the case where strlen opt already knows strlen
> >> (a).  But sure, gimple-fold.c
> >> works as well.
> >
> > I think for the middle-end, using strchr (a, 0) as canonical instead of a + 
> > strlen (a)
> > is better, and at expansion time we can decide what to use (a + strlen (a)
> > if you'd expand strlen inline, rather than as a function call, or
> > __rawmemchr (which if libc is sane should be fastest), or strchr, or a + 
> > strlen (a)).

__rawmemchr is not the fastest on any target I tried, including x86, so GLIBC's
current default behaviour of always using __rawmemchr is inefficient. GCC 
doesn't
support __rawmemchr at all, so the strlen optimization can't optimize it.

strchr is significantly slower than strlen, so generating that is a bad idea 
too.

So the only reasonable optimization is to always emit a + strlen (a).

> OTOH that then argues for doing it in strlenopt because that knows
> whether we maybe
> already computed strlen (a) (which might have other uses than adding to a).

strlenopt can already change strchr (a, 0) into strlen (a) + a when strlen has 
been
called before it, so that part is already done. However it doesn't optimize a 
strlen after 
a strchr, so if the expansion is done late you'd end up with a redundant strlen.

That means the expansion would need to happen either before or very early in 
strlenopt
(likely an extra pass at init time to avoid upsetting the strlen optimization - 
we want to
treat any strchr as a real strlen).

Wilco



[PATCH] Fix late dwarf generated early from optimized out globals

2016-09-15 Thread Richard Biener

This addresses sth I needed to address with the early LTO debug patches
(you might now figure I'm piecemail merging stuff from that patch).

When the cgraph code optimizes out a global we call the late_global_decl
debug hook to eventually add a DW_AT_const_value to its DIE (we don't
really expect a location as that will be invalid after optimizing out
and will be pruned).

With the early LTO debug patches I have introduced a early_dwarf_finished
flag (mainly for consistency checking) and I figured I can use that to
detect the call to the late hook during the early phase and provide
the following cleaned up variant of avoiding to create locations that
require later pruning (which doesn't work with emitting the early DIEs).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I verified it does the correct thing for a unit like

static const int i = 2;

(but ISTR we do have at least one testcase in the testsuite as well).

Will commit if testing finishes successfully.

Richard.

2016-09-15  Richard Biener  

* dwarf2out.c (early_dwarf_finished): New global.
(set_early_dwarf::set_early_dwarf): Assert early_dwarf_finished
is false.
(dwarf2out_early_finish): Set early_dwarf_finished at the end.
(gen_variable_die): When being invoked late during the early
debug phase do not add locations but only const value attributes.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 240153)
+++ gcc/dwarf2out.c (working copy)
@@ -2711,9 +2711,14 @@ die_node;
 
 /* Set to TRUE while dwarf2out_early_global_decl is running.  */
 static bool early_dwarf;
+static bool early_dwarf_finished;
 struct set_early_dwarf {
   bool saved;
-  set_early_dwarf () : saved(early_dwarf) { early_dwarf = true; }
+  set_early_dwarf () : saved(early_dwarf)
+{
+  gcc_assert (! early_dwarf_finished);
+  early_dwarf = true;
+}
   ~set_early_dwarf () { early_dwarf = saved; }
 };
 
@@ -21464,8 +21469,17 @@ gen_variable_die (tree decl, tree origin
   if (early_dwarf)
add_pubname (decl_or_origin, var_die);
   else
-   add_location_or_const_value_attribute (var_die, decl_or_origin,
-  decl == NULL);
+   {
+ /* We get called during the early debug phase via the symtab
+code invoking late_global_decl for symbols that are optimized
+out.  When the early phase is not finished, do not add
+locations.  */
+ if (! early_dwarf_finished)
+   tree_add_const_value_attribute_for_decl (var_die, decl_or_origin);
+ else
+   add_location_or_const_value_attribute (var_die, decl_or_origin,
+  decl == NULL);
+   }
 }
   else
 tree_add_const_value_attribute_for_decl (var_die, decl_or_origin);
@@ -28163,6 +28177,9 @@ dwarf2out_early_finish (void)
}
 }
   deferred_asm_name = NULL;
+
+  /* The early debug phase is now finished.  */
+  early_dwarf_finished = true;
 }
 
 /* Reset all state within dwarf2out.c so that we can rerun the compiler


Re: [PATCH] Partially improve scalability of the unwinder (PR libgcc/71744)

2016-09-15 Thread Ian Lance Taylor
Jakub Jelinek  writes:

> 2016-09-15  Jakub Jelinek  
>
>   PR libgcc/71744
>   * unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame*
>   is not the primary registry and atomics are available.
>   (any_objects_registered): New variable.
>   (__register_frame_info_bases, __register_frame_info_table_bases):
>   Atomically store 1 to any_objects_registered after registering first
>   unwind info.
>   (_Unwind_Find_FDE): Return early if any_objects_registered is 0.

This is OK.

> +#ifdef ATOMIC_FDE_FAST_PATH
> +  /* For targets where unwind info is usually not registered through these
> + APIs anymore, avoid taking a global lock.  */
> +  if (__builtin_expect (!__atomic_load_n (_objects_registered,
> +   __ATOMIC_ACQUIRE), 1))
> +return NULL;
> +#endif
> +
>init_object_mutex_once ();
>__gthread_mutex_lock (_mutex);

I doubt it matters, but I don't think you need to use __ATOMIC_ACQUIRE
in the atomic_load_n.  You could use __ATOMIC_RELAXED.  Acquiring the
mutex is going to enforce cross-thread sequential consistency anyhow.

Thanks.

Ian


Ping Re: Define TS 18661-1 CR_DECIMAL_DIG in

2016-09-15 Thread Joseph Myers
Ping.  This patch 
 is pending 
review.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Joseph Myers
On Thu, 15 Sep 2016, Tamar Christina wrote:

> a rather large costs in complexity. Also wouldn't this be problematic 
> for other functions as well such as expand_builtin_signbit?

expand_builtin_signbit computes a word number and the bit position in that 
word.  It has no problem with 128-bit types on 32-bit systems where the 
largest integer mode supported for scalar variables is DImode.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Joseph Myers
On Thu, 15 Sep 2016, Wilco Dijkstra wrote:

> Yes, if there are targets which don't implement TImode operations then 
> surely they should be automatically split into DImode operations before 
> or during Expand?

The operations generally don't exist if the mode fails the 
scalar_mode_supported_p hook.  I don't know whether there are sufficient 
TImode operations for the bitwise operations you need here, even in the 
case where it fails that hook (and so you can't declare variables with 
that mode) - it's arithmetic, and the ABI support needed for argument 
passing, that are harder to do by splitting into smaller modes (and that 
GCC generally only handles in libgcc for 2-word operands, not for 4-word 
operands).

> So for now it would seem best to keep the boolean false for quad formats 
> on 32-bit targets.

This is a function of command-line options, not the format, so it can't go 
in the table.  The table should describe the format properties only.

Does the expansion work, in fact, for __float128 on 32-bit x86, given the 
boolean set to true (other relevant cases include 128-bit long double on 
32-bit s390 and 32-bit sparc with appropriate options to make long double 
128-bit)?  If it does, it may be OK to use modes that fail the 
scalar_mode_supported_p hook.  If something doesn't work in that case, the 
right way to avoid an expansion is not to set the boolean to false in the 
table of formats, it's to loop over supported integer modes seeing if 
there is one wide enough that also passes the scalar_mode_supported_p 
hook.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix PR77544

2016-09-15 Thread Richard Biener

This fixes an endless recursion through fold-const.c associate code.
While the testcase is fixed if we "properly" split the TREE_CONSTANT
~(unsigned int) (302806 >> 0) into *conp and *minus_lit this runs
into endless recursion during Ada bootstrap as the condition

  /* Only do something if we found more than two objects.  
Otherwise,
 nothing has changed and we risk infinite recursion.  */
  if (ok
  && (2 < ((var0 != 0) + (var1 != 0)
   + (con0 != 0) + (con1 != 0)
   + (lit0 != 0) + (lit1 != 0)
   + (minus_lit0 != 0) + (minus_lit1 != 0

is really too simple (when you have var0 and from the above con1 and
minus_lit1 you combine back to the original tree but from three
components).  I suspect this is a latent issue on the path where
we split PLUS/MINUS_EXPRs in split_tree as well (didn't try to come
up with a testcase, sth like x + ( + 1) would probably do).

So rather than trying to convince myself about a "better" condition
for the above the following simply restricts the "unfolding" trick
we apply to ~X to the non-TREE_CONSTANT case.  (a condition might
be that we need from any of two vars, two cons or two lit/minus-lit
from the different sources)

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-09-15  Richard Biener  

PR middle-end/77544
* fold-const.c (split_tree): Do not split constant ~X.

* c-c++-common/torture/pr77544.c: New testcase.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 240153)
--- gcc/fold-const.c(working copy)
*** split_tree (location_t loc, tree in, tre
*** 837,851 
  var = negate_expr (var);
}
  }
else if (TREE_CODE (in) == BIT_NOT_EXPR
   && code == PLUS_EXPR)
  {
!   /* -X - 1 is folded to ~X, undo that here.  */
*minus_litp = build_one_cst (TREE_TYPE (in));
var = negate_expr (TREE_OPERAND (in, 0));
  }
-   else if (TREE_CONSTANT (in))
- *conp = in;
else
  var = in;
  
--- 837,852 
  var = negate_expr (var);
}
  }
+   else if (TREE_CONSTANT (in))
+ *conp = in;
else if (TREE_CODE (in) == BIT_NOT_EXPR
   && code == PLUS_EXPR)
  {
!   /* -X - 1 is folded to ~X, undo that here.  Do _not_ do this
!  when IN is constant.  */
*minus_litp = build_one_cst (TREE_TYPE (in));
var = negate_expr (TREE_OPERAND (in, 0));
  }
else
  var = in;
  
Index: gcc/testsuite/c-c++-common/torture/pr77544.c
===
*** gcc/testsuite/c-c++-common/torture/pr77544.c(revision 0)
--- gcc/testsuite/c-c++-common/torture/pr77544.c(working copy)
***
*** 0 
--- 1,7 
+ /* { dg-do compile } */
+ 
+ struct {
+   long a : 17;
+ } b;
+ int c, d;
+ void e() { b.a = d + c + ~(long)(302806U >> 0); }


Re: [PING] set libfunc entry for sdivmod_optab to NULL in optabs.def

2016-09-15 Thread Richard Biener
On Thu, Sep 15, 2016 at 1:21 PM, Prathamesh Kulkarni
 wrote:
> On 15 September 2016 at 16:31, Richard Sandiford
>  wrote:
>> Prathamesh Kulkarni  writes:
>>> On 15 September 2016 at 04:21, Richard Sandiford
>>>  wrote:
 Richard Sandiford  writes:
> Prathamesh Kulkarni  writes:
>> Hi,
>> I would like to ping the following patch:
>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01015.html
>>
>> While implementing divmod transform:
>> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
>>
>> I ran into an  issue with optab_libfunc().
>> It appears optab_libfunc (sdivmod_optab, mode) returns
>> bogus libfunc for unsupported modes, for instance
>> on x86_64, optab_libfunc (sdivmod_optab, DImode) returns
>> a libfunc with name "__divmoddi4", even though such a libfunc
>> does not exist in libgcc. This happens because in optabs.def
>> the libfunc entry for sdivmod_optab has gen_int_libfunc,
>> and call to optab_libfunc (sdivmo_optab, DImode) lazily
>> creates a bogus libfunc "__divmoddi4" by calling gen_int_libfunc().
>>
>> To work around this issue I set libfunc entry for sdivmod_optab to NULL
>> and verified that optab_libfunc (sdivmod_optab, DImode) returns NULL_RTX
>> instead of a bogus libfunc if it's not overriden by the target.
>>
>> Bootstrapped and tested on ppc64le-linux-gnu, x86_64-linux-gnu.
>> Cross tested on arm*-*-*, aarch64*-*-*.
>> OK for trunk ?
>
> I'm not a maintainer for this area, but:

 ...in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
 you said that c6x follows the return-by-pointer convention.
 I'm no c6x expert, but from a quick look, I think its divrem
 function returns a div/mod pair in A4/A5, which matches the
 ARM convention of returning both results by value.

 Does anyone know if the optab function registered by the SPU
 backend is ever called directly?

 You mention that this is latent as far as expand_twoval_binop_libfunc
 is concerned.  AIUI expand_twoval_binop_libfunc implements the ARM/c6x
 convention and expects the two values to be returned as a pair.
 It then extracts one half of the pair and discards the other.
 So my worry is that we're leaving the udivmod entry intact even though
 the standard __udivmoddi4 doesn't do what expand_twoval_binop_libfunc
 expects.

 Would it make sense to set both entries to null and treat __udivmoddi4
 as a non-optab function?  ARM and c6x could then continue to register
 their current optab functions and a non-null optab function would
 indicate a return value pair.
>>> AFAIU, there are only three targets (c6x, spu, arm) that override
>>> optab_libfunc for udivmod_optab for following modes:
>>> ./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, SImode, "__c6xabi_divremu");
>>> ./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, DImode, 
>>> "__c6xabi_divremull");
>>> ./arm/arm.c:  set_optab_libfunc (udivmod_optab, DImode, "__aeabi_uldivmod");
>>> ./arm/arm.c:  set_optab_libfunc (udivmod_optab, SImode, "__aeabi_uidivmod");
>>> ./spu/spu.c:  set_optab_libfunc (udivmod_optab, DImode, "__udivmoddi4");
>>> ./spu/spu.c:  set_optab_libfunc (udivmod_optab, TImode, "__udivmodti4");
>>>
>>> Out of these only the arm, and c6x have target-specific divmod libfuncs 
>>> which
>>> return  pair, while spu merely makes it point to the
>>> standard functions.
>>>
>>> So we could set libfunc entry for udivmod_optab to NULL, thus dropping
>>> support for generic
>>> divmod functions (__udivmoddi4, __udivmodti4). For targets that
>>> require standard divmod libfuncs like __udivmoddi4,
>>> they could explicitly override  optab_libfunc and set it to
>>> __udivmoddi4, just as spu does.
>>>
>>> However this implies that non-null optab function doesn't necessarily
>>> follow arm/c6x convention.
>>> (i686-gcc for instance generates call to libgcc routines
>>> __udivdi3/__umoddi3 for DImode division/mod operations
>>> and could profit from divmod transform by calling __udivmoddi4).
>>
>> What I meant was that we shouldn't treat udivmoddi4 as an optab function
>> at all, but handle it with some on-the-side mechanism.  That seems like
>> quite a natural fit if we handle the fused div/mod operation as an
>> internal function during gimple.
> Ah right, thanks for pointing out. So if optab function for [us]divmod_optab
> is defined, then it must follow the arm/c6x convention ?
>>
>> I think the current SPU code is wrong, but it looks like a latent bug.
>> (Like I say, does the udivmodti4 function that it registers ever
>> actually get used?  It seems unlikely.)
>>
>> In that scenario no other targets should do what SPU does.
> I am testing the following patch which sets libfunc 

Re: C/C++ PATCH to implement -Wpointer-compare warning (PR c++/64767)

2016-09-15 Thread Marek Polacek
On Wed, Sep 14, 2016 at 12:07:49AM -0400, Jason Merrill wrote:
> On Sat, Sep 10, 2016 at 10:58 AM, Marek Polacek  wrote:
> > Spurred by the recent  findings, I 
> > decided to
> > implement a warning that warns when a pointer is compared with a zero 
> > character
> > literal (constant), because this isn't likely to be intended.  So e.g.
> >
> >   ptr == L'\0'
> >
> > is probably wrong and should've been written as
> >
> >   ptr[0] == L'\0'
> >
> > Jonathan pointed out that this would actually be invalid C++11 since pointer
> > conversions are only allowed for integer literals, not char literals.
> 
> Ah, indeed.  And if we fix that, we get an error rather than a
> warning.  Maybe let's handle this by wrapping character literals in a
> redundant NOP_EXPR?

So I've tried.  Wrapping character literals in a NOP_EXPR would make us
error on that comparison (good), but it breaks -Wchar-subscripts, which
could be solved by adding STRIP_NOPS, but unfortunately it breaks even
-Wmemset-transposed-args: '\0' would become (char) '\0', which is not a
literal zero anymore.  And if I do sth like
+   if (TREE_CODE (arg2) == NOP_EXPR
+   && TREE_TYPE (arg2) == TREE_TYPE (TREE_OPERAND (arg2, 0)))
+ arg2 = TREE_OPERAND (arg2, 0);
then (int) 0 would became a literal zero and we'd warn when not appropriate.

We should also error for e.g. void *p = '\0'; but the problem with the
NOP_EXPR cast is that convert_for_init strips nops, so we wouldn't reject
this code.

So it seems we may need some CHARACTER_CST or somesuch?

Note that we should also take boolean-literals into account.

Marek


Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-15 Thread Christophe Lyon
On 15 September 2016 at 11:26, Rainer Orth  
wrote:
> Hi Jason,
>
>>> I confirm no change on arm* either.
>>> On aarch64, gen-attrs-[25]1.C, and devirt-33 now work.
>>> name-clash11.C still fails on both targets
>>
>> Ah, I needed to remove the limit on field alignment as well.  This
>> seems to fix things.
>
> The failures are gone on Solaris/SPARC (sparc-sun-solaris2.12) now, with
> one exception (when run as C test):
>
> 32-bit:
>
> FAIL: c-c++-common/pr52181.c  -Wc++-compat   (test for bogus messages, line 
> 11)
> FAIL: c-c++-common/pr52181.c  -Wc++-compat  (test for excess errors)
>

Hi Jason,

Same on arm* if that's easier for you.

Christophe


> This is new with you last patch.
>
> Excess errors:
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/c-c++-common/pr52181.c:6:1: 
> warning: requested alignment 16 is larger than 8 [-Wattributes]
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/c-c++-common/pr52181.c:8:1: 
> warning: requested alignment 16 is larger than 8 [-Wattributes]
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PING] set libfunc entry for sdivmod_optab to NULL in optabs.def

2016-09-15 Thread Prathamesh Kulkarni
On 15 September 2016 at 16:31, Richard Sandiford
 wrote:
> Prathamesh Kulkarni  writes:
>> On 15 September 2016 at 04:21, Richard Sandiford
>>  wrote:
>>> Richard Sandiford  writes:
 Prathamesh Kulkarni  writes:
> Hi,
> I would like to ping the following patch:
> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01015.html
>
> While implementing divmod transform:
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
>
> I ran into an  issue with optab_libfunc().
> It appears optab_libfunc (sdivmod_optab, mode) returns
> bogus libfunc for unsupported modes, for instance
> on x86_64, optab_libfunc (sdivmod_optab, DImode) returns
> a libfunc with name "__divmoddi4", even though such a libfunc
> does not exist in libgcc. This happens because in optabs.def
> the libfunc entry for sdivmod_optab has gen_int_libfunc,
> and call to optab_libfunc (sdivmo_optab, DImode) lazily
> creates a bogus libfunc "__divmoddi4" by calling gen_int_libfunc().
>
> To work around this issue I set libfunc entry for sdivmod_optab to NULL
> and verified that optab_libfunc (sdivmod_optab, DImode) returns NULL_RTX
> instead of a bogus libfunc if it's not overriden by the target.
>
> Bootstrapped and tested on ppc64le-linux-gnu, x86_64-linux-gnu.
> Cross tested on arm*-*-*, aarch64*-*-*.
> OK for trunk ?

 I'm not a maintainer for this area, but:
>>>
>>> ...in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
>>> you said that c6x follows the return-by-pointer convention.
>>> I'm no c6x expert, but from a quick look, I think its divrem
>>> function returns a div/mod pair in A4/A5, which matches the
>>> ARM convention of returning both results by value.
>>>
>>> Does anyone know if the optab function registered by the SPU
>>> backend is ever called directly?
>>>
>>> You mention that this is latent as far as expand_twoval_binop_libfunc
>>> is concerned.  AIUI expand_twoval_binop_libfunc implements the ARM/c6x
>>> convention and expects the two values to be returned as a pair.
>>> It then extracts one half of the pair and discards the other.
>>> So my worry is that we're leaving the udivmod entry intact even though
>>> the standard __udivmoddi4 doesn't do what expand_twoval_binop_libfunc
>>> expects.
>>>
>>> Would it make sense to set both entries to null and treat __udivmoddi4
>>> as a non-optab function?  ARM and c6x could then continue to register
>>> their current optab functions and a non-null optab function would
>>> indicate a return value pair.
>> AFAIU, there are only three targets (c6x, spu, arm) that override
>> optab_libfunc for udivmod_optab for following modes:
>> ./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, SImode, "__c6xabi_divremu");
>> ./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, DImode, 
>> "__c6xabi_divremull");
>> ./arm/arm.c:  set_optab_libfunc (udivmod_optab, DImode, "__aeabi_uldivmod");
>> ./arm/arm.c:  set_optab_libfunc (udivmod_optab, SImode, "__aeabi_uidivmod");
>> ./spu/spu.c:  set_optab_libfunc (udivmod_optab, DImode, "__udivmoddi4");
>> ./spu/spu.c:  set_optab_libfunc (udivmod_optab, TImode, "__udivmodti4");
>>
>> Out of these only the arm, and c6x have target-specific divmod libfuncs which
>> return  pair, while spu merely makes it point to the
>> standard functions.
>>
>> So we could set libfunc entry for udivmod_optab to NULL, thus dropping
>> support for generic
>> divmod functions (__udivmoddi4, __udivmodti4). For targets that
>> require standard divmod libfuncs like __udivmoddi4,
>> they could explicitly override  optab_libfunc and set it to
>> __udivmoddi4, just as spu does.
>>
>> However this implies that non-null optab function doesn't necessarily
>> follow arm/c6x convention.
>> (i686-gcc for instance generates call to libgcc routines
>> __udivdi3/__umoddi3 for DImode division/mod operations
>> and could profit from divmod transform by calling __udivmoddi4).
>
> What I meant was that we shouldn't treat udivmoddi4 as an optab function
> at all, but handle it with some on-the-side mechanism.  That seems like
> quite a natural fit if we handle the fused div/mod operation as an
> internal function during gimple.
Ah right, thanks for pointing out. So if optab function for [us]divmod_optab
is defined, then it must follow the arm/c6x convention ?
>
> I think the current SPU code is wrong, but it looks like a latent bug.
> (Like I say, does the udivmodti4 function that it registers ever
> actually get used?  It seems unlikely.)
>
> In that scenario no other targets should do what SPU does.
I am testing the following patch which sets libfunc entries for both
sdivmod_optab, udivmod_optab to NULL.
This won't change the current (broken) behavior for SPU port since it
explicitly overrides optab_libfunc for udivmod_optab
and sets it to 

Re: [PING] set libfunc entry for sdivmod_optab to NULL in optabs.def

2016-09-15 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> On 15 September 2016 at 04:21, Richard Sandiford
>  wrote:
>> Richard Sandiford  writes:
>>> Prathamesh Kulkarni  writes:
 Hi,
 I would like to ping the following patch:
 https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01015.html

 While implementing divmod transform:
 https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html

 I ran into an  issue with optab_libfunc().
 It appears optab_libfunc (sdivmod_optab, mode) returns
 bogus libfunc for unsupported modes, for instance
 on x86_64, optab_libfunc (sdivmod_optab, DImode) returns
 a libfunc with name "__divmoddi4", even though such a libfunc
 does not exist in libgcc. This happens because in optabs.def
 the libfunc entry for sdivmod_optab has gen_int_libfunc,
 and call to optab_libfunc (sdivmo_optab, DImode) lazily
 creates a bogus libfunc "__divmoddi4" by calling gen_int_libfunc().

 To work around this issue I set libfunc entry for sdivmod_optab to NULL
 and verified that optab_libfunc (sdivmod_optab, DImode) returns NULL_RTX
 instead of a bogus libfunc if it's not overriden by the target.

 Bootstrapped and tested on ppc64le-linux-gnu, x86_64-linux-gnu.
 Cross tested on arm*-*-*, aarch64*-*-*.
 OK for trunk ?
>>>
>>> I'm not a maintainer for this area, but:
>>
>> ...in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
>> you said that c6x follows the return-by-pointer convention.
>> I'm no c6x expert, but from a quick look, I think its divrem
>> function returns a div/mod pair in A4/A5, which matches the
>> ARM convention of returning both results by value.
>>
>> Does anyone know if the optab function registered by the SPU
>> backend is ever called directly?
>>
>> You mention that this is latent as far as expand_twoval_binop_libfunc
>> is concerned.  AIUI expand_twoval_binop_libfunc implements the ARM/c6x
>> convention and expects the two values to be returned as a pair.
>> It then extracts one half of the pair and discards the other.
>> So my worry is that we're leaving the udivmod entry intact even though
>> the standard __udivmoddi4 doesn't do what expand_twoval_binop_libfunc
>> expects.
>>
>> Would it make sense to set both entries to null and treat __udivmoddi4
>> as a non-optab function?  ARM and c6x could then continue to register
>> their current optab functions and a non-null optab function would
>> indicate a return value pair.
> AFAIU, there are only three targets (c6x, spu, arm) that override
> optab_libfunc for udivmod_optab for following modes:
> ./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, SImode, "__c6xabi_divremu");
> ./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, DImode, "__c6xabi_divremull");
> ./arm/arm.c:  set_optab_libfunc (udivmod_optab, DImode, "__aeabi_uldivmod");
> ./arm/arm.c:  set_optab_libfunc (udivmod_optab, SImode, "__aeabi_uidivmod");
> ./spu/spu.c:  set_optab_libfunc (udivmod_optab, DImode, "__udivmoddi4");
> ./spu/spu.c:  set_optab_libfunc (udivmod_optab, TImode, "__udivmodti4");
>
> Out of these only the arm, and c6x have target-specific divmod libfuncs which
> return  pair, while spu merely makes it point to the
> standard functions.
>
> So we could set libfunc entry for udivmod_optab to NULL, thus dropping
> support for generic
> divmod functions (__udivmoddi4, __udivmodti4). For targets that
> require standard divmod libfuncs like __udivmoddi4,
> they could explicitly override  optab_libfunc and set it to
> __udivmoddi4, just as spu does.
>
> However this implies that non-null optab function doesn't necessarily
> follow arm/c6x convention.
> (i686-gcc for instance generates call to libgcc routines
> __udivdi3/__umoddi3 for DImode division/mod operations
> and could profit from divmod transform by calling __udivmoddi4).

What I meant was that we shouldn't treat udivmoddi4 as an optab function
at all, but handle it with some on-the-side mechanism.  That seems like
quite a natural fit if we handle the fused div/mod operation as an
internal function during gimple.

I think the current SPU code is wrong, but it looks like a latent bug.
(Like I say, does the udivmodti4 function that it registers ever
actually get used?  It seems unlikely.)

In that scenario no other targets should do what SPU does.

Thanks,
Richard


Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Wilco Dijkstra
Tamar Christina wrote:
> On 13/09/16 13:43, Joseph Myers wrote:
> > On Tue, 13 Sep 2016, Tamar Christina wrote:
>>
> >> On 12/09/16 23:33, Joseph Myers wrote:
> >>> Why is this boolean false for ieee_quad_format, mips_quad_format and
> >>> ieee_half_format?  They should meet your description (even if the x86 /
> >>> m68k "extended" formats don't because of the leading mantissa bit being
> >>> set for infinities).
> >>>
> >> Ah, I played it a bit too safe there. I will change this and do some
> >> re-testing and update the patch.
> > It occurred to me that there might be an issue with your approach of
> > overlaying the floating-point value with a single integer, when the quad
> > formats are used on 32-bit systems where TImode isn't fully supported as a
> > scalar mode.  However, if that's an issue the answer isn't to mark the
> > formats as non-IEEE, it's to support ORing together the relevant parts of
> > multiple words when determining whether the mantissa is nonzero (or some
> > equivalent logic).
> >
> I have been trying to reproduce this on the architectures I have access to
> but have been unable to so far. In practice if this does happen though 
> isn't it the fault of the system for advertising partial TImode support and 
> support of IEEE types?
>
> It seems to me that in order for me to be able to do this fpclassify 
> would incur a rather large costs in complexity. Also wouldn't this be 
> problematic 
> for other functions as well such as expand_builtin_signbit?

Yes, if there are targets which don't implement TImode operations then surely
they should be automatically split into DImode operations before or during 
Expand?
GCC's implementation of types larger than the register int type is generally 
extremely
poor as it is missing such an expansion (practically all compilers do this), so 
this
would improve things significantly.

So for now it would seem best to keep the boolean false for quad formats on 
32-bit
targets.

Wilco



Re: [PATCH 0/8] make next_*_insn take rtx_insn * arguments

2016-09-15 Thread Bernd Schmidt

On 09/14/2016 09:21 PM, tbsaunde+...@tbsaunde.org wrote:


Basically $subject.  First change variable's type to rtx_insn * where possible.
Then change the functions and fixup callers where it is still necessary to
cast.


#2, #4 and #8 look good and can be applied if they work independently of 
the others.


Less certain about some of the others which introduce additional casts. 
Maybe LABEL_REF_LABEL needs converting first, and reorg.c has a few 
variables that might have to be made rtx_insn * in patch #7 to avoid casts.



Bernd


Re: Patch ping

2016-09-15 Thread Bernd Schmidt

On 09/14/2016 11:47 PM, Jakub Jelinek wrote:



http://gcc.gnu.org/ml/gcc-patches/2016-09/msg00088.html
  - PR77425 - fix UB in sd_iterator_cond


Ok.


Bernd



Re: [PING] set libfunc entry for sdivmod_optab to NULL in optabs.def

2016-09-15 Thread Prathamesh Kulkarni
On 15 September 2016 at 04:21, Richard Sandiford
 wrote:
> Richard Sandiford  writes:
>> Prathamesh Kulkarni  writes:
>>> Hi,
>>> I would like to ping the following patch:
>>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01015.html
>>>
>>> While implementing divmod transform:
>>> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
>>>
>>> I ran into an  issue with optab_libfunc().
>>> It appears optab_libfunc (sdivmod_optab, mode) returns
>>> bogus libfunc for unsupported modes, for instance
>>> on x86_64, optab_libfunc (sdivmod_optab, DImode) returns
>>> a libfunc with name "__divmoddi4", even though such a libfunc
>>> does not exist in libgcc. This happens because in optabs.def
>>> the libfunc entry for sdivmod_optab has gen_int_libfunc,
>>> and call to optab_libfunc (sdivmo_optab, DImode) lazily
>>> creates a bogus libfunc "__divmoddi4" by calling gen_int_libfunc().
>>>
>>> To work around this issue I set libfunc entry for sdivmod_optab to NULL
>>> and verified that optab_libfunc (sdivmod_optab, DImode) returns NULL_RTX
>>> instead of a bogus libfunc if it's not overriden by the target.
>>>
>>> Bootstrapped and tested on ppc64le-linux-gnu, x86_64-linux-gnu.
>>> Cross tested on arm*-*-*, aarch64*-*-*.
>>> OK for trunk ?
>>
>> I'm not a maintainer for this area, but:
>
> ...in https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01757.html
> you said that c6x follows the return-by-pointer convention.
> I'm no c6x expert, but from a quick look, I think its divrem
> function returns a div/mod pair in A4/A5, which matches the
> ARM convention of returning both results by value.
>
> Does anyone know if the optab function registered by the SPU
> backend is ever called directly?
>
> You mention that this is latent as far as expand_twoval_binop_libfunc
> is concerned.  AIUI expand_twoval_binop_libfunc implements the ARM/c6x
> convention and expects the two values to be returned as a pair.
> It then extracts one half of the pair and discards the other.
> So my worry is that we're leaving the udivmod entry intact even though
> the standard __udivmoddi4 doesn't do what expand_twoval_binop_libfunc
> expects.
>
> Would it make sense to set both entries to null and treat __udivmoddi4
> as a non-optab function?  ARM and c6x could then continue to register
> their current optab functions and a non-null optab function would
> indicate a return value pair.
AFAIU, there are only three targets (c6x, spu, arm) that override
optab_libfunc for udivmod_optab for following modes:
./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, SImode, "__c6xabi_divremu");
./c6x/c6x.c:  set_optab_libfunc (udivmod_optab, DImode, "__c6xabi_divremull");
./arm/arm.c:  set_optab_libfunc (udivmod_optab, DImode, "__aeabi_uldivmod");
./arm/arm.c:  set_optab_libfunc (udivmod_optab, SImode, "__aeabi_uidivmod");
./spu/spu.c:  set_optab_libfunc (udivmod_optab, DImode, "__udivmoddi4");
./spu/spu.c:  set_optab_libfunc (udivmod_optab, TImode, "__udivmodti4");

Out of these only the arm, and c6x have target-specific divmod libfuncs which
return  pair, while spu merely makes it point to the
standard functions.

So we could set libfunc entry for udivmod_optab to NULL, thus dropping
support for generic
divmod functions (__udivmoddi4, __udivmodti4). For targets that
require standard divmod libfuncs like __udivmoddi4,
they could explicitly override  optab_libfunc and set it to
__udivmoddi4, just as spu does.
However this implies that non-null optab function doesn't necessarily
follow arm/c6x convention.
(i686-gcc for instance generates call to libgcc routines
__udivdi3/__umoddi3 for DImode division/mod operations
and could profit from divmod transform by calling __udivmoddi4).

However then the issue with expand_twoval_optab_libfunc() still remains, and
I am not sure what to do about that.
As a temporary hack, we could punt if the divmod function's name is
"__udivmoddi4":

--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -2083,7 +2083,7 @@ expand_twoval_binop_libfunc (optab binoptab, rtx
op0, rtx op1,

   mode = GET_MODE (op0);
   libfunc = optab_libfunc (binoptab, mode);
-  if (!libfunc)
+  if (!libfunc || !strcmp (XSTR (libfunc, 0), "__udivmoddi4"))
 return false;

which is admittedly quite ugly :/

Thanks,
Prathamesh
>
> Sorry if this has already been covered.
>
> Thanks,
> Richard


Re: [PATCH PR77503]Record reduction code for CONST_COND_REDUCTION at analysis stage and use it at transform

2016-09-15 Thread Richard Biener
On Thu, Sep 15, 2016 at 12:07 PM, Bin Cheng  wrote:
> Hi,
> This patch fixes PR77503.  Root cause is loop peeling changes the initial 
> value for reduction PHI node, resulting in different statement for 
> vect_transform_stmt to vect_analyze_stmt.  Similar issue stands for loop 
> control IV, we record the original information in 
> LOOP_PHI_EVOLUTION_BASE_UNCHANGED for that.  This patch follows the same 
> method by recording reduction code for CONST_COND_REDUCTION at analysis stage 
> and use the information at transform stage.  The only difference is we need 
> record it in stmt_vec_info because there might be multiple reductions.  
> Unfortunately this requires additional memory for each statement, I didn't 
> see easy way out.  Maybe it's possible to improve vectorizer so it 
> caches/reuses information from analysis stage to transform stage?
> Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK if no regression?

Ok.

As for improving stmt_info size and caching info from analysis phase
-- yes, ideally stmt_info would have
most of its contents discriminated on STMT_VINFO_TYPE using a union of
type specific fields.  Note that
this kind of refactoring would be way easier than trying to make it a
class using inheritance (you'd need to
defer vinfo creation until analysis figured out the info or add an
indirection to type specific data).

stmt_info isn't very well packed either so the general answer for now
is -- we don't care about its size.

As for re-using data from analysis phase -- yes!  That we share the
"head" of all vectorizable_* routines
for both analysis and transform phase was a bad design decision --
ideally we'd have vectorizable_*
routines w/o transform and then vectorize_* routines which only do the
transform based on data recorded
during analysis phase.

Both refactorings are very welcome (esp. the latter which eventually
means adding many more fields to
stmt_info).

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-09-07  Bin Cheng  
>
> PR tree-optimization/77503
> * tree-vect-loop.c (vectorizable_reduction): Record reduction
> code for CONST_COND_REDUCTION at analysis stage and use it at
> transform stage.
> * tree-vectorizer.h (struct _stmt_vec_info): New field.
> (STMT_VINFO_VEC_CONST_COND_REDUC_CODE): New macro.
> * tree-vect-stmts.c (new_stmt_vec_info): Initialize above new
> field.
>
> gcc/testsuite/ChangeLog
> 2016-09-07  Bin Cheng  
>
> PR tree-optimization/77503
> * gcc.dg/vect/pr77503.c: New test.


[PATCH] Editorial fixes to libstdc++ debug mode docs

2016-09-15 Thread Jonathan Wakely

* doc/xml/manual/debug_mode.xml: Minor editorial fixes.
* doc/html/*: Regenerate.

Committed to trunk.

commit d2fadc3e0b0d69c834c398f79d53e38d900c5951
Author: Jonathan Wakely 
Date:   Thu Sep 15 11:29:29 2016 +0100

Editorial fixes to libstdc++ debug mode docs

* doc/xml/manual/debug_mode.xml: Minor editorial fixes.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/debug_mode.xml 
b/libstdc++-v3/doc/xml/manual/debug_mode.xml
index 8b5380d..ff2f1cc 100644
--- a/libstdc++-v3/doc/xml/manual/debug_mode.xml
+++ b/libstdc++-v3/doc/xml/manual/debug_mode.xml
@@ -800,8 +800,7 @@ test02()
 
Use implementation-specific properties of 
anonymous
 namespaces. 
-See http://www.w3.org/1999/xlink; 
xlink:href="http://gcc.gnu.org/ml/libstdc++/2003-08/msg4.html;> this post
-
+See http://www.w3.org/1999/xlink; 
xlink:href="http://gcc.gnu.org/ml/libstdc++/2003-08/msg4.html;>this 
post.
 This method fails the correctness 
criteria.
 
   Extension: allow reopening on 
namespaces: This would
@@ -817,8 +816,8 @@ test02()
 recompilation requirement, because we would only be able to
 support option (1) or (2).
 
-  Extension: use link name: This option 
involves
-complicated re-naming between debug-mode and release-mode
+  Extension: use link name: This option
+involves complicated re-naming between debug-mode and release-mode
 components at compile time, and then a g++ extension called 
 link name  to recover the original names at link time. There
 are two drawbacks to this approach. One, it's very verbose,
@@ -827,8 +826,8 @@ test02()
 functions taking no arguments in mixed-mode settings resulting in
 equivalent link names,  vector::push_back()  being
 one example.
-See http://www.w3.org/1999/xlink; 
xlink:href="http://gcc.gnu.org/ml/libstdc++/2003-08/msg00177.html;>link
-name 
+See http://www.w3.org/1999/xlink; 
xlink:href="http://gcc.gnu.org/ml/libstdc++/2003-08/msg00177.html;>proof-of-concept
 using link
+name. 
 
 
 Other options may exist for implementing the debug mode, many of


Re: [PATCH] Fix PR64078

2016-09-15 Thread Tom de Vries

On 31/08/16 07:42, Tom de Vries wrote:

On 30/08/16 11:38, Bernd Edlinger wrote:

On 08/30/16 10:21, Tom de Vries wrote:

On 29/08/16 18:43, Bernd Edlinger wrote:

Thanks!

Actually my patch missed to fix one combination: -m32 with -fpic

make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c --tool_opts
'-m32 -fpic'"

FAIL: c-c++-common/ubsan/object-size-9.c   -O2  execution test
FAIL: c-c++-common/ubsan/object-size-9.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test

The problem here is that the functions f2 and f3 access a stack-
based object out of bounds and that is inlined in main and
therefore smashes the return address of main in this case.

A possible fix could look like follows:

Index: object-size-9.c
===
--- object-size-9.c(revision 239794)
+++ object-size-9.c(working copy)
@@ -93,5 +93,9 @@
  #endif
f4 (12);
f5 (12);
+#ifdef __cplusplus
+  /* Stack may be smashed by f2/f3 above.  */
+  __builtin_exit (0);
+#endif
return 0;
  }


Do you think that this should be fixed too?


I think it should be fixed. Ideally, we'd prevent the out-of-bounds
writes to have harmful effects, but I'm not sure how to enforce that.

This works for me:
...
diff --git a/gcc/testsuite/c-c++-common/ubsan/object-size-9.c
b/gcc/testsuite/c-c++-common/ubsan/object-size-9.c
index 46f1fb9..fec920d 100644
--- a/gcc/testsuite/c-c++-common/ubsan/object-size-9.c
+++ b/gcc/testsuite/c-c++-common/ubsan/object-size-9.c
@@ -31,6 +31,7 @@ static struct C
  f2 (int i)
  {
struct C x;
+  struct C x2;
x.d[i] = 'z';
return x;
  }
@@ -45,6 +46,7 @@ static struct C
  f3 (int i)
  {
struct C x;
+  struct C x2;
char *p = x.d;
p += i;
*p = 'z';
...

But I have no idea how stable this solution is.



At least x2 could not be opimized away, as it is no POD,
but there is no guarantee, that x2 comes after x on the stack.
Another possibility, which seems to work as well:


Index: gcc/testsuite/c-c++-common/ubsan/object-size-9.c
===
--- gcc/testsuite/c-c++-common/ubsan/object-size-9.c(revision 239794)
+++ gcc/testsuite/c-c++-common/ubsan/object-size-9.c(working copy)
@@ -30,7 +30,7 @@ f1 (struct T x, int i)
  static struct C
  f2 (int i)
  {
-  struct C x;
+  struct C x __attribute__ ((aligned(16)));
x.d[i] = 'z';
return x;
  }
@@ -44,7 +44,7 @@ f2 (int i)
  static struct C
  f3 (int i)
  {
-  struct C x;
+  struct C x __attribute ((aligned(16)));
char *p = x.d;
p += i;
*p = 'z';



Works for me.


OK for trunk, 5 & 6 branch?

Thanks,
- Tom

Fix object-size-9.c with -fpic

2016-09-15  Bernd Edlinger  
	Tom de Vries  

	PR testsuite/77411
	* c-c++-common/ubsan/object-size-9.c (f2, f3): Declare struct C variable
	with __attribute__((aligned(16))).

---
 gcc/testsuite/c-c++-common/ubsan/object-size-9.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/ubsan/object-size-9.c b/gcc/testsuite/c-c++-common/ubsan/object-size-9.c
index 46f1fb9..4cd8529 100644
--- a/gcc/testsuite/c-c++-common/ubsan/object-size-9.c
+++ b/gcc/testsuite/c-c++-common/ubsan/object-size-9.c
@@ -30,7 +30,7 @@ f1 (struct T x, int i)
 static struct C
 f2 (int i)
 {
-  struct C x;
+  struct C x __attribute__((aligned(16)));
   x.d[i] = 'z';
   return x;
 }
@@ -44,7 +44,7 @@ f2 (int i)
 static struct C
 f3 (int i)
 {
-  struct C x;
+  struct C x __attribute__((aligned(16)));
   char *p = x.d;
   p += i;
   *p = 'z';


Re: C++ PATCH to forbid use of bool with the ++ operator

2016-09-15 Thread Jonathan Wakely

On 14/09/16 17:28 +0200, Marek Polacek wrote:

diff --git gcc/libstdc++-v3/testsuite/23_containers/vector/debug/insert6_neg.cc 
gcc/libstdc++-v3/testsuite/23_containers/vector/debug/insert6_neg.cc
index 9893293..c939c22 100644
--- gcc/libstdc++-v3/testsuite/23_containers/vector/debug/insert6_neg.cc
+++ gcc/libstdc++-v3/testsuite/23_containers/vector/debug/insert6_neg.cc
@@ -16,6 +16,7 @@
// .
//
// { dg-do run { xfail *-*-* } }
+// { dg-options "-Wno-deprecated" }

#include 
#include 


I'm replacing this part with a fix to avoid incrementing bool in the
testsuite, because otherwise generate_unique::build() only ever
returns true. This way it still doesn't return unique values, but at
least it alternates between true and false, and is valid for C++17.

Tested x86_64-linux, committed to trunk.


commit 4137ef6dbe7915f61c0451e10e7d91376421804c
Author: Jonathan Wakely 
Date:   Thu Sep 15 10:48:26 2016 +0100

Fix testsuite to not increment bool

	* testsuite/23_containers/vector/debug/insert6_neg.cc: Remove
	-Wno-deprecated.
	* testsuite/util/debug/checks.h (generate_unique): Specialize.

diff --git a/libstdc++-v3/testsuite/23_containers/vector/debug/insert6_neg.cc b/libstdc++-v3/testsuite/23_containers/vector/debug/insert6_neg.cc
index c939c22..9893293 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/debug/insert6_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/debug/insert6_neg.cc
@@ -16,7 +16,6 @@
 // .
 //
 // { dg-do run { xfail *-*-* } }
-// { dg-options "-Wno-deprecated" }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/util/debug/checks.h b/libstdc++-v3/testsuite/util/debug/checks.h
index 5a40608..872fa1a 100644
--- a/libstdc++-v3/testsuite/util/debug/checks.h
+++ b/libstdc++-v3/testsuite/util/debug/checks.h
@@ -53,6 +53,19 @@ namespace __gnu_test
   }
 };
 
+  template<>
+struct generate_unique
+{
+  typedef bool value_type;
+
+  value_type build()
+  {
+	static value_type _S_;
+	_S_ = !_S_;
+	return _S_;
+  }
+};
+
   template
 struct generate_unique >
 {


[PATCH PR77503]Record reduction code for CONST_COND_REDUCTION at analysis stage and use it at transform

2016-09-15 Thread Bin Cheng
Hi,
This patch fixes PR77503.  Root cause is loop peeling changes the initial value 
for reduction PHI node, resulting in different statement for 
vect_transform_stmt to vect_analyze_stmt.  Similar issue stands for loop 
control IV, we record the original information in 
LOOP_PHI_EVOLUTION_BASE_UNCHANGED for that.  This patch follows the same method 
by recording reduction code for CONST_COND_REDUCTION at analysis stage and use 
the information at transform stage.  The only difference is we need record it 
in stmt_vec_info because there might be multiple reductions.  Unfortunately 
this requires additional memory for each statement, I didn't see easy way out.  
Maybe it's possible to improve vectorizer so it caches/reuses information from 
analysis stage to transform stage?
Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK if no regression?

Thanks,
bin

2016-09-07  Bin Cheng  

PR tree-optimization/77503
* tree-vect-loop.c (vectorizable_reduction): Record reduction
code for CONST_COND_REDUCTION at analysis stage and use it at
transform stage.
* tree-vectorizer.h (struct _stmt_vec_info): New field.
(STMT_VINFO_VEC_CONST_COND_REDUC_CODE): New macro.
* tree-vect-stmts.c (new_stmt_vec_info): Initialize above new
field.

gcc/testsuite/ChangeLog
2016-09-07  Bin Cheng  

PR tree-optimization/77503
* gcc.dg/vect/pr77503.c: New test.diff --git a/gcc/testsuite/gcc.dg/vect/pr77503.c 
b/gcc/testsuite/gcc.dg/vect/pr77503.c
new file mode 100644
index 000..609e7fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr77503.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+/* { dg-skip-if "need vect_max_reduc support" { ! vect_max_reduc } } */
+
+extern void d(void);
+void a() {
+  char *b;
+  char c = 0;
+  for (; b < (char *)a; b++) {
+if (*b)
+  c = 1;
+*b = 0;
+  }
+  if (c)
+d();
+}
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fa06505..58f3456 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5438,7 +5438,7 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
   tree def0, def1, tem, op1 = NULL_TREE;
   bool first_p = true;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
-  tree cond_reduc_val = NULL_TREE, const_cond_cmp = NULL_TREE;
+  tree cond_reduc_val = NULL_TREE;
 
   /* In case of reduction chain we switch to the first stmt in the chain, but
  we don't update STMT_INFO, since only the last stmt is marked as reduction
@@ -5645,7 +5645,19 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
= INTEGER_INDUC_COND_REDUCTION;
}
 
-  if (cond_reduc_dt == vect_constant_def)
+  /* Loop peeling modifies initial value of reduction PHI, which
+makes the reduction stmt to be transformed different to the
+original stmt analyzed.  We need to record reduction code for
+CONST_COND_REDUCTION type reduction at analyzing stage, thus
+it can be used directly at transform stage.  */
+  if (STMT_VINFO_VEC_CONST_COND_REDUC_CODE (stmt_info) == MAX_EXPR
+ || STMT_VINFO_VEC_CONST_COND_REDUC_CODE (stmt_info) == MIN_EXPR)
+   {
+ /* Also set the reduction type to CONST_COND_REDUCTION.  */
+ gcc_assert (cond_reduc_dt == vect_constant_def);
+ STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) = CONST_COND_REDUCTION;
+   }
+  else if (cond_reduc_dt == vect_constant_def)
{
  enum vect_def_type cond_initial_dt;
  gimple *def_stmt = SSA_NAME_DEF_STMT (ops[reduc_index]);
@@ -5667,7 +5679,9 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
dump_printf_loc (MSG_NOTE, vect_location,
 "condition expression based on "
 "compile time constant.\n");
- const_cond_cmp = e;
+ /* Record reduction code at analysis stage.  */
+ STMT_VINFO_VEC_CONST_COND_REDUC_CODE (stmt_info)
+   = integer_onep (e) ? MAX_EXPR : MIN_EXPR;
  STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
= CONST_COND_REDUCTION;
}
@@ -5821,10 +5835,8 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
 we want to base our reduction around.  */
   if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == CONST_COND_REDUCTION)
{
- gcc_assert (const_cond_cmp != NULL_TREE);
- gcc_assert (integer_onep (const_cond_cmp)
- || integer_zerop (const_cond_cmp));
- orig_code = integer_onep (const_cond_cmp) ? MAX_EXPR : MIN_EXPR;
+ orig_code = STMT_VINFO_VEC_CONST_COND_REDUC_CODE (stmt_info);
+ gcc_assert (orig_code == MAX_EXPR || 

[RFC] Speed-up -fprofile-update=atomic

2016-09-15 Thread Martin Liška
On 09/07/2016 02:09 PM, Richard Biener wrote:
> On Wed, Sep 7, 2016 at 1:37 PM, Martin Liška  wrote:
>> On 08/18/2016 06:06 PM, Richard Biener wrote:
>>> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek  
>>> wrote:
 On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>> I'd prefer to make updates atomic in multi-threaded applications.
>> The best proxy we have for that is -pthread.
>>
>> Is it slower, most definitely, but odds are we're giving folks
>> garbage data otherwise, which in many ways is even worse.
>
> It will likely be catastrophically slower in some cases.
>
> Catastrophically as in too slow to be usable.
>
> An atomic instruction is a lot more expensive than a single
 increment. Also
> they sometimes are really slow depending on the state of the machine.

 Can't we just have thread-local copies of all the counters (perhaps
 using
 __thread pointer as base) and just atomically merge at thread
 termination?
>>>
>>> I suggested that as well but of course it'll have its own class of issues 
>>> (short lived threads, so we need to somehow re-use counters from terminated 
>>> threads, large number of threads and thus using too much memory for the 
>>> counters)
>>>
>>> Richard.
>>
>> Hello.
>>
>> I've got written the approach on my TODO list, let's see whether it would be 
>> doable in a reasonable amount of time.
>>
>> I've just finished some measurements to illustrate slow-down of 
>> -fprofile-update=atomic approach.
>> All numbers are: no profile, -fprofile-generate, -fprofile-generate 
>> -fprofile-update=atomic
>> c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
>> unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
>> tramp3d (1 thread, -O3): 18.0, 46.6, 168s
>>
>> So the slow-down is roughly 300% compared to -fprofile-generate. I'm not 
>> having much experience with default option
>> selection, but these numbers can probably help.
>>
>> Thoughts?
> 
> Look at the generated code for an instrumented simple loop and see that for
> the non-atomic updates we happily apply store-motion to the counter update
> and thus we only get one counter update per loop exit rather than one per
> loop iteration.  Now see what happens for the atomic case (I suspect you
> get one per iteration).
> 
> I'll bet this accounts for most of the slowdown.
> 
> Back in time ICC which had atomic counter updates (but using function
> calls - ugh!) had a > 1000% overhead with FDO for tramp3d (they also
> didn't have early inlining -- removing abstraction helps reducing the number
> of counters significantly).
> 
> Richard.

Hi.

During Cauldron I discussed with Richi approaches how to speed-up ARCS
profile counter updates. My first attempt is to utilize TLS storage, where
every function is accumulating arcs counters. These are eventually added
(using atomic operations) to the global one at the very end of a function.
Currently I rely on target support of TLS, which is questionable whether
to have such a requirement for -fprofile-update=atomic, or to add a new option 
value
like -fprofile-update=atomic-tls?

Running the patch on tramp3d, compared to previous numbers, it takes 88s to 
finish.
Time shrinks to 50%, compared to the current implementation.

Thoughts?
Martin

> 
>> Martin
>>
>>>
  Jakub
>>>
>>>
>>

>From 91b5342c422950b32d1ba7d616bda418c7993a84 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 15 Sep 2016 09:49:41 +0200
Subject: [PATCH] Improve implementation of -fprofile-update=atomic

---
 gcc/coverage.c | 99 ++
 gcc/coverage.h |  6 ++--
 gcc/profile.c  |  2 ++
 gcc/tree-profile.c | 45 +
 4 files changed, 114 insertions(+), 38 deletions(-)

diff --git a/gcc/coverage.c b/gcc/coverage.c
index a6a888a..1e0052f 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -48,6 +48,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "params.h"
 #include "auto-profile.h"
+#include "varasm.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "tree-vrp.h"
+#include "tree-ssanames.h"
 
 #include "gcov-io.c"
 
@@ -93,6 +98,9 @@ static GTY(()) tree fn_v_ctrs[GCOV_COUNTERS];   /* counter variables.  */
 static unsigned fn_n_ctrs[GCOV_COUNTERS]; /* Counters allocated.  */
 static unsigned fn_b_ctrs[GCOV_COUNTERS]; /* Allocation base.  */
 
+/* Thread-local storage variable of ARCS counters.  */
+static GTY(()) tree arcs_tls_ctr;
+
 /* Coverage info VAR_DECL and function info type nodes.  */
 static GTY(()) tree gcov_info_var;
 static GTY(()) tree gcov_fn_info_type;
@@ -127,7 +135,8 @@ static const char *const ctr_names[GCOV_COUNTERS] = {
 
 /* Forward declarations.  */
 static void read_counts_file (void);
-static tree build_var (tree, tree, int);
+static tree build_var (tree fn_decl, tree type, int counter,
+		   bool is_tls = false);
 

Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-15 Thread Yuan, Pengfei
> Yes.  This means it's easy to experiment with other values than zero.  
> Basically
> early inlining is supposed to remove abstraction penalty to
> 
>  a) reduce FDO instrumentation overhead
>  b) get more realistic size estimates for the inliner
> 
> a) was particularly important back in time for tramp3d, reducing
> profiling runtime
> 1000-fold.  b) is generally important.
> 
> PARAM_EARLY_INLINING_INSNS is supposed to be a reasonable value to
> get abstraction removed but IIRC we increased it quite a bit to also get more
> early optimization (to get more accurate inliner estimates).
> 
> What I am saying is that reducing PARAM_EARLY_INLINING_INSNS makes
> sense for FDO but reducing it to zero is probably a bit much.
> 
> Can you do your measurements with values between zero and the current
> default of 14 (wow, 14 ... didn't know it's _that_ high currently ;)).
> What's the
> value that crosses the boundary of diminishing returns regarding to code-size
> improvements for you?
> 
> Richard.
> 

I see. This procedure may take some time since profile data can not be reused.

Yuan, Pengfei



Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2016-09-15 Thread Rainer Orth
Hi Jason,

>> I confirm no change on arm* either.
>> On aarch64, gen-attrs-[25]1.C, and devirt-33 now work.
>> name-clash11.C still fails on both targets
>
> Ah, I needed to remove the limit on field alignment as well.  This
> seems to fix things.

The failures are gone on Solaris/SPARC (sparc-sun-solaris2.12) now, with
one exception (when run as C test):

32-bit:

FAIL: c-c++-common/pr52181.c  -Wc++-compat   (test for bogus messages, line 11)
FAIL: c-c++-common/pr52181.c  -Wc++-compat  (test for excess errors)

This is new with you last patch.

Excess errors:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/c-c++-common/pr52181.c:6:1: warning: 
requested alignment 16 is larger than 8 [-Wattributes]
/vol/gcc/src/hg/trunk/local/gcc/testsuite/c-c++-common/pr52181.c:8:1: warning: 
requested alignment 16 is larger than 8 [-Wattributes]

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix gcc.dg/fold-reassoc-2.c

2016-09-15 Thread Richard Biener

It's supposed to test for the transform in two cases.

Applied to trunk.

Richard.

2016-09-15  Richard Biener  

* gcc.dg/fold-reassoc-2.c: Fix dump scan.

Index: gcc/testsuite/gcc.dg/fold-reassoc-2.c
===
--- gcc/testsuite/gcc.dg/fold-reassoc-2.c   (revision 240153)
+++ gcc/testsuite/gcc.dg/fold-reassoc-2.c   (working copy)
@@ -10,4 +10,4 @@ int bar (int i)
   return (i + 2) + ~i;
 }
 
-/* { dg-final { scan-tree-dump "return 1;" "original" } } */
+/* { dg-final { scan-tree-dump-times "return 1;" 2 "original" } } */


Re: [PATCH 1/8] change a few rtx_insn * to rtx_jump_insn *

2016-09-15 Thread Trevor Saunders
On Thu, Sep 15, 2016 at 10:28:19AM +0200, Richard Biener wrote:
> On Thu, Sep 15, 2016 at 6:31 AM, Trevor Saunders  
> wrote:
> > On Thu, Sep 15, 2016 at 01:04:04PM +0900, Kaz Kojima wrote:
> >> tbsaunde+...@tbsaunde.org wrote:
> >> > @@ -2201,8 +2201,7 @@ fix_crossing_unconditional_branches (void)
> >> > {
> >> >   if (!BARRIER_P (cur_insn))
> >> > BLOCK_FOR_INSN (cur_insn) = cur_bb;
> >> > - if (JUMP_P (cur_insn))
> >> > -   jump_insn = cur_insn;
> >> > + jump_insn = dyn_cast (cur_insn);
> >> > }
> >>
> >> This hunk results several new failures for tree-profile tests on SH.
> >> If the line "if (JUMP_P (cur_insn))" is restored, those failures
> >> go away.
> >
> > That's interesting because dyn_cast should include that check.  What is
> > the error?
> 
> maybe jump_insn is non-NULL before?  in which case you set it to NULL
> if ! JUMP_P while before we didn't.

Oh wow, I missed that this  assignment was in a loop not an iff :(
makes sense now and easy enough to fix up.

Trev

> 
> Richard.
> 
> > Thanks!
> >
> > Trev
> >
> >>
> >> Regards,
> >>   kaz


Re: [PATCH] Improve string::clear() performance

2016-09-15 Thread Jonathan Wakely

On 14/09/16 10:41 -0700, Cong Wang wrote:

For long term, I think gcc should have something as simple as
'Signed-off-by' for Linux kernel, otherwise too much work for first-time
contributors like me. We all want to save time on this, don't we? ;)


Signed-off-by wouldn't help. The copyright assignment is done so that
the FSF owns the copyright in the majority of the GCC code. If I add
a Signed-off-by to your patch that doesn't have any effect on your
copyright of the code.

It certainly does add a hurdle for contributors to GCC, but it's not a
policy that is likely to change.

Anyway, I'll fix this one way or another ASAP. Thanks again.



Re: debug container mutex association

2016-09-15 Thread Jonathan Wakely

On 14/09/16 22:17 +0200, François Dumont wrote:

On 14/09/2016 11:00, Jonathan Wakely wrote:

On 13/09/16 22:37 +0200, François Dumont wrote:

Hi

  When I proposed to change std::hash for pointers my plan was to 
use this approach for the debug containers. So here is the patch 
leveraging on this technique to avoid going through _Hash_impl to 
hash address and get mutex index from it. I think it is obious 
that new access is faster so I didn't wrote a performance test to 
demonstrate it.


Is this actually a bottleneck that needs to be made faster?

I know it's nice if Debug Mode isn't very slow, but you've already
made it much better, and performance is not the primary goal of Debug
Mode.


Sure but my approach is that if I can make something faster then I 
just try. I considered that making this mode faster will allow its 
usage in an even more extended way.




diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver

index 9b5bb23..c9a253e 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1947,6 +1947,21 @@ GLIBCXX_3.4.23 {
   _ZNSsC[12]ERKSs[jmy]RKSaIcE;
   _ZNSbIwSt11char_traitsIwESaIwEEC[12]ERKS2_mRKS1_;

+# __gnu_debug::_Safe_sequence_base
+ _ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEm;


The 'm' here should be [jmy] because std::size_t mangles differently
on different targets.


+ _ZN11__gnu_debug19_Safe_sequence_base22_M_revalidate_singularEm;
+_ZN11__gnu_debug19_Safe_sequence_base12_M_get_mutexEm;
+_ZN11__gnu_debug19_Safe_sequence_base7_M_swapERS0_m;
+
+# __gnu_debug::_Safe_iterator_base
+ _ZN11__gnu_debug19_Safe_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEbm;
+_ZN11__gnu_debug19_Safe_iterator_base9_M_detachEm;
+
+# __gnu_debug::_Safe_unordered_container_base and 
_Safe_local_iterator_base

+ _ZN11__gnu_debug30_Safe_unordered_container_base7_M_swapERS0_m;
+ 
_ZN11__gnu_debug25_Safe_local_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEbm;
+_ZN11__gnu_debug25_Safe_local_iterator_base9_M_detachEm;
+
} GLIBCXX_3.4.22;


It's unfortunate to have to export and maintain new symbols when we
don't offer the same level of ABI guarantee for Debug Mode  :-(


Yes, I have been told that Debug ABI compatibility was not guaranteed 
but at the same time if we don't do so abi-check is failing. Couldn't 
we have a special section in gnu.ver for unversioned symbols like 
those ?


I we could, but making them unversioned wouldn't really be useful - we
still need to keep them in the library forever.


@@ -174,6 +191,18 @@ namespace __gnu_debug
   const _Safe_iterator_base* __last)
 { return __first->_M_can_compare(*__last); }

+  // Compute power of 2 not greater than __n.
+  template
+struct _Lowest_power_of_two
+{
+  static const size_t __val
+= _Lowest_power_of_two< (__n >> 1) >::__val + 1;
+};
+
+  template<>
+struct _Lowest_power_of_two<1>
+{ static const size_t __val = 1; };


The lowest power of two not greater than 1 is 2^0, but this gives the
result 1.

Similarly, the lowest power of two not greater than 2 is 2^1, i.e. 1.


Indeed, when writing the test case I decided to return 1 rather than 0 
for the _Lowest_power_of_two<1> specialization.It gave a better 
container instance - mutex mapping. But I forgot to change its name.




And the name is misleading, it doesn't compute a power of two, it
computes the base-2 logarithm (then adds one), so it's a compile-time
version of floor(log2(n))+1.


@@ -123,6 +125,36 @@ namespace __gnu_debug
 template
   void
   _M_transfer_from_if(_Safe_sequence& __from, _Predicate __pred);
+
+  static _GLIBCXX_CONSTEXPR std::size_t
+  _S_alignment() _GLIBCXX_NOEXCEPT
+  { return _Lowest_power_of_two<__alignof(_Sequence)>::__val; }


This function is called "alignment" but it returns log2(alignof(T))+1


Yes, I wasn't proud about the naming.



Does it need to be constexpr? Is that for std::array?


No, it is not used in std::array context. I just considered that it 
could be constexpr.


OK. Sometimes that's useful, I agree. But it's never used in a
constexpr context, and it's only called by the library directly, so
there's no advantage here (it just makes it harder to write because it
can only use code that's valid in a constant expression).

Anyway, you can write it without a template, and still be constexpr:



You can get the same result without _Lowest_power_of_two:

static _GLIBCXX_CONSTEXPR size_t
_S_alignbits() _GLIBCXX_NOEXCEPT
{ return __builtin_ctz(__alignof(_Sequence)) + 1; }

But I'm not convinced the +1 is correct, see below.


@@ -46,15 +47,30 @@ namespace
 /** Returns different instances of __mutex depending on the 
passed address
  *  in order to limit contention without breaking current 
library binary

  *  compatibility. */
+  const size_t mask = 0xf;
+
 __gnu_cxx::__mutex&
-  get_safe_base_mutex(void* __address)
+  get_mutex(size_t index)
 {
-const size_t mask = 0xf;
   

Re: [PATCH] Optimise the fpclassify builtin to perform integer operations when possible

2016-09-15 Thread Tamar Christina



On 13/09/16 13:43, Joseph Myers wrote:

On Tue, 13 Sep 2016, Tamar Christina wrote:



On 12/09/16 23:33, Joseph Myers wrote:

Why is this boolean false for ieee_quad_format, mips_quad_format and
ieee_half_format?  They should meet your description (even if the x86 /
m68k "extended" formats don't because of the leading mantissa bit being
set for infinities).


Ah, I played it a bit too safe there. I will change this and do some
re-testing and update the patch.

It occurred to me that there might be an issue with your approach of
overlaying the floating-point value with a single integer, when the quad
formats are used on 32-bit systems where TImode isn't fully supported as a
scalar mode.  However, if that's an issue the answer isn't to mark the
formats as non-IEEE, it's to support ORing together the relevant parts of
multiple words when determining whether the mantissa is nonzero (or some
equivalent logic).


I have been trying to reproduce this on the architectures I have access to
but have been unable to so far. In practice if this does happen though 
isn't it
the fault of the system for advertising partial TImode support and 
support of

IEEE types?

It seems to me that in order for me to be able to do this fpclassify 
would incur
a rather large costs in complexity. Also wouldn't this be problematic 
for other functions

as well such as expand_builtin_signbit?



Re: [PATCH 1/8] change a few rtx_insn * to rtx_jump_insn *

2016-09-15 Thread Richard Biener
On Thu, Sep 15, 2016 at 6:31 AM, Trevor Saunders  wrote:
> On Thu, Sep 15, 2016 at 01:04:04PM +0900, Kaz Kojima wrote:
>> tbsaunde+...@tbsaunde.org wrote:
>> > @@ -2201,8 +2201,7 @@ fix_crossing_unconditional_branches (void)
>> > {
>> >   if (!BARRIER_P (cur_insn))
>> > BLOCK_FOR_INSN (cur_insn) = cur_bb;
>> > - if (JUMP_P (cur_insn))
>> > -   jump_insn = cur_insn;
>> > + jump_insn = dyn_cast (cur_insn);
>> > }
>>
>> This hunk results several new failures for tree-profile tests on SH.
>> If the line "if (JUMP_P (cur_insn))" is restored, those failures
>> go away.
>
> That's interesting because dyn_cast should include that check.  What is
> the error?

maybe jump_insn is non-NULL before?  in which case you set it to NULL
if ! JUMP_P while before we didn't.

Richard.

> Thanks!
>
> Trev
>
>>
>> Regards,
>>   kaz


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-15 Thread Richard Biener
On Thu, Sep 15, 2016 at 2:21 AM, Yuan, Pengfei  wrote:
>> I think the approach is reasonable though it might still have
>> interesting effects on
>> optimization for very small growth.  So for further experimenting it
>> would be nice
>
> Test it on SPEC CPU with FDO enabled?
>
>> to have a separate PARAM_EARLY_FDO_INLINING_INSNS or maybe simply
>> adjust the PARAM_EARLY_INLINING_INSNS default accordingly when FDO is
>> enabled?
>
> Setting PARAM_EARLY_INLINING_INSNS to 0 when FDO is enabled should be
> equivalent to my patch.

Yes.  This means it's easy to experiment with other values than zero.  Basically
early inlining is supposed to remove abstraction penalty to

 a) reduce FDO instrumentation overhead
 b) get more realistic size estimates for the inliner

a) was particularly important back in time for tramp3d, reducing
profiling runtime
1000-fold.  b) is generally important.

PARAM_EARLY_INLINING_INSNS is supposed to be a reasonable value to
get abstraction removed but IIRC we increased it quite a bit to also get more
early optimization (to get more accurate inliner estimates).

What I am saying is that reducing PARAM_EARLY_INLINING_INSNS makes
sense for FDO but reducing it to zero is probably a bit much.

Can you do your measurements with values between zero and the current
default of 14 (wow, 14 ... didn't know it's _that_ high currently ;)).
What's the
value that crosses the boundary of diminishing returns regarding to code-size
improvements for you?

Richard.

> Regards,
> Yuan, Pengfei
>
>> I'll let Honza also double-check the condition detecting FDO (it looks
>> like we should
>> have some abstraction for that).
>>
>> Thanks,
>> Richard.
>


Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Jakub Jelinek
On Thu, Sep 15, 2016 at 09:52:34AM +0200, Richard Biener wrote:
> On Wed, Sep 14, 2016 at 3:45 PM, Jakub Jelinek  wrote:
> > On Wed, Sep 14, 2016 at 03:41:33PM +0200, Richard Biener wrote:
> >> > We've seen several different proposals for where/how to do this 
> >> > simplification, why did you
> >> > say strlenopt is best? It would be an unconditional strchr (a, 0) -> a + 
> >> > strlen (a) rewrite,
> >> > ie. completely unrelated to what strlenopt does. We do all the other 
> >> > simplifications based
> >> > on constant arguments in builtins.c and gimple-fold.c, why is strchr (s, 
> >> > 0) different?
> >>
> >> I was thinking about the case where strlen opt already knows strlen
> >> (a).  But sure, gimple-fold.c
> >> works as well.
> >
> > I think for the middle-end, using strchr (a, 0) as canonical instead of a + 
> > strlen (a)
> > is better, and at expansion time we can decide what to use (a + strlen (a)
> > if you'd expand strlen inline, rather than as a function call, or
> > __rawmemchr (which if libc is sane should be fastest), or strchr, or a + 
> > strlen (a)).
> 
> OTOH that then argues for doing it in strlenopt because that knows
> whether we maybe
> already computed strlen (a) (which might have other uses than adding to a).

Sure.

Jakub


Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-09-15 Thread Richard Biener
On Wed, Sep 14, 2016 at 3:45 PM, Jakub Jelinek  wrote:
> On Wed, Sep 14, 2016 at 03:41:33PM +0200, Richard Biener wrote:
>> > We've seen several different proposals for where/how to do this 
>> > simplification, why did you
>> > say strlenopt is best? It would be an unconditional strchr (a, 0) -> a + 
>> > strlen (a) rewrite,
>> > ie. completely unrelated to what strlenopt does. We do all the other 
>> > simplifications based
>> > on constant arguments in builtins.c and gimple-fold.c, why is strchr (s, 
>> > 0) different?
>>
>> I was thinking about the case where strlen opt already knows strlen
>> (a).  But sure, gimple-fold.c
>> works as well.
>
> I think for the middle-end, using strchr (a, 0) as canonical instead of a + 
> strlen (a)
> is better, and at expansion time we can decide what to use (a + strlen (a)
> if you'd expand strlen inline, rather than as a function call, or
> __rawmemchr (which if libc is sane should be fastest), or strchr, or a + 
> strlen (a)).

OTOH that then argues for doing it in strlenopt because that knows
whether we maybe
already computed strlen (a) (which might have other uses than adding to a).

Richard.

> Jakub


Re: [PATCH][2/n] Change dw2_asm_output_offset to allow assembling extra offset

2016-09-15 Thread Richard Biener
On Wed, Aug 19, 2015 at 4:25 PM, Richard Biener  wrote:
>
> This is needed so that we can output references to $early-debug-symbol +
> constant offset where $early-debug-symbol is the beginning of a
> .debug_info section containing early debug info from the compile-stage.
> Constant offsets are always fine for any object formats I know, I
> tested ia64-linux apart from x86_64-linux.  I have access to darwin at
> home (x86_64), so can try there as well.
>
> The question is whether we want to assemble "+" HOST_WIDE_INT_PRINT_DEC
> directly for the non-ASM_OUTPUT_DWARF_OFFSET case as well as opposed
> to building a PLUS (we do build a SYMBOL_REF already).
>
> I've also refrained from changing all existing callers to
> dw2_asm_output_offset to pass a 0 offset argument to avoid the
> overloading - please tell me if you prefer that.  The LTO support
> adds a single call here:
>
> @@ -9064,8 +9248,12 @@ output_die (dw_die_ref die)
> size = DWARF2_ADDR_SIZE;
>   else
> size = DWARF_OFFSET_SIZE;
> - dw2_asm_output_offset (size, sym, debug_info_section,
> "%s",
> -name);
> + if (AT_ref (a)->with_offset)
> +   dw2_asm_output_offset (size, sym, AT_ref
> (a)->die_offset,
> +  debug_info_section, "%s",
> name);
> + else
> +   dw2_asm_output_offset (size, sym, debug_info_section,
> "%s",
> +  name);
> }
>
> (ignore that ->with_offset, it can hopefully go if die_offset is zero
> for other cases - just checking with an assert right now).
>
> Bootstrap & regtest pending on x86_64-unknown-linux-gnu.  The patch
> is an effective no-op currently (the offset != 0 condition is always
> false for now).
>
> Ok?
>
> CCing affected target maintainers - I've verified this on ia64
> by just generating assembly for a simple testcase with -g and
> changing one of the generated section-relative offsets in the
> suggested way (adding "+4"), passing through the assembler and
> inspecting the resulting relocations.

Given no further comments and my plan to go ahead with the early LTO debug
merge I am currently re-testing this nicely split out part and plan to commit it
once that succeeded.  This makes life a lot easier when testing the remains
as the tm.texi dance vanishes.

Richard.

> Thanks,
> Richard.
>
> 2015-08-19  Richard Biener  
>
> * dwarf2asm.h (dw2_asm_output_offset): Add overload with
> extra offset argument.
> * dwarf2asm.c (dw2_asm_output_offset): Implement that.
> * doc/tm.texi.in (ASM_OUTPUT_DWARF_OFFSET): Adjust documentation
> to reflect new offset parameter.
> * doc/tm.texi: Regenerate.
> * config/darwin.h (ASM_OUTPUT_DWARF_OFFSET): Adjust.
> * config/darwin-protos.h (darwin_asm_output_dwarf_delta): Add
> offset argument.
> (darwin_asm_output_dwarf_offset): Likewise.
> * config/darwin.c (darwin_asm_output_dwarf_delta): Add offset
> argument.
> (darwin_asm_output_dwarf_offset): Pass offset argument through.
> * config/ia64/ia64.h (ASM_OUTPUT_DWARF_OFFSET): Adjust.
> * config/i386/cygmin.h (ASM_OUTPUT_DWARF_OFFSET): Likewise.
>
> Index: gcc/dwarf2asm.c
> ===
> --- gcc/dwarf2asm.c (revision 226856)
> +++ gcc/dwarf2asm.c (working copy)
> @@ -33,6 +33,8 @@ along with GCC; see the file COPYING3.
>  #include "dwarf2asm.h"
>  #include "dwarf2.h"
>  #include "tm_p.h"
> +#include "function.h"
> +#include "emit-rtl.h"
>
>
>  /* Output an unaligned integer with the given value and size.  Prefer not
> @@ -190,12 +192,39 @@ dw2_asm_output_offset (int size, const c
>va_start (ap, comment);
>
>  #ifdef ASM_OUTPUT_DWARF_OFFSET
> -  ASM_OUTPUT_DWARF_OFFSET (asm_out_file, size, label, base);
> +  ASM_OUTPUT_DWARF_OFFSET (asm_out_file, size, label, 0, base);
>  #else
>dw2_assemble_integer (size, gen_rtx_SYMBOL_REF (Pmode, label));
>  #endif
>
>if (flag_debug_asm && comment)
> +{
> +  fprintf (asm_out_file, "\t%s ", ASM_COMMENT_START);
> +  vfprintf (asm_out_file, comment, ap);
> +}
> +  fputc ('\n', asm_out_file);
> +
> +  va_end (ap);
> +}
> +
> +void
> +dw2_asm_output_offset (int size, const char *label, HOST_WIDE_INT offset,
> +  section *base ATTRIBUTE_UNUSED,
> +  const char *comment, ...)
> +{
> +  va_list ap;
> +
> +  va_start (ap, comment);
> +
> +#ifdef ASM_OUTPUT_DWARF_OFFSET
> +  ASM_OUTPUT_DWARF_OFFSET (asm_out_file, size, label, offset, base);
> +#else
> +  dw2_assemble_integer (size, gen_rtx_PLUS (Pmode,
> +   gen_rtx_SYMBOL_REF (Pmode, label),
> +   gen_int_mode (offset, Pmode)));
> 

[PATCH] Partially improve scalability of the unwinder (PR libgcc/71744)

2016-09-15 Thread Jakub Jelinek
Hi!

These days on many targets that use dl_iterate_phdr to find .eh_frame_hdr
that way in most of the programs the old style EH registry is never used,
yet we still lock a global mutex and unlock it soon afterwards to find out
it is the case.

This patch adds a fast path to that, by replacing that with an atomic load
of a flag "has anything ever been registered in the old style registry"
and if the flag says no, it avoids the locking.

Of course, there is another locking in dl_iterate_phdr, which will need
agreement and cooperation in between the glibc and GCC projects, so this
isn't a full solution, but a path towards it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-09-15  Jakub Jelinek  

PR libgcc/71744
* unwind-dw2-fde.c (ATOMIC_FDE_FAST_PATH): Define if __register_frame*
is not the primary registry and atomics are available.
(any_objects_registered): New variable.
(__register_frame_info_bases, __register_frame_info_table_bases):
Atomically store 1 to any_objects_registered after registering first
unwind info.
(_Unwind_Find_FDE): Return early if any_objects_registered is 0.

--- libgcc/unwind-dw2-fde.c.jj  2016-01-04 15:14:09.0 +0100
+++ libgcc/unwind-dw2-fde.c 2016-07-07 13:25:39.248845579 +0200
@@ -35,6 +35,11 @@ see the files COPYING3 and COPYING.RUNTI
 #include "unwind-pe.h"
 #include "unwind-dw2-fde.h"
 #include "gthr.h"
+#else
+#if (defined(__GTHREAD_MUTEX_INIT) || defined(__GTHREAD_MUTEX_INIT_FUNCTION)) \
+&& defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4)
+#define ATOMIC_FDE_FAST_PATH 1
+#endif
 #endif
 
 /* The unseen_objects list contains objects that have been registered
@@ -43,6 +48,9 @@ see the files COPYING3 and COPYING.RUNTI
by decreasing value of pc_begin.  */
 static struct object *unseen_objects;
 static struct object *seen_objects;
+#ifdef ATOMIC_FDE_FAST_PATH
+static int any_objects_registered;
+#endif
 
 #ifdef __GTHREAD_MUTEX_INIT
 static __gthread_mutex_t object_mutex = __GTHREAD_MUTEX_INIT;
@@ -96,6 +104,10 @@ __register_frame_info_bases (const void
 
   ob->next = unseen_objects;
   unseen_objects = ob;
+#ifdef ATOMIC_FDE_FAST_PATH
+  if (!any_objects_registered)
+__atomic_store_n (_objects_registered, 1, __ATOMIC_RELEASE);
+#endif
 
   __gthread_mutex_unlock (_mutex);
 }
@@ -140,6 +152,10 @@ __register_frame_info_table_bases (void
 
   ob->next = unseen_objects;
   unseen_objects = ob;
+#ifdef ATOMIC_FDE_FAST_PATH
+  if (!any_objects_registered)
+__atomic_store_n (_objects_registered, 1, __ATOMIC_RELEASE);
+#endif
 
   __gthread_mutex_unlock (_mutex);
 }
@@ -1001,6 +1017,14 @@ _Unwind_Find_FDE (void *pc, struct dwarf
   struct object *ob;
   const fde *f = NULL;
 
+#ifdef ATOMIC_FDE_FAST_PATH
+  /* For targets where unwind info is usually not registered through these
+ APIs anymore, avoid taking a global lock.  */
+  if (__builtin_expect (!__atomic_load_n (_objects_registered,
+ __ATOMIC_ACQUIRE), 1))
+return NULL;
+#endif
+
   init_object_mutex_once ();
   __gthread_mutex_lock (_mutex);
 

Jakub


[PATCH] Fix PR77514

2016-09-15 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-09-15  Richard Biener  

PR tree-optimization/77514
* tree-ssa-pre.c (create_expression_by_pieces): Handle garbage
only forced_stmts sequence.

* gcc.dg/torture/pr77514.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 240133)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -2879,7 +2879,21 @@ create_expression_by_pieces (basic_block
   gimple_seq_discard (forced_stmts);
   return folded;
 }
-
+  /* Likewise if we simplified to sth not queued for insertion.  */
+  bool found = false;
+  gsi = gsi_start (forced_stmts);
+  for (; !gsi_end_p (gsi); gsi_next ())
+{
+  gimple *stmt = gsi_stmt (gsi);
+  tree forcedname = gimple_get_lhs (stmt);
+  if (forcedname == folded)
+   found = true;
+}
+  if (! found)
+{
+  gimple_seq_discard (forced_stmts);
+  return folded;
+}
   gcc_assert (TREE_CODE (folded) == SSA_NAME);
 
   /* If we have any intermediate expressions to the value sets, add them
Index: gcc/testsuite/gcc.dg/torture/pr77514.c
===
--- gcc/testsuite/gcc.dg/torture/pr77514.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr77514.c  (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+
+void
+m1 (char l0, char e8, int hw)
+{
+  char *rs = 
+
+yu:
+  l0 = 1;
+  while (l0 != 0)
+{
+  l0 = -l0;
+  l0 += (*rs ^ (l0 &= 1));
+}
+  for (;;)
+{
+  if (hw != 0)
+   goto yu;
+  rs = 
+}
+}