[PATCH] diagnose unsupported uses of hardware register variables (PR 88000)

2018-11-13 Thread Martin Sebor

In PR 88000 the reporter expects to be able to use an explicit
register local variable in a context where it isn't supported
i.e., for something other than an input or output of an asm
statement: namely to pass it as argument to a user-defined
function.  GCC emits unexpected object code in this case which
the reporter thought was a GCC bug.

Since explicit register global variables are supported in these
contexts, using the same kind of local variable seems like an easy
mistake to make.  To help users avoid it the attached patch adds
a warning that points it out.

Tested on x86_64-linux.  A number of i386 tests use explicit
register variables in calls to GCC library functions like
__builtin_fabsq.  I prune the warnings from those but if using
explicit register vars in those functions is meant to be supported
despite what the manual says the patch will need tweaking (as will
the manual).

Martin
PR c/88000 - Different local vars regs order may produce different code

gcc/c/ChangeLog:

	PR c/88000
	* c-typeck.c (convert_arguments):  Call warn_hw_reg_arg to diagnose
	unsupported uses of hardware register variables.

gcc/c-family/ChangeLog:

	PR c/88000
	* c.opt (-Wasm-register-var): New option.
	* c-common.h (warn_hw_reg_arg): Declare.
	* c-warn.c (warn_hw_reg_arg): Define.

gcc/cp/ChangeLog:

	PR c/88000
	* call.c (build_over_call): Call warn_hw_reg_arg.

gcc/testsuite/ChangeLog:

	PR c/88000
	* c-c++-common/Wasm-register.c: New test.
	* g++.dg/ext/Wasm-register.C: Same.
	* gcc.dg/torture/pr71762-3.c: Prune excess diagnostics.
	* gcc.target/i386/attr-returns_twice-1.c: Same.
	* gcc.target/i386/avx512dq-abs-copysign-1.c: Same.
	* gcc.target/i386/avx512dq-concatv2si-1.c: Same.
	* gcc.target/i386/avx512vl-abs-copysign-1.c: Same.
	* gcc.target/i386/avx512vl-abs-copysign-2.c: Same.
	* gcc.target/i386/avx512vl-concatv2si-1.c: Same.

gcc/ChangeLog:

	PR c/88000
	* doc/invoke.texi (-Wasm-register-var): Document.

Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c	(revision 266086)
+++ gcc/c/c-typeck.c	(working copy)
@@ -3543,6 +3543,9 @@ convert_arguments (location_t loc, vec
 	/* Convert `short' and `char' to full-size `int'.  */
 	parmval = default_conversion (val);
 
+  /* Diagnose uses of local variables declared asm register.  */
+  warn_hw_reg_arg (fundecl, parmnum + 1, val);
+
   (*values)[parmnum] = parmval;
   if (parmval == error_mark_node)
 	error_args = true;
Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt	(revision 266086)
+++ gcc/c-family/c.opt	(working copy)
@@ -338,6 +338,10 @@ Warray-bounds=
 LangEnabledBy(C ObjC C++ LTO ObjC++,Wall,1,0)
 ; in common.opt
 
+Wasm-register-var
+C ObjC C++ ObjC++ Var(warn_asm_register_var) Warning Init(1)
+Warn for unsupported uses of variables declared asm register.
+
 Wassign-intercept
 ObjC ObjC++ Var(warn_assign_intercept) Warning
 Warn whenever an Objective-C assignment is being intercepted by the garbage collector.
Index: gcc/c-family/c-common.h
===
--- gcc/c-family/c-common.h	(revision 266086)
+++ gcc/c-family/c-common.h	(working copy)
@@ -1321,6 +1321,7 @@ extern bool diagnose_mismatched_attributes (tree,
 extern tree do_warn_duplicated_branches_r (tree *, int *, void *);
 extern void warn_for_multistatement_macros (location_t, location_t,
 	location_t, enum rid);
+extern void warn_hw_reg_arg (tree, int, tree);
 
 /* In c-attribs.c.  */
 extern bool attribute_takes_identifier_p (const_tree);
Index: gcc/c-family/c-warn.c
===
--- gcc/c-family/c-warn.c	(revision 266086)
+++ gcc/c-family/c-warn.c	(working copy)
@@ -2609,3 +2609,39 @@ warn_for_multistatement_macros (location_t body_lo
 inform (guard_loc, "some parts of macro expansion are not guarded by "
 	"this %qs clause", guard_tinfo_to_string (keyword));
 }
+
+
+/* Diagnose unsuported use of explicit hardware register variable ARG
+   as an argument ARGNO to function FNDECL.  */
+
+void
+warn_hw_reg_arg (tree fndecl, int argno, tree arg)
+{
+  if (!fndecl)
+return;
+
+  /* Avoid diagnosing GCC intrinsics with no library fallbacks.  */
+  if (fndecl_built_in_p (fndecl)
+  && DECL_IS_BUILTIN (fndecl)
+  && !c_decl_implicit (fndecl)
+  && !DECL_ASSEMBLER_NAME_SET_P (fndecl))
+return;
+
+  /* Also avoid diagnosing always_inline functions since those are
+ often used to implement vectorization intrinsics that make use
+ of hardware register variables.  */
+  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (fndecl)))
+return;
+
+  /* Diagnose uses of local variables declared asm register.  */
+  STRIP_ANY_LOCATION_WRAPPER (arg);
+  if (VAR_P (arg)
+  && !TREE_STATIC (arg)
+  && DECL_HARD_REGISTER (arg)
+  && warning_at (input_location, OPT_Wasm_register_var,
+		 "unsupported u

Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Alan Modra
On Tue, Nov 13, 2018 at 05:17:41AM -0600, Segher Boessenkool wrote:
> On Tue, Nov 13, 2018 at 12:02:55PM +1030, Alan Modra wrote:
> > OK, fair enough.  Another option is to just disable -many when gcc is
> > in development, like we enable checking.
> 
> That is a good plan for GCC 9 at least.

Here's the patch.  Bootstrapped etc. powerpc64le-linux with resultant
fail of clone2 test as already noted.  On top of
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00924.html so needs
to be hand edited if applying without that patch.  I'm going to be
away for a few days without email access, which means I probably won't
be seeing any replies until Monday.

* config/rs6000/rs6000.h (ASM_OPT_ANY): Define.
(ASM_CPU_SPEC): Conditionally add -many.
* config/rs6000/aix61.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Likewise.
* testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c: Don't use
power mnemonics.

diff --git a/gcc/config/rs6000/aix61.h b/gcc/config/rs6000/aix61.h
index 353e5d6cfeb..809c5d8d599 100644
--- a/gcc/config/rs6000/aix61.h
+++ b/gcc/config/rs6000/aix61.h
@@ -91,8 +91,8 @@ do {  
\
 %{mcpu=630: -m620} \
 %{mcpu=970: -m970} \
 %{mcpu=G5: -m970} \
-%{mvsx: %{!mcpu*: -mpwr6}} \
--many"
+%{mvsx: %{!mcpu*: -mpwr6}}" \
+ASM_OPT_ANY
 
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpwr4"
diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 2398ed64baa..319bd2dc013 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -89,8 +89,8 @@ do {  
\
maltivec: -m970; \
maix64|mpowerpc64: -mppc64; \
: %(asm_default)}; \
-  :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
--many"
+  :%eMissing -mcpu option in ASM_SPEC_CPU?\n}" \
+ASM_OPT_ANY
 
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpwr4"
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index d75137cf8f5..613d16add69 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -72,6 +72,12 @@
 #define PPC405_ERRATUM77 0
 #endif
 
+#if CHECKING_P
+#define ASM_OPT_ANY ""
+#else
+#define ASM_OPT_ANY " -many"
+#endif
+
 /* Common ASM definitions used by ASM_SPEC among the various targets for
handling -mcpu=xxx switches.  There is a parallel list in driver-rs6000.c to
provide the default assembler options if the user uses -mcpu=native, so if
@@ -137,8 +143,8 @@
mvsx: -mpower7; \
mpowerpc64: -mppc64;: %(asm_default)}; \
   :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
-%{mvsx: -mvsx -maltivec; maltivec: -maltivec} \
--many"
+%{mvsx: -mvsx -maltivec; maltivec: -maltivec}" \
+ASM_OPT_ANY
 
 #define CPP_DEFAULT_SPEC ""
 
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c 
b/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
index 14908dba690..eea7f6ffc2e 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
@@ -45,14 +45,14 @@ __asm__ ("\t.globl\t" #NAME "_asm\n\t"  
\
 #NAME "_asm:\n\t"  \
 "lis 11,gparms@ha\n\t" \
 "la 11,gparms@l(11)\n\t"   \
-"st 3,0(11)\n\t"   \
-"st 4,4(11)\n\t"   \
-"st 5,8(11)\n\t"   \
-"st 6,12(11)\n\t"  \
-"st 7,16(11)\n\t"  \
-"st 8,20(11)\n\t"  \
-"st 9,24(11)\n\t"  \
-"st 10,28(11)\n\t" \
+"stw 3,0(11)\n\t"  \
+"stw 4,4(11)\n\t"  \
+"stw 5,8(11)\n\t"  \
+"stw 6,12(11)\n\t" \
+"stw 7,16(11)\n\t" \
+"stw 8,20(11)\n\t" \
+"stw 9,24(11)\n\t" \
+"stw 10,28(11)\n\t"\
 "stfd 1,32(11)\n\t"\
 "stfd 2,40(11)\n\t"\
 "stfd 3,48(11)\n\t"\

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Peter Bergner
On 11/13/18 6:06 PM, Eric Botcazou wrote:
>>  PR rtl-optimization/87507
>>  * lower-subreg.c (operand_for_simple_move_operator): New function.
>>  (simple_move): Strip simple operators.
>>  (find_pseudo_copy): Likewise.
>>  (resolve_operand_for_simple_move_operator): New function.
>>  (resolve_simple_move): Strip simple operators and swap operands.
>>
>> gcc/testsuite/
>>  PR rtl-optimization/87507
>>  * gcc.target/powerpc/pr87507.c: New test.
>>  * gcc.target/powerpc/pr68805.c: Update expected results.
> 
> OK with the s/simple/swap/ change suggested by Richard.

Ok, I used operand_for_swap_move_operator like Richard suggested and also
did a similar change for resolve_operand_for_swap_move_operator to keep
things consistent.  Now committed.  Thanks!

Peter





Re: [C++ Patch] Fix two grokdeclarator locations

2018-11-13 Thread Jason Merrill

On 11/12/18 6:39 AM, Paolo Carlini wrote:

Hi again,

On 08/11/18 10:26, Paolo Carlini wrote:

Hi,

two additional grokdeclarator locations that we can easily fix by 
using declarator->id_loc. Slightly more interesting, testing revealed 
a latent issue in the make_id_declarator uses: 
cp_parser_member_declaration wasn't setting declarator->id_loc, thus I 
decided to add a location_t parameter to make_id_declarator itself and 
adjust all the callers. Tested x86_64-linux.


PS: In my local tree I have the cp_parser_objc_class_ivars change using 
token->location instead of UNKNOWN_LOCATION, thus all the 
make_id_declarator calls should be completely fine location-wise.


Great, I was going to ask about that.  Can I see that patch, then?

Jason


Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Eric Botcazou
>   PR rtl-optimization/87507
>   * lower-subreg.c (operand_for_simple_move_operator): New function.
>   (simple_move): Strip simple operators.
>   (find_pseudo_copy): Likewise.
>   (resolve_operand_for_simple_move_operator): New function.
>   (resolve_simple_move): Strip simple operators and swap operands.
> 
> gcc/testsuite/
>   PR rtl-optimization/87507
>   * gcc.target/powerpc/pr87507.c: New test.
>   * gcc.target/powerpc/pr68805.c: Update expected results.

OK with the s/simple/swap/ change suggested by Richard.

-- 
Eric Botcazou


[committed] Fix debug stmt handling in omp-simd-clone.c (PR tree-optimization/87898)

2018-11-13 Thread Jakub Jelinek
Hi!

Gimple debug binds (both normal and source) need to have a decl as the first
operand.  The simd clone adjustment code would in some cases transform that
into an ARRAY_REF.  The following patch removes such debug stmts.
In the future we could have a VAR_DECL with DECL_ABSTRACT_ORIGIN pointing to
the PARM_DECL that isn't there and debug stmts for it at the start say
for the array[0], but before that is really worth doing we need to think
what we are going to do with debug info in vectorized loops.

So for now just this patch.  Bootstrapped/regtested on x86_64-linux and
i686-linux, committed to trunk.

2018-11-13  Jakub Jelinek  

PR tree-optimization/87898
* omp-simd-clone.c (ipa_simd_modify_stmt_ops): Formatting fix.
(ipa_simd_modify_function_body): Remove debug stmts where the first
argument was changed into a non-decl.

* gcc.dg/gomp/pr87898.c: New test.

--- gcc/omp-simd-clone.c.jj 2018-10-31 10:33:08.994660342 +0100
+++ gcc/omp-simd-clone.c2018-11-13 16:47:35.249022270 +0100
@@ -834,11 +834,8 @@ ipa_simd_modify_stmt_ops (tree *tp, int
   struct ipa_parm_adjustment *cand = NULL;
   if (TREE_CODE (*tp) == PARM_DECL)
 cand = ipa_get_adjustment_candidate (&tp, NULL, info->adjustments, true);
-  else
-{
-  if (TYPE_P (*tp))
-   *walk_subtrees = 0;
-}
+  else if (TYPE_P (*tp))
+*walk_subtrees = 0;
 
   tree repl = NULL_TREE;
   if (cand)
@@ -1014,6 +1011,21 @@ ipa_simd_modify_function_body (struct cg
  if (info.modified)
{
  update_stmt (stmt);
+ /* If the above changed the var of a debug bind into something
+different, remove the debug stmt.  We could also for all the
+replaced parameters add VAR_DECLs for debug info purposes,
+add debug stmts for those to be the simd array accesses and
+replace debug stmt var operand with that var.  Debugging of
+vectorized loops doesn't work too well, so don't bother for
+now.  */
+ if ((gimple_debug_bind_p (stmt)
+  && !DECL_P (gimple_debug_bind_get_var (stmt)))
+ || (gimple_debug_source_bind_p (stmt)
+ && !DECL_P (gimple_debug_source_bind_get_var (stmt
+   {
+ gsi_remove (&gsi, true);
+ continue;
+   }
  if (maybe_clean_eh_stmt (stmt))
gimple_purge_dead_eh_edges (gimple_bb (stmt));
}
--- gcc/testsuite/gcc.dg/gomp/pr87898.c.jj  2018-11-13 16:49:13.621413907 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr87898.c 2018-11-13 16:48:33.584068511 +0100
@@ -0,0 +1,10 @@
+/* PR tree-optimization/87898 */
+/* { dg-do compile { target fgraphite } } */
+/* { dg-options "-O1 -floop-parallelize-all -fopenmp 
-ftree-parallelize-loops=2 -g" } */
+
+#pragma omp declare simd
+void
+foo (int x)
+{
+  x = 0;
+}

Jakub


Re: [PATCH] Remove redundant loop in unsynchronized_pool_resource code

2018-11-13 Thread Jonathan Wakely

On 13/11/18 23:19 +, Jonathan Wakely wrote:

On 13/11/18 22:59 +, Jonathan Wakely wrote:

* src/c++17/memory_resource.cc (bitset::find_first_unset()): Remove
unused function.
(bitset::get_first_unset()): Remove loop, if there's are unset bits
then _M_next_word refers to the first one and there's no need to loop.
(_Pool::_Pool(size_t, size_t), _Pool::block_size()): Remove dead code.




   size_type get_first_unset() noexcept
   {
-  for (size_type i = _M_next_word; i < nwords(); ++i)
+  if (_M_next_word < nwords())
{
- const size_type n = std::__countr_one(_M_words[i]);
+ const size_type n = std::__countr_one(_M_words[_M_next_word]);
  if (n < bits_per_word)
{
  const word bit = word(1) << n;
- _M_words[i] |= bit;
- if (i == _M_next_word)
+ _M_words[_M_next_word] |= bit;
+ const size_t res = (_M_next_word * bits_per_word) + n;
+ if (n == (bits_per_word - 1))
update_next_word();
- return (i * bits_per_word) + n;
+ return res;
}
}
 return size_type(-1);


I'm not sure why, but this version seems to perform measurably worse.
I'll investigate, and maybe revert the change.


The attached patch restores the previous performance. I'll finish
testing it tomorrow.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc b/libstdc++-v3/src/c++17/memory_resource.cc
index 605bdd53950..79c1665146d 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -335,17 +335,16 @@ namespace pmr
 
 size_type get_first_unset() noexcept
 {
-  if (_M_next_word < nwords())
+  const size_type wd = _M_next_word;
+  if (wd < nwords())
 	{
-	  const size_type n = std::__countr_one(_M_words[_M_next_word]);
+	  const size_type n = std::__countr_one(_M_words[wd]);
 	  if (n < bits_per_word)
 	{
 	  const word bit = word(1) << n;
-	  _M_words[_M_next_word] |= bit;
-	  const size_t res = (_M_next_word * bits_per_word) + n;
-	  if (n == (bits_per_word - 1))
-		update_next_word();
-	  return res;
+	  _M_words[wd] |= bit;
+	  update_next_word();
+	  return (wd * bits_per_word) + n;
 	}
 	}
   return size_type(-1);


[PATCH] Fix bootstrap with GCC 4.1.2 (PR bootstrap/86739)

2018-11-13 Thread Jakub Jelinek
Hi!

As mentioned in the PR, with GCC before 4.3 one can't instantiate std::pair
where one or both of the template parameters are reference types, because
the std::pair constructor has arguments references to the template parameter
types and the CWG that resolved hasn't been applied to those compilers.

The following patch works around it by not returning
std::pair object, but instead a different class that
holds the two references and has conversion operator to std::pair.

If that conversion operator isn't acceptable, in the PR there is another
patch which adjusts the (so far) two spots which need to be changed in that
case.

Bootstrapped/regtested on x86_64-linux and i686-linux (using GCC 7 as
bootstrap compiler) and tested on the preprocessed source with GCC 4.1.
Ok for trunk?

2018-11-13  Jakub Jelinek  

PR bootstrap/86739
* hash-map.h (hash_map::iterator::reference_pair): New class.
(hash_map::iterator::operator*): Return it rather than std::pair.

--- gcc/hash-map.h.jj   2018-07-11 22:22:01.250836509 +0200
+++ gcc/hash-map.h  2018-11-13 20:45:38.463037081 +0100
@@ -223,10 +223,23 @@ public:
   return *this;
 }
 
-std::pair operator* ()
+/* Can't use std::pair here, because GCC before 4.3 don't handle
+   std::pair where template parameters are references well.
+   See PR86739.  */
+struct reference_pair {
+  const Key &first;
+  Value &second;
+
+  reference_pair (const Key &key, Value &value) : first (key), second 
(value) {}
+
+  template 
+  operator std::pair () const { return std::pair (first, 
second); }
+};
+
+reference_pair operator* ()
 {
   hash_entry &e = *m_iter;
-  return std::pair (e.m_key, e.m_value);
+  return reference_pair (e.m_key, e.m_value);
 }
 
 bool

Jakub


Re: [PATCH] Improve handling of pool_options::largest_required_pool_block

2018-11-13 Thread Jonathan Wakely

On 13/11/18 22:59 +, Jonathan Wakely wrote:

@@ -898,9 +907,10 @@ namespace pmr
  {
auto p = std::lower_bound(std::begin(pool_sizes), std::end(pool_sizes),
  opts.largest_required_pool_block);
-if (int npools = p - std::begin(pool_sizes))
-  return npools;
-return 1;
+const int n = p - std::begin(pool_sizes);
+if (p == std::end(pool_sizes) || *p == opts.largest_required_pool_block)


This is wrong, it still chooses one pool too few when the block_size
matches an element of pool_sizes[].


+  return n;
+return n + 1;
  }


Fixed by this patch, tested x86_64-linux and committed to trunk.


commit 01882ab88871d2bcb34ad141505b96b317eefccb
Author: Jonathan Wakely 
Date:   Tue Nov 13 23:26:08 2018 +

Fix error when selecting number of memory pools

* src/c++17/memory_resource.cc (select_num_pools): Fix off-by-one
error when block_size is equal to one of the values in the array.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc b/libstdc++-v3/src/c++17/memory_resource.cc
index cb91e5147ce..605bdd53950 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -892,7 +892,7 @@ namespace pmr
 auto p = std::lower_bound(std::begin(pool_sizes), std::end(pool_sizes),
 			  opts.largest_required_pool_block);
 const int n = p - std::begin(pool_sizes);
-if (p == std::end(pool_sizes) || *p == opts.largest_required_pool_block)
+if (p == std::end(pool_sizes))
   return n;
 return n + 1;
   }


[PATCH] Fix x86 bzhi/bextr iff zero_extract with zero size is undefined (PR rtl-optimization/87817)

2018-11-13 Thread Jakub Jelinek
Hi!

As mentioned in the PR, it is unclear if zero_extract is well defined
if the second argument is 0.  x86 intrinsic require bzhi and bextr to be
well defined in those cases (extraction of 0 bits results in 0), but
e.g. combiner hapilly transforms that into << 64 >> 64, simplify-rtx.c,
while it folds it into 0, might invoke UB in the compiler etc.

So, is (zero_extract x 0 y) well defined in the middle-end or not?
If it is well defined, then what about (sign_extract x 0 y), does it also
yield 0, undefined, something else?

If neither of those are well defined, the following patch provides a backend
workaround for that, by making sure the second argument of zero_extract is
never 0 on x86.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2018-11-13  Jakub Jelinek  

PR rtl-optimization/87817
* config/i386/i386.md (nmi2_bzhi_3, *bmi2_bzhi_3,
*bmi2_bzhi_3_1, *bmi2_bzhi_3_1_ccz): Use IF_THEN_ELSE
in the pattern to avoid triggering UB when operands[2] is zero.
(tbm_bextri_): New expander.  Renamed the old define_insn to ...
(*tbm_bextri_): ... this.

--- gcc/config/i386/i386.md.jj  2018-11-09 14:02:00.030267540 +0100
+++ gcc/config/i386/i386.md 2018-11-13 15:45:33.034870609 +0100
@@ -13614,12 +13614,15 @@ (define_insn "*bmi_blsr__ccz"
 (define_expand "bmi2_bzhi_3"
   [(parallel
 [(set (match_operand:SWI48 0 "register_operand")
- (zero_extract:SWI48
-   (match_operand:SWI48 1 "nonimmediate_operand")
-   (umin:SWI48
- (and:SWI48 (match_operand:SWI48 2 "register_operand")
-(const_int 255))
- (match_dup 3))
+ (if_then_else:SWI48
+   (ne:QI (and:SWI48 (match_operand:SWI48 2 "register_operand")
+ (const_int 255))
+  (const_int 0))
+   (zero_extract:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (umin:SWI48 (and:SWI48 (match_dup 2) (const_int 255))
+ (match_dup 3))
+ (const_int 0))
(const_int 0)))
  (clobber (reg:CC FLAGS_REG))])]
   "TARGET_BMI2"
@@ -13627,12 +13630,15 @@ (define_expand "bmi2_bzhi_3"
 
 (define_insn "*bmi2_bzhi_3"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
-   (zero_extract:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "rm")
- (umin:SWI48
-   (and:SWI48 (match_operand:SWI48 2 "register_operand" "r")
-  (const_int 255))
-   (match_operand:SWI48 3 "const_int_operand" "n"))
+   (if_then_else:SWI48
+ (ne:QI (and:SWI48 (match_operand:SWI48 2 "register_operand" "r")
+   (const_int 255))
+(const_int 0))
+ (zero_extract:SWI48
+   (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+   (umin:SWI48 (and:SWI48 (match_dup 2) (const_int 255))
+   (match_operand:SWI48 3 "const_int_operand" "n"))
+   (const_int 0))
  (const_int 0)))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_BMI2 && INTVAL (operands[3]) ==  * BITS_PER_UNIT"
@@ -13643,11 +13649,13 @@ (define_insn "*bmi2_bzhi_3"
 
 (define_insn "*bmi2_bzhi_3_1"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
-   (zero_extract:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "rm")
- (umin:SWI48
-   (zero_extend:SWI48 (match_operand:QI 2 "register_operand" "r"))
-   (match_operand:SWI48 3 "const_int_operand" "n"))
+   (if_then_else:SWI48
+ (ne:QI (match_operand:QI 2 "register_operand" "r") (const_int 0))
+ (zero_extract:SWI48
+   (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+   (umin:SWI48 (zero_extend:SWI48 (match_dup 2))
+   (match_operand:SWI48 3 "const_int_operand" "n"))
+   (const_int 0))
  (const_int 0)))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_BMI2 && INTVAL (operands[3]) ==  * BITS_PER_UNIT"
@@ -13659,11 +13667,13 @@ (define_insn "*bmi2_bzhi_3_1"
 (define_insn "*bmi2_bzhi_3_1_ccz"
   [(set (reg:CCZ FLAGS_REG)
(compare:CCZ
- (zero_extract:SWI48
-   (match_operand:SWI48 1 "nonimmediate_operand" "rm")
-   (umin:SWI48
- (zero_extend:SWI48 (match_operand:QI 2 "register_operand" "r"))
- (match_operand:SWI48 3 "const_int_operand" "n"))
+ (if_then_else:SWI48
+   (ne:QI (match_operand:QI 2 "register_operand" "r") (const_int 0))
+   (zero_extract:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+ (umin:SWI48 (zero_extend:SWI48 (match_dup 2))
+ (match_operand:SWI48 3 "const_int_operand" "n"))
+ (const_int 0))
(const_int 0))
(const_int 0)))
(clobber (match_scratch:SWI48 0 "=r"))]
@@ -13696,7 +13706,28 @@ (define_insn "bmi2_pext_3"
(set_attr "mode" "")])
 
 ;; TBM instructions.
-(define_insn "

Re: [PATCH] Fix debug stmt handling in optimize_recip_sqrt (PR tree-optimization/87977)

2018-11-13 Thread Jakub Jelinek
On Tue, Nov 13, 2018 at 10:19:10AM +0100, Jakub Jelinek wrote:
> > >  Though, in this particular case the sqrt call is
> > > optimized away, so it wouldn't make a difference.
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk, or
> > > should I do the gimple_build_assign + gsi_replace change?
> > 
> > I think that would be cleaner.
> 
> Ok, will do.  Thanks.

Here it is.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-11-13  Jakub Jelinek  

PR tree-optimization/87977
* tree-ssa-math-opts.c (optimize_recip_sqrt): Don't reuse division
stmt, build a new one and replace the old one with it.  Formatting fix.
Call release_ssa_name (x) if !has_other_use and !delete_div.
(pass_cse_reciprocals::execute): Before calling optimize_recip_sqrt
verify lhs of stmt is still def.

* gcc.dg/recip_sqrt_mult_1.c: Add -fcompare-debug to dg-options.
* gcc.dg/recip_sqrt_mult_2.c: Likewise.
* gcc.dg/recip_sqrt_mult_3.c: Likewise.
* gcc.dg/recip_sqrt_mult_4.c: Likewise.
* gcc.dg/recip_sqrt_mult_5.c: Likewise.

--- gcc/tree-ssa-math-opts.c.jj 2018-11-12 20:01:19.224793981 +0100
+++ gcc/tree-ssa-math-opts.c2018-11-13 11:30:16.326203020 +0100
@@ -652,10 +652,14 @@ optimize_recip_sqrt (gimple_stmt_iterato
  print_gimple_stmt (dump_file, stmt, 0, TDF_NONE);
  fprintf (dump_file, "with new division\n");
}
-  gimple_assign_set_lhs (stmt, sqr_ssa_name);
-  gimple_assign_set_rhs2 (stmt, a);
+  stmt
+   = gimple_build_assign (sqr_ssa_name, gimple_assign_rhs_code (stmt),
+  gimple_assign_rhs1 (stmt), a);
+  gsi_insert_before (def_gsi, stmt, GSI_SAME_STMT);
+  gsi_remove (def_gsi, true);
+  *def_gsi = gsi_for_stmt (stmt);
   fold_stmt_inplace (def_gsi);
   update_stmt (stmt);
 
   if (dump_file)
print_gimple_stmt (dump_file, stmt, 0, TDF_NONE);
@@ -704,7 +707,7 @@ optimize_recip_sqrt (gimple_stmt_iterato
 
   gimple *new_stmt
= gimple_build_assign (x, MULT_EXPR,
-   orig_sqrt_ssa_name, sqr_ssa_name);
+  orig_sqrt_ssa_name, sqr_ssa_name);
   gsi_insert_after (def_gsi, new_stmt, GSI_NEW_STMT);
   update_stmt (stmt);
 }
@@ -715,6 +718,8 @@ optimize_recip_sqrt (gimple_stmt_iterato
   gsi_remove (&gsi2, true);
   release_defs (stmt);
 }
+  else
+release_ssa_name (x);
 }
 
 /* Look for floating-point divisions among DEF's uses, and try to
@@ -951,6 +956,7 @@ pass_cse_reciprocals::execute (function
  stmt = gsi_stmt (gsi);
  if (flag_unsafe_math_optimizations
  && is_gimple_assign (stmt)
+ && gimple_assign_lhs (stmt) == def
  && !stmt_can_throw_internal (cfun, stmt)
  && gimple_assign_rhs_code (stmt) == RDIV_EXPR)
optimize_recip_sqrt (&gsi, def);
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c.jj 2018-11-12 20:01:19.315792497 
+0100
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c2018-11-13 11:17:41.667541915 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-recip" } */
+/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
 
 double res, res2, tmp;
 void
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c.jj 2018-11-12 20:01:19.320792415 
+0100
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c2018-11-13 11:17:41.668541899 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-optimized" } */
+/* { dg-options "-Ofast -fdump-tree-optimized -fcompare-debug" } */
 
 float
 foo (float a)
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c.jj 2018-11-12 20:01:19.363791713 
+0100
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c2018-11-13 11:17:41.671541850 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-optimized" } */
+/* { dg-options "-Ofast -fdump-tree-optimized -fcompare-debug" } */
 
 double
 foo (double a)
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c.jj 2018-11-12 20:01:19.371791582 
+0100
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c2018-11-13 11:17:41.686541603 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-recip" } */
+/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
 
 /* The main path doesn't have any multiplications.
Avoid introducing them in the recip pass.  */
--- gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c.jj 2018-11-12 20:01:19.386791336 
+0100
+++ gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c2018-11-13 11:17:41.705541293 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -fdump-tree-recip" } */
+/* { dg-options "-Ofast -fdump-tree-recip -fcompare-debug" } */
 
 /* We want to do the recip_sqrt transformations here there is already
a multiplication on the main path.  */


Jakub


Re: [PATCH] Remove redundant loop in unsynchronized_pool_resource code

2018-11-13 Thread Jonathan Wakely

On 13/11/18 22:59 +, Jonathan Wakely wrote:

* src/c++17/memory_resource.cc (bitset::find_first_unset()): Remove
unused function.
(bitset::get_first_unset()): Remove loop, if there's are unset bits
then _M_next_word refers to the first one and there's no need to loop.
(_Pool::_Pool(size_t, size_t), _Pool::block_size()): Remove dead code.




size_type get_first_unset() noexcept
{
-  for (size_type i = _M_next_word; i < nwords(); ++i)
+  if (_M_next_word < nwords())
{
- const size_type n = std::__countr_one(_M_words[i]);
+ const size_type n = std::__countr_one(_M_words[_M_next_word]);
  if (n < bits_per_word)
{
  const word bit = word(1) << n;
- _M_words[i] |= bit;
- if (i == _M_next_word)
+ _M_words[_M_next_word] |= bit;
+ const size_t res = (_M_next_word * bits_per_word) + n;
+ if (n == (bits_per_word - 1))
update_next_word();
- return (i * bits_per_word) + n;
+ return res;
}
}
  return size_type(-1);


I'm not sure why, but this version seems to perform measurably worse.
I'll investigate, and maybe revert the change.




Re: RFA: vectorizer patches 1/2 : WIDEN_MULT_PLUS support

2018-11-13 Thread Joern Wolfgang Rennecke



On 12/11/18 14:30, Richard Biener wrote:

I guess I already asked this question when WIDEN_MULT_PLUS_EXPR was
introduced - but isn't that fully contained within a DOT_PROD_EXPR?


I'm not sure what exactly you mean here.
A mailing list search to find that post was unsuccessful.  The earliest 
it found for

widen_sum_expr and richard and (biener or guenther) was from October 2012,
and WIDEN_SUM_EXPR already existed then as was visibile in the context 
of the

patch under discussion.

From the perspective of expressing scalar operations, or expressing 
reduction operations on
(originally) scalars in the vectorizer, WIDEN_SUM_EXPR is redundant, as 
we could use DOT_PROD_EXPR with one constant input.
For describing vectorizing narrow vector operations, WIDEN_SUM_EXPR 
would be more suitable, as
we wouldn't need to bring in matrix operations, but the current 
non-widening semantics suffer
from being insufficiently precisely defined in what they are actually 
summing up.


From the perspective of providing optabs, there are two aspects to 
consider: First, if a dot_prod
optab were to be to be used for expanding WIDEN_SUM_EXPR (by that or any 
other name), a
target that implements WIDEN_SUM but not DOT_PROD in general would have 
to have a dot_prod
expander that requires a vector of constant ones for one of its 
operands.  And then vectorizer, when
it wants to consider DOT_PROD_EXPR for two vector variables, would have 
to test the operand
predicates of the expander.  That sounds like a lot more trouble than 
the little reduction in the

set of optabs is worth.



Some comments on the patch.

+  tree vecotype
+= build_vector_type (otype, GET_MODE_NUNITS (TYPE_MODE (vecitype)));

TYPE_VECTOR_SUBPARTS (vecitype)

You want to pass in the half/full types and use get_vectype_for_scalar_type
which also makes sure the target supports the vector type.


WIDEN_MULT_PLUS is special on our target in that it creates double-sized
vectors.  To get a full-sized vector in the reduction loop, you have to have
a double-sized vector result.  We already make the reduction special in that
the result can vary in size, it's scalar for an ordinary DOT_PROD_EXPR, 
while

a vector for other operations.


I think you want to extend and re-use supportable_widening_operation
here anyways.


I see.  Other targets generally use even/odd or lo/hi vectorized 
widening mult operations.
ia64/vect.md,  s390/vector.md,  spu/spu.md and tilegx/tilegx.md have 
mult_lo / mult_even patterns, but no dot_prod pattern, so if I go via an 
enhanced supportable_widening_operation, my patch should become relevant 
to these platforms.


To make it cover our target as well, I'll have to make it able, if the 
optabs indicate that, to specify a single WIDEN_MULT_PLUS_EXPR with a 
double-sized vector result instead of a split into two halves.





Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 266008)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -10638,7 +10638,11 @@ vect_get_vector_types_for_stmt (stmt_vec
scalar_type);

if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
-   GET_MODE_SIZE (TYPE_MODE (nunits_vectype
+   GET_MODE_SIZE (TYPE_MODE (nunits_vectype)))
+  /* Reductions that use a widening reduction would show
+a mismatch but that's already been checked to be OK.  */
+  && STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)
+
  return opt_result::failure_at (stmt,
"not vectorized: different sized vector "
"types in statement, %T and %T\n",

that change doesn't look good.


Would it become acceptable if I made it specific to WIDEN_MULT_PLUS_EXPR ?



Re: [PATCH] Fix incorrect assertion when deallocating big block

2018-11-13 Thread Jonathan Wakely

@@ -932,8 +946,7 @@ namespace pmr
  }

  void
-  __pool_resource::deallocate(void* p, size_t bytes [[maybe_unused]],
- size_t alignment [[maybe_unused]])
+  __pool_resource::deallocate(void* p, size_t bytes, size_t alignment)
  {
const auto it
  = std::lower_bound(_M_unpooled.begin(), _M_unpooled.end(), p);


This part of the change wasn't meant to be committed. Restored with
the attached patch.


commit 90d835c3a0e3fd73b9287ad3b636e13319f38973
Author: Jonathan Wakely 
Date:   Tue Nov 13 23:01:44 2018 +

Fix unused parameter warnings introduced in earlier patch

* src/c++17/memory_resource.cc (_Pool::deallocate): Restore
attributes to parameters that are only used in assertions.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc b/libstdc++-v3/src/c++17/memory_resource.cc
index b553606f552..cb91e5147ce 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -940,7 +940,8 @@ namespace pmr
   }
 
   void
-  __pool_resource::deallocate(void* p, size_t bytes, size_t alignment)
+  __pool_resource::deallocate(void* p, size_t bytes [[maybe_unused]],
+			  size_t alignment [[maybe_unused]])
   {
 const auto it
   = std::lower_bound(_M_unpooled.begin(), _M_unpooled.end(), p);


[PATCH] Remove redundant loop in unsynchronized_pool_resource code

2018-11-13 Thread Jonathan Wakely

* src/c++17/memory_resource.cc (bitset::find_first_unset()): Remove
unused function.
(bitset::get_first_unset()): Remove loop, if there's are unset bits
then _M_next_word refers to the first one and there's no need to loop.
(_Pool::_Pool(size_t, size_t), _Pool::block_size()): Remove dead code.

Tested x86_64-linux, committed to trunk.


commit b731b6ff6b9560377356d782ae641876ca410e0d
Author: Jonathan Wakely 
Date:   Tue Nov 13 14:43:14 2018 +

Remove redundant loop in unsynchronized_pool_resource code

* src/c++17/memory_resource.cc (bitset::find_first_unset()): Remove
unused function.
(bitset::get_first_unset()): Remove loop, if there's are unset bits
then _M_next_word refers to the first one and there's no need to 
loop.
(_Pool::_Pool(size_t, size_t), _Pool::block_size()): Remove dead 
code.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 691a2999de6..b553606f552 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -280,7 +280,7 @@ namespace pmr
 // Number of blocks
 size_t size() const noexcept { return _M_size; }
 
-// Number of unset bits
+// Number of free blocks (unset bits)
 size_t free() const noexcept
 {
   size_t n = 0;
@@ -289,7 +289,7 @@ namespace pmr
   return n;
 }
 
-// True if all bits are set
+// True if there are no free blocks (all bits are set)
 bool full() const noexcept
 {
   if (_M_next_word >= nwords())
@@ -303,7 +303,7 @@ namespace pmr
   return false;
 }
 
-// True if size() != 0 and no bits are set.
+// True if size() != 0 and all blocks are free (no bits are set).
 bool empty() const noexcept
 {
   if (nwords() == 0)
@@ -333,29 +333,19 @@ namespace pmr
   return _M_words[wd] & bit;
 }
 
-size_type find_first_unset() const noexcept
-{
-  for (size_type i = _M_next_word; i < nwords(); ++i)
-   {
- const size_type n = std::__countr_one(_M_words[i]);
- if (n < bits_per_word)
-   return (i * bits_per_word) + n;
-   }
-  return size_type(-1);
-}
-
 size_type get_first_unset() noexcept
 {
-  for (size_type i = _M_next_word; i < nwords(); ++i)
+  if (_M_next_word < nwords())
{
- const size_type n = std::__countr_one(_M_words[i]);
+ const size_type n = std::__countr_one(_M_words[_M_next_word]);
  if (n < bits_per_word)
{
  const word bit = word(1) << n;
- _M_words[i] |= bit;
- if (i == _M_next_word)
+ _M_words[_M_next_word] |= bit;
+ const size_t res = (_M_next_word * bits_per_word) + n;
+ if (n == (bits_per_word - 1))
update_next_word();
- return (i * bits_per_word) + n;
+ return res;
}
}
   return size_type(-1);
@@ -605,9 +595,7 @@ namespace pmr
 : _M_chunks(),
   _M_block_sz(__block_size),
   _M_blocks_per_chunk(__blocks_per_chunk)
-{
-  __glibcxx_assert(block_size() == __block_size);
-}
+{ }
 
 // Must call release(r) before destruction!
 ~_Pool() { __glibcxx_assert(_M_chunks.empty()); }
@@ -617,11 +605,7 @@ namespace pmr
 
 // Size of blocks in this pool
 size_t block_size() const noexcept
-#if POW2_BLKSZ
-{ return _S_min_block << _M_blksize_mul; }
-#else
 { return _M_block_sz; }
-#endif
 
 // Allocate a block if the pool is not full, otherwise return null.
 void* try_allocate() noexcept


[PATCH] Improve handling of pool_options::largest_required_pool_block

2018-11-13 Thread Jonathan Wakely

Make the munge_options function round the largest_required_pool_block
value to a multiple of the smallest pool size (currently 8 bytes) to
avoid pools with odd sizes.

Ensure there is a pool large enough for blocks of the requested size.
Previously when largest_required_pool_block was exactly equal to one of
the pool_sizes[] values there would be no pool of that size. This patch
increases _M_npools by one, so there is a pool at least as large as the
requested value. It also reduces the size of the largest pool to be no
larger than needed.

* src/c++17/memory_resource.cc (munge_options): Round up value of
largest_required_pool_block to multiple of smallest pool size. Round
excessively large values down to largest pool size.
(select_num_pools): Increase number of pools by one unless it exactly
matches requested largest_required_pool_block.
(__pool_resource::_M_alloc_pools()): Make largest pool size equal
largest_required_pool_block.
* testsuite/20_util/unsynchronized_pool_resource/options.cc: Check
that pool_options::largest_required_pool_block is set appropriately.

Tested x86_64-linux, committed to trunk.

commit dbf2d43429e86dc2aa9b158520d154e5637e2f6e
Author: Jonathan Wakely 
Date:   Mon Nov 12 22:56:53 2018 +

Improve handling of pool_options::largest_required_pool_block

Make the munge_options function round the largest_required_pool_block
value to a multiple of the smallest pool size (currently 8 bytes) to
avoid pools with odd sizes.

Ensure there is a pool large enough for blocks of the requested size.
Previously when largest_required_pool_block was exactly equal to one of
the pool_sizes[] values there would be no pool of that size. This patch
increases _M_npools by one, so there is a pool at least as large as the
requested value. It also reduces the size of the largest pool to be no
larger than needed.

* src/c++17/memory_resource.cc (munge_options): Round up value of
largest_required_pool_block to multiple of smallest pool size. Round
excessively large values down to largest pool size.
(select_num_pools): Increase number of pools by one unless it 
exactly
matches requested largest_required_pool_block.
(__pool_resource::_M_alloc_pools()): Make largest pool size equal
largest_required_pool_block.
* testsuite/20_util/unsynchronized_pool_resource/options.cc: Check
that pool_options::largest_required_pool_block is set appropriately.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 719cb9f1d29..691a2999de6 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -830,6 +830,19 @@ namespace pmr
 
   namespace {
 
+  constexpr size_t pool_sizes[] = {
+  8, 16, 24,
+  32, 48,
+  64, 80, 96, 112,
+  128, 192,
+  256, 320, 384, 448,
+  512, 768,
+  1024, 1536,
+  2048, 3072,
+  1<<12, 1<<13, 1<<14, 1<<15, 1<<16, 1<<17,
+  1<<20, 1<<21, 1<<22 // 4MB should be enough for anybody
+  };
+
   pool_options
   munge_options(pool_options opts)
   {
@@ -860,29 +873,25 @@ namespace pmr
   }
 else
   {
-   // TODO round to preferred granularity ?
+   // Round to preferred granularity
+   static_assert(std::__ispow2(pool_sizes[0]));
+   constexpr size_t mask = pool_sizes[0] - 1;
+   opts.largest_required_pool_block += mask;
+   opts.largest_required_pool_block &= ~mask;
   }
 
 if (opts.largest_required_pool_block < big_block::min)
   {
opts.largest_required_pool_block = big_block::min;
   }
+else if (opts.largest_required_pool_block > std::end(pool_sizes)[-1])
+  {
+   // Setting _M_opts to the largest pool allows users to query it:
+   opts.largest_required_pool_block = std::end(pool_sizes)[-1];
+  }
 return opts;
   }
 
-  const size_t pool_sizes[] = {
-  8, 16, 24,
-  32, 48,
-  64, 80, 96, 112,
-  128, 192,
-  256, 320, 384, 448,
-  512, 768,
-  1024, 1536,
-  2048, 3072,
-  1<<12, 1<<13, 1<<14, 1<<15, 1<<16, 1<<17,
-  1<<20, 1<<21, 1<<22 // 4MB should be enough for anybody
-  };
-
   inline int
   pool_index(size_t block_size, int npools)
   {
@@ -898,9 +907,10 @@ namespace pmr
   {
 auto p = std::lower_bound(std::begin(pool_sizes), std::end(pool_sizes),
  opts.largest_required_pool_block);
-if (int npools = p - std::begin(pool_sizes))
-  return npools;
-return 1;
+const int n = p - std::begin(pool_sizes);
+if (p == std::end(pool_sizes) || *p == opts.largest_required_pool_block)
+  return n;
+return n + 1;
   }
 
   } // namespace
@@ -971,7 +981,11 @@ namespace pmr
 _Pool* p = alloc.allocate(_M_npools);
 for (int i = 0; i < _M_npools; ++i)
   {
-

[PATCH] Fix incorrect assertion when deallocating big block

2018-11-13 Thread Jonathan Wakely

Since a big_block rounds up the size to a multiple of big_block::min it
is wrong to assert that the supplied number of bytes equals the
big_block's size(). Add big_block::alloc_size(size_t) to calculate the
allocated size consistently, and add comments to the code.

* src/c++17/memory_resource.cc (big_block): Improve comments.
(big_block::all_ones): Remove.
(big_block::big_block(size_t, size_t)): Use alloc_size.
(big_block::size()): Add comment, replace all_ones with equivalent
expression.
(big_block::align()): Shift value of correct type.
(big_block::alloc_size(size_t)): New function to round up size.
(__pool_resource::allocate(size_t, size_t)): Add comment.
(__pool_resource::deallocate(void*, size_t, size_t)): Likewise. Fix
incorrect assertion by using big_block::alloc_size(size_t).
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc: Add
more tests for unpooled allocations.

Tested x86_64-linux, committed to trunk.

commit 8f54af8a6ab6dae9f9736afd9790691a57d57428
Author: Jonathan Wakely 
Date:   Tue Nov 13 21:01:57 2018 +

Fix incorrect assertion when deallocating big block

Since a big_block rounds up the size to a multiple of big_block::min it
is wrong to assert that the supplied number of bytes equals the
big_block's size(). Add big_block::alloc_size(size_t) to calculate the
allocated size consistently, and add comments to the code.

* src/c++17/memory_resource.cc (big_block): Improve comments.
(big_block::all_ones): Remove.
(big_block::big_block(size_t, size_t)): Use alloc_size.
(big_block::size()): Add comment, replace all_ones with equivalent
expression.
(big_block::align()): Shift value of correct type.
(big_block::alloc_size(size_t)): New function to round up size.
(__pool_resource::allocate(size_t, size_t)): Add comment.
(__pool_resource::deallocate(void*, size_t, size_t)): Likewise. Fix
incorrect assertion by using big_block::alloc_size(size_t).
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc: Add
more tests for unpooled allocations.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index fdbbc914f2e..719cb9f1d29 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -537,22 +537,22 @@ namespace pmr
   // An oversized allocation that doesn't fit in a pool.
   struct big_block
   {
+// Alignment must be a power-of-two so we only need to use enough bits
+// to store the power, not the actual value:
 static constexpr unsigned _S_alignbits
-  = std::__log2p1((unsigned)numeric_limits::digits) - 1;
+  = std::__log2p1((unsigned)numeric_limits::digits - 1);
+// Use the remaining bits to store the size:
 static constexpr unsigned _S_sizebits
   = numeric_limits::digits - _S_alignbits;
 // The maximum value that can be stored in _S_size
-static constexpr size_t all_ones = (1ull << _S_sizebits) - 1u;
-// The minimum size of a big block
+static constexpr size_t all_ones = size_t(-1) >> _S_alignbits;
+// The minimum size of a big block (smaller sizes will be rounded up).
 static constexpr size_t min = 1u << _S_alignbits;
 
 big_block(size_t bytes, size_t alignment)
-: _M_size((bytes + min - 1u) >> _S_alignbits),
+: _M_size(alloc_size(bytes) >> _S_alignbits),
   _M_align_exp(std::__log2p1(alignment) - 1u)
-{
-  if (__builtin_expect(std::__countl_one(bytes) == _S_sizebits, false))
-   _M_size = all_ones;
-}
+{ }
 
 void* pointer = nullptr;
 size_t _M_size : numeric_limits::digits - _S_alignbits;
@@ -560,13 +560,26 @@ namespace pmr
 
 size_t size() const noexcept
 {
-  if (__builtin_expect(_M_size == all_ones, false))
+  // If all bits are set in _M_size it means the maximum possible size:
+  if (__builtin_expect(_M_size == (size_t(-1) >> _S_alignbits), false))
return (size_t)-1;
   else
return _M_size << _S_alignbits;
 }
 
-size_t align() const noexcept { return 1ul << _M_align_exp; }
+size_t align() const noexcept { return size_t(1) << _M_align_exp; }
+
+// Calculate size to be allocated instead of requested number of bytes.
+// The requested value will be rounded up to a multiple of big_block::min,
+// so the low _S_alignbits bits are all zero and don't need to be stored.
+static constexpr size_t alloc_size(size_t bytes) noexcept
+{
+  const size_t s = bytes + min - 1u;
+  if (__builtin_expect(s < bytes, false))
+   return size_t(-1); // addition wrapped past zero, return max value
+  else
+   return s & ~(min - 1u);
+}
 
 friend bool operator<(void* p, const big_block& b) noexcept
 { return less{}(p, b.pointer); }
@

[PATCH] Fix overflows in std::pmr::unsynchonized_pool_resource

2018-11-13 Thread Jonathan Wakely

* src/c++17/memory_resource.cc (bitset::full()): Handle edge case
for _M_next_word maximum value.
(bitset::get_first_unset(), bitset::set(size_type)): Use
update_next_word() to update _M_next_word.
(bitset::update_next_word()): New function, avoiding wraparound of
unsigned _M_next_word member.
(bitset::max_word_index()): New function.
(chunk::chunk(void*, uint32_t, void*, size_t)): Add assertion.
(chunk::max_bytes_per_chunk()): New function.
(pool::replenish(memory_resource*, const pool_options&)): Prevent
_M_blocks_per_chunk from exceeding max_blocks_per_chunk or from
causing chunk::max_bytes_per_chunk() to be exceeded.
* testsuite/20_util/unsynchronized_pool_resource/allocate-max-chunks.cc:
New test.

Tested x86_64-linux, committed to trunk.


commit 9e6c1cedf054342beb710163c5b01137dcf07d45
Author: Jonathan Wakely 
Date:   Mon Nov 12 22:50:36 2018 +

Fix overflows in std::pmr::unsynchonized_pool_resource

* src/c++17/memory_resource.cc (bitset::full()): Handle edge case
for _M_next_word maximum value.
(bitset::get_first_unset(), bitset::set(size_type)): Use
update_next_word() to update _M_next_word.
(bitset::update_next_word()): New function, avoiding wraparound of
unsigned _M_next_word member.
(bitset::max_word_index()): New function.
(chunk::chunk(void*, uint32_t, void*, size_t)): Add assertion.
(chunk::max_bytes_per_chunk()): New function.
(pool::replenish(memory_resource*, const pool_options&)): Prevent
_M_blocks_per_chunk from exceeding max_blocks_per_chunk or from
causing chunk::max_bytes_per_chunk() to be exceeded.
* 
testsuite/20_util/unsynchronized_pool_resource/allocate-max-chunks.cc:
New test.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 3595e255889..fdbbc914f2e 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -290,7 +290,18 @@ namespace pmr
 }
 
 // True if all bits are set
-bool full() const noexcept { return _M_next_word >= nwords(); }
+bool full() const noexcept
+{
+  if (_M_next_word >= nwords())
+   return true;
+  // For a bitset with size() > (max_blocks_per_chunk() - 64) we will
+  // have nwords() == (max_word_index() + 1) and so _M_next_word will
+  // never be equal to nwords().
+  // In that case, check if the last word is full:
+  if (_M_next_word == max_word_index())
+   return _M_words[_M_next_word] == word(-1);
+  return false;
+}
 
 // True if size() != 0 and no bits are set.
 bool empty() const noexcept
@@ -343,11 +354,7 @@ namespace pmr
  const word bit = word(1) << n;
  _M_words[i] |= bit;
  if (i == _M_next_word)
-   {
- while (_M_words[_M_next_word] == word(-1)
- && ++_M_next_word != nwords())
-   { }
-   }
+   update_next_word();
  return (i * bits_per_word) + n;
}
}
@@ -361,11 +368,7 @@ namespace pmr
   const word bit = word(1) << (n % bits_per_word);
   _M_words[wd] |= bit;
   if (wd == _M_next_word)
-   {
- while (_M_words[_M_next_word] == word(-1)
- && ++_M_next_word != nwords())
-   { }
-   }
+   update_next_word();
 }
 
 void clear(size_type n) noexcept
@@ -378,6 +381,18 @@ namespace pmr
_M_next_word = wd;
 }
 
+// Update _M_next_word to refer to the next word with an unset bit.
+// The size of the _M_next_word bit-field means it cannot represent
+// the maximum possible nwords() value. To avoid wraparound to zero
+// this function saturates _M_next_word at max_word_index().
+void update_next_word() noexcept
+{
+  size_t next = _M_next_word;
+  while (_M_words[next] == word(-1) && ++next < nwords())
+   { }
+  _M_next_word = std::min(next, max_word_index());
+}
+
 void swap(bitset& b) noexcept
 {
   std::swap(_M_words, b._M_words);
@@ -396,6 +411,10 @@ namespace pmr
 static constexpr size_t max_blocks_per_chunk() noexcept
 { return (1ull << _S_size_digits) - 1; }
 
+// Maximum value that can be stored in bitset::_M_next_word member (8191).
+static constexpr size_t max_word_index() noexcept
+{ return (max_blocks_per_chunk() + bits_per_word - 1) / bits_per_word; }
+
 word* data() const noexcept { return _M_words; }
 
   private:
@@ -425,7 +444,7 @@ namespace pmr
 : bitset(words, n),
   _M_bytes(bytes),
   _M_p(static_cast(p))
-{ }
+{ __glibcxx_assert(bytes <= chunk::max_bytes_per_chunk()); }
 
 chunk(chunk&& c) noexcept
 : bitset(std::move(c)), _M_bytes(c._M_bytes),

ping x3 [PATCH 0/3] [MSP430] Add methods to extract MCU data from file

2018-11-13 Thread Jozef Lawrynowicz

ping

On 29/10/2018 14:22, Jozef Lawrynowicz wrote:

The same as previous pings except I removed the patch which updates the
hard-coded device data - I'll commit that later as "obvious".

Ok for trunk?



The following series of patches extends MCU device data handling for 
the msp430
target, allowing an external file to be read which describes the CPU 
ISA and

hardware multiply supported for different MCUs.
The current hard-coded solution means that new MCUs can only be 
supported by

updating the GCC itself.

The first patch keeps the hard-coded data as the only way of reading 
MCU data,

but consolidates it in a single file. Extensions to the spec handling in
msp430.h mean that the hard-coded data is no longer needed in 
't-msp430' for
multilib selection, or in the assembler. This is achieved by the 
driver which

places the corresponding mcpu value for the MCU on its command line.

Some extensions to msp430.exp were necessary to ensure that full test 
coverage

is achieved when the testsuite is run using "make check".
As the tests for different MCUs result in different ISAs/memory
models being used, the hard-coded libgloss multilib directories on the 
command
line needed to be fixed up to allow the non-default "430" and "large" 
multilibs

to be tested.
The tests could be downgraded from link tests to assemble tests, (the 
mips
testsuite does this), but then we would lose coverage that the spec 
strings

and multilib selection work as expected.

The second patch adds functionality to search the include paths 
specified with
-I for "devices.csv". If the file is found, and a device name has been 
passed
to the -mmcu option, then devices.csv is parsed, and the MCU data for 
the given

device is extracted.

The third patch adds functionality to search for devices.csv in both
the path specified by the environment variable 
"MSP430_GCC_INCLUDE_DIR", and

the directory "msp430-elf/include/devices" from the toolchain root. These
locations are searched if devices.csv is not found on an include path.
If devices.csv is found using one of these methods, the directory 
containing
devices.csv is also registered as an include path and linker library 
path.






Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-13 Thread Peter Bergner
On 11/13/18 4:09 PM, Vladimir Makarov wrote:
> On 11/13/2018 10:53 AM, Peter Bergner wrote:
>> I think with the above results, I think the patch is ready for review.
>> I'm attaching the latest updated patch below.
>>
>> Again, this passed bootstrap and regtesting on powerpc64le-linux with
>> no regressions.  Ok for mainline?
>>
> Ok, Peter.

Ok, this is committed now.  Thanks for the review and thank you
to everyone who helped debug and test the plethora of patches!!!

Peter





Re: [PATCH] RFC: C/C++: print help when a header can't be found

2018-11-13 Thread Jason Merrill
On Mon, Nov 12, 2018 at 4:01 PM Martin Sebor  wrote:
> On 11/11/2018 04:33 PM, David Malcolm wrote:
> > When gcc can't find a header file, it's a hard error that stops the build,
> > typically requiring the user to mess around with compile flags, Makefiles,
> > dependencies, and so forth.
> >
> > Often the exact search paths aren't obvious to the user.  Consider the
> > case where the include paths are injected via a tool such as pkg-config,
> > such as e.g.:
> >
> >   gcc $(pkg-config --cflags glib-2.0) demo.c
> >
> > This patch is an attempt at being more helpful for such cases.  Given that
> > the user can't proceed until the issue is resolved, I think it's reasonable
> > to default to telling the user as much as possible about what happened.
> > This patch list all of the search paths, and any close matches (e.g. for
> > misspellings).
> >
> > Without the patch, the current behavior is:
> >
> > misspelled-header-1.c:1:10: fatal error: test-header.hpp: No such file or 
> > directory
> > 1 | #include "test-header.hpp"
> >   |  ^
> > compilation terminated.
> >
> > With the patch, the user gets this output:
> >
> > misspelled-header-1.c:1:10: fatal error: test-header.hpp: No such file or 
> > directory
> > 1 | #include "test-header.hpp"
> >   |  ^
> > misspelled-header-1.c:1:10: note: paths searched:
> > misspelled-header-1.c:1:10: note:  path: ''
> > misspelled-header-1.c:1:10: note:   not found: 'test-header.hpp'
> > misspelled-header-1.c:1:10: note:   close match: 'test-header.h'
> > 1 | #include "test-header.hpp"
> >   |  ^
> >   |  "test-header.h"
> > misspelled-header-1.c:1:10: note:  path: '/usr/include/glib-2.0' (via '-I')
> > misspelled-header-1.c:1:10: note:   not found: 
> > '/usr/include/glib-2.0/test-header.hpp'
> > misspelled-header-1.c:1:10: note:  path: '/usr/lib64/glib-2.0/include' (via 
> > '-I')
> > misspelled-header-1.c:1:10: note:   not found: 
> > '/usr/lib64/glib-2.0/include/test-header.hpp'
> > misspelled-header-1.c:1:10: note:  path: './include' (system directory)
> > misspelled-header-1.c:1:10: note:   not found: './include/test-header.hpp'
> > misspelled-header-1.c:1:10: note:  path: './include-fixed' (system 
> > directory)
> > misspelled-header-1.c:1:10: note:   not found: 
> > './include-fixed/test-header.hpp'
> > misspelled-header-1.c:1:10: note:  path: '/usr/local/include' (system 
> > directory)
> > misspelled-header-1.c:1:10: note:   not found: 
> > '/usr/local/include/test-header.hpp'
> > misspelled-header-1.c:1:10: note:  path: '/usr/include' (system directory)
> > misspelled-header-1.c:1:10: note:   not found: 
> > '/usr/include/test-header.hpp'
> > compilation terminated.
> >
> > showing the paths that were tried, and why (e.g. the -I paths injected by
> > the pkg-config invocation), and the .hpp vs .h issue (with a fix-it hint).
> >
> > It's verbose, but as I said above, the user can't proceed until they
> > resolve it, so I think being verbose is appropriate here.
> >
> > Thoughts?
>
> I think printing the directories and especially the near matches
> will be very helpful, especially for big projects with lots of -I
> options.
>
> The output could be made substantially shorter, less repetitive,
> and so easier to read -- basically cut in half -- by avoiding
> most of the duplication and collapsing two notes into one, e.g.
> like so:
>
>fatal error: test-header.hpp: No such file or directory
>1 | #include "test-header.hpp"
>  |  ^
>note: paths searched:
>note: -I '.'
>note:   close match: 'test-header.h'
>1 | #include "test-header.hpp"
>  |  ^
>  |  "test-header.h"
>note: -I '/usr/include/glib-2.0'
>note: -I '/usr/lib64/glib-2.0/include'
>note: -isystem './include'
>note: -isystem './include-fixed'
>note: -isystem '/usr/local/include'
>note: -isystem '/usr/include'
>
> or by printing the directories in sections:
>
>note: -I paths searched:
>note:   '.'
>note:   close match: 'test-header.h'
>1 | #include "test-header.hpp"
>  |  ^
>  |  "test-header.h"
>note:   '/usr/include/glib-2.0'
>note:   '/usr/lib64/glib-2.0/include'
>note: -isystem paths searched:
>note:   './include'
>note:   './include-fixed'
>note:   '/usr/local/include'
>note:   '/usr/include'

I agree that the "not found" lines are unnecessary; I don't feel
strongly about the other formatting.

Jason


Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-13 Thread Vladimir Makarov

On 11/13/2018 10:53 AM, Peter Bergner wrote:

On 11/13/18 9:01 AM, Renlin Li wrote:

I could verify that, your patch fixes all the ICEs I saw with 
arm-linux-gnueabihf toolchain!
There are some differences on the test results, because I compare the latest 
results with something which is old.

I haven't test it on bare-metal toolchain yet. But will do to ensure all 
related issues are fixed.

Hi Renlin,

That's excellent news!  My guess on the testsuite results changes is that
they're probably caused by the combine changes/fixes that went in around
the same time as my patches.

If you want to disable the special copy handling, which should only help
things since it gives RA more freedom, you can apply the patch I mentioned
here:

   https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00379.html

which allows you to turn on and off the optimization with an option.


Jeff and Vlad,

I think with the above results, I think the patch is ready for review.
I'm attaching the latest updated patch below.

Again, this passed bootstrap and regtesting on powerpc64le-linux with
no regressions.  Ok for mainline?


Ok, Peter.


Additional point generation here is not important.  It is more important 
for IRA.  Therefore more efforts were spent to reduce their numbers and 
spans in IRA than in LRA.


Thanks for working on the PR.  LRA/reload patches need a few iteration 
as a rule.  So it is a normal situation.


gcc/
PR rtl-optimization/87899
* lra-lives.c (start_living): Update white space in comment.
(enum point_type): New.
(sparseset_contains_pseudos_p): New function.
(update_pseudo_point): Likewise.
(make_hard_regno_live): Use HARD_REGISTER_NUM_P macro.
(make_hard_regno_dead): Likewise.  Remove ignore_reg_for_conflicts
handling.  Move early exit after adding conflicts.
(mark_pseudo_live): Use HARD_REGISTER_NUM_P macro.  Add early exit
if regno is already live.  Remove all handling of program points.
(mark_pseudo_dead): Use HARD_REGISTER_NUM_P macro.  Add early exit
after adding conflicts.  Remove all handling of program points and
ignore_reg_for_conflicts.
(mark_regno_live): Use HARD_REGISTER_NUM_P macro.  Remove return value
and do not guard call to mark_pseudo_live.
(mark_regno_dead): Use HARD_REGISTER_NUM_P macro.  Remove return value
and do not guard call to mark_pseudo_dead.
(check_pseudos_live_through_calls): Use HARD_REGISTER_NUM_P macro.
(process_bb_lives): Use HARD_REGISTER_NUM_P and HARD_REGISTER_P macros.
Use new function update_pseudo_point.  Handle register copies by
removing the source register from the live set.  Handle INOUT operands.
Update to the next program point using the unused_set, dead_set and
start_dying sets.
(lra_create_live_ranges_1): Use HARD_REGISTER_NUM_P macro.





Re: [PATCH] C/C++: add fix-it hints for missing '&' and '*' (PR c++/87850)

2018-11-13 Thread Jason Merrill
On Mon, Nov 12, 2018 at 4:32 PM Martin Sebor  wrote:
> On 11/11/2018 02:02 PM, David Malcolm wrote:
> > On Sun, 2018-11-11 at 11:01 -0700, Martin Sebor wrote:
> >> On 11/10/2018 12:01 AM, Eric Gallager wrote:
> >>> On 11/9/18, David Malcolm  wrote:
>  This patch adds a fix-it hint to various pointer-vs-non-pointer
>  diagnostics, suggesting the addition of a leading '&' or '*'.
> 
>  For example, note the ampersand fix-it hint in the following:
> 
>  demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka
>  'unsigned
>  int'}
> to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
>  5 |   pthread_key_create(key, NULL);
>    |  ^~~
>    |  |
>    |  pthread_key_t {aka unsigned int}
>    |  &
> >>>
> >>> Having both the type and the fixit underneath the caret looks kind
> >>> of confusing
> >>
> >> I agree it's rather subtle.  Keeping the diagnostics separate from
> >> the suggested fix should avoid the confusion.
> >
> > FWIW, the fix-it hint is in a different color (assuming that gcc is
> > invoked in an environment that prints that...)
>
> I figured it would be, but I'm still not sure it's good design
> to be relying on color alone to distinguish between the problem
> and the suggested fix.  Especially when they are so close to one
> another and the fix is just a single character with no obvious
> relationship to the rest of the text on the screen.  In other
> warnings there's at least the "did you forget the '@'?" part
> to give a clue, even though even there the connection between
> the "did you forget" and the & several lines down wouldn't
> necessarily be immediately apparent.

Agreed, something along those lines would help to understand why the
compiler is throwing a random & into the diagnostic.

Jason


[PATCH v3] Add sinh(atanh(x)) and cosh(atanh(x)) optimizations

2018-11-13 Thread Giuliano Augusto Faulin Belinassi
Only do the optimization if flag_signed_zeros &&
!flag_finite_math_only is set, as suggested in the previous iteration.

Before, the patch did the optimization even when -fno-signed-zeros and
-ffinite-math-only was set. This could generate badly incorrect
results for targets that do not support infinite or signed zeros.

I also updated the tests with the proper flags.

gcc/ChangeLog
2018-11-13  Giuliano Belinassi  

* match.pd (sinh (atanh (x))): New simplification rules.
(cosh (atanh (x))): Likewise.

gcc/testsuite/ChangeLog
2018-11-13  Giuliano Belinassi  

* gcc.dg/sinhatanh-1.c: New test.
* gcc.dg/sinhatanh-2.c: New test.

There are no tests in trunk that seems to be breaking because of this patch.
Index: gcc/match.pd
===
--- gcc/match.pd	(revision 266029)
+++ gcc/match.pd	(working copy)
@@ -4311,6 +4311,26 @@
   (rdiv { t_one; } (sqrts (plus (mult @0 @0) { t_one; })))
   (copysigns { t_zero; } @0))
 
+ /* Simplify sinh(atanh(x)) -> x / sqrt((1 - x)*(1 + x)). */
+ (for sinhs (SINH)
+  atanhs (ATANH)
+  sqrts (SQRT)
+  (simplify
+   (sinhs (atanhs:s @0))
+   (with { tree t_one = build_one_cst (type); }
+   (if (flag_signed_zeros && !flag_finite_math_only)
+(rdiv @0 (sqrts (mult (minus { t_one; } @0) (plus { t_one; } @0
+
+ /* Simplify cosh(atanh(x)) -> 1 / sqrt((1 - x)*(1 + x)) */
+ (for coshs (COSH)
+  atanhs (ATANH)
+  sqrts (SQRT)
+  (simplify
+   (coshs (atanhs:s @0))
+   (with { tree t_one = build_real (type, dconst1); }
+   (if (flag_signed_zeros && !flag_finite_math_only)
+(rdiv { t_one; } (sqrts (mult (minus { t_one; } @0) (plus { t_one; } @0
+
 /* cabs(x+0i) or cabs(0+xi) -> abs(x).  */
 (simplify
  (CABS (complex:C @0 real_zerop@1))
Index: gcc/testsuite/gcc.dg/sinhatanh-1.c
===
--- gcc/testsuite/gcc.dg/sinhatanh-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinhatanh-1.c	(working copy)
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -funsafe-math-optimizations -fsigned-zeros
+ -fno-finite-math-only -fno-associative-math
+ -fdump-tree-optimized" } */
+
+extern float sinhf (float);
+extern float coshf (float);
+extern float atanhf (float);
+extern float sqrtf (float);
+extern double sinh (double);
+extern double cosh (double);
+extern double atanh (double);
+extern double sqrt (double);
+extern long double sinhl (long double);
+extern long double coshl (long double);
+extern long double atanhl (long double);
+extern long double sqrtl (long double);
+
+double __attribute__ ((noinline))
+sinhatanh_ (double x)
+{
+return sinh (atanh (x));
+}
+
+double __attribute__ ((noinline))
+coshatanh_ (double x)
+{
+return cosh (atanh (x));
+}
+
+float __attribute__ ((noinline))
+sinhatanhf_(float x)
+{
+return sinhf (atanhf (x));
+}
+
+float __attribute__ ((noinline))
+coshatanhf_(float x)
+{
+return coshf (atanhf (x));
+}
+
+long double __attribute__ ((noinline))
+sinhatanhl_ (long double x)
+{
+return sinhl (atanhl (x));
+}
+
+long double __attribute__ ((noinline))
+coshatanhl_ (long double x)
+{
+return coshl (atanhl (x));
+}
+
+/* There must be no calls to sinh, cosh, or atanh */
+/* {dg-final { scan-tree-dump-not "sinh " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "cosh " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "atanh " "optimized" }} */
+/* {dg-final { scan-tree-dump-not "sinfh " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "cosfh " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "atanfh " "optimized" }} */
+/* {dg-final { scan-tree-dump-not "sinlh " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "coslh " "optimized" } } */
+/* {dg-final { scan-tree-dump-not "atanlh " "optimized" }} */
Index: gcc/testsuite/gcc.dg/sinhatanh-2.c
===
--- gcc/testsuite/gcc.dg/sinhatanh-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/sinhatanh-2.c	(working copy)
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -funsafe-math-optimizations -fsigned-zeros
+ -fno-finite-math-only -fno-associative-math
+ -fdump-tree-optimized" } */
+
+extern float sinhf (float);
+extern float coshf (float);
+extern float atanhf (float);
+extern float sqrtf (float);
+extern double sinh (double);
+extern double cosh (double);
+extern double atanh (double);
+extern double sqrt (double);
+extern long double sinhl (long double);
+extern long double coshl (long double);
+extern long double atanhl (long double);
+extern long double sqrtl (long double);
+
+float __attribute__ ((noinline))
+coshatanhf_(float x)
+{
+float atg = atanhf(x);
+return coshf(atg) + atg;
+}
+
+double __attribute__ ((noinline))
+cosatan_(double x)
+{
+double atg = atanh(x);
+return cosh(atg) + atg;
+}
+
+long double __attribute__ ((noinline))
+cosata

Re: Bug 52869 - [DR 1207] "this" not being allowed in noexcept clauses

2018-11-13 Thread Jason Merrill
On Tue, Nov 13, 2018 at 10:40 AM Marek Polacek  wrote:
> On Tue, Nov 13, 2018 at 11:49:55AM +0530, Umesh Kalappa wrote:
> > Hi All,
> >
> > the following patch fix the subjected issue
> >
> > Index: gcc/cp/parser.c
> > ===
> > --- gcc/cp/parser.c (revision 266026)
> > +++ gcc/cp/parser.c (working copy)
> > @@ -24615,6 +24615,8 @@
> >  {
> >tree expr;
> >cp_lexer_consume_token (parser->lexer);
> > +
> > +  inject_this_parameter (current_class_type, TYPE_UNQUALIFIED);
> >
> >if (cp_lexer_peek_token (parser->lexer)->type == CPP_OPEN_PAREN)
> > {
> >
> >
> > ok to commit along the testcase with changelog update ?
>
> Thanks for the patch.
>
> Please also include the testcase along with the patch (and I think it should
> also test noexcept in a template).  Please also include a ChangeLog entry
> in the patch submission.
>
> Can you describe how this patch has been tested?
>
> Further, wouldn't it be better to call inject_this_parameter inside the
> CPP_OPEN_PAREN block?  If noexcept doesn't have any expression, then it
> can't refer to "this".

Agreed, thanks.  You also need to restore the old
current_class_{ptr,ref} at the end of the noexcept-specifier.

Jason


Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-13 Thread Qing Zhao
Hi,

> On Nov 13, 2018, at 1:18 PM, Miroslav Benes  wrote:
> 
>> Attached is the patch for new -flive-patching=[inline-only-static | 
>> inline-clone] master option.
>> 
>> '-flive-patching=LEVEL'
>> Control GCC's optimizations to provide a safe compilation for
>> live-patching.  Provides multiple-level control on how many of the
>> optimizations are enabled by users' request.  The LEVEL argument
>> should be one of the following:
>> 
>> 'inline-only-static'
>> 
>>  Only enable inlining of static functions, disable all other
>>  ipa optimizations/analyses.  As a result, when patching a
>>  static routine, all its callers need to be patches as well.
>> 
>> 'inline-clone'
>> 
>>  Only enable inlining and all optimizations that internally
>>  create clone, for example, cloning, ipa-sra, partial inlining,
>>  etc.; disable all other ipa optimizations/analyses.  As a
>>  result, when patching a routine, all its callers and its
>>  clones' callers need to be patched as well.
> 
> Based on our previous discussion I assume that "clone" optimizations are 
> safe (for LP) and the others are not. Anyway I'd welcome a note mentioning 
> that disabled optimizations are dangerous for LP.

actually, I don’t think that those disabled optimizations are “dangerous” for 
live-patching. one of the major reasons we disable them
is because that currently the compiler does NOT provide a good way to compute 
the impacted function list for those optimizations.
therefore, we disable them at this time. 

many of them could be enabled too if the compiler can report the impacted 
function list accurately in the future.



> 
> I know it may be the same for you, but it is not for me as a GCC user. 
> "internally create clone" sounds very... well, internal. It does not 
> describe the option much for ordinary user whow has no knowledge about GCC 
> internals.
> 
> So could you rephrase it a bit, please?

I tried to make this clear. please see the following:

'-flive-patching=LEVEL'
 Control GCC's optimizations to provide a safe compilation for
 live-patching.

 If the compiler's optimization uses a function's body or
 information extracted from its body to optimize/change another
 function, the latter is called an impacted function of the former.
 If a function is patched, its impacted functions should be patched
 too.

 The impacted functions are decided by the compiler's
 interprocedural optimizations.  For example, inlining a function
 into its caller, cloning a function and changing its caller to call
 this new clone, or extracting a function's pureness/constness
 information to optimize its direct or indirect callers, etc.

 Usually, the more ipa optimizations enabled, the larger the number
 of impacted functions for each function.  In order to control the
 number of impacted functions and computed the list of impacted
 function easily, we provide control to partially enable ipa
 optimizations on two different levels.

 The LEVEL argument should be one of the following:

 'inline-only-static'

  Only enable inlining of static functions, disable all other
  interprocedural optimizations/analyses.  As a result, when
  patching a static routine, all its callers need to be patches
  as well.

 'inline-clone'

  Only enable inlining and cloning optimizations, which includes
  inlining, cloning, interprocedural scalar replacement of
  aggregates and partial inlining.  Disable all other
  interprocedural optimizations/analyses.  As a result, when
  patching a routine, all its callers and its clones' callers
  need to be patched as well.

 When -flive-patching specified without any value, the default value
 is "inline-clone".

 This flag is disabled by default.


> 
>> When -flive-patching specified without any value, the default value
>> is "inline-clone".
>> 
>> This flag is disabled by default.
>> 
>> let me know your comments and suggestions on the implementation.
> 
> I compared it to Martin's patch and ipa-icf-variables is not covered in 
> yours (I may have missed something).

Yes, you are right. I added this into my patch.

I am attaching the new patch here.



flive-patching.patch
Description: Binary data


> 
> Thanks,
> Miroslav



[wwwdocs] Update C++ status post San Diego

2018-11-13 Thread Marek Polacek
This patch updates the current C++ status with proposals accepted
at the recent San Diego meeting.

Committed to CVS.

Index: projects/cxx-status.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cxx-status.html,v
retrieving revision 1.66
diff -u -r1.66 cxx-status.html
--- projects/cxx-status.html13 Nov 2018 04:50:14 -  1.66
+++ projects/cxx-status.html13 Nov 2018 20:42:20 -
@@ -106,7 +106,9 @@
 
Concepts 
   http://wg21.link/p0734r0";>P0734R0
-http://wg21.link/p0857r0";>P0857R0 
+http://wg21.link/p0857r0";>P0857R0
+   http://wg21.link/p1084r2";>P1084R2
+   http://wg21.link/p1414r2";>P1414R2
TS with -fconcepts 

 
@@ -246,7 +248,8 @@
 
 
Support for contract based programming in C++ 
-  http://wg21.link/p0542r5";>P0542R5
+  http://wg21.link/p0542r5";>P0542R5
+  http://wg21.link/p1289r1";>P1289R1
No 

 
@@ -256,6 +259,44 @@
9 

 
+
+   Signed integers are two's complement 
+  http://wg21.link/p1236r1";>P1236R1
+   No 
+   
+
+
+   char8_t 
+  http://wg21.link/p0482r6";>P0482R6
+   No 
+   
+
+
+   Immediate functions (consteval) 
+  http://wg21.link/p1073r3";>P1073R3
+   No 
+   
+
+
+   std::is_constant_evaluated 
+  http://wg21.link/p0595r2";>P0595R2
+   No 
+   
+
+
+   Nested inline namespaces 
+  http://wg21.link/p1094r2";>P1094R2
+   No 
+   
+
+
+   Relaxations of constexpr restrictions
+  http://wg21.link/p1002r1";>P1002R1
+  http://wg21.link/p1327r1";>P1327R1
+  http://wg21.link/p1330r0";>P1330R0
+   No 
+   
+
   
 
   C++17 Support in GCC


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Andi Kleen
On Tue, Nov 13, 2018 at 07:37:27PM +0100, Richard Biener wrote:
> I'd look at doing the instrumentation after var-tracking has run - that is 
> what computes the locations in the end. That means instrumenting on late RTL 
> after register allocation (and eventually with branch range restrictions in 
> place). Basically you'd instrument at the same time as generating debug info.

Ok that would be a full rewrite. I'll check if it's really a problem
first. I would prefer to stay on the GIMPLE level.

-Andi


Re: [PATCH v4] PR preprocessor/83173: Enhance -fdump-internal-locations output

2018-11-13 Thread David Malcolm
On Tue, 2018-11-13 at 14:54 -0500, Mike Gulick wrote:
> 2018-11-13  Mike Gulick  

[...]

>   * gcc/diagnostic-core.h (num_digits): Add extern definition.

FWIW you moved the decl to diagnostic.h, but didn't update the above
ChangeLog entry.

[...]

> diff --git a/libcpp/location-example.txt b/libcpp/location-
> example.txt
> index 14b5c2e284a..dc448b0493e 100644
> --- a/libcpp/location-example.txt
> +++ b/libcpp/location-example.txt

You're going to need to regenerate this file again; I touched many of
the same lines as your patch does, in r266085 (sorry).

Other than the nits above, this looks good to me (once you have your
contribution paperwork in place).

Thanks
Dave


[PATCH v4] PR preprocessor/83173: Enhance -fdump-internal-locations output

2018-11-13 Thread Mike Gulick
2018-11-13  Mike Gulick  

PR preprocessor/83173
* gcc/input.c (dump_location_info): Dump reason and
included_from fields from line_map_ordinary struct.  Fix
indentation when location > 5 digits.
* libcpp/location-example.txt: Update example
-fdump-internal-locations output.
* gcc/diagnostic-show-locus.c (num_digits, test_num_digits):
Move to gcc/diagnostic.c to allow it to be utilized by
gcc/input.c.
* gcc/diagnostic.c (num_digits, test_num_digits): Moved here.
(diagnostic_c_tests): Run test_num_digits.
* gcc/diagnostic-core.h (num_digits): Add extern definition.
---
 gcc/diagnostic-show-locus.c |  51 --
 gcc/diagnostic.c|  46 +
 gcc/diagnostic.h|   2 +
 gcc/input.c |  41 -
 libcpp/location-example.txt | 333 +---
 5 files changed, 281 insertions(+), 192 deletions(-)

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index a42ff819512..08fe74a6136 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -819,56 +819,6 @@ fixit_cmp (const void *p_a, const void *p_b)
   return hint_a->get_start_loc () - hint_b->get_start_loc ();
 }
 
-/* Get the number of digits in the decimal representation
-   of VALUE.  */
-
-static int
-num_digits (int value)
-{
-  /* Perhaps simpler to use log10 for this, but doing it this way avoids
- using floating point.  */
-  gcc_assert (value >= 0);
-
-  if (value == 0)
-return 1;
-
-  int digits = 0;
-  while (value > 0)
-{
-  digits++;
-  value /= 10;
-}
-  return digits;
-}
-
-
-#if CHECKING_P
-
-/* Selftest for num_digits.  */
-
-static void
-test_num_digits ()
-{
-  ASSERT_EQ (1, num_digits (0));
-  ASSERT_EQ (1, num_digits (9));
-  ASSERT_EQ (2, num_digits (10));
-  ASSERT_EQ (2, num_digits (99));
-  ASSERT_EQ (3, num_digits (100));
-  ASSERT_EQ (3, num_digits (999));
-  ASSERT_EQ (4, num_digits (1000));
-  ASSERT_EQ (4, num_digits ());
-  ASSERT_EQ (5, num_digits (1));
-  ASSERT_EQ (5, num_digits (9));
-  ASSERT_EQ (6, num_digits (10));
-  ASSERT_EQ (6, num_digits (99));
-  ASSERT_EQ (7, num_digits (100));
-  ASSERT_EQ (7, num_digits (999));
-  ASSERT_EQ (8, num_digits (1000));
-  ASSERT_EQ (8, num_digits ());
-}
-
-#endif /* #if CHECKING_P */
-
 /* Implementation of class layout.  */
 
 /* Constructor for class layout.
@@ -3761,7 +3711,6 @@ void
 diagnostic_show_locus_c_tests ()
 {
   test_line_span ();
-  test_num_digits ();
 
   test_layout_range_for_single_point ();
   test_layout_range_for_single_line ();
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index a572c084aac..08d40b87e2c 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -1024,6 +1024,27 @@ diagnostic_report_diagnostic (diagnostic_context 
*context,
   return true;
 }
 
+/* Get the number of digits in the decimal representation of VALUE.  */
+
+int
+num_digits (int value)
+{
+  /* Perhaps simpler to use log10 for this, but doing it this way avoids
+ using floating point.  */
+  gcc_assert (value >= 0);
+
+  if (value == 0)
+return 1;
+
+  int digits = 0;
+  while (value > 0)
+{
+  digits++;
+  value /= 10;
+}
+  return digits;
+}
+
 /* Given a partial pathname as input, return another pathname that
shares no directory elements with the pathname of __FILE__.  This
is used by fancy_abort() to print `Internal compiler error in expr.c'
@@ -1774,6 +1795,29 @@ test_diagnostic_get_location_text ()
   progname = old_progname;
 }
 
+/* Selftest for num_digits.  */
+
+static void
+test_num_digits ()
+{
+  ASSERT_EQ (1, num_digits (0));
+  ASSERT_EQ (1, num_digits (9));
+  ASSERT_EQ (2, num_digits (10));
+  ASSERT_EQ (2, num_digits (99));
+  ASSERT_EQ (3, num_digits (100));
+  ASSERT_EQ (3, num_digits (999));
+  ASSERT_EQ (4, num_digits (1000));
+  ASSERT_EQ (4, num_digits ());
+  ASSERT_EQ (5, num_digits (1));
+  ASSERT_EQ (5, num_digits (9));
+  ASSERT_EQ (6, num_digits (10));
+  ASSERT_EQ (6, num_digits (99));
+  ASSERT_EQ (7, num_digits (100));
+  ASSERT_EQ (7, num_digits (999));
+  ASSERT_EQ (8, num_digits (1000));
+  ASSERT_EQ (8, num_digits ());
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -1785,6 +1829,8 @@ diagnostic_c_tests ()
   test_print_parseable_fixits_remove ();
   test_print_parseable_fixits_replace ();
   test_diagnostic_get_location_text ();
+  test_num_digits ();
+
 }
 
 } // namespace selftest
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 3498a9ba7bb..a48fe3f9a97 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -401,5 +401,7 @@ extern char *file_name_as_prefix (diagnostic_context *, 
const char *);
 
 extern char *build_message_string (const char *, ...) ATTRIBUTE_PRINTF_1;
 
+/* Compute the number of digits in the decimal representation of an integer.  
*/
+extern int num_digits (int);
 
 #endif /* ! G

Re: RFC (branch prediction): PATCH to implement P0479R5, [[likely]] and [[unlikely]].

2018-11-13 Thread Jason Merrill
On Tue, Nov 13, 2018 at 9:20 AM Martin Liška  wrote:
>
> On 11/13/18 5:43 AM, Jason Merrill wrote:
> > [[likely]] and [[unlikely]] are equivalent to the GNU hot/cold attributes,
> > except that they can be applied to arbitrary statements as well as labels;
> > this is most likely to be useful for marking if/else branches as likely or
> > unlikely.  Conveniently, PREDICT_EXPR fits the bill nicely as a
> > representation.
> >
> > I also had to fix marking case labels as hot/cold, which didn't work before.
> > Which then required me to force __attribute ((fallthrough)) to apply to the
> > statement rather than the label.
> >
> > Tested x86_64-pc-linux-gnu.  Does this seem like a sane implementation
> > approach to people with more experience with PREDICT_EXPR?
>
> Hi.
>
> In general it makes sense to implement it the same way. Question is how much
> should the hold/cold attribute should be close to __builtin_expect.
>
> Let me present few examples and differences that I see:
>
> 1) ./xgcc -B. -O2 -fdump-tree-profile_estimate=/dev/stdout /tmp/test1.C
>
> ;; Function foo (_Z3foov, funcdef_no=0, decl_uid=2301, cgraph_uid=1, 
> symbol_order=3)
>
> Predictions for bb 2
>   first match heuristics: 90.00%
>   combined heuristics: 90.00%
>   __builtin_expect heuristics of edge 2->3: 90.00%
>
> As seen here __builtin_expect is stronger as it's first match heuristics and 
> has probability == 90%.
>
> ;; Function bar (_Z3barv, funcdef_no=1, decl_uid=2303, cgraph_uid=2, 
> symbol_order=4)
>
> Predictions for bb 2
>   DS theory heuristics: 74.48%
>   combined heuristics: 74.48%
>   opcode values nonequal (on trees) heuristics of edge 2->3: 34.00%
>   hot label heuristics of edge 2->3: 85.00%
>
> Here we combine hot label prediction with the opcode one, resulting in quite 
> poor result 75%.
> So maybe cold/hot prediction cal also happen first match.

Makes sense.

> 2) ./xgcc -B. -O2 -fdump-tree-profile_estimate=/dev/stdout /tmp/test2.C
> ...
> foo ()
> {
> ...
>   switch (_3)  [3.33%], case 3:  [3.33%], case 42:  
> [3.33%], case 333:  [90.00%]>
>
> while:
>
> bar ()
> {
>   switch (a.1_1)  [25.00%], case 3:  [25.00%], case 42: 
>  [25.00%], case 333:  [25.00%]>
> ...
>
> Note that support for __builtin_expect was enhanced in this stage1. I can 
> definitely cover also situations when one uses
> hot/cold for labels. So definitely place for improvement.

Hmm, the gimplifier should be adding a PREDICT_EXPR for the case label
now, is it just ignored currently?

> 3) last example where one can use the attribute for function decl, resulting 
> in:
> __attribute__((hot, noinline))
> foo ()
> {
> ..
>
> Hope it's desired? If so I would cover that with a test-case in test-suite.

[[likely]] and [[unlikely]] don't apply to functions; I suppose I
should diagnose that.

> Jason can you please point to C++ specification of the attributes?

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0479r5.html

> Would you please consider an error diagnostics for situations written in 
> test4.C?

A warning seems appropriate.  You think the front end is the right
place for that?

Jason


Re: [PATCH v3 3/3] PR preprocessor/83173: Enhance -fdump-internal-locations output

2018-11-13 Thread Mike Gulick
On 11/12/18 7:56 PM, David Malcolm wrote:
> On Mon, 2018-11-12 at 21:13 +, Mike Gulick wrote:
>> On 11/2/18 5:04 PM, David Malcolm wrote:
>>> On Thu, 2018-11-01 at 11:56 -0400, Mike Gulick wrote:
 2017-10-31  Mike Gulick  

PR preprocessor/83173
* gcc/input.c (dump_location_info): Dump reason and
included_from fields from line_map_ordinary struct.  Fix
indentation when location > 5 digits.

* libcpp/location-example.txt: Update example
-fdump-internal-locations output.
 ---
  gcc/input.c |  49 +-
  libcpp/location-example.txt | 333 +-
 
 --
  2 files changed, 241 insertions(+), 141 deletions(-)
>>>
>>> Sorry about the belated response.  This is a nice enhancement; some
>>> nits below.
>>>
 diff --git a/gcc/input.c b/gcc/input.c
 index a94a010f353..f938a37f20e 100644
 --- a/gcc/input.c
 +++ b/gcc/input.c
 @@ -1075,6 +1075,17 @@ dump_labelled_location_range (FILE
 *stream,
fprintf (stream, "\n");
  }
  
 +#define NUM_DIGITS(x) ((x) >= 10 ? 10 : \
 + (x) >= 1 ? 9 : \
 + (x) >= 1000 ? 8 : \
 + (x) >= 100 ? 7 : \
 + (x) >= 10 ? 6 : \
 + (x) >= 1 ? 5 : \
 + (x) >= 1000 ? 4 : \
 + (x) >= 100 ? 3 : \
 + (x) >= 10 ? 2 : \
 + 1)
>>>
>>> diagnostic-show-locus.c has a function "num_digits" (currently
>>> static)
>>> and, fwiw, a unit test.  It would be good to share the
>>> implementation.
>>>
>>
>> I initially tried to use this function by just adding "extern int
>> num_digits(int);" into diagnostic-core.h, but that failed to link, so
>> it seems
>> like diagnostic-show-locus.c is not included in whatever library
>> input.c gets
>> linked with (I forget which library it was trying to link). 
> 
> Both input.o and diagnostic-show-locus.o are in OBJS-libcommon, so I'm
> not sure what went wrong.
> 

After looking at libcommon.a with nm, I realized that
diagnostic-show-locus.c wrapped everything inside an anonymous
namespace, so that was why the symbol wasn't visible.

>> Instead I moved
>> num_digits and its unit test to diagnostic.c, and added the extern
>> definition to
>> diagnostic-core.h.  That builds and tests successfully.  Does that
>> seem like a
>> reasonable way to do this?
> 
> Thanks.  That sounds good (maybe put the decl in diagnostic.h rather
> than diagnostic-core.h; the latter is used in lots of places, whereas
> the former is more about implementation details).
>

No problem.

  /* Write a visualization of the locations in the line_table to
 STREAM.  */
  
  void
 @@ -1104,6 +1115,35 @@ dump_location_info (FILE *stream)
   map->m_column_and_range_bits - map-
> m_range_bits);
fprintf (stream, "  range bits: %i\n",
   map->m_range_bits);
 +  const char * reason;
 +  switch (map->reason) {
 +  case LC_ENTER:
 +  reason = "LC_ENTER";
 +  break;
 +  case LC_LEAVE:
 +  reason = "LC_LEAVE";
 +  break;
 +  case LC_RENAME:
 +  reason = "LC_RENAME";
 +  break;
 +  case LC_RENAME_VERBATIM:
 +  reason = "LC_RENAME_VERBATIM";
 +  break;
 +  case LC_ENTER_MACRO:
 +  reason = "LC_RENAME_MACRO";
 +  break;
 +  default:
 +  reason = "Unknown";
 +  }
 +  fprintf (stream, "  reason: %d (%s)\n", map->reason,
 reason);
 +
 +  const line_map_ordinary *includer_map
 +  = linemap_included_from_linemap (line_table, map);
 +  fprintf (stream, "  included from map: %d\n",
 + includer_map ? int (includer_map - line_table-
> info_ordinary.maps)

 + : -1);
>>>
>>> I'm not a fan of "-1" here; it's a NULL pointer in the original
>>> data.
>>> How about "n/a" for that case?
>>>
>>
>> That's a good suggestion.  Thanks.
>>
 +  fprintf (stream, "  included from location: %d\n",
 + linemap_included_from (map));
>>>
>>> ...or merging it with this line, for something like:
>>>
>>>   included from location: 127 (in ordinary map 2)
>>>
>>> vs:
>>>
>>>   included from location: 0
>>>
>>> [...snip...]
>>>
>>> Other than that, this is OK for trunk, assuming your contributor
>>> paperwork is in place.
>>>
>>> Dave
>>>
>>
>> What is the preferred way to re-send this patch?  Should I re-send
>> the entire
>> patch series as v4, or just an updated version of this single patch?
> 
> The latter: just an updated version of the changed patch.  IIRC the
> rest is all approved.
>

Thanks, I'll reply with the updated patch.

>>
>> Also, I'm waiting on FSF for assignment paperwork.  I've re-pinged
>> them after
>> waiting a week.
> 
> Thanks.
> 
>> Thanks for the feedback and help.
>>
>> -Mike
> 


Re: [PATCH] avoid copying inlining attributes in attribute copy

2018-11-13 Thread Jeff Law
On 11/13/18 12:15 PM, Martin Sebor wrote:
> Enabling in Glibc the recent enhancement to detect attribute
> mismatches in alias declarations (PR 81824) revealed a use
> case where aliases are being declared for targets declared
> with the always_inline attribute.  Copying the always_inline
> attribute to the declaration of the alias would then trigger
> 
>   error: always_inline function might not be inlinable [-Werror=attributes]
> 
> due to the alias not being inlinable in some contexts where
> (presumably) the target would be.  To avoid the warning for
> this use case the attached patch excludes all attributes
> that affect inlining from being copied by attribute copy.
> 
> While testing this I also more thoroughly exercised attribute
> tls_target (which also came up during the Glibc deployment),
> and improved the diagnostics for the attribute to make their
> root cause easier to understand (printing "attribute ignored"
> alone isn't very informative without also explaining why).
> 
> Martin
> 
> PS While testing this I also opened bug 88010 where inlining
> attributes on aliases seem to be silently ignored in favor of
> those on their targets.  I'm wondering what the expected (or
> preferred) behavior is.
> 
> gcc-attr-copy-fix.diff
> 
> gcc/c-family/ChangeLog:
> 
>   * c-attribs.c (handle_copy_attribute): Exclude inlining attributes.
>   (handle_tls_model_attribute): Improve diagnostics.
>   (has_attribute): Fix a typo.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/attr-copy-5.c: New test.
>   * gcc.dg/tls/diag-6.c: Adjust expected diagnostics.
OK.
jeff


Re: [PATCH][GCC] Make DR_TARGET_ALIGNMENT compile time variable

2018-11-13 Thread Dominique d'Humières
Revision r266072 breaks bootstrap on darwin:

In file included from ../../work/gcc/coretypes.h:430,
 from ../../work/gcc/tree-vect-data-refs.c:24:
../../work/gcc/poly-int.h: In instantiation of 'typename if_nonpoly::type maybe_lt(const Ca&, const poly_int_pod&) [with unsigned int 
N = 1; Ca = int; Cb = long long unsigned int; typename if_nonpoly::type = bool]':
../../work/gcc/tree-vect-data-refs.c:6338:13:   required from here
../../work/gcc/poly-int.h:1384:12: error: comparison of integer expressions of 
different signedness: 'const int' and 'const long long unsigned int' 
[-Werror=sign-compare]
 1384 |   return a < b.coeffs[0];
  |  ~~^~~
cc1plus: all warnings being treated as errors

TIA

Dominique

Re: [doc PATCH] Fix weakref description.

2018-11-13 Thread Michael Ploujnikov
On 2018-11-12 12:50 p.m., Michael Ploujnikov wrote:
> On 2018-11-02 1:59 p.m., Michael Ploujnikov wrote:
>> I came across this typo and also added a similar ld invocation for
>> illustration purposes as mentioned by Jakub on irc.
>>
> 
> After talking to Jakub about it, I went with different terminology.
> 
> 
> - Michael
> 

Installed as obvious.

- Michael



signature.asc
Description: OpenPGP digital signature


Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-13 Thread Miroslav Benes
On Tue, 13 Nov 2018, Qing Zhao wrote:

> Hi,

Hi, 
 
> Attached is the patch for new -flive-patching=[inline-only-static | 
> inline-clone] master option.
> 
> '-flive-patching=LEVEL'
>  Control GCC's optimizations to provide a safe compilation for
>  live-patching.  Provides multiple-level control on how many of the
>  optimizations are enabled by users' request.  The LEVEL argument
>  should be one of the following:
> 
>  'inline-only-static'
> 
>   Only enable inlining of static functions, disable all other
>   ipa optimizations/analyses.  As a result, when patching a
>   static routine, all its callers need to be patches as well.
> 
>  'inline-clone'
> 
>   Only enable inlining and all optimizations that internally
>   create clone, for example, cloning, ipa-sra, partial inlining,
>   etc.; disable all other ipa optimizations/analyses.  As a
>   result, when patching a routine, all its callers and its
>   clones' callers need to be patched as well.

Based on our previous discussion I assume that "clone" optimizations are 
safe (for LP) and the others are not. Anyway I'd welcome a note mentioning 
that disabled optimizations are dangerous for LP.

I know it may be the same for you, but it is not for me as a GCC user. 
"internally create clone" sounds very... well, internal. It does not 
describe the option much for ordinary user whow has no knowledge about GCC 
internals.

So could you rephrase it a bit, please?

>  When -flive-patching specified without any value, the default value
>  is "inline-clone".
> 
>  This flag is disabled by default.
> 
> let me know your comments and suggestions on the implementation.

I compared it to Martin's patch and ipa-icf-variables is not covered in 
yours (I may have missed something).

Thanks,
Miroslav


[PATCH] avoid copying inlining attributes in attribute copy

2018-11-13 Thread Martin Sebor

Enabling in Glibc the recent enhancement to detect attribute
mismatches in alias declarations (PR 81824) revealed a use
case where aliases are being declared for targets declared
with the always_inline attribute.  Copying the always_inline
attribute to the declaration of the alias would then trigger

  error: always_inline function might not be inlinable [-Werror=attributes]

due to the alias not being inlinable in some contexts where
(presumably) the target would be.  To avoid the warning for
this use case the attached patch excludes all attributes
that affect inlining from being copied by attribute copy.

While testing this I also more thoroughly exercised attribute
tls_target (which also came up during the Glibc deployment),
and improved the diagnostics for the attribute to make their
root cause easier to understand (printing "attribute ignored"
alone isn't very informative without also explaining why).

Martin

PS While testing this I also opened bug 88010 where inlining
attributes on aliases seem to be silently ignored in favor of
those on their targets.  I'm wondering what the expected (or
preferred) behavior is.
gcc/c-family/ChangeLog:

	* c-attribs.c (handle_copy_attribute): Exclude inlining attributes.
	(handle_tls_model_attribute): Improve diagnostics.
	(has_attribute): Fix a typo.

gcc/testsuite/ChangeLog:

	* gcc.dg/attr-copy-5.c: New test.
	* gcc.dg/tls/diag-6.c: Adjust expected diagnostics.


Index: gcc/c-family/c-attribs.c
===
--- gcc/c-family/c-attribs.c	(revision 266033)
+++ gcc/c-family/c-attribs.c	(working copy)
@@ -2239,13 +2239,15 @@ handle_copy_attribute (tree *node, tree name, tree
   /* Copy decl attributes from REF to DECL.  */
   for (tree at = attrs; at; at = TREE_CHAIN (at))
 	{
-	  /* Avoid copying attributes that affect a symbol linkage or
-	 visibility since those in all likelihood only apply to
-	 the target.
+	  /* Avoid copying attributes that affect a symbol linkage,
+	 inlining, or visibility since those in all likelihood
+	 only apply to the target.
 	 FIXME: make it possible to specify which attributes to
 	 copy or not to copy in the copy attribute itself.  */
 	  tree atname = get_attribute_name (at);
 	  if (is_attribute_p ("alias", atname)
+	  || is_attribute_p ("always_inline", atname)
+	  || is_attribute_p ("gnu_inline", atname)
 	  || is_attribute_p ("ifunc", atname)
 	  || is_attribute_p ("visibility", atname)
 	  || is_attribute_p ("weak", atname)
@@ -2458,17 +2460,26 @@ handle_tls_model_attribute (tree *node, tree name,
   tree decl = *node;
   enum tls_model kind;
 
-  if (!VAR_P (decl) || !DECL_THREAD_LOCAL_P (decl))
+  if (!VAR_P (decl))
 {
-  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  warning (OPT_Wattributes, "%qE attribute ignored because %qD "
+	   "is not a variable",
+	   name, decl);
   return NULL_TREE;
 }
 
+  if (!DECL_THREAD_LOCAL_P (decl))
+{
+  warning (OPT_Wattributes, "%qE attribute ignored because %qD does "
+	   "not have thread storage duration", name, decl);
+  return NULL_TREE;
+}
+
   kind = DECL_TLS_MODEL (decl);
   id = TREE_VALUE (args);
   if (TREE_CODE (id) != STRING_CST)
 {
-  error ("tls_model argument not a string");
+  error ("%qE argument not a string", name);
   return NULL_TREE;
 }
 
@@ -2481,7 +2492,9 @@ handle_tls_model_attribute (tree *node, tree name,
   else if (!strcmp (TREE_STRING_POINTER (id), "global-dynamic"))
 kind = TLS_MODEL_GLOBAL_DYNAMIC;
   else
-error ("tls_model argument must be one of \"local-exec\", \"initial-exec\", \"local-dynamic\" or \"global-dynamic\"");
+error ("%qE argument must be one of %qs, %qs, %qs, or %qs",
+	   name,
+	   "local-exec", "initial-exec", "local-dynamic", "global-dynamic");
 
   set_decl_tls_model (decl, kind);
   return NULL_TREE;
Index: gcc/testsuite/gcc.dg/attr-copy-5.c
===
--- gcc/testsuite/gcc.dg/attr-copy-5.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/attr-copy-5.c	(working copy)
@@ -0,0 +1,57 @@
+/* PR middle-end/81824 - Warn for missing attributes with function aliases
+   Verify that attributes always_inline, gnu_inline, and noinline aren't
+   copied.  Also verify that copying attribute tls_model to a non-thread
+   variable triggers a warning.
+   { dg-do compile }
+   { dg-options "-Wall" }
+   { dg-require-effective-target tls } */
+
+#define ATTR(...)   __attribute__ ((__VA_ARGS__))
+
+ATTR (always_inline, gnu_inline, noreturn) inline int
+finline_noret (void)
+{
+  __builtin_abort ();
+  /* Expect no -Wreturn-type.  */
+}
+
+int call_finline_noret (void)
+{
+  finline_noret ();
+  /* Expect no -Wreturn-type.  */
+}
+
+
+ATTR (copy (finline_noret)) int
+fnoret (void);
+
+int call_fnoret (void)
+{
+  fnoret ();
+  /* Expect no -Wreturn-type.  */
+}
+
+
+/* Verify that attribute always_inline on a

Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Peter Bergner
On 11/13/18 12:49 PM, Richard Henderson wrote:
> On 11/13/18 5:38 PM, Peter Bergner wrote:
>> On 11/13/18 2:53 AM, Eric Botcazou wrote:
>>> Superfluous semi-colon.  Given that the function returns an operand, its 
>>> name 
>>> is IMO misleading, so maybe [get_]operand_for_simple_move_operator.
>>
>> Fixed and renamed function to operand_for_simple_move_operator.
> 
> Would not operand_for_swap_move_operator be better?  This is not a "simple
> move", it is something that requires swapping the words of the operand.
> (Presumably one could think of other operators that generate a swap, and match
> them here.  I can't think of another one off the top of my head though.)

That's fine with me.

Peter




Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Richard Henderson
On 11/13/18 5:38 PM, Peter Bergner wrote:
> On 11/13/18 2:53 AM, Eric Botcazou wrote:
>>> +static rtx
>>> +simple_move_operator (rtx x)
>>> +{
>>> +  /* A word sized rotate of a register pair is equivalent to swapping
>>> + the registers in the register pair.  */
>>> +  if (GET_CODE (x) == ROTATE
>>> +  && GET_MODE (x) == twice_word_mode
>>> +  && simple_move_operand (XEXP (x, 0))
>>> +  && CONST_INT_P (XEXP (x, 1))
>>> +  && INTVAL (XEXP (x, 1)) == BITS_PER_WORD)
>>> +return XEXP (x, 0);;
>>> +
>>> +  return NULL_RTX;
>>> +}
>>> +
>>
>> Superfluous semi-colon.  Given that the function returns an operand, its 
>> name 
>> is IMO misleading, so maybe [get_]operand_for_simple_move_operator.
> 
> Fixed and renamed function to operand_for_simple_move_operator.

Would not operand_for_swap_move_operator be better?  This is not a "simple
move", it is something that requires swapping the words of the operand.
(Presumably one could think of other operators that generate a swap, and match
them here.  I can't think of another one off the top of my head though.)


r~


Re: record_ranges_from_incoming_edge: use value_range API for creating new range

2018-11-13 Thread Richard Biener
On November 13, 2018 5:40:59 PM GMT+01:00, Aldy Hernandez  
wrote:
>With your cleanups, the main raison d'etre for my patch goes away, but 
>here is the promised removal of ignore_equivs_equal_p.
>
>I think the == operator is a bit confusing, and equality intent should 
>be clearly specified.  I am providing the following for the derived 
>class (with no hidden default arguments):
>
>   bool equal_p (const value_range &, bool ignore_equivs) const;
>
>and providing the following for the base class:
>
>   bool equal_p (const value_range_base &) const;
>
>I am also removing access to both the == and the != operators.  It 
>should now be clear from the code whether the equivalence bitmap is 
>being taken into account or not.
>
>What do you think?

Sounds good. 

Richard. 

>Aldy



[PATCH] Add missing ZLIBINC to CFLAGS-optinfo-emit-json.o

2018-11-13 Thread David Malcolm
On Tue, 2018-11-13 at 17:58 +, Kyrill Tkachov wrote:
> Hi David,
> 
> On 09/11/18 21:00, Jeff Law wrote:
> > On 11/9/18 10:51 AM, David Malcolm wrote:
> > > One of the concerns noted at Cauldron about -fsave-optimization-
> > > record
> > > was the size of the output files.
> > > 
> > > This file implements compression of the -fsave-optimization-
> > > record
> > > output, using zlib.
> > > 
> > > I did some before/after testing of this patch, using SPEC 2017's
> > > 502.gcc_r with -O3, looking at the sizes of the generated
> > > FILENAME.opt-record.json[.gz] files.
> > > 
> > > The largest file was for insn-attrtab.c:
> > >   before:  171736285 bytes (164M)
> > >   after: 5304015 bytes (5.1M)
> > > 
> > > Smallest file was for vasprintf.c:
> > >   before:  30567 bytes
> > >   after:4485 bytes
> > > 
> > > Median file by size before was lambda-mat.c:
> > >   before:2266738 bytes (2.2M)
> > >   after:   75988 bytes (15K)
> > > 
> > > Total of all files in the benchmark:
> > >   before: 2041720713 bytes (1.9G)
> > >   after:66870770 bytes (63.8M)
> > > 
> > > ...so clearly compression is a big win in terms of file size, at
> > > the
> > > cost of making the files slightly more awkward to work with. [1]
> > > I also wonder if we want to support any pre-filtering of the
> > > output
> > > (FWIW roughly half of the biggest file seems to be "Adding assert
> > > for "
> > > messages from tree-vrp.c).
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> 
> So does this now add a dependency on zlib?
> I can't build GCC on my aarch64-none-linux machine after this patch
> due to a missing zlib.h.
> I see there's a zlib in the top-level GCC tree. Is that build/used
> during the GCC build itself?
> 
> Thanks,
> Kyrill

Sorry about that.  Does the following patch fix the build for you?

gcc/ChangeLog:
* Makefile.in (CFLAGS-optinfo-emit-json.o): Add $(ZLIBINC).
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16c9ed6..1e8a311 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2233,7 +2233,7 @@ s-bversion: BASE-VER
$(STAMP) s-bversion
 
 CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
-CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\"
+CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
$(ZLIBINC)
 
 pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
$(srcdir)/gen-pass-instances.awk
-- 
1.8.5.3



Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Peter Bergner
On 11/13/18 12:06 PM, Iain Sandoe wrote:
> As far as I expect, Darwin should be untouched by this - we have a separate 
> assembler (which doesn’t even respond to -many), so unless there’s some 
> higher level translation done (it’s not mentioned in any Darwin specs), we 
> should just carry on as before.

Ah, good then.

> When I do expect things to change is when multiple .machine directives are 
> included in asm sources.
> (probably) the old cctools assembler won’t deal with them properly

Usually when there are multiple .machine's being used, they should be used
with the ".machine push" and ".machine pop" directives so the temporary
.machine value doesn't corrupt the .machine value being used for the rest
of the file.  Like so.

.machine "power8"

...

.machine push
.machine "power9"

.machine pop

...

Hopefully the cctools supports that.

Peter



Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Richard Biener
On November 13, 2018 7:09:15 PM GMT+01:00, Andi Kleen  
wrote:
>On Tue, Nov 13, 2018 at 09:03:52AM +0100, Richard Biener wrote:
>> > I even had an earlier version of this that instrumented
>> > assembler output of the compiler with PTWRITE in a separate script,
>> > and it worked fine too.
>> 
>> Apart from eventually messing up branch range restrictions I guess ;)
>
>You mean for LOOP? For everything else the assembler handles it I
>believe.
>
>> Did you gather any statistics on how many ptwrite instructions
>> that are generated by your patch are not covered by any
>> location range & expr?  
>
>Need to look into that. Any suggestions how to do it in the compiler?

I guess you need to do that in a dwarf decoder somehow. 

>I had some decode failures with the perf dwarf decoder,
>but I was usually blaming them on perf dwarf limitations. 
>
>> I assume ptwrite is writing from register
>> input only so you probably should avoid instrumenting writes
>> of constants (will require an extra register)?
>
>Hmm, I think those are needed unfortunately because someone
>might want to trace every update of of something. With branch
>tracing it could be recreated theoretically but would 
>be a lot more work for the decoder.
>
>> How does the .text size behave say for cc1 when you enable
>> the various granularities of instrumentation?  How many
>> ptwrite instructions are there per 100 regular instructions?
>
>With locals tracing (worst case) I see ~23% of all instructions
>in cc1 be PTWRITE. Binary is ~27% bigger.

OK, I suppose it will get better when addressing some of my review comments. 

>> Can we get an updated patch based on my review?
>
>Yes, working on it, also addressing Martin's comments. Hopefully soon.
>> 
>> I still think we should eventually move the pass later
>
>It's after pass_sanopt now.
>
>> avoid instrumenting places we'll not have any meaningful locations
>> in the debug info - if only to reduce required trace bandwith.
>
>Can you suggest how to check that?

I'd look at doing the instrumentation after var-tracking has run - that is what 
computes the locations in the end. That means instrumenting on late RTL after 
register allocation (and eventually with branch range restrictions in place). 
Basically you'd instrument at the same time as generating debug info.

Richard. 

>-Andi



[PATCH] Add missing ZLIBINC to CFLAGS-optinfo-emit-json.o

2018-11-13 Thread David Malcolm
On Tue, 2018-11-13 at 17:58 +, Kyrill Tkachov wrote:
> Hi David,
> 
> On 09/11/18 21:00, Jeff Law wrote:
> > On 11/9/18 10:51 AM, David Malcolm wrote:
> > > One of the concerns noted at Cauldron about -fsave-optimization-
> > > record
> > > was the size of the output files.
> > > 
> > > This file implements compression of the -fsave-optimization-
> > > record
> > > output, using zlib.
> > > 
> > > I did some before/after testing of this patch, using SPEC 2017's
> > > 502.gcc_r with -O3, looking at the sizes of the generated
> > > FILENAME.opt-record.json[.gz] files.
> > > 
> > > The largest file was for insn-attrtab.c:
> > >   before:  171736285 bytes (164M)
> > >   after: 5304015 bytes (5.1M)
> > > 
> > > Smallest file was for vasprintf.c:
> > >   before:  30567 bytes
> > >   after:4485 bytes
> > > 
> > > Median file by size before was lambda-mat.c:
> > >   before:2266738 bytes (2.2M)
> > >   after:   75988 bytes (15K)
> > > 
> > > Total of all files in the benchmark:
> > >   before: 2041720713 bytes (1.9G)
> > >   after:66870770 bytes (63.8M)
> > > 
> > > ...so clearly compression is a big win in terms of file size, at
> > > the
> > > cost of making the files slightly more awkward to work with. [1]
> > > I also wonder if we want to support any pre-filtering of the
> > > output
> > > (FWIW roughly half of the biggest file seems to be "Adding assert
> > > for "
> > > messages from tree-vrp.c).
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> 
> So does this now add a dependency on zlib?
> I can't build GCC on my aarch64-none-linux machine after this patch
> due to a missing zlib.h.
> I see there's a zlib in the top-level GCC tree. Is that build/used
> during the GCC build itself?
> 
> Thanks,
> Kyrill

Sorry about that.  Does the following patch fix the build for you?

gcc/ChangeLog:
* Makefile.in (CFLAGS-optinfo-emit-json.o): Add $(ZLIBINC).
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16c9ed6..1e8a311 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2233,7 +2233,7 @@ s-bversion: BASE-VER
$(STAMP) s-bversion
 
 CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
-CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\"
+CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
$(ZLIBINC)
 
 pass-instances.def: $(srcdir)/passes.def $(PASSES_EXTRA) \
$(srcdir)/gen-pass-instances.awk
-- 
1.8.5.3



Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Andi Kleen
On Tue, Nov 13, 2018 at 09:03:52AM +0100, Richard Biener wrote:
> > I even had an earlier version of this that instrumented
> > assembler output of the compiler with PTWRITE in a separate script,
> > and it worked fine too.
> 
> Apart from eventually messing up branch range restrictions I guess ;)

You mean for LOOP? For everything else the assembler handles it I
believe.

> Did you gather any statistics on how many ptwrite instructions
> that are generated by your patch are not covered by any
> location range & expr?  

Need to look into that. Any suggestions how to do it in the compiler?

I had some decode failures with the perf dwarf decoder,
but I was usually blaming them on perf dwarf limitations. 

> I assume ptwrite is writing from register
> input only so you probably should avoid instrumenting writes
> of constants (will require an extra register)?

Hmm, I think those are needed unfortunately because someone
might want to trace every update of of something. With branch
tracing it could be recreated theoretically but would 
be a lot more work for the decoder.

> How does the .text size behave say for cc1 when you enable
> the various granularities of instrumentation?  How many
> ptwrite instructions are there per 100 regular instructions?

With locals tracing (worst case) I see ~23% of all instructions
in cc1 be PTWRITE. Binary is ~27% bigger.

> Can we get an updated patch based on my review?

Yes, working on it, also addressing Martin's comments. Hopefully soon.
> 
> I still think we should eventually move the pass later

It's after pass_sanopt now.

> avoid instrumenting places we'll not have any meaningful locations
> in the debug info - if only to reduce required trace bandwith.

Can you suggest how to check that?

-Andi


Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Iain Sandoe
Hi Folks,

> On 13 Nov 2018, at 17:48, Peter Bergner  wrote:
> 
> On 11/13/18 5:17 AM, Segher Boessenkool wrote:
>> On Tue, Nov 13, 2018 at 12:02:55PM +1030, Alan Modra wrote:
>>> On Mon, Nov 12, 2018 at 04:34:34PM -0800, Mike Stump wrote:
 On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
 On darwin, we (darwin, as a platform decision) like all instructions 
 available from the assembler.
>>> 
>>> OK, fair enough.  Another option is to just disable -many when gcc is
>>> in development, like we enable checking.
>> 
>> That is a good plan for GCC 9 at least.
> 
> I like the plan too.  We can also continue to pass -many just for darwin
> if they really really think they need it.

As far as I expect, Darwin should be untouched by this - we have a separate 
assembler (which doesn’t even respond to -many), so unless there’s some higher 
level translation done (it’s not mentioned in any Darwin specs), we should just 
carry on as before.

When I do expect things to change is when multiple .machine directives are 
included in asm sources.
(probably) the old cctools assembler won’t deal with them properly
(the 4.0.1 era) LLVM-backend based version I have doesn’t deal with them either 
(this could be a general consideration for the other parts of the PPC 
toolchain).  Having said that, I didn’t experiment with .machine on later LLVM 
backend versions yet.

Thus, my current expectation is that this will be a NOP unless/until 
incompatible asm source changes are made.

Iain



RE: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions

2018-11-13 Thread Joseph Myers
On Tue, 13 Nov 2018, Tamar Christina wrote:

> Would restricting it to flag_unsafe_math_optimizations not be enough in 
> this case? Since if it's only done for unsafe math then you likely won't 
> care about a small loss in precision anyway?

We have what should be the right logic (modulo DFP issues) in 
convert_to_real_1, for converting (outertype)((innertype0)a+(innertype1)b) 
into ((newtype)a+(newtype)b).  I think the right thing to do is to move 
that logic, including the associated comments, from convert_to_real_1 to 
match.pd (not duplicate it or a subset in both places - make match.pd able 
to do everything that logic in convert.c does, so it's no longer needed in 
convert.c).  That logic knows both when the conversion is OK based on 
flag_unsafe_math_optimizations, and when it's OK based on 
real_can_shorten_arithmetic.

(Most of the other special case logic in convert_to_real_1 to optimize 
particular conversions would make sense to move as well - but for your 
present issue, only the PLUS_EXPR / MINUS_EXPR / MULT_EXPR / RDIV_EXPR 
logic should need to move.)

> But my simple testcase is

I'd hope a simpler test could be added - one not involving complex 
arithmetic at all, just an intermediate temporary variable or whatever's 
needed for convert_to_real_1 not to apply.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] -fsave-optimization-record: compress the output using zlib

2018-11-13 Thread Kyrill Tkachov

Hi David,

On 09/11/18 21:00, Jeff Law wrote:

On 11/9/18 10:51 AM, David Malcolm wrote:
> One of the concerns noted at Cauldron about -fsave-optimization-record
> was the size of the output files.
>
> This file implements compression of the -fsave-optimization-record
> output, using zlib.
>
> I did some before/after testing of this patch, using SPEC 2017's
> 502.gcc_r with -O3, looking at the sizes of the generated
> FILENAME.opt-record.json[.gz] files.
>
> The largest file was for insn-attrtab.c:
>   before:  171736285 bytes (164M)
>   after: 5304015 bytes (5.1M)
>
> Smallest file was for vasprintf.c:
>   before:  30567 bytes
>   after:4485 bytes
>
> Median file by size before was lambda-mat.c:
>   before:2266738 bytes (2.2M)
>   after:   75988 bytes (15K)
>
> Total of all files in the benchmark:
>   before: 2041720713 bytes (1.9G)
>   after:66870770 bytes (63.8M)
>
> ...so clearly compression is a big win in terms of file size, at the
> cost of making the files slightly more awkward to work with. [1]
> I also wonder if we want to support any pre-filtering of the output
> (FWIW roughly half of the biggest file seems to be "Adding assert for "
> messages from tree-vrp.c).
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?
>


So does this now add a dependency on zlib?
I can't build GCC on my aarch64-none-linux machine after this patch due to a 
missing zlib.h.
I see there's a zlib in the top-level GCC tree. Is that build/used during the 
GCC build itself?

Thanks,
Kyrill


> [1] I've updated my optrecord.py module to deal with this, which
> simplifies things; it's still not clear to me if that should live
> in "contrib/" or not.
>
> gcc/ChangeLog:
>* doc/invoke.texi (-fsave-optimization-record): Note that the
>output is compressed.
>* optinfo-emit-json.cc: Include .
>(optrecord_json_writer::write): Compress the output.
OK.
jeff




[PATCH][RFC] Come up with -flive-patching master option.

2018-11-13 Thread Qing Zhao
Hi,


Attached is the patch for new -flive-patching=[inline-only-static | 
inline-clone] master option.

'-flive-patching=LEVEL'
 Control GCC's optimizations to provide a safe compilation for
 live-patching.  Provides multiple-level control on how many of the
 optimizations are enabled by users' request.  The LEVEL argument
 should be one of the following:

 'inline-only-static'

  Only enable inlining of static functions, disable all other
  ipa optimizations/analyses.  As a result, when patching a
  static routine, all its callers need to be patches as well.

 'inline-clone'

  Only enable inlining and all optimizations that internally
  create clone, for example, cloning, ipa-sra, partial inlining,
  etc.; disable all other ipa optimizations/analyses.  As a
  result, when patching a routine, all its callers and its
  clones' callers need to be patched as well.

 When -flive-patching specified without any value, the default value
 is "inline-clone".

 This flag is disabled by default.

let me know your comments and suggestions on the implementation.

thanks a lot.

Qing



flive-patching.patch
Description: Binary data


> On Nov 12, 2018, at 4:29 PM, Qing Zhao  wrote:
> 
> 
>> On Nov 12, 2018, at 2:53 AM, Martin Liška  wrote:
>> 
>>> 
>>> Okay, I see.
>>> 
>>> I am also working on a similar option as yours, but make the 
>>> -flive-patching as two level control:
>>> 
>>> +flive-patching
>>> +Common RejectNegative Alias(flive-patching=,inline-clone)
>>> +
>>> +flive-patching=
>>> +Common Report Joined RejectNegative Enum(live_patching_level) 
>>> Var(flag_live_patching) Init(LIVE_NONE)
>>> +-flive-patching=[inline-only-static|inline-clone]  Control 
>>> optimizations to provide a safe comp for live-patching purpose.
>>> 
>>> the implementation for -flive-patching=inline-clone (the default) is 
>>> exactly as yours,  the new level -flive-patching=inline-only-static
>>> is to only enable inlining of static function for live patching, which is 
>>> important for multiple-processes live patching to control memory
>>> consumption. 
>>> 
>>> (please see my 2nd version of the -flive-patching proposal).
>>> 
>>> I will send out my complete patch in another email.
>> 
>> Hi, sure, works for me. Let's make 2 level option.
> 
> thank you.
> 
> I will send the patch tomorrow.
> 
> Qing
>> 
>> Martin
> 



Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Peter Bergner
On 11/13/18 5:17 AM, Segher Boessenkool wrote:
> On Tue, Nov 13, 2018 at 12:02:55PM +1030, Alan Modra wrote:
>> On Mon, Nov 12, 2018 at 04:34:34PM -0800, Mike Stump wrote:
>>> On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
>>> On darwin, we (darwin, as a platform decision) like all instructions 
>>> available from the assembler.
>>
>> OK, fair enough.  Another option is to just disable -many when gcc is
>> in development, like we enable checking.
> 
> That is a good plan for GCC 9 at least.

I like the plan too.  We can also continue to pass -many just for darwin
if they really really think they need it.

Peter



[PATCH][RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars

2018-11-13 Thread Peter Bergner
PR87870 shows a problem loading simple constant values into TImode variables.
This is a regression ever since VSX was added and we started using the
vsx_mov_64bit pattern.  We still get the correct code on trunk if we
compile with -mno-vsx, since we fall back to using the older mov_ppc64
move pattern, which has an alternative "r" <- "n".

Our current vsx_mov_64bit pattern currently has two alternatives for
loading constants into GPRs, one using "*r" <- "jwM" and "??r" <- "W".
These look redundant to me, since "W" contains support for both all-zero
constants (ie, "j") and all-one constants (ie, wM) as well as a few more.
My patch below consolidates them both and uses a new mode iterator that
uses "W" for the vector modes and "n" for TImode like mov_ppc64
used.

I'll note I didn't change the vsx_mov_32bit pattern, since TImode
isn't supported with -m32.  However, if you want, I could remove the
redundant "*r" <- "jwM" alternative there too?

This passes bootstrap and regtesting with no regressions.  Ok for trunk?

Peter


gcc/
PR target/87870
* config/rs6000/vsx.md (nW): New mode iterator.
(vsx_mov_64bit): Use it.  Remove redundant GPR 0/-1 alternative.

gcc/testsuite/
PR target/87870
* gcc.target/powerpc/pr87870.c: New test.

Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 265971)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -183,6 +183,18 @@ (define_mode_attr ??r  [(V16QI "??r")
 (TF"??r")
 (TI"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW   [(V16QI "W")
+(V8HI  "W")
+(V4SI  "W")
+(V4SF  "W")
+(V2DI  "W")
+(V2DF  "W")
+(V1TI  "W")
+(KF"W")
+(TF"W")
+(TI"n")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VSi [(V4SF  "v4si")
   (V2DF  "v2di")
@@ -1193,17 +1205,17 @@ (define_insn_and_split "*xxspltib_
 
 ;;  VSX store  VSX load   VSX move  VSX->GPR   GPR->VSXLQ (GPR)
 ;;  STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIBVSPLTISW
-;;  VSX 0/-1   GPR 0/-1   VMX const GPR const  LVX (VMX)   STVX 
(VMX)
+;;  VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=ZwO,  , , r, we,?wQ,
 ?&r,   ??r,   ??Y,   , wo,v,
-?,*r,v, ??r,   wZ,v")
+?,v, , wZ,v")
 
(match_operand:VSX_M 1 "input_operand" 
", ZwO,   , we,r, r,
 wQ,Y, r, r, wE,jwM,
-?jwM,  jwM,   W, W, v, wZ"))]
+?jwM,  W, ,  v, wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)
&& (register_operand (operands[0], mode) 
@@ -1214,12 +1226,12 @@ (define_insn "vsx_mov_64bit"
   [(set_attr "type"
"vecstore,  vecload,   vecsimple, mffgpr,mftgpr,load,
 store, load,  store, *, vecsimple, 
vecsimple,
-vecsimple, *, *, *, vecstore,  
vecload")
+vecsimple, *, *, vecstore,  vecload")
 
(set_attr "length"
"4, 4, 4, 8, 4, 8,
 8, 8, 8, 8, 4, 4,
-4, 8, 20,20,4, 4")])
+4, 20,20,4, 4")])
 
 ;;  VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;  XXSPLTIB   VSPLTISW   VSX 0/-1   GPR 0/-1   VMX const  GPR 
const
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 265971)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -183,6 +183,18 @@ (define_mode_attr ??r  [(V16QI "??r")
 (TF"??r")
 (TI"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW   [(V16QI "W")
+(V8HI  "W")
+(V4SI  "W")
+(V4SF  "W")
+(V2DI  "W")
+(V2DF  "W")
+(V1TI  "W")
+(KF"W")
+(TF"W")
+(TI"n")])
+
 ;; Same size i

Re: record_ranges_from_incoming_edge: use value_range API for creating new range

2018-11-13 Thread Aldy Hernandez
With your cleanups, the main raison d'etre for my patch goes away, but 
here is the promised removal of ignore_equivs_equal_p.


I think the == operator is a bit confusing, and equality intent should 
be clearly specified.  I am providing the following for the derived 
class (with no hidden default arguments):


bool equal_p (const value_range &, bool ignore_equivs) const;

and providing the following for the base class:

bool equal_p (const value_range_base &) const;

I am also removing access to both the == and the != operators.  It 
should now be clear from the code whether the equivalence bitmap is 
being taken into account or not.


What do you think?

Aldy


RE: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions

2018-11-13 Thread Tamar Christina
Hi Joseph,

> What types exactly is this meant to apply to?  Floating-point?  Integer?  
> Mixtures of those?  (I'm guessing not mixtures, because those would be 
> something other than "convert" here.)

Originally I had it for both Floating-point and Integer, but not a mix of the 
two.

> For integer types, it's not safe, in that if e.g. F is int and X is unsigned 
> long long, you're changing from defined overflow to undefined overflow.

This is a good point, so I should limited it to floating point formats only.

> For floating-point types, using TYPE_PRECISION is suspect (it's not 
> wonderfully clear what it means, but it's not the number of significand
> bits) - you need to look at the actual properties of the real format of the 
> machine modes in question.

Would restricting it to flag_unsafe_math_optimizations not be enough in this 
case? Since if it's only done for unsafe math
then you likely won't care about a small loss in precision anyway?

> Specifically, see convert.c:convert_to_real_1, the long comment starting 
> "Sometimes this transformation is safe (cannot change results through 
> affecting double rounding cases) and sometimes it is not.", and the 
> associated code calling real_can_shorten_arithmetic.  I think that code in 
> convert.c ought to apply to your case of half precision converted to float 
> for arithmetic and then converted back to half precision afterwards.  (In the 
> case where the excess precision configuration - which depends on 
> TARGET_FP_F16INST for AArch64 - says to do arithmetic directly on half 
> precision, anyway.)

Sorry I hadn't attached a test-case because I wanted to get some feedback 
before.

But my simple testcase is

#include 

#define N 200

void foo (_Float16 complex a[N], _Float16 complex b[N], _Float16 complex *c)
{
  for (int x = 0; x < N; x++)
c[x] = a[x] + b[x] * I;
}

A simplified version of one of the expressions in the gimple tree for this is

  _7 = IMAGPART_EXPR <*_4>;
  _13 = REALPART_EXPR <*_3>;
  _8 = (floatD.30) _7;
  _14 = (floatD.30) _13;
  _35 = _14 + _8;
  _19 = (_Float16D.33) _35;
  IMAGPART_EXPR <*_20> = _19;

note that the type of _4, _3 and _19 are that of half float.

  _4 = a_26(D) + _2;
  _3 = b_25(D) + _2;
  _20 = c_28(D) + _2;

  complex _Float16D.43 * a_26(D) = aD.3538;
  complex _Float16D.43 * b_25(D) = bD.3539;
  complex _Float16D.43 * c_28(D) = cD.3540;

If the code is SLP vectorized, then inside the tree (without my match.pd rule)
it will contain the casts, which is evident from the resulting vector code

ldr h0, [x6, x3]
ldr h2, [x1, x3]
fcvts0, h0
fcvts2, h2
fadds0, s0, s2
fcvth1, s1
fcvth0, s0
str h1, [x2, x3]
str h0, [x4, x3]

The problem seems to be (as far as I can tell) that when a loop is present, a 
new
temporary is created to store the intermediate result, while it's creating the
destination variable with the dest + offset.  It can't do the cast and the 
computation
at the same time.

foo (complex _Float16 * a, complex _Float16 * b, complex _Float16 * c)
{
  complex _Float16 D_3548;
  complex _Float16 D_3549;
  complex float D_3550;

The type of this temporary seems to be the type of the inner computation, so in
this case float due to the 90 rotation of b in the C file.

This also has the effect that it seems to make the type of the imaginary and 
real
expressions all float

So when convert.c tries to convert them, all it sees are

 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x7ffb18e362a0 precision:32
pointer_to_this >
side-effects
arg:0  -Original Message-
> From: Joseph Myers 
> Sent: Tuesday, November 13, 2018 00:49
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> ; Richard Earnshaw
> ; Marcus Shawcroft
> ; l...@redhat.com; i...@airs.com;
> rguent...@suse.de
> Subject: Re: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away
> unneeded type casts in expressions
> 
> On Sun, 11 Nov 2018, Tamar Christina wrote:
> 
> > This patch adds a match.pd rule for stripping away the type converts
> > when you're converting to a type that has twice the precision of the
> > current type in the same class, doing a simple math operation on it
> > and converting back to the smaller type.
> 
> What types exactly is this meant to apply to?  Floating-point?  Integer?
> Mixtures of those?  (I'm guessing not mixtures, because those would be
> something other than "convert" here.)
> 
> For integer types, it's not safe, in that if e.g. F is int and X is unsigned 
> long
> long, you're changing from defined overflow to undefined overflow.
> 
> For floating-point types, using TYPE_PRECISION is suspect (it's not
> wonderfully clear what it means, but it's not the number of significand
> bits) - you need to look at the actual properties of the real format of the
> machine modes in question.
> 
> Specifically, s

Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 4:51 PM Alexandre Oliva  wrote:
>
> On Nov 13, 2018, Richard Biener  wrote:
>
> > Reworking gnattools build to always use host CC/CXX in "stage1" (or for 
> > crosses)
> > rather than doing sth different.
>
> > Yeah, but gnattools is bootstrapped, right?
>
> No, it's not built in stage1, it's a post-bootstrap host subpackage.

Huh, indeed - it's a host_module without bootstrap ...  and libada is
a target_module not bootstrapped either.  So we're indeed in a curious
situation where we have a bootstrap of Ada requiring a host Ada but
nothing of Ada is actually bootstrapped ... ;)

> > For --disable-bootstrap you get binaries built with the host compiler
> > throughout and that's good.
>
> I guess it *could* be built with the host compiler, and linked with that
> runtime.  Then, in order to run it, you might still need the previous
> gnat rts shared libs.  Just like for a non-bootstrapped compiler
> front-end, yeah.

Yeah, I expected that for non-bootstrap.  And I somehow assumed it
was bootstrapped so I'd get gnattools and gnat1 not depending on the
host compiler libs.  I guess we're lucky for gnat1 because it's written
in C?

At least I now somewhat understand why we wind up with the
bootstrapped CC/CXX for the build of the stage3 host modules.

>
> Whereas for a bootstrapped compiler, you'd want to use what, the stage2
> compiler, that built other final host tools, to build gnattools as well?
>
> All doable, but not without additional complexity.  I'm afraid I fail to
> see the upside.
>
> Unfortunately we don't have a lot of other post-bootstrap host tools
> linked (on native builds) with target libraries to draw parallels.  I
> guess we could reason about them as if they were part of the gcc/
> subdir, and this would lead down the path you suggest.  I will then
> point to the complications arising from using a just-built libstdc++ in
> subsequent stages, and in later target builds in the same stage, and
> conclude it's far from trivial, it's actually quite error-prone and
> hardly glitch-free, and so it's not too hard to understand why gnat,
> that had to take care of this before we even thought of bootstrapping
> libstdc++, took a different path.
>
> Anyway...  I won't take your suggestion as an objection to the proposed
> patch, but I will say that I still have a lot to learn about the Ada
> build machinery to be able to grasp the impact your suggestion might
> have on existing uses, so, as much as I like uniformity and symmetry,
> I'm not jumping right onto it ;-)

;)

Richard.

>
> --
> Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
> Be the change, be Free! FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [PATCH][lower-subreg] Fix PR87507

2018-11-13 Thread Peter Bergner
On 11/13/18 2:53 AM, Eric Botcazou wrote:
>> +static rtx
>> +simple_move_operator (rtx x)
>> +{
>> +  /* A word sized rotate of a register pair is equivalent to swapping
>> + the registers in the register pair.  */
>> +  if (GET_CODE (x) == ROTATE
>> +  && GET_MODE (x) == twice_word_mode
>> +  && simple_move_operand (XEXP (x, 0))
>> +  && CONST_INT_P (XEXP (x, 1))
>> +  && INTVAL (XEXP (x, 1)) == BITS_PER_WORD)
>> +return XEXP (x, 0);;
>> +
>> +  return NULL_RTX;
>> +}
>> +
> 
> Superfluous semi-colon.  Given that the function returns an operand, its name 
> is IMO misleading, so maybe [get_]operand_for_simple_move_operator.

Fixed and renamed function to operand_for_simple_move_operator.


> Can we factor out the duplicate manipulation into a function here, for 
> example 
> resolve_operand_for_simple_move_operator?

Good idea.  I went with your name, but would swap_concatn_operands() be a
better name, since that is what it is doing?  I'll leave it up to you.
Updated patch below.

Peter


gcc/
PR rtl-optimization/87507
* lower-subreg.c (operand_for_simple_move_operator): New function.
(simple_move): Strip simple operators.
(find_pseudo_copy): Likewise.
(resolve_operand_for_simple_move_operator): New function.
(resolve_simple_move): Strip simple operators and swap operands.

gcc/testsuite/
PR rtl-optimization/87507
* gcc.target/powerpc/pr87507.c: New test.
* gcc.target/powerpc/pr68805.c: Update expected results.

Index: gcc/lower-subreg.c
===
--- gcc/lower-subreg.c  (revision 265971)
+++ gcc/lower-subreg.c  (working copy)
@@ -320,6 +320,24 @@ simple_move_operand (rtx x)
   return true;
 }
 
+/* If X is an operator that can be treated as a simple move that we
+   can split, then return the operand that is operated on.  */
+
+static rtx
+operand_for_simple_move_operator (rtx x)
+{
+  /* A word sized rotate of a register pair is equivalent to swapping
+ the registers in the register pair.  */
+  if (GET_CODE (x) == ROTATE
+  && GET_MODE (x) == twice_word_mode
+  && simple_move_operand (XEXP (x, 0))
+  && CONST_INT_P (XEXP (x, 1))
+  && INTVAL (XEXP (x, 1)) == BITS_PER_WORD)
+return XEXP (x, 0);
+
+  return NULL_RTX;
+}
+
 /* If INSN is a single set between two objects that we want to split,
return the single set.  SPEED_P says whether we are optimizing
INSN for speed or size.
@@ -330,7 +348,7 @@ simple_move_operand (rtx x)
 static rtx
 simple_move (rtx_insn *insn, bool speed_p)
 {
-  rtx x;
+  rtx x, op;
   rtx set;
   machine_mode mode;
 
@@ -348,6 +366,9 @@ simple_move (rtx_insn *insn, bool speed_
 return NULL_RTX;
 
   x = SET_SRC (set);
+  if ((op = operand_for_simple_move_operator (x)) != NULL_RTX)
+x = op;
+
   if (x != recog_data.operand[0] && x != recog_data.operand[1])
 return NULL_RTX;
   /* For the src we can handle ASM_OPERANDS, and it is beneficial for
@@ -386,9 +407,13 @@ find_pseudo_copy (rtx set)
 {
   rtx dest = SET_DEST (set);
   rtx src = SET_SRC (set);
+  rtx op;
   unsigned int rd, rs;
   bitmap b;
 
+  if ((op = operand_for_simple_move_operator (src)) != NULL_RTX)
+src = op;
+
   if (!REG_P (dest) || !REG_P (src))
 return false;
 
@@ -846,6 +871,21 @@ can_decompose_p (rtx x)
   return true;
 }
 
+/* OPND is a concatn operand this is used with a simple move operator.
+   Return a new rtx with the concatn's operands swapped.  */
+
+static rtx
+resolve_operand_for_simple_move_operator (rtx opnd)
+{
+  gcc_assert (GET_CODE (opnd) == CONCATN);
+  rtx concatn = copy_rtx (opnd);
+  rtx op0 = XVECEXP (concatn, 0, 0);
+  rtx op1 = XVECEXP (concatn, 0, 1);
+  XVECEXP (concatn, 0, 0) = op1;
+  XVECEXP (concatn, 0, 1) = op0;
+  return concatn;
+}
+
 /* Decompose the registers used in a simple move SET within INSN.  If
we don't change anything, return INSN, otherwise return the start
of the sequence of moves.  */
@@ -853,7 +893,7 @@ can_decompose_p (rtx x)
 static rtx_insn *
 resolve_simple_move (rtx set, rtx_insn *insn)
 {
-  rtx src, dest, real_dest;
+  rtx src, dest, real_dest, src_op;
   rtx_insn *insns;
   machine_mode orig_mode, dest_mode;
   unsigned int orig_size, words;
@@ -876,6 +916,23 @@ resolve_simple_move (rtx set, rtx_insn *
 
   real_dest = NULL_RTX;
 
+  if ((src_op = operand_for_simple_move_operator (src)) != NULL_RTX)
+{
+  if (resolve_reg_p (dest))
+   {
+ /* DEST is a CONCATN, so swap its operands and strip
+SRC's operator.  */
+ dest = resolve_operand_for_simple_move_operator (dest);
+ src = src_op;
+   }
+  else if (resolve_reg_p (src_op))
+   {
+ /* SRC is an operation on a CONCATN, so strip the operator and
+swap the CONCATN's operands.  */
+ src = resolve_operand_for_simple_move_operator (src_op);
+   }
+}
+
   if (GET_CODE (src) == SUBREG
   && resolve_reg_p (

Re: [PATCH] Fix PR86991

2018-11-13 Thread Richard Biener
On Tue, 13 Nov 2018, Richard Biener wrote:

> 
> This PR shows we have stale reduction groups lying around because
> the fixup doesn't work reliably with reduction chains.  Fixed by
> delaying the build to after detection is successful.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.

The following slightly fixed is what I have applied.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/86991
* tree-vect-loop.c (vect_is_slp_reduction): Delay reduction
group building until we have successfully detected the SLP
reduction.
(vect_is_simple_reduction): Remove fixup code here.

* gcc.dg/pr86991.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266071)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2466,7 +2466,7 @@ vect_is_slp_reduction (loop_vec_info loo
   struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
   enum tree_code code;
   gimple *loop_use_stmt = NULL;
-  stmt_vec_info use_stmt_info, current_stmt_info = NULL;
+  stmt_vec_info use_stmt_info;
   tree lhs;
   imm_use_iterator imm_iter;
   use_operand_p use_p;
@@ -2476,6 +2476,7 @@ vect_is_slp_reduction (loop_vec_info loo
   if (loop != vect_loop)
 return false;
 
+  auto_vec reduc_chain;
   lhs = PHI_RESULT (phi);
   code = gimple_assign_rhs_code (first_stmt);
   while (1)
@@ -2528,17 +2529,9 @@ vect_is_slp_reduction (loop_vec_info loo
 
   /* Insert USE_STMT into reduction chain.  */
   use_stmt_info = loop_info->lookup_stmt (loop_use_stmt);
-  if (current_stmt_info)
-{
- REDUC_GROUP_NEXT_ELEMENT (current_stmt_info) = use_stmt_info;
-  REDUC_GROUP_FIRST_ELEMENT (use_stmt_info)
-= REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-}
-  else
-   REDUC_GROUP_FIRST_ELEMENT (use_stmt_info) = use_stmt_info;
+  reduc_chain.safe_push (use_stmt_info);
 
   lhs = gimple_assign_lhs (loop_use_stmt);
-  current_stmt_info = use_stmt_info;
   size++;
}
 
@@ -2548,10 +2541,9 @@ vect_is_slp_reduction (loop_vec_info loo
   /* Swap the operands, if needed, to make the reduction operand be the second
  operand.  */
   lhs = PHI_RESULT (phi);
-  stmt_vec_info next_stmt_info = REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-  while (next_stmt_info)
+  for (unsigned i = 0; i < reduc_chain.length (); ++i)
 {
-  gassign *next_stmt = as_a  (next_stmt_info->stmt);
+  gassign *next_stmt = as_a  (reduc_chain[i]->stmt);
   if (gimple_assign_rhs2 (next_stmt) == lhs)
{
  tree op = gimple_assign_rhs1 (next_stmt);
@@ -2565,7 +2557,6 @@ vect_is_slp_reduction (loop_vec_info loo
  && vect_valid_reduction_input_p (def_stmt_info))
{
  lhs = gimple_assign_lhs (next_stmt);
- next_stmt_info = REDUC_GROUP_NEXT_ELEMENT (next_stmt_info);
  continue;
}
 
@@ -2600,14 +2591,20 @@ vect_is_slp_reduction (loop_vec_info loo
 }
 
   lhs = gimple_assign_lhs (next_stmt);
-  next_stmt_info = REDUC_GROUP_NEXT_ELEMENT (next_stmt_info);
 }
 
+  /* Build up the actual chain.  */
+  for (unsigned i = 0; i < reduc_chain.length () - 1; ++i)
+{
+  REDUC_GROUP_FIRST_ELEMENT (reduc_chain[i]) = reduc_chain[0];
+  REDUC_GROUP_NEXT_ELEMENT (reduc_chain[i]) = reduc_chain[i+1];
+}
+  REDUC_GROUP_FIRST_ELEMENT (reduc_chain.last ()) = reduc_chain[0];
+  REDUC_GROUP_NEXT_ELEMENT (reduc_chain.last ()) = NULL;
+
   /* Save the chain for further analysis in SLP detection.  */
-  stmt_vec_info first_stmt_info
-= REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-  LOOP_VINFO_REDUCTION_CHAINS (loop_info).safe_push (first_stmt_info);
-  REDUC_GROUP_SIZE (first_stmt_info) = size;
+  LOOP_VINFO_REDUCTION_CHAINS (loop_info).safe_push (reduc_chain[0]);
+  REDUC_GROUP_SIZE (reduc_chain[0]) = size;
 
   return true;
 }
@@ -3182,16 +3195,6 @@ vect_is_simple_reduction (loop_vec_info
   return def_stmt_info;
 }
 
-  /* Dissolve group eventually half-built by vect_is_slp_reduction.  */
-  stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (def_stmt_info);
-  while (first)
-{
-  stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (first);
-  REDUC_GROUP_FIRST_ELEMENT (first) = NULL;
-  REDUC_GROUP_NEXT_ELEMENT (first) = NULL;
-  first = next;
-}
-
   /* Look for the expression computing loop_arg from loop PHI result.  */
   if (check_reduction_path (vect_location, loop, phi, loop_arg, code))
 return def_stmt_info;
Index: gcc/testsuite/gcc.dg/pr86991.c
===
--- gcc/testsuite/gcc.dg/pr86991.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr86991.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int b;
+extern unsigned c[];
+unsigned d;
+long e;
+
+void f()
+{
+  unsigned g, h;
+  for (; d; d += 2) {
+

Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-13 Thread Peter Bergner
On 11/13/18 9:01 AM, Renlin Li wrote:
> I could verify that, your patch fixes all the ICEs I saw with 
> arm-linux-gnueabihf toolchain!
> There are some differences on the test results, because I compare the latest 
> results with something which is old.
> 
> I haven't test it on bare-metal toolchain yet. But will do to ensure all 
> related issues are fixed.

Hi Renlin,

That's excellent news!  My guess on the testsuite results changes is that
they're probably caused by the combine changes/fixes that went in around
the same time as my patches.

If you want to disable the special copy handling, which should only help
things since it gives RA more freedom, you can apply the patch I mentioned
here:

  https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00379.html

which allows you to turn on and off the optimization with an option.


Jeff and Vlad,

I think with the above results, I think the patch is ready for review.
I'm attaching the latest updated patch below.

Again, this passed bootstrap and regtesting on powerpc64le-linux with
no regressions.  Ok for mainline?

Peter


gcc/
PR rtl-optimization/87899
* lra-lives.c (start_living): Update white space in comment.
(enum point_type): New.
(sparseset_contains_pseudos_p): New function.
(update_pseudo_point): Likewise.
(make_hard_regno_live): Use HARD_REGISTER_NUM_P macro.
(make_hard_regno_dead): Likewise.  Remove ignore_reg_for_conflicts
handling.  Move early exit after adding conflicts.
(mark_pseudo_live): Use HARD_REGISTER_NUM_P macro.  Add early exit
if regno is already live.  Remove all handling of program points.
(mark_pseudo_dead): Use HARD_REGISTER_NUM_P macro.  Add early exit
after adding conflicts.  Remove all handling of program points and
ignore_reg_for_conflicts.
(mark_regno_live): Use HARD_REGISTER_NUM_P macro.  Remove return value
and do not guard call to mark_pseudo_live.
(mark_regno_dead): Use HARD_REGISTER_NUM_P macro.  Remove return value
and do not guard call to mark_pseudo_dead.
(check_pseudos_live_through_calls): Use HARD_REGISTER_NUM_P macro.
(process_bb_lives): Use HARD_REGISTER_NUM_P and HARD_REGISTER_P macros.
Use new function update_pseudo_point.  Handle register copies by
removing the source register from the live set.  Handle INOUT operands.
Update to the next program point using the unused_set, dead_set and
start_dying sets.
(lra_create_live_ranges_1): Use HARD_REGISTER_NUM_P macro.

Index: gcc/lra-lives.c
===
--- gcc/lra-lives.c (revision 265971)
+++ gcc/lra-lives.c (working copy)
@@ -83,7 +83,7 @@ static HARD_REG_SET hard_regs_live;
 
 /* Set of pseudos and hard registers start living/dying in the current
insn.  These sets are used to update REG_DEAD and REG_UNUSED notes
-   in the insn. */
+   in the insn.  */
 static sparseset start_living, start_dying;
 
 /* Set of pseudos and hard regs dead and unused in the current
@@ -96,10 +96,6 @@ static bitmap_head temp_bitmap;
 /* Pool for pseudo live ranges. */
 static object_allocator lra_live_range_pool ("live ranges");
 
-/* If non-NULL, the source operand of a register to register copy for which
-   we should not add a conflict with the copy's destination operand.  */
-static rtx ignore_reg_for_conflicts;
-
 /* Free live range list LR.  */
 static void
 free_live_range_list (lra_live_range_t lr)
@@ -224,6 +220,57 @@ lra_intersected_live_ranges_p (lra_live_
   return false;
 }
 
+enum point_type {
+  DEF_POINT,
+  USE_POINT
+};
+
+/* Return TRUE if set A contains a pseudo register, otherwise, return FALSE.  
*/
+static bool
+sparseset_contains_pseudos_p (sparseset a)
+{
+  int regno;
+  EXECUTE_IF_SET_IN_SPARSESET (a, regno)
+if (!HARD_REGISTER_NUM_P (regno))
+  return true;
+  return false;
+}
+
+/* Mark pseudo REGNO as living or dying at program point POINT, depending on
+   whether TYPE is a definition or a use.  If this is the first reference to
+   REGNO that we've encountered, then create a new live range for it.  */
+
+static void
+update_pseudo_point (int regno, int point, enum point_type type)
+{
+  lra_live_range_t p;
+
+  /* Don't compute points for hard registers.  */
+  if (HARD_REGISTER_NUM_P (regno))
+return;
+
+  if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
+{
+  if (type == DEF_POINT)
+   {
+ if (sparseset_bit_p (pseudos_live, regno))
+   {
+ p = lra_reg_info[regno].live_ranges;
+ lra_assert (p != NULL);
+ p->finish = point;
+   }
+   }
+  else /* USE_POINT */
+   {
+ if (!sparseset_bit_p (pseudos_live, regno)
+ && ((p = lra_reg_info[regno].live_ranges) == NULL
+ || (p->finish != point && p->finish + 1 != point)))
+   lra_reg_info[regno].l

Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Alexandre Oliva
On Nov 13, 2018, Richard Biener  wrote:

> Reworking gnattools build to always use host CC/CXX in "stage1" (or for 
> crosses)
> rather than doing sth different.

> Yeah, but gnattools is bootstrapped, right?

No, it's not built in stage1, it's a post-bootstrap host subpackage.

> For --disable-bootstrap you get binaries built with the host compiler
> throughout and that's good.

I guess it *could* be built with the host compiler, and linked with that
runtime.  Then, in order to run it, you might still need the previous
gnat rts shared libs.  Just like for a non-bootstrapped compiler
front-end, yeah.

Whereas for a bootstrapped compiler, you'd want to use what, the stage2
compiler, that built other final host tools, to build gnattools as well?

All doable, but not without additional complexity.  I'm afraid I fail to
see the upside.

Unfortunately we don't have a lot of other post-bootstrap host tools
linked (on native builds) with target libraries to draw parallels.  I
guess we could reason about them as if they were part of the gcc/
subdir, and this would lead down the path you suggest.  I will then
point to the complications arising from using a just-built libstdc++ in
subsequent stages, and in later target builds in the same stage, and
conclude it's far from trivial, it's actually quite error-prone and
hardly glitch-free, and so it's not too hard to understand why gnat,
that had to take care of this before we even thought of bootstrapping
libstdc++, took a different path.

Anyway...  I won't take your suggestion as an objection to the proposed
patch, but I will say that I still have a lot to learn about the Ada
build machinery to be able to grasp the impact your suggestion might
have on existing uses, so, as much as I like uniformity and symmetry,
I'm not jumping right onto it ;-)

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: Bug 52869 - [DR 1207] "this" not being allowed in noexcept clauses

2018-11-13 Thread Marek Polacek
On Tue, Nov 13, 2018 at 11:49:55AM +0530, Umesh Kalappa wrote:
> Hi All,
> 
> the following patch fix the subjected issue
> 
> Index: gcc/cp/parser.c
> ===
> --- gcc/cp/parser.c (revision 266026)
> +++ gcc/cp/parser.c (working copy)
> @@ -24615,6 +24615,8 @@
>  {
>tree expr;
>cp_lexer_consume_token (parser->lexer);
> +
> +  inject_this_parameter (current_class_type, TYPE_UNQUALIFIED);
> 
>if (cp_lexer_peek_token (parser->lexer)->type == CPP_OPEN_PAREN)
> {
> 
> 
> ok to commit along the testcase with changelog update ?

Thanks for the patch.

Please also include the testcase along with the patch (and I think it should
also test noexcept in a template).  Please also include a ChangeLog entry
in the patch submission.

Can you describe how this patch has been tested?

Further, wouldn't it be better to call inject_this_parameter inside the
CPP_OPEN_PAREN block?  If noexcept doesn't have any expression, then it
can't refer to "this".

Marek


Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Martin Liška
On 11/13/18 4:26 PM, Jakub Jelinek wrote:
> On Tue, Nov 13, 2018 at 04:24:39PM +0100, Martin Liška wrote:
>> 2018-11-13  Martin Liska  
>>
>>  * pr87930.c: Move from gcc/testsuite/gcc.target/i386/
>>  into gcc/testsuite/gcc.dg/asan/.
> 
>   * gcc.target/i386/pr87930.c: Move to ...
>   * gcc.dg/asan/pr87930.c: ... here.  Guard for i?86/x86_64 targets.
> 
> Ok with that change.

Thanks, installed as r266076.

Martin

> 
>> ---
>>  gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>  rename gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c (68%)
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/pr87930.c 
>> b/gcc/testsuite/gcc.dg/asan/pr87930.c
>> similarity index 68%
>> rename from gcc/testsuite/gcc.target/i386/pr87930.c
>> rename to gcc/testsuite/gcc.dg/asan/pr87930.c
>> index e9cf29c221a..4f8e6999fde 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr87930.c
>> +++ b/gcc/testsuite/gcc.dg/asan/pr87930.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target lp64 } } */
>> +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
>>  /* { dg-options "-fsanitize=address -mabi=ms" } */
>>  
>>  int i;
>> -- 
>> 2.19.1
>>
> 
> 
>   Jakub
> 



Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Jakub Jelinek
On Tue, Nov 13, 2018 at 04:24:39PM +0100, Martin Liška wrote:
> 2018-11-13  Martin Liska  
> 
>   * pr87930.c: Move from gcc/testsuite/gcc.target/i386/
>   into gcc/testsuite/gcc.dg/asan/.

* gcc.target/i386/pr87930.c: Move to ...
* gcc.dg/asan/pr87930.c: ... here.  Guard for i?86/x86_64 targets.

Ok with that change.

> ---
>  gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>  rename gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c (68%)
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr87930.c 
> b/gcc/testsuite/gcc.dg/asan/pr87930.c
> similarity index 68%
> rename from gcc/testsuite/gcc.target/i386/pr87930.c
> rename to gcc/testsuite/gcc.dg/asan/pr87930.c
> index e9cf29c221a..4f8e6999fde 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87930.c
> +++ b/gcc/testsuite/gcc.dg/asan/pr87930.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target lp64 } } */
> +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
>  /* { dg-options "-fsanitize=address -mabi=ms" } */
>  
>  int i;
> -- 
> 2.19.1
> 


Jakub


Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Martin Liška
On 11/13/18 4:08 PM, Jakub Jelinek wrote:
> On Tue, Nov 13, 2018 at 04:01:56PM +0100, Martin Liška wrote:
>> On 11/12/18 1:07 PM, Jakub Jelinek wrote:
>>> On Mon, Nov 12, 2018 at 01:03:41PM +0100, Martin Liška wrote:
 The patch reject usage of the mentioned options.

 Ready for trunk?
 Thanks,
 Martin

 gcc/ChangeLog:

 2018-11-12  Martin Liska  

PR sanitizer/87930
* config/i386/i386.c (ix86_option_override_internal): Error
about usage -mabi=ms and -fsanitize={,kernel-}address.
>>>
>>> Please add testcases for this. 
>>
>> Done in attached patch.
>>
>> Can this be changed through attribute too?
>>
>> No.
>>
>> I'm going to install the patch.
> 
> Can you please move the test to gcc.dg/asan/ and guard with
> { i?86-*-* x86_64-*-* } && lp64 ?
> I don't think we don't want -fsanitize=address tests outside of /asan/
> subdirs where we guard them on the availability of asan.
> 
>   Jakub
> 

Sure, there's a patch I've just tested.

Martin
>From 89aa27fb5d816c7d854cd77ba0f157d69bc927f7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 13 Nov 2018 16:23:08 +0100
Subject: [PATCH] Move a test-case to a proper folder.

ChangeLog:

2018-11-13  Martin Liska  

	* pr87930.c: Move from gcc/testsuite/gcc.target/i386/
	into gcc/testsuite/gcc.dg/asan/.
---
 gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 rename gcc/testsuite/{gcc.target/i386 => gcc.dg/asan}/pr87930.c (68%)

diff --git a/gcc/testsuite/gcc.target/i386/pr87930.c b/gcc/testsuite/gcc.dg/asan/pr87930.c
similarity index 68%
rename from gcc/testsuite/gcc.target/i386/pr87930.c
rename to gcc/testsuite/gcc.dg/asan/pr87930.c
index e9cf29c221a..4f8e6999fde 100644
--- a/gcc/testsuite/gcc.target/i386/pr87930.c
+++ b/gcc/testsuite/gcc.dg/asan/pr87930.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
 /* { dg-options "-fsanitize=address -mabi=ms" } */
 
 int i;
-- 
2.19.1



[PATCH] update-copyright.py: Add filters for D language sources

2018-11-13 Thread Iain Buclaw
Hi,

This adds filters for upstream dmd, druntime, and phobos libraries, so
that the update-copyright script doesn't complain or try to update the
copyright years for those files.

OK for trunk?
-- 
Iain

---
contrib/ChangeLog:

2018-11-13  Iain Buclaw  

* update-copyright.py (TestsuiteFilter): Skip .d tests.
(LibPhobosFilter): Add filter for upstream D sources.
(GCCCopyright): Add D Language Foundation as external author.
(GCCCmdLine): Add libphobos.

---
diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py
index 9295c6b8a30..67c21cab23c 100755
--- a/contrib/update-copyright.py
+++ b/contrib/update-copyright.py
@@ -574,6 +574,7 @@ class TestsuiteFilter (GenericFilter):
 '.c',
 '.C',
 '.cc',
+'.d',
 '.h',
 '.hs',
 '.f',
@@ -616,6 +617,25 @@ class LibGCCFilter (GenericFilter):
 'soft-fp',
 ])
 
+class LibPhobosFilter (GenericFilter):
+def __init__ (self):
+GenericFilter.__init__ (self)
+
+self.skip_files |= set ([
+# Source module imported from upstream.
+'object.d',
+])
+
+self.skip_dirs |= set ([
+# Contains sources imported from upstream.
+'core',
+'etc',
+'gc',
+'gcstub',
+'rt',
+'std',
+])
+
 class LibStdCxxFilter (GenericFilter):
 def __init__ (self):
 GenericFilter.__init__ (self)
@@ -682,6 +702,7 @@ class GCCCopyright (Copyright):
 self.add_external_author ('Silicon Graphics')
 self.add_external_author ('Stephen L. Moshier')
 self.add_external_author ('Sun Microsystems, Inc. All rights reserved.')
+self.add_external_author ('The D Language Foundation, All Rights Reserved')
 self.add_external_author ('The Go Authors.  All rights reserved.')
 self.add_external_author ('The Go Authors. All rights reserved.')
 self.add_external_author ('The Go Authors.')
@@ -720,6 +741,7 @@ class GCCCmdLine (CmdLine):
 self.add_dir ('libitm')
 self.add_dir ('libobjc')
 # liboffloadmic is imported from upstream.
+self.add_dir ('libphobos', LibPhobosFilter())
 self.add_dir ('libquadmath')
 # libsanitizer is imported from upstream.
 self.add_dir ('libssp')
@@ -745,6 +767,7 @@ class GCCCmdLine (CmdLine):
 'libiberty',
 'libitm',
 'libobjc',
+'libphobos',
 'libssp',
 'libstdc++-v3',
 'libvtv',


Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Jakub Jelinek
On Tue, Nov 13, 2018 at 04:01:56PM +0100, Martin Liška wrote:
> On 11/12/18 1:07 PM, Jakub Jelinek wrote:
> > On Mon, Nov 12, 2018 at 01:03:41PM +0100, Martin Liška wrote:
> >> The patch reject usage of the mentioned options.
> >>
> >> Ready for trunk?
> >> Thanks,
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-11-12  Martin Liska  
> >>
> >>PR sanitizer/87930
> >>* config/i386/i386.c (ix86_option_override_internal): Error
> >>about usage -mabi=ms and -fsanitize={,kernel-}address.
> > 
> > Please add testcases for this. 
> 
> Done in attached patch.
> 
> Can this be changed through attribute too?
> 
> No.
> 
> I'm going to install the patch.

Can you please move the test to gcc.dg/asan/ and guard with
{ i?86-*-* x86_64-*-* } && lp64 ?
I don't think we don't want -fsanitize=address tests outside of /asan/
subdirs where we guard them on the availability of asan.

Jakub


Re: cleanups and unification of value_range dumping code

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 3:43 PM Aldy Hernandez  wrote:
>
>
>
> On 11/13/18 3:12 AM, Richard Biener wrote:
> > On Mon, Nov 12, 2018 at 10:50 AM Aldy Hernandez  wrote:
> >>
> >> I have rebased my value_range dumping patch after your value_range_base
> >> changes.
> >>
> >> I know you are not a fan of the gimple-pretty-print.c chunk, but I still
> >> think having one set of dumping code is preferable to catering to
> >> possible GC corruption while debugging.  If you're strongly opposed (as,
> >> I'm putting my foot down), I can remove it as well as the relevant
> >> pretty_printer stuff.
> >
> > I'd say we do not want to change the gimple-pretty-print.c stuff also 
> > because
> > we'll miss the leading #.  I'd rather see a simple wide-int-range class
> > wrapping the interesting bits up.  I guess I'll come up with one then ;)
>
> Ok.  Removed.
>
> >
> >> The patch looks bigger than it is because I moved all the dump routines
> >> into one place.
> >>
> >> OK?
> >>
> >> p.s. After your changes, perhaps get_range_info(const_tree, value_range
> >> &) should take a value_range_base instead?
> >
> > Yes, I missed that and am now testing this change.
>
> Thanks.
>
> >
> > Btw, the patch needs updating again (sorry).  If you leave out the
> > gimple-pretty-print.c stuff there's no requirement to use the pretty-printer
> > API, right?
>
> No need to apologize for contributing code :).  Thanks.  And yes,
> there's no need for the pretty-printer bits.
>
> I've also removed the value_range*::dump() versions with no arguments,
> as now we have an overloaded debug() for use from the debugger.
>
> Testing attached patch.

OK.

Thanks,
Richard.

> Aldy


Re: [PATCH PR84648]Adjust loop exit conditions for loop-until-wrap cases.

2018-11-13 Thread Richard Biener
On Sun, Nov 11, 2018 at 9:02 AM bin.cheng  wrote:
>
> Hi,
> This patch fixes PR84648 by adjusting exit conditions for loop-until-wrap 
> cases.
> It only handles simple cases in which IV.base are constants because we rely on
> current niter analyzer which doesn't handle parameterized bound in wrapped
> case.  It could be relaxed in the future.
>
> Bootstrap and test on x86_64 in progress.

Please use TYPE_MIN/MAX_VALUE or wi::min/max_value consistently.
Either tree_int_cst_equal (iv0->base, TYPE_MIN_VALUE (type)) or
wide_int_to_tree (niter_type, wi::max_value (TYPE_PRECISION (type),
TYPE_SIGN (type))).

Also

+  iv0->base = low;
+  iv0->step = fold_convert (niter_type, integer_one_node);

build_int_cst (niter_type, 1);

+  iv1->base = high;
+  iv1->step = integer_zero_node;

build_int_cst (niter_type, 0);

With the code, what happens to signed IVs?  I suppose we figure out things
earlier by means of undefined overflow?

Apart from the above nits OK for trunk.

Thanks,
Richard.

> Thanks,
> bin
> 2018-11-11  Bin Cheng  
>
> PR tree-optimization/84648
> * tree-ssa-loop-niter.c (adjust_cond_for_loop_until_wrap): New.
> (number_of_iterations_cond): Adjust exit cond for loop-until-wrap case
> by calling adjust_cond_for_loop_until_wrap.
>
> 2018-11-11  Bin Cheng  
>
> PR tree-optimization/84648
> * gcc.dg/tree-ssa/pr84648.c: New test.


Re: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR sanitizer/87930).

2018-11-13 Thread Martin Liška
On 11/12/18 1:07 PM, Jakub Jelinek wrote:
> On Mon, Nov 12, 2018 at 01:03:41PM +0100, Martin Liška wrote:
>> The patch reject usage of the mentioned options.
>>
>> Ready for trunk?
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-11-12  Martin Liska  
>>
>>  PR sanitizer/87930
>>  * config/i386/i386.c (ix86_option_override_internal): Error
>>  about usage -mabi=ms and -fsanitize={,kernel-}address.
> 
> Please add testcases for this. 

Done in attached patch.

Can this be changed through attribute too?

No.

I'm going to install the patch.

Thanks,
Martin

> If so, a test for that should be there too.
> 
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -3546,6 +3546,11 @@ ix86_option_override_internal (bool main_args_p,
>>  error ("-mabi=ms not supported with X32 ABI");
>>gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
>>  
>> +  if ((opts->x_flag_sanitize & SANITIZE_USER_ADDRESS) && opts->x_ix86_abi 
>> == MS_ABI)
>> +error ("%<-mabi=ms%> not supported with %<-fsanitize=address%>");
>> +  if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_ix86_abi 
>> == MS_ABI)
>> +error ("%<-mabi=ms%> not supported with %<-fsanitize=kernel-address%>");
>> +
>>/* For targets using ms ABI enable ms-extensions, if not
>>   explicit turned off.  For non-ms ABI we turn off this
>>   option.  */
>>
> 
> 
>   Jakub
> 

>From 7720cb384b33835a29beaaddb4a940a5eeadb13f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 12 Nov 2018 12:55:33 +0100
Subject: [PATCH] Do not allow -mabi=ms and -fsanitize={,kernel-}address (PR
 sanitizer/87930).

gcc/ChangeLog:

2018-11-12  Martin Liska  

	PR sanitizer/87930
	* config/i386/i386.c (ix86_option_override_internal): Error
	about usage -mabi=ms and -fsanitize={,kernel-}address.

gcc/testsuite/ChangeLog:

2018-11-13  Martin Liska  

	PR sanitizer/87930
	* gcc.target/i386/pr87930.c: New test.
---
 gcc/config/i386/i386.c  | 5 +
 gcc/testsuite/gcc.target/i386/pr87930.c | 6 ++
 2 files changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87930.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 711bec0cc9d..b3e0807b894 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3546,6 +3546,11 @@ ix86_option_override_internal (bool main_args_p,
 error ("-mabi=ms not supported with X32 ABI");
   gcc_assert (opts->x_ix86_abi == SYSV_ABI || opts->x_ix86_abi == MS_ABI);
 
+  if ((opts->x_flag_sanitize & SANITIZE_USER_ADDRESS) && opts->x_ix86_abi == MS_ABI)
+error ("%<-mabi=ms%> not supported with %<-fsanitize=address%>");
+  if ((opts->x_flag_sanitize & SANITIZE_KERNEL_ADDRESS) && opts->x_ix86_abi == MS_ABI)
+error ("%<-mabi=ms%> not supported with %<-fsanitize=kernel-address%>");
+
   /* For targets using ms ABI enable ms-extensions, if not
  explicit turned off.  For non-ms ABI we turn off this
  option.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr87930.c b/gcc/testsuite/gcc.target/i386/pr87930.c
new file mode 100644
index 000..e9cf29c221a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87930.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-fsanitize=address -mabi=ms" } */
+
+int i;
+
+/* { dg-error ".-mabi=ms. not supported with .-fsanitize=address." "" { target *-*-* } 0 } */
-- 
2.19.1



Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-13 Thread Renlin Li

Hi Peter,

I could verify that, your patch fixes all the ICEs I saw with 
arm-linux-gnueabihf toolchain!
There are some differences on the test results, because I compare the latest 
results with something which is old.

I haven't test it on bare-metal toolchain yet. But will do to ensure all 
related issues are fixed.

Thanks for fixing it!

Regards,
Renlin



On 11/12/2018 08:25 PM, Peter Bergner wrote:

On 11/12/18 6:25 AM, Renlin Li wrote:

I tried to build a native arm-linuxeabihf toolchain with the patch.
But I got the following ICE:


Ok, the issue was a problem in handling the src reg from a register copy.
I thought I could just remove it from the dead_set, but forgot that the
updating of the program points looks at whether the pseudo is live or
not.  The change below on top of the previous patch fixes the ICE for me.
I now add the src reg back into pseudos_live before we process the insn's
input operands so it doesn't trigger a new program point being added.

Renlin and Jeff, can you apply this patch on top of the previous one
and see whether that is better?

Thanks.

Peter


--- gcc/lra-lives.c.orig2018-11-12 14:15:18.257657911 -0600
+++ gcc/lra-lives.c 2018-11-12 14:08:55.978795092 -0600
@@ -934,6 +934,18 @@
  || sparseset_contains_pseudos_p (start_dying))
next_program_point (curr_point, freq);
  
+  /* If we removed the source reg from a simple register copy from the

+live set above, then add it back now so we don't accidentally add
+it to the start_living set below.  */
+  if (ignore_reg != NULL_RTX)
+   {
+ int ignore_regno = REGNO (ignore_reg);
+ if (HARD_REGISTER_NUM_P (ignore_regno))
+   SET_HARD_REG_BIT (hard_regs_live, ignore_regno);
+ else
+   sparseset_set_bit (pseudos_live, ignore_regno);
+   }
+
sparseset_clear (start_living);
  
/* Mark each used value as live.	*/

@@ -959,11 +971,6 @@
  
sparseset_and_compl (dead_set, start_living, start_dying);
  
-  /* If we removed the source reg from a simple register copy from the

-live set, then it will appear to be dead, but it really isn't.  */
-  if (ignore_reg != NULL_RTX)
-   sparseset_clear_bit (dead_set, REGNO (ignore_reg));
-
sparseset_clear (start_dying);
  
/* Mark early clobber outputs dead.  */




Re: [PATCH] [ARC] Cleanup, fix and set LRA default.

2018-11-13 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-11-12 13:29:33 +0200]:

> From: claziss 
> 
> Hi Andrew,
> 
> This is a patch which fixes and sets LRA by default.
> 
> OK to apply?
> Claudiu
> 
>   Commit message 
> 
> LP_COUNT register cannot be freely allocated by the compiler as it
> size, and/or content may change depending on the ARC hardware
> configuration. Thus, make this register fixed.
> 
> Remove register classes and unused constraint letters.
> 
> Cleanup the implementation of conditional_register_usage hook by using
> macros instead of magic constants and removing all references to
> reg_class_contents which are bringing so much grief when lra is enabled.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.h (reg_class): Reorder registers classes, remove
>   unused register classes.
>   (REG_CLASS_NAMES): Likewise.
>   (REG_CLASS_CONTENTS): Likewise.
>   (FIXED_REGISTERS): Make lp_count fixed.
>   (BASE_REG_CLASS): Remove ACC16_BASE_REGS reference.
>   (PROGRAM_COUNTER_REGNO): Remove.
>   * config/arc/arc.c (arc_conditional_register_usage): Remove unused
>   register classes, use constants for register numbers, remove
>   reg_class_contents references.
>   (arc_process_double_reg_moves): Add asserts.
>   (arc_secondary_reload): Remove LPCOUNT_REG reference, use
>   lra_in_progress predicate.
>   (arc_init_reg_tables): Remove unused register classes.
>   (arc_register_move_cost): Likewise.
>   (arc_preferred_reload_class): Likewise.
>   (hwloop_optimize): Update rtx patterns involving lp_count
>   register.
>   (arc_return_address_register): Rename ILINK1, INLINK2 regnums
>   macros.
>   * config/arc/constraints.md ("c"): Choose between GENERAL_REGS and
>   CHEAP_CORE_REGS.  Former one will be used for LRA.
>   ("Rac"): Choose between GENERAL_REGS and ALL_CORE_REGS.  Former
>   one will be used for LRA.
>   ("w"): Choose between GENERAL_REGS and WRITABLE_CORE_REGS.  Former
>   one will be used for LRA.
>   ("W"): Choose between GENERAL_REGS and MPY_WRITABLE_CORE_REGS.
>   Former one will be used for LRA.
>   ("f"): Delete constraint.
>   ("k"): Likewise.
>   ("e"): Likewise.

The entries below this are for arc.md, but you're missing the filename
in the ChangeLog format.

>   (movqi_insn): Remove unsed lp_count constraints.
>   (movhi_insn): Likewise.
>   (movsi_insn): Update pattern.
>   (arc_lp): Likewise.
>   (dbnz): Likewise.
>   ("l"): Change it from register constraint to constraint.
>   (stack_tie): Remove 'b' constraint letter.
>   (R4_REG): Define.
>   (R9_REG, R15_REG, R16_REG, R25_REG): Likewise.
>   (R32_REG, R40_REG, R41_REG, R42_REG, R43_REG, R44_REG): Likewise.
>   (R57_REG, R59_REG, PCL_REG): Likewise.
>   (ILINK1_REGNUM): Renamed to ILINK1_REG.
>   (ILINK2_REGNUM): Renamed to ILINK2_REG.
>   (Rgp): Remove.
>   (SP_REGS): Likewise.
>   (Rcw): Remove unused reg classes.
>   * config/arc/predicates.md (dest_reg_operand): Just default on
>   register_operand predicate.
>   (mpy_dest_reg_operand): Likewise.
>   (move_dest_operand): Use macros instead of constants.

I'm basically happy with this.  There's a few formatting issues as we
saw in previous patches - tabs instead of whitespace in comments and
single whitespace instead of two at the end of a sentence.  But with
that fixed (and the doc fix Eric suggested) I'm happy.

Thanks,
Andrew


> ---
>  gcc/config/arc/arc.c  | 331 +-
>  gcc/config/arc/arc.h  | 106 ---
>  gcc/config/arc/arc.md |  57 --
>  gcc/config/arc/arc.opt|   7 +-
>  gcc/config/arc/constraints.md |  45 ++---
>  gcc/config/arc/predicates.md  |  28 +--
>  6 files changed, 222 insertions(+), 352 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 75c2384eede..6802ca66554 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -734,11 +734,6 @@ arc_secondary_reload (bool in_p,
>if (cl == DOUBLE_REGS)
>  return GENERAL_REGS;
>  
> -  /* The loop counter register can be stored, but not loaded directly.  */
> -  if ((cl == LPCOUNT_REG || cl == WRITABLE_CORE_REGS)
> -  && in_p && MEM_P (x))
> -return GENERAL_REGS;
> -
>   /* If we have a subreg (reg), where reg is a pseudo (that will end in
>  a memory location), then we may need a scratch register to handle
>  the fp/sp+largeoffset address.  */
> @@ -756,8 +751,9 @@ arc_secondary_reload (bool in_p,
> if (regno != -1)
>   return NO_REGS;
>  
> -   /* It is a pseudo that ends in a stack location.  */
> -   if (reg_equiv_mem (REGNO (x)))
> +   /* It is a pseudo that ends in a stack location.  This
> +  procedure only works with the old reload step.  */
> +   if (reg_equiv_mem (REGNO (x)) && !lra_in_progress)
>   {
> /* G

Re: [PATCH, GCC, AARCH64, 6/6] Enable BTI: Add configure option for BTI and PAC-RET

2018-11-13 Thread Sudakshina Das
Hi James

On 07/11/18 15:36, James Greenhalgh wrote:
> On Fri, Nov 02, 2018 at 01:38:46PM -0500, Sudakshina Das wrote:
>> Hi
>>
>> This patch is part of a series that enables ARMv8.5-A in GCC and
>> adds Branch Target Identification Mechanism.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>
>> This patch is adding a new configure option for enabling and return
>> address signing by default with --enable-standard-branch-protection.
>> This is equivalent to -mbranch-protection=standard which would
>> imply -mbranch-protection=pac-ret+bti.
>>
>> Bootstrapped and regression tested with aarch64-none-linux-gnu with
>> and without the configure option turned on.
>> Also tested on aarch64-none-elf with and without configure option with a
>> BTI enabled aem. Only 2 regressions and these were because newlib
>> requires patches to protect hand coded libraries with BTI.
>>
>> Is this ok for trunk?
> 
> With a tweak to the comment above your changes in aarch64.c, yes this is OK.
> 
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * config/aarch64/aarch64.c (aarch64_override_options): Add case to check
>>  configure option to set BTI and Return Address Signing.
>>  * configure.ac: Add --enable-standard-branch-protection and
>>  --disable-standard-branch-protection.
>>  * configure: Regenerated.
>>  * doc/install.texi: Document the same.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * gcc.target/aarch64/bti-1.c: Update test to not add command
>>  line option when configure with bti.
>>  * gcc.target/aarch64/bti-2.c: Likewise.
>>  * lib/target-supports.exp
>>  (check_effective_target_default_branch_protection):
>>  Add configure check for --enable-standard-branch-protection.
>>
> 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> 12a55a640de4fdc5df21d313c7ea6841f1daf3f2..a1a5b7b464eaa2ce67ac66d9aea837159590aa07
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -11558,6 +11558,26 @@ aarch64_override_options (void)
>> if (!selected_tune)
>>   selected_tune = selected_cpu;
>>   
>> +  if (aarch64_enable_bti == 2)
>> +{
>> +#ifdef TARGET_ENABLE_BTI
>> +  aarch64_enable_bti = 1;
>> +#else
>> +  aarch64_enable_bti = 0;
>> +#endif
>> +}
>> +
>> +  /* No command-line option yet.  */
> 
> This is too broad. Can you narrow this down to which command line option this
> relates to, and what the expected default behaviours are (for both LP64 and
> ILP32).
> 

Updated patch attached. Return address signing is not supported for
ILP32 currently. This patch just follows that and hence the extra ILP32
check is added.

Thanks
Sudi

> Thanks,
> James
> 
>> +  if (accepted_branch_protection_string == NULL && !TARGET_ILP32)
>> +{
>> +#ifdef TARGET_ENABLE_PAC_RET
>> +  aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
>> +  aarch64_ra_sign_key = AARCH64_KEY_A;
>> +#else
>> +  aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
>> +#endif
>> +}
>> +
>>   #ifndef HAVE_AS_MABI_OPTION
>> /* The compiler may have been configured with 2.23.* binutils, which does
>>not have support for ILP32.  */
> 



diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b97d9e4deecf5ca33761dfd1008c39bb4b849881..e267d3441fd7f21105bfba339b69f2ecdb7595ae 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11579,6 +11579,28 @@ aarch64_override_options (void)
   if (!selected_tune)
 selected_tune = selected_cpu;
 
+  if (aarch64_enable_bti == 2)
+{
+#ifdef TARGET_ENABLE_BTI
+  aarch64_enable_bti = 1;
+#else
+  aarch64_enable_bti = 0;
+#endif
+}
+
+  /* Return address signing is currently not supported for ILP32 targets.  For
+ LP64 targets use the configured option in the absence of a command-line
+ option for -mbranch-protection.  */
+  if (!TARGET_ILP32 && accepted_branch_protection_string == NULL)
+{
+#ifdef TARGET_ENABLE_PAC_RET
+  aarch64_ra_sign_scope = AARCH64_FUNCTION_NON_LEAF;
+  aarch64_ra_sign_key = AARCH64_KEY_A;
+#else
+  aarch64_ra_sign_scope = AARCH64_FUNCTION_NONE;
+#endif
+}
+
 #ifndef HAVE_AS_MABI_OPTION
   /* The compiler may have been configured with 2.23.* binutils, which does
  not have support for ILP32.  */
diff --git a/gcc/configure b/gcc/configure
index 03461f1e27538a3a0791c2b61b0e75c3ff1a25be..a0f95106c22ee858bbf4516f14cd9d265dede272 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -947,6 +947,7 @@ with_plugin_ld
 enable_gnu_indirect_function
 enable_initfini_array
 enable_comdat
+enable_standard_branch_protection
 enable_fix_cortex_a53_835769
 enable_fix_cortex_a53_843419
 with_glibc_version
@@ -1677,6 +1678,14 @@ Optional Features:
   --enable-initfini-array	use .init_array/.fini_array sections
   --enable-comdat enable COMDAT group support
 
+  --

Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-11-13 Thread Sudakshina Das
Hi

On 02/11/18 18:38, Sudakshina Das wrote:
> Hi
> 
> This patch is part of a series that enables ARMv8.5-A in GCC and
> adds Branch Target Identification Mechanism.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
> 
> This patch adds a new pass called "bti" which is triggered by the
> command line argument -mbranch-protection whenever "bti" is turned on.
> 
> The pass iterates through the instructions and adds appropriated BTI
> instructions based on the following:
>  * Add a new "BTI C" at the beginning of a function, unless its already
>protected by a "PACIASP/PACIBSP". We exempt the functions that are
>only called directly.
>  * Add a new "BTI J" for every target of an indirect jump, jump table
>targets, non-local goto targets or labels that might be referenced
>by variables, constant pools, etc (NOTE_INSN_DELETED_LABEL)
> 
> Since we have already changed the use of indirect tail calls to only x16
> and x17, we do not have to use "BTI JC".
> (check patch 3/6).
> 

I missed out on the explanation for the changes to the trampoline code.
The patch also updates the trampoline code in case BTI is enabled. Since
the trampoline code is a target of an indirect branch, we need to add an
appropriate BTI instruction at the beginning of it to avoid a branch
target exception.

> Bootstrapped and regression tested with aarch64-none-linux-gnu. Added
> new tests.
> Is this ok for trunk?
> 
> Thanks
> Sudi
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
>   Ramana Radhakrishnan  
> 
>   * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>   * gcc/config/aarch64/aarch64.h: Update comment for
>   TRAMPOLINE_SIZE.
>   * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>   Update if bti is enabled.
>   * config/aarch64/aarch64-bti-insert.c: New file.
>   * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>   bti pass.
>   * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>   Declare the new bti pass.
>   * config/aarch64/aarch64.md (bti_nop): Define.
>   * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
> 
>   * gcc.target/aarch64/bti-1.c: New test.
>   * gcc.target/aarch64/bti-2.c: New test.
>   * lib/target-supports.exp
>   (check_effective_target_aarch64_bti_hw): Add new check for
>   BTI hw.
>

Updated patch attached with more comments and a bit of simplification
in aarch64-bti-insert.c. ChangeLog still applies.

Thanks
Sudi

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b108697cfc7b1c9c6dc1f30cca6fd1158182c29e..3e77f9df6ad6ca55fccca50387eab4b2501af647 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -317,7 +317,7 @@ aarch64*-*-*)
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
-	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o"
+	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-bti-insert.c b/gcc/config/aarch64/aarch64-bti-insert.c
new file mode 100644
index ..15202e0def3b514bdbd1564b39a121e43e01a67f
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-bti-insert.c
@@ -0,0 +1,226 @@
+/* Branch Target Identification for AArch64 architecture.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#define INCLUDE_STRING
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "gimple.h"
+#include "tm_p.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "emit-rtl.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "dumpfile.h"
+#include "rtl-iter.h"
+#include "cfgrtl.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+
+/* This pass enables the support for Branch Target I

Re: [PATCH, GCC, AARCH64, 1/6] Enable ARMv8.5-A in gcc

2018-11-13 Thread Sudakshina Das
Hi James

On 07/11/18 15:16, James Greenhalgh wrote:
> On Fri, Nov 02, 2018 at 01:37:33PM -0500, Sudakshina Das wrote:
>> Hi
>>
>> This patch is part of a series that enables ARMv8.5-A in GCC and
>> adds Branch Target Identification Mechanism.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>
>> This patch add the march option for armv8.5-a.
>>
>> Bootstrapped and regression tested with aarch64-none-linux-gnu.
>> Is this ok for trunk?
> 
> One minor tweak, otherwise OK.
> 
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>>  * config/aarch64/aarch64-arches.def: Define AARCH64_ARCH for
>>  ARMv8.5-A.
>>  * gcc/config/aarch64/aarch64.h (AARCH64_FL_V8_5): New.
>>  (AARCH64_FL_FOR_ARCH8_5, AARCH64_ISA_V8_5): New.
>>  * gcc/doc/invoke.texi: Document ARMv8.5-A.
> 
>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> index 
>> fa9af26fd40fd23b1c9cd6da9b6300fd77089103..b324cdd2fede33af13c03362750401f9eb1c9a90
>>  100644
>> --- a/gcc/config/aarch64/aarch64.h
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -170,6 +170,8 @@ extern unsigned aarch64_architecture_version;
>>   #define AARCH64_FL_SHA3  (1 << 18)  /* Has ARMv8.4-a SHA3 and 
>> SHA512.  */
>>   #define AARCH64_FL_F16FML (1 << 19)  /* Has ARMv8.4-a FP16 extensions. 
>>  */
>>   #define AARCH64_FL_RCPC8_4(1 << 20)  /* Has ARMv8.4-a RCPC extensions. 
>>  */
>> +/* ARMv8.5-A architecture extensions.  */
>> +#define AARCH64_FL_V8_5   (1 << 22)  /* Has ARMv8.5-A features.  */
>>   
>>   /* Statistical Profiling extensions.  */
>>   #define AARCH64_FL_PROFILE(1 << 21)
> 
> Let's keep this in order. 20, 21, 22.
> 

I have the moved the Armv8.5 stuff below. Patch attached.
If this looks ok, I will rebase 2/6 on top. Let me know if you
want me to resend the rebased 2/6 too.

Thanks
Sudi

> Thanks,
> James
> 
> 



diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index a37a5553894d6ab1d629017ea204478f69d8773d..7d05cd604093d15f27e5b197803a50c45a260e6e 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -35,5 +35,6 @@ AARCH64_ARCH("armv8.1-a", generic,	 8_1A,	8,  AARCH64_FL_FOR_ARCH8_1)
 AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
 AARCH64_ARCH("armv8.3-a", generic,	 8_3A,	8,  AARCH64_FL_FOR_ARCH8_3)
 AARCH64_ARCH("armv8.4-a", generic,	 8_4A,	8,  AARCH64_FL_FOR_ARCH8_4)
+AARCH64_ARCH("armv8.5-a", generic,	 8_5A,	8,  AARCH64_FL_FOR_ARCH8_5)
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 8ab21e7bc37c7d5ffba1a365345f70d9f501b3ac..8ce8445586f29963107848604c5e2bab8e853685 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -177,6 +177,9 @@ extern unsigned aarch64_architecture_version;
 /* Statistical Profiling extensions.  */
 #define AARCH64_FL_PROFILE(1 << 21)
 
+/* ARMv8.5-A architecture extensions.  */
+#define AARCH64_FL_V8_5	  (1 << 22)  /* Has ARMv8.5-A features.  */
+
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
 
@@ -195,6 +198,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8_4			\
   (AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_V8_4 | AARCH64_FL_F16FML \
| AARCH64_FL_DOTPROD | AARCH64_FL_RCPC8_4)
+#define AARCH64_FL_FOR_ARCH8_5			\
+  (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_5)
 
 /* Macros to test ISA flags.  */
 
@@ -216,6 +221,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_SHA3	   (aarch64_isa_flags & AARCH64_FL_SHA3)
 #define AARCH64_ISA_F16FML	   (aarch64_isa_flags & AARCH64_FL_F16FML)
 #define AARCH64_ISA_RCPC8_4	   (aarch64_isa_flags & AARCH64_FL_RCPC8_4)
+#define AARCH64_ISA_V8_5	   (aarch64_isa_flags & AARCH64_FL_V8_5)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3e54087ab98049ba932caa34ba2fb135eda48396..26770c5aafda1524d63a89cacf8cc069b7c8b9b6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15118,8 +15118,11 @@ more feature modifiers.  This option has the form
 @option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}.
 
 The permissible values for @var{arch} are @samp{armv8-a},
-@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a} or @samp{armv8.4-a}
-or @var{native}.
+@samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a}, @samp{armv8.4-a},
+@samp{armv8.5-a} or @var{native}.
+
+The value @samp{armv8.5-a} implies @samp{armv8.4-a} and enables compiler
+support for the ARMv8.5-A architecture extensions.
 
 The value @samp{armv8.4-a} implies @samp{armv8.3-a} and enables compiler
 support for the ARMv8.4-A architecture extensions.




Re: [PATCH] RFC: machine-readable diagnostic output (PR other/19165)

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 8:58 AM David Malcolm  wrote:
>
> This patch implements a -fdiagnostics-format=json option which
> converts the diagnostics to be output to stderr in a JSON format;
> see the documentation in invoke.texi.
>
> Logically-related diagnostics are nested at the JSON level, using
> the auto_diagnostic_group mechanism.

LGTM if people really want it.

Richard.

> gcc/ChangeLog:
> PR other/19165
> * Makefile.in (OBJS): Move json.o to...
> (OBJS-libcommon): ...here and add diagnostic-format-json.o.
> * common.opt (fdiagnostics-format=): New option.
> (diagnostics_output_format): New enum.
> * diagnostic-format-json.cc: New file.
> * diagnostic.c (default_diagnostic_final_cb): New function, taken
> from start of diagnostic_finish.
> (diagnostic_initialize): Initialize final_cb to
> default_diagnostic_final_cb.
> (diagnostic_finish): Move "being treated as errors" messages to
> default_diagnostic_final_cb.  Call any final_cb.
> * diagnostic.h (enum diagnostics_output_format): New enum.
> (struct diagnostic_context): Add "final_cb".
> (diagnostic_output_format_init): New decl.
> * doc/invoke.texi (-fdiagnostics-format): New option.
> * dwarf2out.c (gen_producer_string): Ignore
> OPT_fdiagnostics_format_.
> * gcc.c (driver_handle_option): Handle OPT_fdiagnostics_format_.
> * lto-wrapper.c (append_diag_options): Ignore it.
> * opts.c (common_handle_option): Handle it.
> ---
>  gcc/Makefile.in   |   2 +-
>  gcc/common.opt|  17 +++
>  gcc/diagnostic-format-json.cc | 265 
> ++
>  gcc/diagnostic.c  |  40 ---
>  gcc/diagnostic.h  |  16 +++
>  gcc/doc/invoke.texi   |  78 +
>  gcc/dwarf2out.c   |   1 +
>  gcc/gcc.c |   5 +
>  gcc/lto-wrapper.c |   1 +
>  gcc/opts.c|   5 +
>  10 files changed, 414 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/diagnostic-format-json.cc
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 16c9ed6..9534d59 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1395,7 +1395,6 @@ OBJS = \
> ira-color.o \
> ira-emit.o \
> ira-lives.o \
> -   json.o \
> jump.o \
> langhooks.o \
> lcm.o \
> @@ -1619,6 +1618,7 @@ OBJS = \
>  # Objects in libcommon.a, potentially used by all host binaries and with
>  # no target dependencies.
>  OBJS-libcommon = diagnostic.o diagnostic-color.o diagnostic-show-locus.o \
> +   diagnostic-format-json.o json.o \
> edit-context.o \
> pretty-print.o intl.o \
> sbitmap.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5a5d332..2f669f6 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1273,6 +1273,23 @@ Enum(diagnostic_color_rule) String(always) 
> Value(DIAGNOSTICS_COLOR_YES)
>  EnumValue
>  Enum(diagnostic_color_rule) String(auto) Value(DIAGNOSTICS_COLOR_AUTO)
>
> +fdiagnostics-format=
> +Common Joined RejectNegative Enum(diagnostics_output_format)
> +-fdiagnostics-format=[text|json] Select output format
> +
> +; Required for these enum values.
> +SourceInclude
> +diagnostic.h
> +
> +Enum
> +Name(diagnostics_output_format) Type(int)
> +
> +EnumValue
> +Enum(diagnostics_output_format) String(text) 
> Value(DIAGNOSTICS_OUTPUT_FORMAT_TEXT)
> +
> +EnumValue
> +Enum(diagnostics_output_format) String(json) 
> Value(DIAGNOSTICS_OUTPUT_FORMAT_JSON)
> +
>  fdiagnostics-parseable-fixits
>  Common Var(flag_diagnostics_parseable_fixits)
>  Print fix-it hints in machine-readable form.
> diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
> new file mode 100644
> index 000..7860696
> --- /dev/null
> +++ b/gcc/diagnostic-format-json.cc
> @@ -0,0 +1,265 @@
> +/* JSON output for diagnostics
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   Contributed by David Malcolm .
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "diagnostic.h"
> +#include "json.h"
> +
> +/* The top-level JSON array of pending diagnostics.  */
> +
> +static json::array *toplevel_array;
> +
> +/* The

Re: [PATCH] Improve -fprofile-report.

2018-11-13 Thread Richard Biener
On Tue, Nov 6, 2018 at 3:05 PM Martin Liška  wrote:
>
> Hi.
>
> The patch is based on what was discussed on IRC and in the PR.
> Apart from that the reported layout is improved.
>
> Patch survives regression tests on x86_64-linux-gnu.
>
> Ready for trunk?

OK.

Thanks,
Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2018-11-06  Martin Liska  
>
> PR tree-optimization/87885
> * cfghooks.c (account_profile_record): Rename
> to ...
> (profile_record_check_consistency): ... this.
> Calculate missing num_mismatched_freq_in.
> (profile_record_account_profile): New function
> that calculates time and size of a function.
> * cfghooks.h (struct profile_record): Remove
> all tuples.
> (struct cfg_hooks): Remove after_pass flag.
> (account_profile_record): Rename to ...
> (profile_record_check_consistency): ... this.
> (profile_record_account_profile): New.
> * cfgrtl.c (rtl_account_profile_record): Remove
> after_pass flag.
> * passes.c (check_profile_consistency): Do only
> checking.
> (account_profile): Calculate size and time of
> function only.
> (pass_manager::dump_profile_report): Reformat
> output.
> (execute_one_ipa_transform_pass): Call
> consistency check before clean upand call account_profile
> after a clean up is done.
> (execute_one_pass): Call check_profile_consistency and
> account_profile instead of using after_pass flag..
> * tree-cfg.c (gimple_account_profile_record): Likewise.
> ---
>  gcc/cfghooks.c |  38 +++--
>  gcc/cfghooks.h |  17 ++--
>  gcc/cfgrtl.c   |  12 ++-
>  gcc/passes.c   | 207 ++---
>  gcc/tree-cfg.c |  11 ++-
>  5 files changed, 161 insertions(+), 124 deletions(-)
>
>


Re: cleanups and unification of value_range dumping code

2018-11-13 Thread Aldy Hernandez



On 11/13/18 3:12 AM, Richard Biener wrote:

On Mon, Nov 12, 2018 at 10:50 AM Aldy Hernandez  wrote:


I have rebased my value_range dumping patch after your value_range_base
changes.

I know you are not a fan of the gimple-pretty-print.c chunk, but I still
think having one set of dumping code is preferable to catering to
possible GC corruption while debugging.  If you're strongly opposed (as,
I'm putting my foot down), I can remove it as well as the relevant
pretty_printer stuff.


I'd say we do not want to change the gimple-pretty-print.c stuff also because
we'll miss the leading #.  I'd rather see a simple wide-int-range class
wrapping the interesting bits up.  I guess I'll come up with one then ;)


Ok.  Removed.




The patch looks bigger than it is because I moved all the dump routines
into one place.

OK?

p.s. After your changes, perhaps get_range_info(const_tree, value_range
&) should take a value_range_base instead?


Yes, I missed that and am now testing this change.


Thanks.



Btw, the patch needs updating again (sorry).  If you leave out the
gimple-pretty-print.c stuff there's no requirement to use the pretty-printer
API, right?


No need to apologize for contributing code :).  Thanks.  And yes, 
there's no need for the pretty-printer bits.


I've also removed the value_range*::dump() versions with no arguments, 
as now we have an overloaded debug() for use from the debugger.


Testing attached patch.

Aldy
gcc/

	* tree-vrp.c (value_range_base::dump): Dump type.
	Do not use INF nomenclature for 1-bit types.
	(dump_value_range): Group all variants to common dumping code.
	(debug): New overloaded functions for value_ranges.
	(value_range_base::dump): Remove no argument version.
	(value_range::dump): Same.

gcc/testsuite/

	* gcc.dg/tree-ssa/pr64130.c: Adjust for new value_range pretty
	printer.
	* gcc.dg/tree-ssa/vrp92.c: Same.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
index e068765e2fc..28ffbb76da8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
@@ -15,6 +15,6 @@ int funsigned2 (uint32_t a)
   return (-1 * 0x1L) / a == 0;
 }
 
-/* { dg-final { scan-tree-dump ": \\\[2, 8589934591\\\]" "evrp" } } */
-/* { dg-final { scan-tree-dump ": \\\[-8589934591, -2\\\]" "evrp" } } */
+/* { dg-final { scan-tree-dump "int \\\[2, 8589934591\\\]" "evrp" } } */
+/* { dg-final { scan-tree-dump "int \\\[-8589934591, -2\\\]" "evrp" } } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
index 5a2dbf0108a..66d74e9b5e9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp92.c
@@ -18,5 +18,5 @@ int foo (int i, int j)
   return j;
 }
 
-/* { dg-final { scan-tree-dump "res_.: \\\[1, 1\\\]" "vrp1" } } */
+/* { dg-final { scan-tree-dump "res_.: int \\\[1, 1\\\]" "vrp1" } } */
 /* { dg-final { scan-tree-dump-not "Threaded" "vrp1" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 27bc1769f11..f498386e8eb 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -365,8 +365,6 @@ value_range_base::type () const
   return TREE_TYPE (min ());
 }
 
-/* Dump value range to FILE.  */
-
 void
 value_range_base::dump (FILE *file) const
 {
@@ -374,21 +372,26 @@ value_range_base::dump (FILE *file) const
 fprintf (file, "UNDEFINED");
   else if (m_kind == VR_RANGE || m_kind == VR_ANTI_RANGE)
 {
-  tree type = TREE_TYPE (min ());
+  tree ttype = type ();
+
+  print_generic_expr (file, ttype);
+  fprintf (file, " ");
 
   fprintf (file, "%s[", (m_kind == VR_ANTI_RANGE) ? "~" : "");
 
-  if (INTEGRAL_TYPE_P (type)
-	  && !TYPE_UNSIGNED (type)
-	  && vrp_val_is_min (min ()))
+  if (INTEGRAL_TYPE_P (ttype)
+	  && !TYPE_UNSIGNED (ttype)
+	  && vrp_val_is_min (min ())
+	  && TYPE_PRECISION (ttype) != 1)
 	fprintf (file, "-INF");
   else
 	print_generic_expr (file, min ());
 
   fprintf (file, ", ");
 
-  if (INTEGRAL_TYPE_P (type)
-	  && vrp_val_is_max (max ()))
+  if (INTEGRAL_TYPE_P (ttype)
+	  && vrp_val_is_max (max ())
+	  && TYPE_PRECISION (ttype) != 1)
 	fprintf (file, "+INF");
   else
 	print_generic_expr (file, max ());
@@ -398,7 +401,7 @@ value_range_base::dump (FILE *file) const
   else if (varying_p ())
 fprintf (file, "VARYING");
   else
-fprintf (file, "INVALID RANGE");
+gcc_unreachable ();
 }
 
 void
@@ -425,17 +428,45 @@ value_range::dump (FILE *file) const
 }
 
 void
-value_range_base::dump () const
+dump_value_range (FILE *file, const value_range *vr)
 {
-  dump_value_range (stderr, this);
-  fprintf (stderr, "\n");
+  if (!vr)
+fprintf (file, "[]");
+  else
+vr->dump (file);
 }
 
 void
-value_range::dump () const
+dump_value_range (FILE *file, const value_range_base *vr)
+{
+  if (!vr)
+fprintf (file, "[]");
+  else
+vr->dump (file);
+}
+
+DEBUG_FUNCTION void
+debug (const value_range_base *vr)
+{
+  dump_value_range (stderr, vr);
+}
+
+DEBUG_FUN

Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 2:10 PM Alexandre Oliva  wrote:
>
> On Nov 13, 2018, Richard Biener  wrote:
>
> >> Please let me know if there are objections to this change in the next
> >> few days, e.g., if enabling C and C++ for an Ada-only build is too
> >> onerous.  It is certainly possible to rework gnattools build machinery
> >> so that it uses CC and CXX as detected by the top-level configure if we
> >> can't find xgcc and xg++ in ../gcc.
>
> > I really wonder why we not _always_ do this for consistency given we
> > already require a host Ada compiler.
>
> Sorry, I can't tell what the 'this' refers to.  Enabling C and C++ for
> an Ada-only build?  Reworking gnattools build machinery to use top-level
> CC and CXX?  Something else?

Reworking gnattools build to always use host CC/CXX in "stage1" (or for crosses)
rather than doing sth different.  That would also not require C++ to be enabled
for crosses.

> FWIW, I see the the point of using the just-built gcc/g++ if it's there
> and usable: considering the checks for different versions of Ada
> compilers, you really want to use the last stage of the bootstrap to
> build tools linked with the runtime built with it.  It seems to me you'd
> run into a catch-22 without that.

Yeah, but gnattools is bootstrapped, right?  For --disable-bootstrap
you get binaries built with the host compiler throughout and that's
good.  IIRC I originally stumbled across this with --disable-bootstrap.

Richard.

>
> --
> Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
> Be the change, be Free! FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-13 Thread Richard Biener
On Tue, Nov 13, 2018 at 10:48 AM Kyrill Tkachov
 wrote:
>
>
> On 13/11/18 09:28, Richard Biener wrote:
> > On Tue, Nov 13, 2018 at 10:15 AM Kyrill Tkachov
> >  wrote:
> >> Hi Richard,
> >>
> >> On 13/11/18 08:24, Richard Biener wrote:
> >>> On Mon, Nov 12, 2018 at 7:20 PM Kyrill Tkachov
> >>>  wrote:
>  On 12/11/18 14:10, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
> >  wrote:
> >> On 09/11/18 12:18, Richard Biener wrote:
> >>> On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> >>>  wrote:
>  Hi all,
> 
>  In this testcase the codegen for VLA SVE is worse than it could be 
>  due to unrolling:
> 
>  fully_peel_me:
>  mov x1, 5
>  ptrue   p1.d, all
>  whilelo p0.d, xzr, x1
>  ld1dz0.d, p0/z, [x0]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0]
>  cntdx2
>  addvl   x3, x0, #1
>  whilelo p0.d, x2, x1
>  beq .L1
>  ld1dz0.d, p0/z, [x0, #1, mul vl]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x3]
>  cntwx2
>  incbx0, all, mul #2
>  whilelo p0.d, x2, x1
>  beq .L1
>  ld1dz0.d, p0/z, [x0]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0]
>  .L1:
>  ret
> 
>  In this case, due to the vector-length-agnostic nature of SVE the 
>  compiler doesn't know the loop iteration count.
>  For such loops we don't want to unroll if we don't end up 
>  eliminating branches as this just bloats code size
>  and hurts icache performance.
> 
>  This patch introduces a new unroll-known-loop-iterations-only param 
>  that disables cunroll when the loop iteration
>  count is unknown (SCEV_NOT_KNOWN). This case occurs much more often 
>  for SVE VLA code, but it does help some
>  Advanced SIMD cases as well where loops with an unknown iteration 
>  count are not unrolled when it doesn't eliminate
>  the branches.
> 
>  So for the above testcase we generate now:
>  fully_peel_me:
>  mov x2, 5
>  mov x3, x2
>  mov x1, 0
>  whilelo p0.d, xzr, x2
>  ptrue   p1.d, all
>  .L2:
>  ld1dz0.d, p0/z, [x0, x1, lsl 3]
>  faddz0.d, z0.d, z0.d
>  st1dz0.d, p0, [x0, x1, lsl 3]
>  incdx1
>  whilelo p0.d, x1, x3
>  bne .L2
>  ret
> 
>  Not perfect still, but it's preferable to the original code.
>  The new param is enabled by default on aarch64 but disabled for 
>  other targets, leaving their behaviour unchanged
>  (until other target people experiment with it and set it, if 
>  appropriate).
> 
>  Bootstrapped and tested on aarch64-none-linux-gnu.
>  Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences 
>  in performance.
> 
>  Ok for trunk?
> >>> Hum.  Why introduce a new --param and not simply key on
> >>> flag_peel_loops instead?  That is
> >>> enabled by default at -O3 and with FDO but you of course can control
> >>> that in your targets
> >>> post-option-processing hook.
> >> You mean like this?
> >> It's certainly a simpler patch, but I was just a bit hesitant of 
> >> making this change for all targets :)
> >> But I suppose it's a reasonable change.
> > No, that change is backward.  What I said is that peeling is already
> > conditional on
> > flag_peel_loops and that is enabled by -O3.  So you want to disable
> > flag_peel_loops for
> > SVE instead in the target.
>  Sorry, I got confused by the similarly named functions.
>  I'm talking about try_unroll_loop_completely when run as part of 
>  canonicalize_induction_variables i.e. the "ivcanon" pass
>  (sorry about blaming cunroll here). This doesn't get called through the 
>  try_unroll_loops_completely path.
> >>> Well, peeling gets disabled.  From your patch I see you want to
> >>> disable "unrolling" when
> >>> the number of loop iteration is not constant.  That is called peeling
> >>> where we need to
> >>> emit the loop exit test N times.
> >>>
> >>> Did you check your testcases with -fno-peel-loops?
> >> -fno-peel-loops doesn't help in the testcases. The code that does this 
> >> peeling (try_unroll_loop_completely)
> >> can be ca

Re: [PATCH] More value_range API cleanup

2018-11-13 Thread Aldy Hernandez




On 11/13/18 8:58 AM, Richard Biener wrote:

On Tue, 13 Nov 2018, Aldy Hernandez wrote:


On 11/13/18 3:07 AM, Richard Biener wrote:

On Tue, 13 Nov 2018, Aldy Hernandez wrote:




The tricky part starts in the prologue for

 if (vr0->undefined_p ())
   {
 vr0->deep_copy (vr1);
 return;
   }

but yes, we probably can factor out a bit more common code
here.  I'll see to followup with more minor cleanups this
week (noticed a few details myself).


Like this?  (untested)


I would inline value_range_base::union_helper into
value_range_base::union_,
and remove all the undefined/varying/etc stuff from value_range::union_.

If should work because upon return from value_range_base::union_, in the
this->undefined_p case, the base class will copy everything but the
equivalences.  Then the derived union_ only has to nuke the equivalences
if
this->undefined or this->varying, and the equivalences' IOR just works.

For instance, in the case where other->undefined_p, there shouldn't be
anything in the equivalences so the IOR won't copy anything to this as
expected.  Similarly for this->varying_p.

In the case of other->varying, this will already been set to varying so
neither this nor other should have anything in their equivalence fields,
so
the IOR won't do anything.

I think I covered all of them...the bitmap math should just work.  What do
you
think?


I think the only case that will not work is the case when this->undefined
(when we need the deep copy).  Because we'll not get the bitmap from
other in that case.  So I've settled with the thing below (just
special-casing that very case)


Ah, good point.




Finally, as I've hinted before, I think we need to be careful that any
time we
change state to VARYING / UNDEFINED from a base method, that the derived
class
is in a sane state (there are no equivalences set per the API contract).
This
was not originally enforced in VRP, and I wouldn't be surprised if there
are
dragons if we enforce honesty.  I suppose, since we have an API, we could
enforce this lazily: any time equiv() is called, clear the equivalences or
return NULL if it's varying or undefined?  Just a thought.


I have updated ->update () to adjust equiv when we update to VARYING
or UNDEFINED.


Excellent idea.  I don't see that part in your patch though?


That was part of the previous (or previous previous) change.


Ah ok.






+/* Helper for meet operation for value ranges.  Given two value ranges VR0
and
+   VR1, return a range that contains both VR0 and VR1.  This may not be the
+   smallest possible such range.  */
+
+value_range_base
+value_range_base::union_helper (const value_range_base *vr0,
+   const value_range_base *vr1)
+{


I know this was my fault, but would you mind removing vr0 from union_helper?
Perhaps something like this:

value_range_base::union_helper (const value_range_base *other)

I think it'll be cleaner and more consistent this way.


The method is static now and given it doesn't modify VR0 now
but returns a copy that is better IMHO.


Sure.

Aldy


Re: RFC (branch prediction): PATCH to implement P0479R5, [[likely]] and [[unlikely]].

2018-11-13 Thread Martin Liška
On 11/13/18 5:43 AM, Jason Merrill wrote:
> [[likely]] and [[unlikely]] are equivalent to the GNU hot/cold attributes,
> except that they can be applied to arbitrary statements as well as labels;
> this is most likely to be useful for marking if/else branches as likely or
> unlikely.  Conveniently, PREDICT_EXPR fits the bill nicely as a
> representation.
> 
> I also had to fix marking case labels as hot/cold, which didn't work before.
> Which then required me to force __attribute ((fallthrough)) to apply to the
> statement rather than the label.
> 
> Tested x86_64-pc-linux-gnu.  Does this seem like a sane implementation
> approach to people with more experience with PREDICT_EXPR?

Hi.

In general it makes sense to implement it the same way. Question is how much
should the hold/cold attribute should be close to __builtin_expect.

Let me present few examples and differences that I see:

1) ./xgcc -B. -O2 -fdump-tree-profile_estimate=/dev/stdout /tmp/test1.C

;; Function foo (_Z3foov, funcdef_no=0, decl_uid=2301, cgraph_uid=1, 
symbol_order=3)

Predictions for bb 2
  first match heuristics: 90.00%
  combined heuristics: 90.00%
  __builtin_expect heuristics of edge 2->3: 90.00%

As seen here __builtin_expect is stronger as it's first match heuristics and 
has probability == 90%.

;; Function bar (_Z3barv, funcdef_no=1, decl_uid=2303, cgraph_uid=2, 
symbol_order=4)

Predictions for bb 2
  DS theory heuristics: 74.48%
  combined heuristics: 74.48%
  opcode values nonequal (on trees) heuristics of edge 2->3: 34.00%
  hot label heuristics of edge 2->3: 85.00%

Here we combine hot label prediction with the opcode one, resulting in quite 
poor result 75%.
So maybe cold/hot prediction cal also happen first match.

2) ./xgcc -B. -O2 -fdump-tree-profile_estimate=/dev/stdout /tmp/test2.C
...
foo ()
{
...
  switch (_3)  [3.33%], case 3:  [3.33%], case 42:  
[3.33%], case 333:  [90.00%]>

while:

bar ()
{
  switch (a.1_1)  [25.00%], case 3:  [25.00%], case 42:  
[25.00%], case 333:  [25.00%]>
...

Note that support for __builtin_expect was enhanced in this stage1. I can 
definitely cover also situations when one uses
hot/cold for labels. So definitely place for improvement.

3) last example where one can use the attribute for function decl, resulting in:
__attribute__((hot, noinline))
foo ()
{
..

Hope it's desired? If so I would cover that with a test-case in test-suite.

Jason can you please point to C++ specification of the attributes?
Would you please consider an error diagnostics for situations written in 
test4.C?
Such situation is then silently ignored in profile_estimate pass:

Predictions for bb 2
  hot label heuristics of edge 2->4 (edge pair duplicate): 85.00%
  hot label heuristics of edge 2->3 (edge pair duplicate): 85.00%
...

Thanks,
Martin

> 
> gcc/
>   * gimplify.c (gimplify_case_label_expr): Handle hot/cold attributes.
> gcc/c-family/
>   * c-lex.c (c_common_has_attribute): Handle likely/unlikely.
> gcc/cp/
>   * parser.c (cp_parser_std_attribute): Handle likely/unlikely.
>   (cp_parser_statement): Call process_stmt_hotness_attribute.
>   (cp_parser_label_for_labeled_statement): Apply attributes to case.
>   * cp-gimplify.c (lookup_hotness_attribute, remove_hotness_attribute)
>   (process_stmt_hotness_attribute): New.
>   * decl.c (finish_case_label): Give label in template type void.
>   * pt.c (tsubst_expr) [CASE_LABEL_EXPR]: Copy attributes.
>   [PREDICT_EXPR]: Handle.
> ---
>  gcc/cp/cp-tree.h  |  2 +
>  gcc/c-family/c-lex.c  |  4 +-
>  gcc/cp/cp-gimplify.c  | 42 +
>  gcc/cp/decl.c |  2 +-
>  gcc/cp/parser.c   | 45 +++
>  gcc/cp/pt.c   | 12 +-
>  gcc/gimplify.c| 10 -
>  gcc/testsuite/g++.dg/cpp2a/attr-likely1.C | 38 +++
>  gcc/testsuite/g++.dg/cpp2a/attr-likely2.C | 10 +
>  gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C   | 12 ++
>  gcc/ChangeLog |  4 ++
>  gcc/c-family/ChangeLog|  4 ++
>  gcc/cp/ChangeLog  | 12 ++
>  13 files changed, 184 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/attr-likely1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/attr-likely2.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index c4d79c0cf7f..c55352ec5ff 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7541,6 +7541,8 @@ extern bool cxx_omp_disregard_value_expr(tree, 
> bool);
>  extern void cp_fold_function (tree);
>  extern tree cp_fully_fold(tree);
>  extern void clear_fold_cache (void);
> +extern tree lookup_hotness_attribute (tree);
> +extern tree process_stmt_hotness_attribute   (tree);
>  
>  /* in name-lookup.c */
>  extern tree strip_usin

PING^2: [PATCH] apply_subst_iterator: Handle define_split/define_insn_and_split

2018-11-13 Thread H.J. Lu
On Sun, Nov 4, 2018 at 7:24 AM H.J. Lu  wrote:
>
> On Fri, Oct 26, 2018 at 12:44 AM H.J. Lu  wrote:
> >
> > On 10/25/18, Uros Bizjak  wrote:
> > > On Fri, Oct 26, 2018 at 8:48 AM H.J. Lu  wrote:
> > >>
> > >> On 10/25/18, Uros Bizjak  wrote:
> > >> > On Fri, Oct 26, 2018 at 8:07 AM H.J. Lu  wrote:
> > >> >>
> > >> >> * read-rtl.c (apply_subst_iterator): Handle
> > >> >> define_insn_and_split.
> > >> >> ---
> > >> >>  gcc/read-rtl.c | 6 --
> > >> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> > >> >>
> > >> >> diff --git a/gcc/read-rtl.c b/gcc/read-rtl.c
> > >> >> index d698dd4af4d..5957c29671a 100644
> > >> >> --- a/gcc/read-rtl.c
> > >> >> +++ b/gcc/read-rtl.c
> > >> >> @@ -275,9 +275,11 @@ apply_subst_iterator (rtx rt, unsigned int, int
> > >> >> value)
> > >> >>if (value == 1)
> > >> >>  return;
> > >> >>gcc_assert (GET_CODE (rt) == DEFINE_INSN
> > >> >> + || GET_CODE (rt) == DEFINE_INSN_AND_SPLIT
> > >> >>   || GET_CODE (rt) == DEFINE_EXPAND);
> > >> >
> > >> > Can we also handle DEFINE_SPLIT here?
> > >> >
> > >>
> > >> Yes, we could if there were a usage for it.  I am reluctant to add
> > >> something
> > >> I have no use nor test for.
> > >
> > > Just split one define_insn_and_split to define_insn and corresponding
> > > define_split.
> > >
> > > define_insn_and_split is a contraction for for the define_insn and
> > > corresponding define_split, so it looks weird to only handle
> > > define_insn_and-split without handling define_split.
> > >
> >
> > Here is the updated patch to handle define_split.  Tested with
> >
> > (define_insn "*sse4_1_v8qiv8hi2_2"
> >   [(set (match_operand:V8HI 0 "register_operand")
> > (any_extend:V8HI
> >   (vec_select:V8QI
> > (subreg:V16QI
> >   (vec_concat:V2DI
> > (match_operand:DI 1 "memory_operand")
> > (const_int 0)) 0)
> > (parallel [(const_int 0) (const_int 1)
> >(const_int 2) (const_int 3)
> >(const_int 4) (const_int 5)
> >(const_int 6) (const_int 7)]]
> >   "TARGET_SSE4_1 &&  && "
> >   "#")
> >
> > (define_split
> >   [(set (match_operand:V8HI 0 "register_operand")
> > (any_extend:V8HI
> >   (vec_select:V8QI
> > (subreg:V16QI
> >   (vec_concat:V2DI
> > (match_operand:DI 1 "memory_operand")
> > (const_int 0)) 0)
> > (parallel [(const_int 0) (const_int 1)
> >(const_int 2) (const_int 3)
> >(const_int 4) (const_int 5)
> >(const_int 6) (const_int 7)]]
> >   "TARGET_SSE4_1 &&  && 
> >&& can_create_pseudo_p ()"
> >   [(set (match_dup 0)
> > (any_extend:V8HI (match_dup 1)))]
> > {
> >   operands[1] = adjust_address_nv (operands[1], V8QImode, 0);
> > })
> >
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01665.html
>
> This patch blocks an i386 backend patch.
>

PING.

-- 
H.J.


Re: [PATCH] Fix aarch64_compare_and_swap* constraints (PR target/87839)

2018-11-13 Thread Kyrill Tkachov

Hi Jakub,

On 13/11/18 09:28, Jakub Jelinek wrote:

Hi!

The following testcase ICEs because the predicate and constraints on one of
the operands of @aarch64_compare_and_swapdi aren't consistent.  The RA which
goes according to constraints
(insn 15 13 16 2 (set (reg:DI 104)
(const_int 8589934595 [0x20003])) "pr87839.c":15:3 47 
{*movdi_aarch64}
 (expr_list:REG_EQUIV (const_int 8589934595 [0x20003])
(nil)))
(insn 16 15 21 2 (parallel [
(set (reg:CC 66 cc)
(unspec_volatile:CC [
(const_int 0 [0])
] UNSPECV_ATOMIC_CMPSW))
(set (reg:DI 101)
(mem/v:DI (reg/f:DI 99) [-1  S8 A64]))
(set (mem/v:DI (reg/f:DI 99) [-1  S8 A64])
(unspec_volatile:DI [
(reg:DI 104)
(reg:DI 103)
(const_int 0 [0])
(const_int 32773 [0x8005]) repeated x2
] UNSPECV_ATOMIC_CMPSW))
(clobber (scratch:SI))
]) "pr87839.c":15:3 3532 {aarch64_compare_and_swapdi}
 (expr_list:REG_UNUSED (reg:DI 101)
(expr_list:REG_UNUSED (reg:CC 66 cc)
(nil
when seeing n constraint puts the 0x20003 constant directly into the
atomic instruction, but the predicate requires that it is either a register,
or shifted positive or negative 12-bit constant and so it fails to split.
The positive shifted constant apparently has I constraint and negative one
J, and other uses of aarch64_plus_operand that have some constraint use
rIJ (or r):
config/aarch64/aarch64.md: (match_operand:GPI 2 "aarch64_plus_operand" "r,I,J"))
config/aarch64/aarch64.md:(match_operand:SI 2 "aarch64_plus_operand" 
"r,I,J"))
config/aarch64/aarch64.md: (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J"))
config/aarch64/aarch64.md: (match_operand:GPI 1 "aarch64_plus_operand" "r"))
config/aarch64/aarch64.md: (match_operand:GPI 1 "aarch64_plus_operand" 
"r,I,J")))]

I don't have a setup to easily bootstrap/regtest aarch64-linux ATM, could
somebody please include it in their bootstrap/regtest? Thanks.

2018-11-13  Jakub Jelinek  

PR target/87839
* config/aarch64/atomics.md (@aarch64_compare_and_swap): Use
rIJ constraint for aarch64_plus_operand rather than rn.

* gcc.target/aarch64/pr87839.c: New test.



This passes bootstrap and regtesting shows no problems on 
aarch64-none-linux-gnu.
The change looks good to me but you'll still need maintainer approval.

Thanks,
Kyrill


--- gcc/config/aarch64/atomics.md.jj2018-11-01 12:06:43.469963662 +0100
+++ gcc/config/aarch64/atomics.md   2018-11-13 09:59:35.660185116 +0100
@@ -71,7 +71,7 @@ (define_insn_and_split "@aarch64_compare
 (match_operand:GPI 1 "aarch64_sync_memory_operand" "+Q"))   ;; memory
(set (match_dup 1)
 (unspec_volatile:GPI
-  [(match_operand:GPI 2 "aarch64_plus_operand" "rn")   ;; expect
+  [(match_operand:GPI 2 "aarch64_plus_operand" "rIJ")  ;; expect
(match_operand:GPI 3 "aarch64_reg_or_zero" "rZ");; 
desired
(match_operand:SI 4 "const_int_operand");; 
is_weak
(match_operand:SI 5 "const_int_operand");; mod_s
--- gcc/testsuite/gcc.target/aarch64/pr87839.c.jj 2018-11-13 10:13:44.353309416 
+0100
+++ gcc/testsuite/gcc.target/aarch64/pr87839.c  2018-11-13 10:13:05.496944699 
+0100
@@ -0,0 +1,29 @@
+/* PR target/87839 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -w" } */
+
+long long b[64];
+void foo (void);
+int bar (void (*) (void));
+void qux (long long *, long long) __attribute__((noreturn));
+void quux (long long *, long long);
+
+void
+baz (void)
+{
+  __sync_val_compare_and_swap (b, 4294967298LL, 78187493520LL);
+  __sync_bool_compare_and_swap (b + 1, 8589934595LL, 21474836489LL);
+  __sync_fetch_and_xor (b, 60129542145LL);
+  quux (b, 42949672967LL);
+  __sync_xor_and_fetch (b + 22, 60129542145LL);
+  quux (b + 23, 42949672967LL);
+  if (bar (baz))
+__builtin_abort ();
+  foo ();
+  __sync_val_compare_and_swap (b, 4294967298LL, 0);
+  __sync_bool_compare_and_swap (b + 1, 8589934595LL, 78187493520LL);
+  if (__sync_or_and_fetch (b, 21474836489LL) != 21474836489LL)
+qux (b + 22, 60129542145LL);
+  __atomic_fetch_nand (b + 23, 42949672967LL, __ATOMIC_RELAXED);
+  bar (baz);
+}

Jakub




Re: [PATCH] More value_range API cleanup

2018-11-13 Thread Richard Biener
On Tue, 13 Nov 2018, Aldy Hernandez wrote:

> On 11/13/18 3:07 AM, Richard Biener wrote:
> > On Tue, 13 Nov 2018, Aldy Hernandez wrote:
> > 
> > > 
> > > > > The tricky part starts in the prologue for
> > > > > 
> > > > > if (vr0->undefined_p ())
> > > > >   {
> > > > > vr0->deep_copy (vr1);
> > > > > return;
> > > > >   }
> > > > > 
> > > > > but yes, we probably can factor out a bit more common code
> > > > > here.  I'll see to followup with more minor cleanups this
> > > > > week (noticed a few details myself).
> > > > 
> > > > Like this?  (untested)
> > > 
> > > I would inline value_range_base::union_helper into
> > > value_range_base::union_,
> > > and remove all the undefined/varying/etc stuff from value_range::union_.
> > > 
> > > If should work because upon return from value_range_base::union_, in the
> > > this->undefined_p case, the base class will copy everything but the
> > > equivalences.  Then the derived union_ only has to nuke the equivalences
> > > if
> > > this->undefined or this->varying, and the equivalences' IOR just works.
> > > 
> > > For instance, in the case where other->undefined_p, there shouldn't be
> > > anything in the equivalences so the IOR won't copy anything to this as
> > > expected.  Similarly for this->varying_p.
> > > 
> > > In the case of other->varying, this will already been set to varying so
> > > neither this nor other should have anything in their equivalence fields,
> > > so
> > > the IOR won't do anything.
> > > 
> > > I think I covered all of them...the bitmap math should just work.  What do
> > > you
> > > think?
> > 
> > I think the only case that will not work is the case when this->undefined
> > (when we need the deep copy).  Because we'll not get the bitmap from
> > other in that case.  So I've settled with the thing below (just
> > special-casing that very case)
> 
> Ah, good point.
> 
> > 
> > > Finally, as I've hinted before, I think we need to be careful that any
> > > time we
> > > change state to VARYING / UNDEFINED from a base method, that the derived
> > > class
> > > is in a sane state (there are no equivalences set per the API contract).
> > > This
> > > was not originally enforced in VRP, and I wouldn't be surprised if there
> > > are
> > > dragons if we enforce honesty.  I suppose, since we have an API, we could
> > > enforce this lazily: any time equiv() is called, clear the equivalences or
> > > return NULL if it's varying or undefined?  Just a thought.
> > 
> > I have updated ->update () to adjust equiv when we update to VARYING
> > or UNDEFINED.
> 
> Excellent idea.  I don't see that part in your patch though?

That was part of the previous (or previous previous) change.

> 
> > +/* Helper for meet operation for value ranges.  Given two value ranges VR0
> > and
> > +   VR1, return a range that contains both VR0 and VR1.  This may not be the
> > +   smallest possible such range.  */
> > +
> > +value_range_base
> > +value_range_base::union_helper (const value_range_base *vr0,
> > +   const value_range_base *vr1)
> > +{
> 
> I know this was my fault, but would you mind removing vr0 from union_helper?
> Perhaps something like this:
> 
> value_range_base::union_helper (const value_range_base *other)
> 
> I think it'll be cleaner and more consistent this way.

The method is static now and given it doesn't modify VR0 now
but returns a copy that is better IMHO.

Richard.


[PATCH] Fix PR86991

2018-11-13 Thread Richard Biener


This PR shows we have stale reduction groups lying around because
the fixup doesn't work reliably with reduction chains.  Fixed by
delaying the build to after detection is successful.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/86991
* tree-vect-loop.c (vect_is_slp_reduction): Delay reduction
group building until we have successfully detected the SLP
reduction.
(vect_is_simple_reduction): Remove fixup code here.

* gcc.dg/pr86991.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266071)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2476,6 +2476,7 @@ vect_is_slp_reduction (loop_vec_info loo
   if (loop != vect_loop)
 return false;
 
+  auto_vec reduc_chain;
   lhs = PHI_RESULT (phi);
   code = gimple_assign_rhs_code (first_stmt);
   while (1)
@@ -2528,17 +2529,9 @@ vect_is_slp_reduction (loop_vec_info loo
 
   /* Insert USE_STMT into reduction chain.  */
   use_stmt_info = loop_info->lookup_stmt (loop_use_stmt);
-  if (current_stmt_info)
-{
- REDUC_GROUP_NEXT_ELEMENT (current_stmt_info) = use_stmt_info;
-  REDUC_GROUP_FIRST_ELEMENT (use_stmt_info)
-= REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-}
-  else
-   REDUC_GROUP_FIRST_ELEMENT (use_stmt_info) = use_stmt_info;
+  reduc_chain.safe_push (use_stmt_info);
 
   lhs = gimple_assign_lhs (loop_use_stmt);
-  current_stmt_info = use_stmt_info;
   size++;
}
 
@@ -2548,10 +2541,9 @@ vect_is_slp_reduction (loop_vec_info loo
   /* Swap the operands, if needed, to make the reduction operand be the second
  operand.  */
   lhs = PHI_RESULT (phi);
-  stmt_vec_info next_stmt_info = REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
-  while (next_stmt_info)
+  for (unsigned i = 0; i < reduc_chain.length (); ++i)
 {
-  gassign *next_stmt = as_a  (next_stmt_info->stmt);
+  gassign *next_stmt = as_a  (reduc_chain[i]->stmt);
   if (gimple_assign_rhs2 (next_stmt) == lhs)
{
  tree op = gimple_assign_rhs1 (next_stmt);
@@ -2565,7 +2557,6 @@ vect_is_slp_reduction (loop_vec_info loo
  && vect_valid_reduction_input_p (def_stmt_info))
{
  lhs = gimple_assign_lhs (next_stmt);
- next_stmt_info = REDUC_GROUP_NEXT_ELEMENT (next_stmt_info);
  continue;
}
 
@@ -2600,9 +2591,16 @@ vect_is_slp_reduction (loop_vec_info loo
 }
 
   lhs = gimple_assign_lhs (next_stmt);
-  next_stmt_info = REDUC_GROUP_NEXT_ELEMENT (next_stmt_info);
 }
 
+  /* Build up the actual chain.  */
+  for (unsigned i = 0; i < reduc_chain.length () - 1; ++i)
+{
+  REDUC_GROUP_FIRST_ELEMENT (reduc_chain[i]) = reduc_chain[0];
+  REDUC_GROUP_NEXT_ELEMENT (reduc_chain[i]) = reduc_chain[i+1];
+}
+  REDUC_GROUP_NEXT_ELEMENT (reduc_chain.last ()) = NULL;
+
   /* Save the chain for further analysis in SLP detection.  */
   stmt_vec_info first_stmt_info
 = REDUC_GROUP_FIRST_ELEMENT (current_stmt_info);
@@ -3182,16 +3196,6 @@ vect_is_simple_reduction (loop_vec_info
   return def_stmt_info;
 }
 
-  /* Dissolve group eventually half-built by vect_is_slp_reduction.  */
-  stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (def_stmt_info);
-  while (first)
-{
-  stmt_vec_info next = REDUC_GROUP_NEXT_ELEMENT (first);
-  REDUC_GROUP_FIRST_ELEMENT (first) = NULL;
-  REDUC_GROUP_NEXT_ELEMENT (first) = NULL;
-  first = next;
-}
-
   /* Look for the expression computing loop_arg from loop PHI result.  */
   if (check_reduction_path (vect_location, loop, phi, loop_arg, code))
 return def_stmt_info;
Index: gcc/testsuite/gcc.dg/pr86991.c
===
--- gcc/testsuite/gcc.dg/pr86991.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr86991.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int b;
+extern unsigned c[];
+unsigned d;
+long e;
+
+void f()
+{
+  unsigned g, h;
+  for (; d; d += 2) {
+  g = 1;
+  for (; g; g += 3) {
+ h = 2;
+ for (; h < 6; h++)
+   c[h] = c[h] - b - ~e;
+  }
+  }
+}


[PATCH][libbacktrace] Handle DW_FORM_GNU_strp_alt

2018-11-13 Thread Tom de Vries
Hi,

The dwz tool attempts to optimize DWARF debugging information contained in ELF
shared libraries and ELF executables for size.

With the dwz -m option, it attempts to optimize by moving DWARF debugging
information entries (DIEs), strings and macro descriptions duplicated in
more than one object into a newly created ELF ET_REL object whose filename is
given as -m option argument.  The debug sections in the executables and
shared libraries specified on the command line are then modified again,
referring to the entities in the newly created object.

After a dwz invocation:
...
$ dwz -m c.debug a.out b.out
...
both a.out and b.out contain a .gnu_debugaltlink section referring to c.debug,
and use "DWZ DWARF multifile extensions" such as DW_FORM_GNU_strp_alt and
DW_FORM_GNU_ref_alt refer to the content of c.debug.

The .gnu_debugaltlink consists of a filename and the expected buildid.

This patch adds to libbacktrace:
- finding a file matching the .gnu_debugaltlink filename
- verifying the .gnu_debugaltlink buildid
- reading the dwarf of the .gnu_debugaltlink
- support for FORM_GNU_strp_alt
- a testcase btest_dwz.c, which is btest.c minimized to the point that it only
  requires FORM_GNU_strp_alt.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

[libbacktrace] Handle DW_FORM_GNU_strp_alt

2018-11-11  Tom de Vries  

* dwarf.c (struct dwarf_data): Add altlink field.
(read_attribute): Add altlink parameter.  Handle DW_FORM_GNU_strp_alt
using altlink.
(find_address_ranges, build_address_map, build_dwarf_data): Add and
handle altlink parameter.
(read_referenced_name, read_function_entry): Add argument to
read_attribute call.
(backtrace_dwarf_add): Add and handle fileline_entry and
fileline_altlink parameters.
* elf.c (elf_open_debugfile_by_debugaltlink): New function.
(elf_add): Add and handle fileline_entry, with_buildid_data and
with_buildid_size parameters.  Handle .gnu_debugaltlink section.
(phdr_callback, backtrace_initialize): Add arguments to elf_add calls.
* internal.h (backtrace_dwarf_add): Add fileline_entry and
fileline_altlink parameters.
* configure.ac (DWZ): Set with AC_CHECK_PROG.
(HAVE_DWZ): Set with AM_CONDITIONAL.
* configure: Regenerate.
* Makefile.am (check_PROGRAMS): Add btest_dwz.
(TESTS): Add btest_dwz_2 and btest_dwz_3.
* Makefile.in: Regenerate.
* btest_dwz.c: New file.

---
 libbacktrace/Makefile.am  |  22 +
 libbacktrace/Makefile.in  |  95 ---
 libbacktrace/btest_dwz.c  | 237 ++
 libbacktrace/configure|  57 ++-
 libbacktrace/configure.ac |   3 +
 libbacktrace/dwarf.c  |  50 +++---
 libbacktrace/elf.c| 120 +--
 libbacktrace/internal.h   |   4 +-
 8 files changed, 548 insertions(+), 40 deletions(-)

diff --git a/libbacktrace/Makefile.am b/libbacktrace/Makefile.am
index 3c1bd49dd7b..2fec9bbb4b6 100644
--- a/libbacktrace/Makefile.am
+++ b/libbacktrace/Makefile.am
@@ -96,6 +96,28 @@ btest_LDADD = libbacktrace.la
 
 check_PROGRAMS += btest
 
+if HAVE_DWZ
+
+btest_dwz_SOURCES = btest_dwz.c testlib.c
+btest_dwz_CFLAGS = $(AM_CFLAGS) -g -O0
+btest_dwz_LDADD = libbacktrace.la
+
+check_PROGRAMS += btest_dwz
+
+TESTS += btest_dwz_2 btest_dwz_3
+
+btest_dwz_2 btest_dwz_3: btest_dwz_23
+
+.PHONY: btest_dwz_23
+
+btest_dwz_23: btest_dwz
+   rm -f btest_dwz.debug
+   cp btest_dwz btest_dwz_2
+   cp btest_dwz btest_dwz_3
+   $(DWZ) -m btest_dwz.debug btest_dwz_2 btest_dwz_3
+
+endif HAVE_DWZ
+
 stest_SOURCES = stest.c
 stest_LDADD = libbacktrace.la
 
diff --git a/libbacktrace/Makefile.in b/libbacktrace/Makefile.in
index 60a9d887dba..b3e9b9a4eec 100644
--- a/libbacktrace/Makefile.in
+++ b/libbacktrace/Makefile.in
@@ -120,12 +120,16 @@ POST_UNINSTALL = :
 build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
-check_PROGRAMS = $(am__EXEEXT_1) $(am__EXEEXT_2) $(am__EXEEXT_3)
-@NATIVE_TRUE@am__append_1 = btest stest ztest edtest
-@HAVE_ZLIB_TRUE@@NATIVE_TRUE@am__append_2 = -lz
-@HAVE_PTHREAD_TRUE@@NATIVE_TRUE@am__append_3 = ttest
-@HAVE_OBJCOPY_DEBUGLINK_TRUE@@NATIVE_TRUE@am__append_4 = dtest
-@HAVE_COMPRESSED_DEBUG_TRUE@@NATIVE_TRUE@am__append_5 = ctestg ctesta
+check_PROGRAMS = $(am__EXEEXT_1) $(am__EXEEXT_2) $(am__EXEEXT_3) \
+   $(am__EXEEXT_4) $(am__EXEEXT_5)
+@NATIVE_TRUE@am__append_1 = btest
+@HAVE_DWZ_TRUE@@NATIVE_TRUE@am__append_2 = btest_dwz
+@HAVE_DWZ_TRUE@@NATIVE_TRUE@am__append_3 = btest_dwz_2 btest_dwz_3
+@NATIVE_TRUE@am__append_4 = stest ztest edtest
+@HAVE_ZLIB_TRUE@@NATIVE_TRUE@am__append_5 = -lz
+@HAVE_PTHREAD_TRUE@@NATIVE_TRUE@am__append_6 = ttest
+@HAVE_OBJCOPY_DEBUGLINK_TRUE@@NATIVE_TRUE@am__append_7 = dtest
+@HAVE_COMPRESSED_DEBUG_TRUE@@NATIVE_TRUE@am__append_8 = ctestg ctesta
 subdir = .
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4

Re: [PATCH] More value_range API cleanup

2018-11-13 Thread Aldy Hernandez

On 11/13/18 3:07 AM, Richard Biener wrote:

On Tue, 13 Nov 2018, Aldy Hernandez wrote:




The tricky part starts in the prologue for

if (vr0->undefined_p ())
  {
vr0->deep_copy (vr1);
return;
  }

but yes, we probably can factor out a bit more common code
here.  I'll see to followup with more minor cleanups this
week (noticed a few details myself).


Like this?  (untested)


I would inline value_range_base::union_helper into value_range_base::union_,
and remove all the undefined/varying/etc stuff from value_range::union_.

If should work because upon return from value_range_base::union_, in the
this->undefined_p case, the base class will copy everything but the
equivalences.  Then the derived union_ only has to nuke the equivalences if
this->undefined or this->varying, and the equivalences' IOR just works.

For instance, in the case where other->undefined_p, there shouldn't be
anything in the equivalences so the IOR won't copy anything to this as
expected.  Similarly for this->varying_p.

In the case of other->varying, this will already been set to varying so
neither this nor other should have anything in their equivalence fields, so
the IOR won't do anything.

I think I covered all of them...the bitmap math should just work.  What do you
think?


I think the only case that will not work is the case when this->undefined
(when we need the deep copy).  Because we'll not get the bitmap from
other in that case.  So I've settled with the thing below (just
special-casing that very case)


Ah, good point.




Finally, as I've hinted before, I think we need to be careful that any time we
change state to VARYING / UNDEFINED from a base method, that the derived class
is in a sane state (there are no equivalences set per the API contract).  This
was not originally enforced in VRP, and I wouldn't be surprised if there are
dragons if we enforce honesty.  I suppose, since we have an API, we could
enforce this lazily: any time equiv() is called, clear the equivalences or
return NULL if it's varying or undefined?  Just a thought.


I have updated ->update () to adjust equiv when we update to VARYING
or UNDEFINED.


Excellent idea.  I don't see that part in your patch though?



+/* Helper for meet operation for value ranges.  Given two value ranges VR0 and
+   VR1, return a range that contains both VR0 and VR1.  This may not be the
+   smallest possible such range.  */
+
+value_range_base
+value_range_base::union_helper (const value_range_base *vr0,
+   const value_range_base *vr1)
+{


I know this was my fault, but would you mind removing vr0 from 
union_helper?  Perhaps something like this:


value_range_base::union_helper (const value_range_base *other)

I think it'll be cleaner and more consistent this way.

Thanks.
Aldy


Re: [PR81878]: fix --disable-bootstrap --enable-languages=ada, and cross-back gnattools build

2018-11-13 Thread Alexandre Oliva
On Nov 13, 2018, Richard Biener  wrote:

>> Please let me know if there are objections to this change in the next
>> few days, e.g., if enabling C and C++ for an Ada-only build is too
>> onerous.  It is certainly possible to rework gnattools build machinery
>> so that it uses CC and CXX as detected by the top-level configure if we
>> can't find xgcc and xg++ in ../gcc.

> I really wonder why we not _always_ do this for consistency given we
> already require a host Ada compiler.

Sorry, I can't tell what the 'this' refers to.  Enabling C and C++ for
an Ada-only build?  Reworking gnattools build machinery to use top-level
CC and CXX?  Something else?

FWIW, I see the the point of using the just-built gcc/g++ if it's there
and usable: considering the checks for different versions of Ada
compilers, you really want to use the last stage of the bootstrap to
build tools linked with the runtime built with it.  It seems to me you'd
run into a catch-22 without that.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


[PATCH] Fix PR87974

2018-11-13 Thread Richard Biener


Do not look at constant or external defs in reduction stmts to
determine the reduction PHI vector type.  Those are promoted/demoted
as required.

This is another fragile area, I'll poke around a bit but nevertheless,
bootstrap & regtest queued.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/87974
* tree-vect-loop.c (vectorizable_reduction): When computing
the vectorized reduction PHI vector type ignore constant
and external defs.

* g++.dg/opt/pr87974.C: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266061)
+++ gcc/tree-vect-loop.c(working copy)
@@ -6061,13 +6070,17 @@ vectorizable_reduction (stmt_vec_info st
return true;
 
   gassign *reduc_stmt = as_a  (reduc_stmt_info->stmt);
+  code = gimple_assign_rhs_code (reduc_stmt);
   for (unsigned k = 1; k < gimple_num_ops (reduc_stmt); ++k)
{
  tree op = gimple_op (reduc_stmt, k);
  if (op == phi_result)
continue;
- if (k == 1
- && gimple_assign_rhs_code (reduc_stmt) == COND_EXPR)
+ if (k == 1 && code == COND_EXPR)
+   continue;
+ bool is_simple_use = vect_is_simple_use (op, loop_vinfo, &dt);
+ gcc_assert (is_simple_use);
+ if (dt == vect_constant_def || dt == vect_external_def)
continue;
  if (!vectype_in
  || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
Index: gcc/testsuite/g++.dg/opt/pr87974.C
===
--- gcc/testsuite/g++.dg/opt/pr87974.C  (nonexistent)
+++ gcc/testsuite/g++.dg/opt/pr87974.C  (working copy)
@@ -0,0 +1,33 @@
+// { dg-do compile }
+// { dg-options "-O3" }
+
+struct h {
+typedef int &c;
+};
+class i {
+struct j {
+   using c = int *;
+};
+using as = j::c;
+};
+template  class k {
+public:
+using as = i::as;
+h::c operator[](long l) {
+   k::as d = 0;
+   return d[l];
+}
+};
+class : public k { } a;
+long c, f;
+void m()
+{
+  for (long b; b <= 6; b++)
+for (long g; g < b; g++) {
+   unsigned long e = g;
+   c = 0;
+   for (; c < b; c++)
+ f = e >>= 1;
+   a[g] = f;
+}
+}


[PATCH] Fix PR87931

2018-11-13 Thread Richard Biener


We need to restrict what we handle as last operation in a nested
cycle because vectorizable_reduction performs the code-generation
in the end.

Boostrap and regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/87931
* tree-vect-loop.c (vect_is_simple_reduction): Restrict
nested cycles we support to latch computations vectorizable_reduction
handles.

* gcc.dg/graphite/pr87931.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266061)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2983,6 +2976,22 @@ vect_is_simple_reduction (loop_vec_info
 
   if (nested_in_vect_loop && !check_reduction)
 {
+  /* FIXME: Even for non-reductions code generation is funneled
+through vectorizable_reduction for the stmt defining the
+PHI latch value.  So we have to artificially restrict ourselves
+for the supported operations.  */
+  switch (get_gimple_rhs_class (code))
+   {
+   case GIMPLE_BINARY_RHS:
+   case GIMPLE_TERNARY_RHS:
+ break;
+   default:
+ /* Not supported by vectorizable_reduction.  */
+ if (dump_enabled_p ())
+   report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
+   "nested cycle: not handled operation: ");
+ return NULL;
+   }
   if (dump_enabled_p ())
report_vect_op (MSG_NOTE, def_stmt, "detected nested cycle: ");
   return def_stmt_info;
Index: gcc/testsuite/gcc.dg/graphite/pr87931.c
===
--- gcc/testsuite/gcc.dg/graphite/pr87931.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr87931.c (working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-tree-copy-prop -fgraphite-identity" } */
+
+#define N 40
+#define M 128
+float in[N+M];
+float coeff[M];
+float fir_out[N];
+
+void fir ()
+{
+  int i,j,k;
+  float diff;
+
+  for (i = 0; i < N; i++) {
+diff = 0;
+for (j = 0; j < M; j++) {
+  diff += in[j+i]*coeff[j];
+}
+fir_out[i] = diff;
+  }
+}


[PATCH] Fix PR87967

2018-11-13 Thread Richard Biener


A simple omission...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/87967
* tree-vect-loop.c (vect_transform_loop): Also copy PHIs
for constants for the scalar loop.

* g++.dg/opt/pr87967.C: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266061)
+++ gcc/tree-vect-loop.c(working copy)
@@ -8264,7 +8257,7 @@ vect_transform_loop (loop_vec_info loop_
   e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo));
   if (! single_pred_p (e->dest))
{
- split_loop_exit_edge (e);
+ split_loop_exit_edge (e, true);
  if (dump_enabled_p ())
dump_printf (MSG_NOTE, "split exit edge of scalar loop\n");
}
Index: gcc/testsuite/g++.dg/opt/pr87967.C
===
--- gcc/testsuite/g++.dg/opt/pr87967.C  (nonexistent)
+++ gcc/testsuite/g++.dg/opt/pr87967.C  (working copy)
@@ -0,0 +1,50 @@
+// { dg-do compile }
+// { dg-options "-O3" }
+
+void h();
+template  struct k { using d = b; };
+template  class> using e = k;
+template  class f>
+using g = typename e::d;
+struct l {
+  template  using ab = typename i::j;
+};
+struct n : l {
+  using j = g;
+};
+class o {
+public:
+  long r();
+};
+char m;
+char s() {
+  if (m)
+return '0';
+  return 'A';
+}
+class t {
+public:
+  typedef char *ad;
+  ad m_fn2();
+};
+void fn3() {
+  char *a;
+  t b;
+  bool p = false;
+  while (*a) {
+h();
+o c;
+if (*a)
+  a++;
+if (c.r()) {
+  n::j q;
+  for (t::ad d = b.m_fn2(), e; d != e; d++) {
+char f = *q;
+*d = f + s();
+  }
+  p = true;
+}
+  }
+  if (p)
+throw;
+}


[PATCH] Fix PR87962

2018-11-13 Thread Richard Biener


The following better detects invalid nested cycles in particular
those part of an outer reduction.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-11-13  Richard Biener  

PR tree-optimization/87962
* tree-vect-loop.c (vect_is_simple_reduction): More reliably
detect outer reduction for disqualifying in-loop uses.

* gcc.dg/pr87962.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 266061)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2807,11 +2807,11 @@ vect_is_simple_reduction (loop_vec_info
   gphi *phi = as_a  (phi_info->stmt);
   struct loop *loop = (gimple_bb (phi))->loop_father;
   struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
+  bool nested_in_vect_loop = flow_loop_nested_p (vect_loop, loop);
   gimple *phi_use_stmt = NULL;
   enum tree_code orig_code, code;
   tree op1, op2, op3 = NULL_TREE, op4 = NULL_TREE;
   tree type;
-  int nloop_uses;
   tree name;
   imm_use_iterator imm_iter;
   use_operand_p use_p;
@@ -2827,7 +2827,7 @@ vect_is_simple_reduction (loop_vec_info
  can be constant.  See PR60382.  */
   if (has_zero_uses (phi_name))
 return NULL;
-  nloop_uses = 0;
+  unsigned nphi_def_loop_uses = 0;
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, phi_name)
 {
   gimple *use_stmt = USE_STMT (use_p);
@@ -2843,20 +2843,7 @@ vect_is_simple_reduction (loop_vec_info
   return NULL;
 }
 
-  /* For inner loop reductions in nested vectorization there are no
- constraints on the number of uses in the inner loop.  */
-  if (loop == vect_loop->inner)
-   continue;
-
-  nloop_uses++;
-  if (nloop_uses > 1)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"reduction value used in loop.\n");
-  return NULL;
-}
-
+  nphi_def_loop_uses++;
   phi_use_stmt = use_stmt;
 }
 
@@ -2894,26 +2881,32 @@ vect_is_simple_reduction (loop_vec_info
   return NULL;
 }
 
-  nloop_uses = 0;
+  unsigned nlatch_def_loop_uses = 0;
   auto_vec lcphis;
+  bool inner_loop_of_double_reduc = false;
   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
 {
   gimple *use_stmt = USE_STMT (use_p);
   if (is_gimple_debug (use_stmt))
continue;
   if (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
-   nloop_uses++;
+   nlatch_def_loop_uses++;
   else
-   /* We can have more than one loop-closed PHI.  */
-   lcphis.safe_push (as_a  (use_stmt));
+   {
+ /* We can have more than one loop-closed PHI.  */
+ lcphis.safe_push (as_a  (use_stmt));
+ if (nested_in_vect_loop
+ && (STMT_VINFO_DEF_TYPE (loop_info->lookup_stmt (use_stmt))
+ == vect_double_reduction_def))
+   inner_loop_of_double_reduc = true;
+   }
 }
 
   /* If this isn't a nested cycle or if the nested cycle reduction value
  is used ouside of the inner loop we cannot handle uses of the reduction
  value.  */
-  bool nested_in_vect_loop = flow_loop_nested_p (vect_loop, loop);
-  if ((!nested_in_vect_loop || !lcphis.is_empty ())
-  && nloop_uses > 1)
+  if ((!nested_in_vect_loop || inner_loop_of_double_reduc)
+  && (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
Index: gcc/testsuite/gcc.dg/pr87962.c
===
--- gcc/testsuite/gcc.dg/pr87962.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/pr87962.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-march=bdver2" { target { x86_64-*-* i?86-*-* } } 
} */
+
+int a, b;
+
+int c()
+{
+  long d, e;
+  while (a) {
+  a++;
+  b = 0;
+  for (; b++ - 2; d = d >> 1)
+   e += d;
+  }
+  return e;
+}


RE: [PATCH 0/3] [ARC] Glibc required patches

2018-11-13 Thread Claudiu Zissulescu
Thank you for quick review. All the patches are pushed with the suggested mods.
Claudiu

From: Claudiu Zissulescu [claz...@gmail.com]
Sent: Monday, November 12, 2018 12:25 PM
To: gcc-patches@gcc.gnu.org
Cc: francois.bed...@synopsys.com; andrew.burg...@embecosm.com; 
claudiu.zissule...@synopsys.com
Subject: [PATCH 0/3] [ARC] Glibc required patches

Hi Andrew,

The attached three patches are required to reduce/enable glibc
builds. Although not all of them are glibc related they are found when
porting this library to ARC.

OK to apply?
Claudiu

Claudiu Zissulescu (3):
  [ARC] Update EH code.
  [ARC] Do not emit ZOL in the presence of text jump tables.
  [ARC] Add support for profiling in glibc.

 gcc/config/arc/arc-protos.h   |  2 +-
 gcc/config/arc/arc.c  | 25 +--
 gcc/config/arc/arc.h  | 14 +++--
 gcc/config/arc/arc.md | 15 ++
 gcc/config/arc/elf.h  |  9 
 gcc/config/arc/linux.h| 10 +
 gcc/testsuite/gcc.target/arc/builtin_eh.c | 22 
 7 files changed, 79 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arc/builtin_eh.c

--
2.19.1



[PATCH 6/6] [RS6000] inline plt call sequences

2018-11-13 Thread Alan Modra
Version 2.

Finally, the point of the previous patches in this series, support for
inline PLT calls, keyed off -fno-plt.  This emits code using new
relocations that tie all insns in the sequence together, so that the
linker can edit the sequence back to a direct call should the call
target turn out to be local.  An example of ELFv2 code to call puts is
as follows:

 .reloc .,R_PPC64_PLTSEQ,puts
std 2,24(1)
 .reloc .,R_PPC64_PLT16_HA,puts
addis 12,2,0
 .reloc .,R_PPC64_PLT16_LO_DS,puts
ld 12,0(12)
 .reloc .,R_PPC64_PLTSEQ,puts
mtctr 12
 .reloc .,R_PPC64_PLTCALL,puts
bctrl
ld 2,24(1)

"addis 12,2,puts@plt@ha" and "ld 12,puts@plt@l(12)" are also supported
by the assembler.  gcc instead uses the explicit R_PPC64_PLT16_HA and
R_PPC64_PLT16_LO_DS relocs because when the call is to __tls_get_addr
an extra reloc is emitted at every place where one is shown above, to
specify the __tls_get_addr arg.  The linker expects the extra reloc to
come first.  .reloc enforces that ordering.

The patch also changes code emitted for longcalls if the assembler
supports the new marker relocs, so that these too can be edited.  One
side effect of longcalls using PLT16 relocs is that they can now be
resolved lazily by ld.so.

I don't support lazy inline PLT calls for ELFv1, because ELFv1 would
need barriers to reliably load both the function address and toc
pointer from the PLT.  ELFv1 -fno-plt uses the longcall sequence
instead, which isn't edited by GNU ld.

* config.in (HAVE_AS_PLTSEQ): Add.
* config/rs6000/predicates.md (indirect_call_operand): New.
* config/rs6000/rs6000-protos.h (rs6000_pltseq_template),
(rs6000_sibcall_sysv): Declare.
* config/rs6000/rs6000.c (init_cumulative_args): Set cookie
CALL_LONG for -fno-plt.
(print_operand ): Handle UNSPEC_PLTSEQ.
(rs6000_indirect_call_template_1): Emit .reloc directives for
UNSPEC_PLTSEQ calls.
(rs6000_pltseq_template): New function.
(rs6000_longcall_ref): Add arg parameter.  Use PLT16 insns if
relocs supported by assembler.  Move SYMBOL_REF test to callers.
(rs6000_call_aix): Adjust rs6000_longcall_ref call.  Package
insns in UNSPEC_PLTSEQ, preserving original func_desc.
(rs6000_call_sysv): Likewise.
(rs6000_sibcall_sysv): New function.
* config/rs6000/rs6000.h (HAVE_AS_PLTSEQ): Provide default.
* config/rs6000/rs6000.md (UNSPEC_PLTSEQ, UNSPEC_PLT16_HA,
UNSPEC_PLT16_LO): New.
(pltseq_tocsave, pltseq_plt16_ha, pltseq_plt16_lo, pltseq_mtctr): New.
(call_indirect_nonlocal_sysv): Don't differentiate zero from non-zero
cookie in constraints.  Test explicitly for flags in length attr.
Handle unspec operand 1.
(call_value_indirect_nonlocal_sysv): Likewise.
(call_indirect_aix, call_value_indirect_aix): Handle unspec operand 1.
(call_indirect_elfv2, call_value_indirect_elfv2): Likewise.
(sibcall, sibcall_value): Use rs6000_sibcall_sysv.
(sibcall_indirect_nonlocal_sysv): New pattern.
(sibcall_value_indirect_nonlocal_sysv): Likewise.
(sibcall_nonlocal_sysv, sibcall_value_nonlocal_sysv): Remove indirect
call alternatives.
* configure.ac: Check for gas plt sequence marker support.
* configure: Regenerate.

diff --git a/gcc/config.in b/gcc/config.in
index 67a1e6cfc4c..86ff5e8636b 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -577,6 +577,12 @@
 #endif
 
 
+/* Define if your assembler supports R_PPC*_PLTSEQ relocations. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_PLTSEQ
+#endif
+
+
 /* Define if your assembler supports .ref */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_REF
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 7e45d2f0371..1af01935b5e 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1055,6 +1055,24 @@ (define_predicate "call_operand"
  || REGNO (op) >= FIRST_PSEUDO_REGISTER")
  (match_code "symbol_ref")))
 
+;; Return 1 if the operand, used inside a MEM, is a valid first argument
+;; to an indirect CALL.  This is LR, CTR, or a PLTSEQ unspec using CTR.
+(define_predicate "indirect_call_operand"
+  (match_code "reg,unspec")
+{
+  if (REG_P (op))
+return (REGNO (op) == LR_REGNO
+   || REGNO (op) == CTR_REGNO);
+  if (GET_CODE (op) == UNSPEC)
+{
+  if (XINT (op, 1) != UNSPEC_PLTSEQ)
+   return false;
+  op = XVECEXP (op, 0, 0);
+  return REG_P (op) && REGNO (op) == CTR_REGNO;
+}
+  return false;
+})
+
 ;; Return 1 if the operand is a SYMBOL_REF for a function known to be in
 ;; this file.
 (define_predicate "current_file_function_operand"
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 3fd89dc20db..35209d4525d 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.

[PATCH 5/6] [RS6000] Use standard call patterns for __tls_get_addr calls

2018-11-13 Thread Alan Modra
Version 2.

The current code handling __tls_get_addr calls for powerpc*-linux
generates a call then overwrites the call insn with a special
tls_{gd,ld}_{aix,sysv} pattern.  It's done that way to support
!TARGET_TLS_MARKERS, where the arg setup insns need to be emitted
immediately before the branch and link.  When TARGET_TLS_MARKERS, the
arg setup insns are split from the actual call, but we then have a
non-standard call pattern that needs to be carried through to output.

This patch changes that scheme, to instead use the standard call
patterns for __tls_get_addr calls, except for the now rare
!TARGET_TLS_MARKERS case.  Doing it this way should be better for
maintenance as the !TARGET_TLS_MARKERS code can eventually disappear.
It also makes it possible to support longcalls (and in following
patches, inline plt calls) for __tls_get_addr without introducing yet
more special call patterns.

__tls_get_addr calls do however need to be different to standard
calls, because when TARGET_TLS_MARKERS the calls are decorated with an
argument specifier, eg. "bl __tls_get_addr(thread_var@tlsgd)" that
causes a reloc to be emitted by the assembler tying the call to its
arg setup insns.  I chose to smuggle the arg in the currently unused
stack size rtl.

I've also introduced rs6000_call_sysv to generate rtl for sysv calls,
as rs6000_call_aix does for aix and elfv2 calls.  This allows
rs6000_longcall_ref to be local to rs6000.c since the calls in the
expanders never did anything for darwin.

* config/rs6000/predicates.md (unspec_tls): New.
* config/rs6000/rs6000-protos.h (rs6000_call_template),
(rs6000_sibcall_template): Update prototype.
(rs6000_longcall_ref): Delete.
(rs6000_call_sysv): Declare.
* config/rs6000/rs6000.c (edit_tls_call_insn): New function.
(global_tlsarg): New variable.
(rs6000_legitimize_tls_address): Rewrite __tls_get_addr call
handling.
(print_operand): Extract UNSPEC_TLSGD address operand.
(rs6000_call_template, rs6000_sibcall_template): Remove arg
parameter, extract from second call operand instead.
(rs6000_longcall_ref): Make static, localize vars.
(rs6000_call_aix): Rename parameter to reflect new usage.  Take
tlsarg from global_tlsarg.  Don't create unused rtl or nop insns.
(rs6000_sibcall_aix): Rename parameter to reflect new usage.  Take
tlsarg from global_tlsarg.
(rs6000_call_sysv): New function.
* config/rs6000/rs6000.md: Adjust rs6000_call_template and
rs6000_sibcall_template throughout.
(tls_gd_aix, tls_gd_sysv, tls_gd_call_aix, tls_gd_call_sysv): Delete.
(tls_ld_aix, tls_ld_sysv, tls_ld_call_aix, tls_ld_call_sysv): Delete.
(tls_gdld_aix, tls_gdld_sysv): New insns, replacing above.
(tls_gd): Swap operand order.  Simplify mode selection.
(tls_gd_high, tls_gd_low): Swap operand order.
(tls_ld): Remove const_int 0 vector element from UNSPEC_TLSLD.
Simplify mode selection.
(tls_ld_high, tls_ld_low): Similarly adjust UNSPEC_TLSLD.
(call, call_value): Don't assert for second call operand.
Use rs6000_call_sysv.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index b80c278d742..7e45d2f0371 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1039,6 +1039,13 @@ (define_predicate "rs6000_tls_symbol_ref"
   (and (match_code "symbol_ref")
(match_test "RS6000_SYMBOL_REF_TLS_P (op)")))
 
+;; Return 1 for the UNSPEC used in TLS call operands
+(define_predicate "unspec_tls"
+  (match_code "unspec")
+{
+  return XINT (op, 1) == UNSPEC_TLSGD || XINT (op, 1) == UNSPEC_TLSLD;
+})
+
 ;; Return 1 if the operand, used inside a MEM, is a valid first argument
 ;; to CALL.  This is a SYMBOL_REF, a pseudo-register, LR or CTR.
 (define_predicate "call_operand"
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 967f65e2d94..3fd89dc20db 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -111,8 +111,8 @@ extern int ccr_bit (rtx, int);
 extern void rs6000_output_function_entry (FILE *, const char *);
 extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
-extern const char *rs6000_call_template (rtx *, unsigned int, const char *);
-extern const char *rs6000_sibcall_template (rtx *, unsigned int, const char *);
+extern const char *rs6000_call_template (rtx *, unsigned int);
+extern const char *rs6000_sibcall_template (rtx *, unsigned int);
 extern const char *rs6000_indirect_call_template (rtx *, unsigned int);
 extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
@@ -136,7 +136,6 @@ extern void rs6000_expand_atomic_op (enum rtx_code, rtx, 
rtx, rtx, rtx, rtx);
 extern void rs6000_emit_swdiv (rtx, rtx, rtx, bool);
 e

[PATCH 4/6] [RS6000] Remove constraints on call rounded_stack_size_rtx arg

2018-11-13 Thread Alan Modra
Version 2.  (Same as before, here for completeness.)

This call arg is unused on rs6000.

* config/rs6000/darwin.md (call_indirect_nonlocal_darwin64),
(call_nonlocal_darwin64, call_value_indirect_nonlocal_darwin64),
(call_value_nonlocal_darwin64): Remove constraints from second call
arg, the rounded_stack_size_rtx arg.
* config/rs6000/rs6000.md (tls_gd_aix, tls_gd_sysv, tls_gd_call_aix),
(tls_gd_call_sysv, tls_ld_aix, tls_ld_sysv, tls_ld_call_aix),
(tls_ld_call_sysv, call_local32, call_local64, call_value_local32),
(call_value_local64, call_indirect_nonlocal_sysv),
(call_nonlocal_sysv, call_nonlocal_sysv_secure),
(call_value_indirect_nonlocal_sysv, call_value_nonlocal_sysv),
(call_value_nonlocal_sysv_secure, call_local_aix),
(call_value_local_aix, call_nonlocal_aix, call_value_nonlocal_aix),
(call_indirect_aix, call_value_indirect_aix, call_indirect_elfv2),
(call_value_indirect_elfv2, sibcall_local32, sibcall_local64),
(sibcall_value_local32, sibcall_value_local64, sibcall_aix),
(sibcall_value_aix): Likewise.

diff --git a/gcc/config/rs6000/darwin.md b/gcc/config/rs6000/darwin.md
index 2d6d1ca57dd..a1c07702d6f 100644
--- a/gcc/config/rs6000/darwin.md
+++ b/gcc/config/rs6000/darwin.md
@@ -302,7 +302,7 @@ (define_insn "macho_correct_pic_di"
 
 (define_insn "*call_indirect_nonlocal_darwin64"
   [(call (mem:SI (match_operand:DI 0 "register_operand" "c,*l,c,*l"))
-(match_operand 1 "" "g,g,g,g"))
+(match_operand 1))
(use (match_operand:SI 2 "immediate_operand" "O,O,n,n"))
(clobber (reg:SI LR_REGNO))]
   "DEFAULT_ABI == ABI_DARWIN && TARGET_64BIT"
@@ -314,7 +314,7 @@ (define_insn "*call_indirect_nonlocal_darwin64"
 
 (define_insn "*call_nonlocal_darwin64"
   [(call (mem:SI (match_operand:DI 0 "symbol_ref_operand" "s,s"))
-(match_operand 1 "" "g,g"))
+(match_operand 1))
(use (match_operand:SI 2 "immediate_operand" "O,n"))
(clobber (reg:SI LR_REGNO))]
   "(DEFAULT_ABI == ABI_DARWIN)
@@ -332,7 +332,7 @@ (define_insn "*call_nonlocal_darwin64"
 (define_insn "*call_value_indirect_nonlocal_darwin64"
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:DI 1 "register_operand" "c,*l,c,*l"))
- (match_operand 2 "" "g,g,g,g")))
+ (match_operand 2)))
(use (match_operand:SI 3 "immediate_operand" "O,O,n,n"))
(clobber (reg:SI LR_REGNO))]
   "DEFAULT_ABI == ABI_DARWIN"
@@ -345,7 +345,7 @@ (define_insn "*call_value_indirect_nonlocal_darwin64"
 (define_insn "*call_value_nonlocal_darwin64"
   [(set (match_operand 0 "" "")
(call (mem:SI (match_operand:DI 1 "symbol_ref_operand" "s,s"))
- (match_operand 2 "" "g,g")))
+ (match_operand 2)))
(use (match_operand:SI 3 "immediate_operand" "O,n"))
(clobber (reg:SI LR_REGNO))]
   "(DEFAULT_ABI == ABI_DARWIN)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 793a0a9d840..c261c8bb9c1 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9439,7 +9439,7 @@ (define_peephole2
 (define_insn_and_split "tls_gd_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
- (match_operand 4 "" "g")))
+ (match_operand 4)))
(unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
  (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
@@ -9473,7 +9473,7 @@ (define_insn_and_split "tls_gd_aix"
 (define_insn_and_split "tls_gd_sysv"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
- (match_operand 4 "" "g")))
+ (match_operand 4)))
(unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
  (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
@@ -9540,7 +9540,7 @@ (define_insn "*tls_gd_low"
 (define_insn "*tls_gd_call_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 1 "symbol_ref_operand" "s"))
- (match_operand 2 "" "g")))
+ (match_operand 2)))
(unspec:P [(match_operand:P 3 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
@@ -9555,7 +9555,7 @@ (define_insn "*tls_gd_call_aix"
 (define_insn "*tls_gd_call_sysv"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 1 "symbol_ref_operand" "s"))
- (match_operand 2 "" "g")))
+ (match_operand 2)))
(unspec:P [(match_operand:P 3 "rs6000_tls_symbol_ref" "")]
 UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
@@ -9568,7 +9568,7 @@ (define_insn "*tls_gd_call_sysv"
 (define_insn_and_split "tls_ld_aix"
   [(set (match_operand:P 0 "gpc_reg_operand" "=b")
 (call (mem:SI (match_operand:P 2 "symbol_ref_operand" "s"))
- 

[PATCH 3/6] [RS6000] Replace TLSmode with P, and correct tls call mems

2018-11-13 Thread Alan Modra
Version 2.

There is really no need to define a TLSmode mode iterator that is
identical (since !TARGET_64BIT == TARGET_32BIT) to the much used P
mode iterator.  It's nonsense to think we might ever want to support
32-bit TLS on 64-bit or vice versa!  The patch also fixes a minor
error in the call mems.  All other direct calls use (call (mem:SI ..)).

* config/rs6000/rs6000.md (TLSmode): Delete mode iterator.  Replace
with P throughout except for call mems which should use SI.
(tls_abi_suffix, tls_sysv_suffix, tls_insn_suffix): Delete mode
iterators.  Replace with bits, mode and ptrload respectively.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index fe904b1966b..793a0a9d840 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9436,19 +9436,13 @@ (define_peephole2
 
 ;; TLS support.
 
-;; Mode attributes for different ABIs.
-(define_mode_iterator TLSmode [(SI "! TARGET_64BIT") (DI "TARGET_64BIT")])
-(define_mode_attr tls_abi_suffix [(SI "32") (DI "64")])
-(define_mode_attr tls_sysv_suffix [(SI "si") (DI "di")])
-(define_mode_attr tls_insn_suffix [(SI "wz") (DI "d")])
-
-(define_insn_and_split "tls_gd_aix"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-(call (mem:TLSmode (match_operand:TLSmode 3 "symbol_ref_operand" "s"))
+(define_insn_and_split "tls_gd_aix"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+(call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
  (match_operand 4 "" "g")))
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-   (match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-  UNSPEC_TLSGD)
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+ (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
+UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
   "HAVE_AS_TLS && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)"
 {
@@ -9461,28 +9455,28 @@ (define_insn_and_split 
"tls_gd_aix"
 }
   "&& TARGET_TLS_MARKERS"
   [(set (match_dup 0)
-   (unspec:TLSmode [(match_dup 1)
-(match_dup 2)]
-   UNSPEC_TLSGD))
+   (unspec:P [(match_dup 1)
+  (match_dup 2)]
+ UNSPEC_TLSGD))
(parallel [(set (match_dup 0)
-  (call (mem:TLSmode (match_dup 3))
-(match_dup 4)))
- (unspec:TLSmode [(match_dup 2)] UNSPEC_TLSGD)
+  (call (mem:SI (match_dup 3))
+(match_dup 4)))
+ (unspec:P [(match_dup 2)] UNSPEC_TLSGD)
  (clobber (reg:SI LR_REGNO))])]
   ""
   [(set_attr "type" "two")
(set (attr "length")
  (if_then_else (ne (symbol_ref "TARGET_CMODEL") (symbol_ref 
"CMODEL_SMALL"))
-  (const_int 16)
-  (const_int 12)))])
+  (const_int 16)
+  (const_int 12)))])
 
-(define_insn_and_split "tls_gd_sysv"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-(call (mem:TLSmode (match_operand:TLSmode 3 "symbol_ref_operand" "s"))
+(define_insn_and_split "tls_gd_sysv"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+(call (mem:SI (match_operand:P 3 "symbol_ref_operand" "s"))
  (match_operand 4 "" "g")))
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-   (match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-  UNSPEC_TLSGD)
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+ (match_operand:P 2 "rs6000_tls_symbol_ref" "")]
+UNSPEC_TLSGD)
(clobber (reg:SI LR_REGNO))]
   "HAVE_AS_TLS && DEFAULT_ABI == ABI_V4"
 {
@@ -9491,64 +9485,64 @@ (define_insn_and_split 
"tls_gd_sysv"
 }
   "&& TARGET_TLS_MARKERS"
   [(set (match_dup 0)
-   (unspec:TLSmode [(match_dup 1)
-(match_dup 2)]
-   UNSPEC_TLSGD))
+   (unspec:P [(match_dup 1)
+  (match_dup 2)]
+ UNSPEC_TLSGD))
(parallel [(set (match_dup 0)
-  (call (mem:TLSmode (match_dup 3))
-(match_dup 4)))
- (unspec:TLSmode [(match_dup 2)] UNSPEC_TLSGD)
+  (call (mem:SI (match_dup 3))
+(match_dup 4)))
+ (unspec:P [(match_dup 2)] UNSPEC_TLSGD)
  (clobber (reg:SI LR_REGNO))])]
   ""
   [(set_attr "type" "two")
(set_attr "length" "8")])
 
-(define_insn_and_split "*tls_gd"
-  [(set (match_operand:TLSmode 0 "gpc_reg_operand" "=b")
-   (unspec:TLSmode [(match_operand:TLSmode 1 "gpc_reg_operand" "b")
-(match_operand:TLSmode 2 "rs6000_tls_symbol_ref" "")]
-   UNSPEC_TLSGD))]
+(define_insn_and_split "*tls_gd"
+  [(set (match_operand:P 0 "gpc_reg_operand" "=b")
+   (unspec:P [(match_operand:P 1 "gpc_reg_operand" "b")
+  (match_operand:P 2 "rs6000

[PATCH 2/6] [RS6000] rs6000_indirect_call_template

2018-11-13 Thread Alan Modra
Version 2.

Like the last patch for external calls, now handle most assembly code
for indirect calls in one place.  The patch also merges some insns,
correcting some !rs6000_speculate_indirect_jumps cases branching to
LR, which don't require a speculation barrier.

* config/rs6000/rs6000-protos.h (rs6000_indirect_call_template),
(rs6000_indirect_sibcall_template): Declare.
* config/rs6000/rs6000.c (rs6000_indirect_call_template_1),
(rs6000_indirect_call_template, rs6000_indirect_sibcall_template):
New functions.
* config/rs6000/rs6000.md (call_indirect_nonlocal_sysv),
(call_value_indirect_nonlocal_sysv, sibcall_nonlocal_sysv),
(call_indirect_aix, call_value_indirect_aix): Use
rs6000_indirect_call_template and rs6000_indirect_sibcall_template.
call_indirect_elfv2, call_value_indirect_elfv2): Likewise, and
handle both speculation and non-speculation cases.
(call_indirect_aix_nospec, call_value_indirect_aix_nospec): Delete.
(call_indirect_elfv2_nospec, call_value_indirect_elfv2_nospec): Delete.

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 303ba7b91c3..967f65e2d94 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -113,6 +113,8 @@ extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
 extern const char *rs6000_call_template (rtx *, unsigned int, const char *);
 extern const char *rs6000_sibcall_template (rtx *, unsigned int, const char *);
+extern const char *rs6000_indirect_call_template (rtx *, unsigned int);
+extern const char *rs6000_indirect_sibcall_template (rtx *, unsigned int);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
   enum rtx_code);
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6e84f5053c2..cd1ab95166e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21416,6 +21416,83 @@ rs6000_sibcall_template (rtx *operands, unsigned int 
funop, const char *arg)
   return rs6000_call_template_1 (operands, funop, true, arg);
 }
 
+/* As above, for indirect calls.  */
+
+static const char *
+rs6000_indirect_call_template_1 (rtx *operands, unsigned int funop,
+bool sibcall)
+{
+  /* -Wformat-overflow workaround, without which gcc thinks that %u
+  might produce 10 digits.  */
+  gcc_assert (funop <= MAX_RECOG_OPERANDS);
+
+  static char str[144];
+  const char *ptrload = TARGET_64BIT ? "d" : "wz";
+
+  /* We don't need the extra code to stop indirect call speculation if
+ calling via LR.  */
+  bool speculate = (TARGET_MACHO
+   || rs6000_speculate_indirect_jumps
+   || (REG_P (operands[funop])
+   && REGNO (operands[funop]) == LR_REGNO));
+
+  if (DEFAULT_ABI == ABI_AIX)
+{
+  if (speculate)
+   sprintf (str,
+"l%s 2,%%%u\n\t"
+"b%%T%ul\n\t"
+"l%s 2,%%%u(1)",
+ptrload, funop + 2, funop, ptrload, funop + 3);
+  else
+   sprintf (str,
+"crset 2\n\t"
+"l%s 2,%%%u\n\t"
+"beq%%T%ul-\n\t"
+"l%s 2,%%%u(1)",
+ptrload, funop + 2, funop, ptrload, funop + 3);
+}
+  else if (DEFAULT_ABI == ABI_ELFv2)
+{
+  if (speculate)
+   sprintf (str,
+"b%%T%ul\n\t"
+"l%s 2,%%%u(1)",
+funop, ptrload, funop + 2);
+  else
+   sprintf (str,
+"crset 2\n\t"
+"beq%%T%ul-\n\t"
+"l%s 2,%%%u(1)",
+funop, ptrload, funop + 2);
+}
+  else
+{
+  if (speculate)
+   sprintf (str,
+"b%%T%u%s",
+funop, sibcall ? "" : "l");
+  else
+   sprintf (str,
+"crset 2\n\t"
+"beq%%T%u%s-%s",
+funop, sibcall ? "" : "l", sibcall ? "\n\tb $" : "");
+}
+  return str;
+}
+
+const char *
+rs6000_indirect_call_template (rtx *operands, unsigned int funop)
+{
+  return rs6000_indirect_call_template_1 (operands, funop, false);
+}
+
+const char *
+rs6000_indirect_sibcall_template (rtx *operands, unsigned int funop)
+{
+  return rs6000_indirect_call_template_1 (operands, funop, true);
+}
+
 #if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO
 /* Emit an assembler directive to set symbol visibility for DECL to
VISIBILITY_TYPE.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index db9cfe92c72..fe904b1966b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10539,11 +10539,7 @@ (define_insn "*call_indirect_nonlocal_sysv"
   else if (INTVAL (operands[2]) & CALL_V4_CLEAR_FP_ARGS)
 output_asm_insn ("creqv 6,6,6", operands);
 
-  if (

[PATCH 1/6] [RS6000] rs6000_call_template for external call insn assembly output

2018-11-13 Thread Alan Modra
Version 2.

This is a first step in tidying rs6000 call patterns, in preparation
to support inline plt calls.

* config/rs6000/rs6000-protos.h (rs6000_call_template): Declare.
(rs6000_sibcall_template): Declare.
(macho_call_template): Rename from output_call.
* config/rs6000/rs6000.c (rs6000_call_template_1): New function.
(rs6000_call_template, rs6000_sibcall_template): Likewise.
(macho_call_template): Rename from output_call.
* config/rs6000/rs6000.md (tls_gd_aix, tls_gd_sysv),
(tls_gd_call_aix, tls_gd_call_sysv, tls_ld_aix, tls_ld_sysv),
(tls_ld_call_aix, tls_ld_call_sysv, call_nonlocal_sysv),
(call_nonlocal_sysv_secure, call_value_nonlocal_sysv),
(call_value_nonlocal_sysv_secure, call_nonlocal_aix),
(call_value_nonlocal_aix): Use rs6000_call_template and update
occurrences of output_call to macho_call_template.
(sibcall_nonlocal_sysv, sibcall_value_nonlocal_sysv, sibcall_aix),
(sibcall_value_aix): Use rs6000_sibcall_template.

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index fb69019c47c..303ba7b91c3 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -111,6 +111,8 @@ extern int ccr_bit (rtx, int);
 extern void rs6000_output_function_entry (FILE *, const char *);
 extern void print_operand (FILE *, rtx, int);
 extern void print_operand_address (FILE *, rtx);
+extern const char *rs6000_call_template (rtx *, unsigned int, const char *);
+extern const char *rs6000_sibcall_template (rtx *, unsigned int, const char *);
 extern enum rtx_code rs6000_reverse_condition (machine_mode,
   enum rtx_code);
 extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
@@ -228,7 +230,7 @@ extern void (*rs6000_target_modify_macros_ptr) (bool, 
HOST_WIDE_INT,
 extern void rs6000_d_target_versions (void);
 
 #if TARGET_MACHO
-char *output_call (rtx_insn *, rtx *, int, int);
+char *macho_call_template (rtx_insn *, rtx *, int, int);
 #endif
 
 #ifdef NO_DOLLAR_IN_LABEL
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 516e69724cc..6e84f5053c2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21372,6 +21372,50 @@ rs6000_assemble_integer (rtx x, unsigned int size, int 
aligned_p)
   return default_assemble_integer (x, size, aligned_p);
 }
 
+/* Return a template string for assembly to emit when making an
+   external call.  FUNOP is the call mem argument operand number,
+   ARG is either NULL or a @TLSGD or @TLSLD __tls_get_addr argument
+   specifier.  */
+
+static const char *
+rs6000_call_template_1 (rtx *operands ATTRIBUTE_UNUSED, unsigned int funop,
+   bool sibcall, const char *arg)
+{
+  /* -Wformat-overflow workaround, without which gcc thinks that %u
+  might produce 10 digits.  */
+  gcc_assert (funop <= MAX_RECOG_OPERANDS);
+
+  /* The magic 32768 offset here corresponds to the offset of
+ r30 in .got2, as given by LCTOC1.  See sysv4.h:toc_section.  */
+  char z[11];
+  sprintf (z, "%%z%u%s", funop,
+  (DEFAULT_ABI == ABI_V4 && TARGET_SECURE_PLT && flag_pic == 2
+   ? "+32768" : ""));
+
+  static char str[32];  /* 4 spare */
+  if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
+sprintf (str, "b%s %s%s%s", sibcall ? "" : "l", z, arg,
+sibcall ? "" : "\n\tnop");
+  else if (DEFAULT_ABI == ABI_V4)
+sprintf (str, "b%s %s%s%s", sibcall ? "" : "l", z, arg,
+flag_pic ? "@plt" : "");
+  else
+gcc_unreachable ();
+  return str;
+}
+
+const char *
+rs6000_call_template (rtx *operands, unsigned int funop, const char *arg)
+{
+  return rs6000_call_template_1 (operands, funop, false, arg);
+}
+
+const char *
+rs6000_sibcall_template (rtx *operands, unsigned int funop, const char *arg)
+{
+  return rs6000_call_template_1 (operands, funop, true, arg);
+}
+
 #if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO
 /* Emit an assembler directive to set symbol visibility for DECL to
VISIBILITY_TYPE.  */
@@ -32810,8 +32854,8 @@ get_prev_label (tree function_name)
CALL_DEST is the routine we are calling.  */
 
 char *
-output_call (rtx_insn *insn, rtx *operands, int dest_operand_number,
-int cookie_operand_number)
+macho_call_template (rtx_insn *insn, rtx *operands, int dest_operand_number,
+int cookie_operand_number)
 {
   static char buf[256];
   if (darwin_emit_branch_islands
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 65f5fa6e66b..db9cfe92c72 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9453,10 +9453,11 @@ (define_insn_and_split 
"tls_gd_aix"
   "HAVE_AS_TLS && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)"
 {
   if (TARGET_CMODEL != CMODEL_SMALL)
-return "addis %0,%1,%2@got@tlsgd@ha\;addi %0,%0,%2@got@tlsgd@l\;"
-  "bl %z3\;nop";
+output_

Re: [RS6000] Remove unnecessary rtx_equal_p

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 02:16:09PM +1030, Alan Modra wrote:
> REGs are unique.  This patch recognizes that fact, speeding up rs6000
> gcc infinitesimally.  Bootstrapped etc. powerpc64le-linux.  OK?

Of course, fine for trunk.  Thanks!


Segher


>   * gcc/config/rs6000/rs6000.c (rs6000_legitimate_address_p): Replace
>   rtx_equal_p call for known REGs with pointer comparison.
>   (rs6000_secondary_reload_memory): Likewise.
>   (rs6000_secondary_reload_inner): Likewise.


Re: [RS6000] Don't put large integer constants in TOC for -mcmodel=medium

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 01:53:20PM +1030, Alan Modra wrote:
> For -mcmodel=medium we can use toc-relative addressing to access
> constants placed in read-only data, which is better since they can be
> merged when in .rodata.cst8.
> 
> Bootstrapped etc. powerpc64le-linux.  OK?

Okay, thanks!


Segher


>   * config/rs6000/linux64.h (ASM_OUTPUT_SPECIAL_POOL_ENTRY_P): Exclude
>   integer constants when -mcmodel=medium.


Re: [RS6000] Don't pass -many to the assembler

2018-11-13 Thread Segher Boessenkool
On Tue, Nov 13, 2018 at 12:02:55PM +1030, Alan Modra wrote:
> On Mon, Nov 12, 2018 at 04:34:34PM -0800, Mike Stump wrote:
> > On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
> > > 
> > > For people developing new code, it's the right way to go, and
> > > especially so for people working on gcc itself.  For people just
> > > wanting stuff to compile, not so much.  I fully expect a chorus of
> > > *MORON* or worse to come from the likes of the linux kernel rabble.
> > 
> > So, if you just want to hear people whine...
> 
> I'm happy to hear other points of view.  Ignore my hyperbole.
> 
> > On darwin, we (darwin, as a platform decision) like all instructions 
> > available from the assembler.
> 
> OK, fair enough.  Another option is to just disable -many when gcc is
> in development, like we enable checking.

That is a good plan for GCC 9 at least.


Segher


  1   2   >