date:20160427

[patch][wwwdocs] Fixed incorrect copy paste in codingconventions.html

2016-04-27 Thread Chris Gregory

In the `Extern "C"` commentary, the coding conventions said:

Definitions within the body of a namespace are not indented.

Now it reads:

Definitions within the body of an extern "C" block
are not indented.

Initially reported at:
https://gcc.gnu.org/ml/gcc/2016-04/msg00211.html

Cheers,

Chris Gregory!
Index: htdocs/codingconventions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v
retrieving revision 1.73
diff -r1.73 codingconventions.html
1306c1306
< Definitions within the body of a namespace are not indented.
---
> Definitions within the body of an extern "C" block are not 
> indented.

Re: [PATCH 08/18] make side_effects a vec

2016-04-27 Thread Jeff Law


On 04/22/2016 06:11 AM, Trevor Saunders wrote:

On Thu, Apr 21, 2016 at 11:12:48PM -0600, Jeff Law wrote:

On 04/20/2016 12:22 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2016-04-19  Trevor Saunders  

* var-tracking.c (struct adjust_mem_data): Make side_effects a vector.
(adjust_mems): Adjust.
(adjust_insn): Likewise.
(prepare_call_arguments): Likewise.
---
 gcc/var-tracking.c | 30 +++---
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
index 9f09d30..7fc6ed3 100644
--- a/gcc/var-tracking.c
+++ b/gcc/var-tracking.c
@@ -926,7 +926,7 @@ struct adjust_mem_data
   bool store;
   machine_mode mem_mode;
   HOST_WIDE_INT stack_adjust;
-  rtx_expr_list *side_effects;
+  auto_vec side_effects;
 };

Is auto_vec what you really want here?  AFAICT this object is never
destructed, so we're not releasing the memory.  Am I missing something here?


it is destructed, auto_vec has a destructor, there for adjust_mem_data
has a destructor since it has a field with a destructor.
adjust_mem_data is always on the stack so the compiler deals with making
sure the destructor is called when it goes out of scope.\

Duh :-)

OK for the trunk.  It looks like Bernd and others are handling the bulk 
of these, so I'll step aside.


jeff

Re: [PATCH] Take known zero bits into account when checking extraction.

2016-04-27 Thread Jeff Law


On 04/27/2016 02:20 AM, Dominik Vogt wrote:

The attached patch is a result of discussing an S/390 issue with
"and with complement" in some cases.

  https://gcc.gnu.org/ml/gcc/2016-03/msg00163.html
  https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01586.html

Combine would merge a ZERO_EXTEND and a SET taking the known zero
bits into account, resulting in an AND.  Later on,
make_compound_operation() fails to replace that with a ZERO_EXTEND
which we get for free on S/390 but leaves the AND, eventually
resulting in two consecutive AND instructions.

The current code in make_compound_operation() that detects
opportunities for ZERO_EXTEND does not work here because it does
not take the known zero bits into account:

  /* If the constant is one less than a power of two, this might be
 representable by an extraction even if no shift is present.
 If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
 we are in a COMPARE.  */
  else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
new_rtx = make_extraction (mode,
   make_compound_operation (XEXP (x, 0),
next_code),
   0, NULL_RTX, i, 1, 0, in_code == COMPARE);

An attempt to use the zero bits in the above conditions resulted
in many situations that generated worse code, so the patch tries
to fix this in a more conservative way.  While the effect is
completely positive on S/390, this will very likely have
unforeseeable consequences on other targets.

Bootstrapped and regression tested on s390 and s390x only at the
moment.

Ciao

Dominik ^_^  ^_^

-- Dominik Vogt IBM Germany


0001-ChangeLog


gcc/ChangeLog

* combine.c (make_compound_operation): Take known zero bits into
account when checking for possible zero_extend.
I'd strongly recommend writing some tests for this.  Extra credit if 
they can be run on an x86 target which gets more testing than s390.


If I go back to our original discussion, we have this going into combine:

(insn 6 3 7 2 (parallel [
(set (reg:SI 64)
(and:SI (mem:SI (reg/v/f:DI 63 [ a ]) [1 *a_2(D)+0 S4 A32])
(const_int -65521 [0x000f])))
(clobber (reg:CC 33 %cc))
]) andc-immediate.c:21 1481 {*andsi3_zarch}
 (expr_list:REG_DEAD (reg/v/f:DI 63 [ a ])
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil
(insn 7 6 12 2 (set (reg:DI 65)
(zero_extend:DI (reg:SI 64))) andc-immediate.c:21 1207 
{*zero_extendsidi2}

 (expr_list:REG_DEAD (reg:SI 64)
(nil)))
(insn 12 7 13 2 (set (reg/i:DI 2 %r2)
(reg:DI 65)) andc-immediate.c:22 1073 {*movdi_64}
 (expr_list:REG_DEAD (reg:DI 65)
(nil)))

Which combine turns into:

(insn 6 3 7 2 (parallel [
(set (reg:SI 64)
(and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
(const_int -65521 [0x000f])))
(clobber (reg:CC 33 %cc))
]) andc-immediate.c:21 1481 {*andsi3_zarch}
 (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil
(insn 12 7 13 2 (parallel [
(set (reg/i:DI 2 %r2)
(and:DI (subreg:DI (reg:SI 64) 0)
 ^^^
(const_int 4294901775 [0x000f])))
   ^^
(clobber (reg:CC 33 %cc))
]) andc-immediate.c:22 1474 {*anddi3}
 (expr_list:REG_UNUSED (reg:CC 33 %cc)
(expr_list:REG_DEAD (reg:SI 64)
(nil


Instead you want insn 12 to use a zero-extend to extend (reg:SI 64) into 
(reg:DI 2)?


Can't you achieve this in this clause:

 /* If the constant is one less than a power of two, this might be
 representable by an extraction even if no shift is present.
 If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
 we are in a COMPARE.  */

You extract the constant via UINTVAL (XEXP (x, 1)), then munge it based 
on nonzero_bits and pass the result to exact_log2?


Though I do like how you've conditionalized on the cost of the and vs 
the cost of hte zero-extend.  So maybe your approach is ultimately 
better.  Still curious your thoughts on doing it by just munging the 
constant you pass off to exact_log2 in that earlier clause.




+  /* If the one operand is a paradoxical subreg of a register or memory and
+the constant (limited to the smaller mode) has only zero bits where
+the sub expression has known zero bits, this can be expressed as
+a zero_extend.  */
+  else if (GET_CODE (XEXP (x, 0)) == SUBREG)
+   {
+ rtx sub;
+
+ sub = XEXP (XEXP (x, 0), 0);
+ machine_mode sub_mode = GET_MODE (sub);
+ if ((REG_P (sub) || MEM_P (sub))
+ && GET_MODE_PRECISION (sub_mode) < mode_width
+ &&

Re: [PING 5, PATCH] PR/68089: C++-11: Ingore "alignas(0)".

2016-04-27 Thread Jeff Law


On 04/05/2016 03:43 AM, Dominik Vogt wrote:

On Mon, Jan 04, 2016 at 12:33:21PM +0100, Dominik Vogt wrote:

On Fri, Jan 01, 2016 at 05:53:08PM -0700, Martin Sebor wrote:

On 12/31/2015 04:50 AM, Dominik Vogt wrote:

The attached patch fixes C++-11 handling of "alignas(0)" which
should be ignored but currently generates an error message.  A
test case is included; the patch has been tested on S390x.  Since
it's a language issue it should be independent of the backend
used.


The patch doesn't handle value-dependent expressions(*).



It
seems that the problem is in handle_aligned_attribute() calling
check_user_alignment() with the second argument (ALLOW_ZERO)
set to false.  Calling it with true fixes the problem and handles
value-dependent expressions (I haven't done any more testing beyond
that).


Like the attached patch?  (Passes the testsuite on s390x.)

But wouldn't an "aligned" attribute be added, allowing the backend
to possibly generate an error or a warning?


Also, in the test, I noticed the definition of the first struct
is missing the terminating semicolon.


Yeah.



gcc/c-family/ChangeLog

PR/69089
* c-common.c (handle_aligned_attribute): Allow 0 as an argument to the
"aligned" attribute.

gcc/testsuite/ChangeLog

PR/69089
* g++.dg/cpp0x/alignas5.C: New test.

OK for the trunk.
jeff

[patch] cleanup *finish_omp_clauses

2016-04-27 Thread Cesar Philippidis

This patch replaces all of the bool argument to c_finish_omp_clauses and
finish_omp_clauses in the c and c++ front ends, respectively. Right now
there are three bool arguments, one for is_omp/allow_fields,
declare_simd and is_cilk, the latter two have default values set.
OpenACC will require some special handling in *finish_omp_clauses in the
near future, too, so rather than add an is_oacc argument, I introduced
an enum c_omp_region_type, similar to the one in gimplify.c.

Is this patch ok for trunk? I'll make use of C_ORT_ACC shortly in a
follow up patch.

Cesar
2016-04-27  Cesar Philippidis  

	gcc/c-family/
	* c-common.h (enum c_omp_region_type): Define.
**
	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_all_clauses): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-typeck.c (c_finish_omp_clauses): Replace bool arguments
	is_omp, declare_simd, and is_cilk with enum c_omp_region_type.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target):
	(cp_parser_cilk_simd_all_clauses): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Likewise.
	(tsubst_omp_clauses): Likewise.
	(tsubst_omp_for_iterator): Likewise.
	* semantics.c (finish_omp_clauses): Replace bool arguments
	allow_fields, declare_simd, and is_cilk with enum
	c_omp_region_type.
	(finish_omp_for): Update call to finish_omp_clauses.


diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b631e7d..303269f 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1261,6 +1261,17 @@ enum c_omp_clause_split
   C_OMP_CLAUSE_SPLIT_TASKLOOP = C_OMP_CLAUSE_SPLIT_FOR
 };
 
+enum c_omp_region_type
+{
+  C_ORT_NONE		= 0,
+  C_ORT_OMP		= 1 << 0,
+  C_ORT_SIMD		= 1 << 1,
+  C_ORT_CILK		= 1 << 2,
+  C_ORT_ACC		= 1 << 3,
+  C_ORT_OMP_SIMD	= C_ORT_OMP | C_ORT_SIMD,
+  C_ORT_OMP_CILK	= C_ORT_OMP | C_ORT_CILK
+};
+
 extern tree c_finish_omp_master (location_t, tree);
 extern tree c_finish_omp_taskgroup (location_t, tree);
 extern tree c_finish_omp_critical (location_t, tree, tree, tree);
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 36c44ab..aa8ef3e 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13183,7 +13183,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
   c_parser_skip_to_pragma_eol (parser);
 
   if (finish_p)
-return c_finish_omp_clauses (clauses, false);
+return c_finish_omp_clauses (clauses, C_ORT_ACC);
 
   return clauses;
 }
@@ -13468,8 +13468,8 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
   if (finish_p)
 {
   if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_UNIFORM)) != 0)
-	return c_finish_omp_clauses (clauses, true, true);
-  return c_finish_omp_clauses (clauses, true);
+	return c_finish_omp_clauses (clauses, C_ORT_OMP_SIMD);
+  return c_finish_omp_clauses (clauses, C_ORT_OMP);
 }
 
   return clauses;
@@ -13503,7 +13503,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser)
   tree stmt, clauses;
 
   clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE__CACHE_, NULL);
-  clauses = c_finish_omp_clauses (clauses, false);
+  clauses = c_finish_omp_clauses (clauses, C_ORT_ACC);
 
   c_parser_skip_to_pragma_eol (parser);
 
@@ -13837,9 +13837,9 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
 {
   clauses = c_oacc_split_loop_clauses (clauses, cclauses);
   if (*cclauses)
-	*cclauses = c_finish_omp_clauses (*cclauses, false);
+	*cclauses = c_finish_omp_clauses (*cclauses, C_ORT_ACC);
   if (clauses)
-	clauses = c_finish_omp_clauses (clauses, false);
+	clauses = c_finish_omp_clauses (clauses, C_ORT_ACC);
 }
 
   tree block = c_begin_compound_stmt (true);
@@ -15015,7 +15015,7 @@ omp_split_clauses (location_t loc, enum tree_code code,
   c_omp_split_clauses (loc, code, mask, clauses, cclauses);
   for (i = 0; i < C_OMP_CLAUSE_SPLIT_COUNT; i++)
 if (cclauses[i])
-  cclauses[i] = c_finish_omp_clauses (cclauses[i], true);
+  cclauses[i] = c_finish_omp_clauses (cclauses[i], C_ORT_OMP);
 }
 
 /* OpenMP 4.0:
@@ -16546,7 +16546,7 @@ c_parser_omp_declare_target (c_parser *parser)
 {
   clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_TO_DECLARE,
 	  clauses);
-  clauses = c_finish_omp_clauses (clauses, true);
+  clauses = c_finish_omp_clauses (clauses, C_ORT_OMP);
   c_parser_skip_to_pragma_eol

[SH][committed] Remove SH5 support in compiler

2016-04-27 Thread Oleg Endo

Hi,

The removal of SH5 support from GCC has been announced here
https://gcc.gnu.org/ml/gcc/2015-08/msg00101.html

The attached patch removes support for SH5 in the compiler back end. 
 There are still some leftovers and new simplification opportunities. 
 These will be addressed in later follow up patches.

Tested on sh-elf with

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,
-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r235544.

Cheers,
Oleg

gcc/ChangeLog:
* common/config/sh/sh-common.c: Remove SH5 support.
* config/sh/constraints.md: Likewise.
* config/sh/config/sh/elf.h: Likewise.
* config/sh/linux.h: Likewise.
* config/sh/netbsd-elf.h: Likewise.
* config/sh/predicates.md: Likewise.
* config/sh/sh-c.c: Likewise.
* config/sh/sh-protos.h: Likewise.
* config/sh/sh.c: Likewise.
* config/sh/sh.h: Likewise.
* config/sh/sh.md: Likewise.
* config/sh/sh.opt: Likewise.
* config/sh/sync.md: Likewise.
* config/sh/sh64.h: Delete.
* config/sh/shmedia.h: Likewise.
* config/sh/shmedia.md: Likewise.
* config/sh/sshmedia.h: Likewise.
* config/sh/t-netbsd-sh5-64: Likewise.
* config/sh/t-sh64: Likewise.
* config/sh/ushmedia.h: Likewise.

remove_sh64_sh5_gcc_2.patch.tar.gz
Description: application/compressed-tar

Contents of PO file 'cpplib-6.1.0.vi.po'

2016-04-27 Thread Translation Project Robot



cpplib-6.1.0.vi.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

New Vietnamese PO file for 'cpplib' (version 6.1.0)

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Vietnamese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/vi.po

(This file, 'cpplib-6.1.0.vi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

New Chinese (simplified) PO file for 'gcc' (version 6.1.0)

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Chinese (simplified) team of translators.  The file is available at:

http://translationproject.org/latest/gcc/zh_CN.po

(This file, 'gcc-6.1.0.zh_CN.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH, i386]: Fix ix86_spill_class condition

2016-04-27 Thread Uros Bizjak

On Wed, Apr 27, 2016 at 8:05 PM, Uros Bizjak  wrote:
> Hello!
>
> Based on recent discussion, the attached patch fixes ix86_spill_class
> condition. The spills to SSE registers are now enabled for real on
> SSE2 target, where inter-unit moves to/from vector registers are
> enabled.
>
> Since this is new functionality, the patch can cause some minor
> runtime regressions (or unwanted regmove chains), so IMO the beginning
> of stage1 is appropriate timing for these kind of changes.
>
> TARGET_GENERAL_REGS_SSE_SPILL flag is enabled by default on all Intel
> Core processors, so the change will be picked by SPEC testers and any
> problems will soon be detected.
>
> 2016-04-27  Uros Bizjak  
>
> * config/i386/i386.c (ix86_spill_class): Enable for TARGET_SSE2 when
> inter-unit moves to/from vector registers are enabled.  Do not disable
> for TARGET_MMX.
>
> Patch was bootstrapped and regression tested on x86_64-linux-gnu
> {,-m32}, configured with --with-arch=corei7.
>
> Committed to mainline SVN.

And, yes - the patch did trigger a bootstrap failure for march=corei7
in 32bit libjava multilib.

Now we have much more moves between SSE and general registers, so
there are some peephole2s that are not prepared for the fact that
SImode and DImode values can also live in SSE registers. One example:

(define_peephole2
  [(set (match_operand:SI 0 "memory_operand")
(match_operand:SI 1 "register_operand"))
   (set (match_operand:SI 2 "register_operand") (match_dup 1))
   (parallel [(set (match_dup 2)
   (ashiftrt:SI (match_dup 2) (const_int 31)))
   (clobber (reg:CC FLAGS_REG))])
   (set (match_operand:SI 3 "memory_operand") (match_dup 2))]
  "REGNO (operands[1]) != REGNO (operands[2])
   && peep2_reg_dead_p (2, operands[1])
   && peep2_reg_dead_p (4, operands[2])
   && !reg_mentioned_p (operands[2], operands[3])"
  [(set (match_dup 0) (match_dup 1))
   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
  (clobber (reg:CC FLAGS_REG))])
   (set (match_dup 3) (match_dup 1))])

It isn't hard to see the problem when operand 2 is a SSE register, since we get:

(insn 284 283 285 2 (parallel [
(set (reg:SI 21 xmm0 [orig:92 _11 ] [92])
(ashiftrt:SI (reg:SI 21 xmm0 [orig:92 _11 ] [92])
(const_int 31 [0x1f])))
(clobber (reg:CC 17 flags))
]) 
/home/uros/gcc-svn/trunk/libjava/classpath/java/util/zip/ZipFile.java:755
565 {ashrsi3_cvt}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

Fortunately, we get an ICE, so these peephole2s problems will be easy
to catch. The fix is simply to change "register_operand" predicate to
"general_reg_operand" predicate that allows - as we already figured
out - only general registers.

2016-04-28  Uros Bizjak  

* config/i386/i386.md (sign_extend to memory peephole2s): Use
general_reg_operand instead of register_operand predicate.

Bootstrapped on x86_64-linux-gnu (with 32-bit multilib), regression
test in progress.

Committed to mainline to restore bootstrap on corei7 autotesters.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 235535)
+++ config/i386/i386.md (working copy)
@@ -3992,8 +3992,8 @@
 ;; being split with the above splitter.
 (define_peephole2
   [(set (match_operand:SI 0 "memory_operand")
-   (match_operand:SI 1 "register_operand"))
-   (set (match_operand:SI 2 "register_operand") (match_dup 1))
+   (match_operand:SI 1 "general_reg_operand"))
+   (set (match_operand:SI 2 "general_reg_operand") (match_dup 1))
(parallel [(set (match_dup 2)
   (ashiftrt:SI (match_dup 2) (const_int 31)))
   (clobber (reg:CC FLAGS_REG))])
@@ -4009,8 +4009,8 @@
 
 (define_peephole2
   [(set (match_operand:SI 0 "memory_operand")
-   (match_operand:SI 1 "register_operand"))
-   (parallel [(set (match_operand:SI 2 "register_operand")
+   (match_operand:SI 1 "general_reg_operand"))
+   (parallel [(set (match_operand:SI 2 "general_reg_operand")
   (ashiftrt:SI (match_dup 1) (const_int 31)))
   (clobber (reg:CC FLAGS_REG))])
(set (match_operand:SI 3 "memory_operand") (match_dup 2))]

Re: match.pd: unsigned A - B > A --> A < B

2016-04-27 Thread Marc Glisse


On Wed, 27 Apr 2016, Richard Biener wrote:


Please use types_match_p () instead


Ah, thanks, I couldn't remember the name and spent a bit of time looking 
for it with a name like same_type, equal_types, etc, and eventually 
assumed the patch adding it had never been committed when I saw we 
still had several


  (if ((GIMPLE && useless_type_conversion_p (type, TREE_TYPE (@0)))
   || (GENERIC && type == TREE_TYPE (@0)))



I'm fine if you want to disable all this on GENERIC


The main goal of disabling it was to avoid the ugly test, but with 
types_match I left it enabled for both.


--
Marc Glisse

Re: [PATCH] Turn some compile-time tests into run-time tests

2016-04-27 Thread Jeff Law


On 03/10/2016 04:38 PM, Patrick Palka wrote:

I ran the command

  git grep -l "dg-do compile" | xargs grep -l __builtin_abort | xargs grep -lw 
main

to find tests marked as compile-time tests that likely ought to instead
be marked as run-time tests, by the rationale that they use
__builtin_abort and they also define main().  (I also then confirmed that they
compile, link and run cleanly on my machine.)

After this patch, the remaining test files reported by the above command
are:

  These do not define all the functions they use:
gcc/testsuite/g++.dg/ipa/devirt-41.C
gcc/testsuite/g++.dg/ipa/devirt-44.C
gcc/testsuite/g++.dg/ipa/devirt-45.C
gcc/testsuite/gcc.target/i386/pr55672.c

  These are non-x86 tests so I can't confirm that they run cleanly:
gcc/testsuite/gcc.target/arm/pr58041.c
gcc/testsuite/gcc.target/powerpc/pr35907.c
gcc/testsuite/gcc.target/s390/dwarfregtable-1.c
gcc/testsuite/gcc.target/s390/dwarfregtable-2.c
gcc/testsuite/gcc.target/s390/dwarfregtable-3.c

  These use dg-error:
libstdc++-v3/testsuite/20_util/forward/c_neg.cc
libstdc++-v3/testsuite/20_util/forward/f_neg.cc

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
commit?  Does anyone have another heuristic one can use to help find
these kinds of typos?

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-aggr2.C: Make it a run-time test.
* g++.dg/cpp0x/nullptr32.C: Likewise.
* g++.dg/cpp1y/digit-sep-cxx11-neg.C: Likewise.
* g++.dg/cpp1y/digit-sep.C: Likewise.
* g++.dg/ext/flexary13.C: Likewise.
* gcc.dg/alias-14.c: Likewise.
* gcc.dg/ipa/PR65282.c: Likewise.
* gcc.dg/pr69644.c: Likewise.
* gcc.dg/tree-ssa/pr38533.c: Likewise.
* gcc.dg/tree-ssa/pr61385.c: Likewise.
My worry with the 38533 test is that while the ASM defines "f" from the 
standpoint of dataflow, it does not actually emit any code to ensure "f" 
is actually defined.  This could lead to spurious aborts due to use of 
an uninitialized value at runtime.  Similarly for alias-14.c


I'd be worried that we don't necessarily have sync_bool_compare_and_swap 
on all targets for 69644.


flexary13.C probably won't link on a cross target unless the cross 
libraries are available.  But that's probably OK.


The rest seem OK to me.  Note that I'm not convinced all these tests 
were designed to be execution tests, even though they use 
__builtin_abort and friends.  Though it's a good marker of something 
that can/should be looked at.



jeff

[PATCH] Improve AVX512F sse4_1_round* patterns

2016-04-27 Thread Jakub Jelinek

Hi!

While AVX512F doesn't contain EVEX encoded vround{ss,sd,ps,pd} instructions,
it contains vrndscale* which performs the same thing if bits [4:7] of the
immediate are zero.

For _mm*_round_{ps,pd} we actually already emit vrndscale* for -mavx512f
instead of vround* unconditionally (because
_rndscale
instruction has the same RTL as _round
and the former, enabled for TARGET_AVX512F, comes first), for the scalar
cases (thus __builtin_round* or _mm*_round_s{s,d}) the patterns we have
don't allow extended registers and thus we end up with unnecessary moves
if the inputs and/or outputs are or could be most effectively allocated
in the xmm16+ registers.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-04-27  Jakub Jelinek  

* config/i386/i386.md (sse4_1_round2): Add avx512f alternative.
* config/i386/sse.md (sse4_1_round): Likewise.

* gcc.target/i386/avx-vround-1.c: New test.
* gcc.target/i386/avx-vround-2.c: New test.
* gcc.target/i386/avx512vl-vround-1.c: New test.
* gcc.target/i386/avx512vl-vround-2.c: New test.

--- gcc/config/i386/i386.md.jj  2016-04-27 14:34:43.897064531 +0200
+++ gcc/config/i386/i386.md 2016-04-27 14:34:52.402950392 +0200
@@ -15510,15 +15510,19 @@ (define_expand "significand2"
 
 
 (define_insn "sse4_1_round2"
-  [(set (match_operand:MODEF 0 "register_operand" "=x")
-   (unspec:MODEF [(match_operand:MODEF 1 "register_operand" "x")
-  (match_operand:SI 2 "const_0_to_15_operand" "n")]
+  [(set (match_operand:MODEF 0 "register_operand" "=x,v")
+   (unspec:MODEF [(match_operand:MODEF 1 "register_operand" "x,v")
+  (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
  UNSPEC_ROUND))]
   "TARGET_ROUND"
-  "%vround\t{%2, %1, %d0|%d0, %1, %2}"
+  "@
+   %vround\t{%2, %1, %d0|%d0, %1, %2}
+   vrndscale\t{%2, %1, %d0|%d0, %1, %2}"
   [(set_attr "type" "ssecvt")
-   (set_attr "prefix_extra" "1")
-   (set_attr "prefix" "maybe_vex")
+   (set_attr "prefix_extra" "1,*")
+   (set_attr "length_immediate" "*,1")
+   (set_attr "prefix" "maybe_vex,evex")
+   (set_attr "isa" "noavx512f,avx512f")
(set_attr "mode" "")])
 
 (define_insn "rintxf2"
--- gcc/config/i386/sse.md.jj   2016-04-27 14:34:43.903064451 +0200
+++ gcc/config/i386/sse.md  2016-04-27 14:34:52.407950325 +0200
@@ -14867,25 +14867,26 @@ (define_expand "_round"
-  [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x")
+  [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
(vec_merge:VF_128
  (unspec:VF_128
-   [(match_operand:VF_128 2 "register_operand" "Yr,*x,x")
-(match_operand:SI 3 "const_0_to_15_operand" "n,n,n")]
+   [(match_operand:VF_128 2 "register_operand" "Yr,*x,x,v")
+(match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
UNSPEC_ROUND)
- (match_operand:VF_128 1 "register_operand" "0,0,x")
+ (match_operand:VF_128 1 "register_operand" "0,0,x,v")
  (const_int 1)))]
   "TARGET_ROUND"
   "@
round\t{%3, %2, %0|%0, %2, %3}
round\t{%3, %2, %0|%0, %2, %3}
-   vround\t{%3, %2, %1, %0|%0, %1, %2, %3}"
-  [(set_attr "isa" "noavx,noavx,avx")
+   vround\t{%3, %2, %1, %0|%0, %1, %2, %3}
+   vrndscale\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "isa" "noavx,noavx,avx,avx512f")
(set_attr "type" "ssecvt")
(set_attr "length_immediate" "1")
-   (set_attr "prefix_data16" "1,1,*")
+   (set_attr "prefix_data16" "1,1,*,*")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "prefix" "orig,orig,vex,evex")
(set_attr "mode" "")])
 
 (define_expand "round2"
--- gcc/testsuite/gcc.target/i386/avx-vround-1.c.jj 2016-04-27 
14:34:12.785482013 +0200
+++ gcc/testsuite/gcc.target/i386/avx-vround-1.c2016-04-27 
11:49:20.282759808 +0200
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx -mno-avx2" } */
+
+#include 
+
+__attribute__((noinline, noclone)) double
+f1 (double x)
+{
+  return __builtin_round (x);
+}
+
+__attribute__((noinline, noclone)) float
+f2 (float x)
+{
+  return __builtin_roundf (x);
+}
+
+__attribute__((noinline, noclone)) __m128d
+f3 (__m128d x, __m128d y)
+{
+  return _mm_round_sd (x, y, _MM_FROUND_NINT);
+}
+
+__attribute__((noinline, noclone)) __m128
+f4 (__m128 x, __m128 y)
+{
+  return _mm_round_ss (x, y, _MM_FROUND_NINT);
+}
+
+__attribute__((noinline, noclone)) __m128d
+f5 (__m128d x)
+{
+  return _mm_round_pd (x, _MM_FROUND_NINT);
+}
+
+__attribute__((noinline, noclone)) __m128
+f6 (__m128 x)
+{
+  return _mm_round_ps (x, _MM_FROUND_NINT);
+}
+
+__attribute__((noinline, noclone)) __m256d
+f7 (__m256d x)
+{
+  return _mm256_round_pd (x, _MM_FROUND_NINT);
+}
+
+__attribute__((noinline, noclone)) __m256
+f8 (__m256 x)
+{
+  return _mm256_round_ps (x, _MM_FROUND_NINT);
+}
+
+/* { dg-final { scan-assembler-times "vroundsd\[^\n\r\]*xmm" 2 } } */
+/* { dg-final {

Re: [PATCH] Turn some compile-time tests into run-time tests

2016-04-27 Thread Jeff Law


On 03/11/2016 09:38 AM, Patrick Palka wrote:

On Thu, Mar 10, 2016 at 6:38 PM, Patrick Palka  wrote:

I ran the command

  git grep -l "dg-do compile" | xargs grep -l __builtin_abort | xargs grep -lw 
main

to find tests marked as compile-time tests that likely ought to instead
be marked as run-time tests, by the rationale that they use
__builtin_abort and they also define main().  (I also then confirmed that they
compile, link and run cleanly on my machine.)

After this patch, the remaining test files reported by the above command
are:

  These do not define all the functions they use:
gcc/testsuite/g++.dg/ipa/devirt-41.C
gcc/testsuite/g++.dg/ipa/devirt-44.C
gcc/testsuite/g++.dg/ipa/devirt-45.C
gcc/testsuite/gcc.target/i386/pr55672.c

  These are non-x86 tests so I can't confirm that they run cleanly:
gcc/testsuite/gcc.target/arm/pr58041.c
gcc/testsuite/gcc.target/powerpc/pr35907.c
gcc/testsuite/gcc.target/s390/dwarfregtable-1.c
gcc/testsuite/gcc.target/s390/dwarfregtable-2.c
gcc/testsuite/gcc.target/s390/dwarfregtable-3.c

  These use dg-error:
libstdc++-v3/testsuite/20_util/forward/c_neg.cc
libstdc++-v3/testsuite/20_util/forward/f_neg.cc

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
commit?  Does anyone have another heuristic one can use to help find
these kinds of typos?

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-aggr2.C: Make it a run-time test.
* g++.dg/cpp0x/nullptr32.C: Likewise.
* g++.dg/cpp1y/digit-sep-cxx11-neg.C: Likewise.
* g++.dg/cpp1y/digit-sep.C: Likewise.
* g++.dg/ext/flexary13.C: Likewise.
* gcc.dg/alias-14.c: Likewise.
* gcc.dg/ipa/PR65282.c: Likewise.
* gcc.dg/pr69644.c: Likewise.
* gcc.dg/tree-ssa/pr38533.c: Likewise.
* gcc.dg/tree-ssa/pr61385.c: Likewise.


Here's another I found:

diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-return1.C
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-return1.C
index 4b353b6..ea7ae6f 100644
--- a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-return1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-return1.C
THe lambda-return1.C change is fine with an appropriate ChangeLog.  I'll 
be looking at the others momentarily.


jeff

[PATCH 3/3][AArch64] Emit division using the Newton series

2016-04-27 Thread Evandro Menezes


   gcc/
* config/aarch64/aarch64-protos.h
(tune_params): Add new member "approx_div_modes".
(aarch64_emit_approx_div): Declare new function.
* config/aarch64/aarch64.c
(generic_tunings): New member "approx_div_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(aarch64_emit_approx_div): Define new function.
* config/aarch64/aarch64.md ("div3"): New expansion.
* config/aarch64/aarch64-simd.md ("div3"): Likewise.
* config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
* doc/invoke.texi (-mlow-precision-div): Describe new option.


--
Evandro Menezes

>From 0bdd18af83a82377dd6b954c4e64904f6022a2b2 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 4 Apr 2016 14:02:24 -0500
Subject: [PATCH 3/3] [AArch64] Emit division using the Newton series

2016-04-04  Evandro Menezes  
Wilco Dijkstra  

gcc/
	* config/aarch64/aarch64-protos.h
	(tune_params): Add new member "approx_div_modes".
	(aarch64_emit_approx_div): Declare new function.
	* config/aarch64/aarch64.c
	(generic_tunings): New member "approx_div_modes".
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(exynosm1_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(aarch64_emit_approx_div): Define new function.
	* config/aarch64/aarch64.md ("div3"): New expansion.
	* config/aarch64/aarch64-simd.md ("div3"): Likewise.
	* config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
	* doc/invoke.texi (-mlow-precision-div): Describe new option.
---
 gcc/config/aarch64/aarch64-protos.h |  2 +
 gcc/config/aarch64/aarch64-simd.md  | 14 +-
 gcc/config/aarch64/aarch64.c| 85 +
 gcc/config/aarch64/aarch64.md   | 19 +++--
 gcc/config/aarch64/aarch64.opt  |  5 +++
 gcc/doc/invoke.texi | 10 +
 6 files changed, 130 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 437f6af..ce7d147 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -244,6 +244,7 @@ struct tune_params
   } autoprefetcher_model;
 
   unsigned int extra_tuning_flags;
+  unsigned int approx_div_modes;
   unsigned int approx_sqrt_modes;
   unsigned int approx_rsqrt_modes;
 };
@@ -398,6 +399,7 @@ void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 void aarch64_save_restore_target_globals (tree);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
+bool aarch64_emit_approx_div (rtx, rtx, rtx);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 47ccb18..7e99e16 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1509,7 +1509,19 @@
   [(set_attr "type" "neon_fp_mul_")]
 )
 
-(define_insn "div3"
+(define_expand "div3"
+ [(set (match_operand:VDQF 0 "register_operand")
+   (div:VDQF (match_operand:VDQF 1 "general_operand")
+		 (match_operand:VDQF 2 "register_operand")))]
+ "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+DONE;
+
+  operands[1] = force_reg (mode, operands[1]);
+})
+
+(define_insn "*div3"
  [(set (match_operand:VDQF 0 "register_operand" "=w")
(div:VDQF (match_operand:VDQF 1 "register_operand" "w")
 		 (match_operand:VDQF 2 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 589871b..d3e73bf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -417,6 +417,7 @@ static const struct tune_params generic_tunings =
   0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  (AARCH64_APPROX_NONE),	/* approx_div_modes.  */
   (AARCH64_APPROX_NONE),	/* approx_sqrt_modes.  */
   (AARCH64_APPROX_NONE)	/* approx_rsqrt_modes.  */
 };
@@ -444,6 +445,7 @@ static const struct tune_params cortexa35_tunings =
   0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  (AARCH64_APPROX_NONE),	/* approx_div_modes.  */
   (AARCH64_APPROX_NONE),	/* approx_sqrt_modes.  */
   (AARCH64_APPROX_NONE)	/* approx_rsqrt_modes.  */
 };
@@ -471,6 +473,7 @@ static const struct tune_params cortexa53_tunings =
   0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */

[PATCH 2/3][AArch64] Emit square root using the Newton series

2016-04-27 Thread Evandro Menezes


   gcc/
* config/aarch64/aarch64-protos.h
(aarch64_emit_approx_rsqrt): Replace with new function
"aarch64_emit_approx_sqrt".
(tune_params): New member "approx_sqrt_modes".
* config/aarch64/aarch64.c
(generic_tunings): New member "approx_rsqrt_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(aarch64_emit_approx_rsqrt): Replace with new function
"aarch64_emit_approx_sqrt".
(aarch64_override_options_after_change_1): Handle new option.
* config/aarch64/aarch64-simd.md
(rsqrt2): Use new function instead.
(sqrt2): New expansion and insn definitions.
* config/aarch64/aarch64.md: Likewise.
* config/aarch64/aarch64.opt
(mlow-precision-sqrt): Add new option description.
* doc/invoke.texi (mlow-precision-sqrt): Likewise.


--
Evandro Menezes

>From 753115a8691afd7aed4a510d9e9cb0a8e859acf4 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 4 Apr 2016 11:23:29 -0500
Subject: [PATCH 2/3] [AArch64] Emit square root using the Newton series

2016-04-04  Evandro Menezes  
Wilco Dijkstra  

gcc/
	* config/aarch64/aarch64-protos.h
	(aarch64_emit_approx_rsqrt): Replace with new function
	"aarch64_emit_approx_sqrt".
	(tune_params): New member "approx_sqrt_modes".
	* config/aarch64/aarch64.c
	(generic_tunings): New member "approx_rsqrt_modes".
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(exynosm1_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(aarch64_emit_approx_rsqrt): Replace with new function
	"aarch64_emit_approx_sqrt".
	(aarch64_override_options_after_change_1): Handle new option.
	* config/aarch64/aarch64-simd.md
	(rsqrt2): Use new function instead.
	(sqrt2): New expansion and insn definitions.
	* config/aarch64/aarch64.md: Likewise.
	* config/aarch64/aarch64.opt
	(mlow-precision-sqrt): Add new option description.
	* doc/invoke.texi (mlow-precision-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +-
 gcc/config/aarch64/aarch64-simd.md  | 13 -
 gcc/config/aarch64/aarch64.c| 99 +++--
 gcc/config/aarch64/aarch64.md   | 11 -
 gcc/config/aarch64/aarch64.opt  |  9 +++-
 gcc/doc/invoke.texi | 10 
 6 files changed, 113 insertions(+), 32 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 50f1d24..437f6af 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -244,6 +244,7 @@ struct tune_params
   } autoprefetcher_model;
 
   unsigned int extra_tuning_flags;
+  unsigned int approx_sqrt_modes;
   unsigned int approx_rsqrt_modes;
 };
 
@@ -396,7 +397,7 @@ void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 void aarch64_save_restore_target_globals (tree);
-void aarch64_emit_approx_rsqrt (rtx, rtx);
+bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..47ccb18 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -405,7 +405,7 @@
 		 UNSPEC_RSQRT))]
   "TARGET_SIMD"
 {
-  aarch64_emit_approx_rsqrt (operands[0], operands[1]);
+  aarch64_emit_approx_sqrt (operands[0], operands[1], true);
   DONE;
 })
 
@@ -4307,7 +4307,16 @@
 
 ;; sqrt
 
-(define_insn "sqrt2"
+(define_expand "sqrt2"
+  [(set (match_operand:VDQF 0 "register_operand")
+	(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+  "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_sqrt (operands[0], operands[1], false))
+DONE;
+})
+
+(define_insn "*sqrt2"
   [(set (match_operand:VDQF 0 "register_operand" "=w")
 (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
   "TARGET_SIMD"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 68381bf..589871b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -38,6 +38,7 @@
 #include "recog.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
+#include "insn-flags.h"
 #include "insn-modes.h"
 #include "alias.h"
 #include "fold-const.h"
@@ -416,6 +417,7 @@ static const struct tune_params generic_tunings =
   0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
+  (AARCH64_APPROX_NONE),	/* approx_sqrt_modes.  */
   (AARCH64_APPROX_NONE)	/*

[PATCH 1/3][AArch64] Add more choices for the reciprocal square root approximation

2016-04-27 Thread Evandro Menezes


   gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_APPROX_MODE): New macro.
(AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}):
   Likewise.
(tune_params): New member "approx_rsqrt_modes".
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
* config/aarch64/aarch64.c
(generic_tunings): New member "approx_rsqrt_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(use_rsqrt_p): New argument for the mode and use new member from
"tune_params".
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.
* doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description.


--
Evandro Menezes

>From 2cb6c0f35bbdc3b4cc6f88c61a50f3fbb168ec99 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 3 Mar 2016 18:13:46 -0600
Subject: [PATCH 1/3] [AArch64] Add more choices for the reciprocal square root
 approximation

Allow a target to prefer such operation depending on the operation mode.

2016-03-03  Evandro Menezes  

gcc/
	* config/aarch64/aarch64-protos.h
	(AARCH64_APPROX_MODE): New macro.
	(AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}): Likewise.
	(tune_params): New member "approx_rsqrt_modes".
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
	* config/aarch64/aarch64.c
	(generic_tunings): New member "approx_rsqrt_modes".
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(exynosm1_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(use_rsqrt_p): New argument for the mode and use new member from
	"tune_params".
	(aarch64_builtin_reciprocal): Devise mode from builtin.
	(aarch64_optab_supported_p): New argument for the mode.
	* doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description.
---
 gcc/config/aarch64/aarch64-protos.h | 27 
 gcc/config/aarch64/aarch64-tuning-flags.def |  2 --
 gcc/config/aarch64/aarch64.c| 39 ++---
 gcc/doc/invoke.texi |  2 +-
 4 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index f22a31c..50f1d24 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,32 @@ struct cpu_branch_cost
   const int unpredictable;  /* Unpredictable branch or optimizing for speed.  */
 };
 
+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \
+	  + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
+ : (0))
+#define AARCH64_APPROX_NONE (0)
+#define AARCH64_APPROX_SP (AARCH64_APPROX_MODE (SFmode) \
+			   | AARCH64_APPROX_MODE (V2SFmode) \
+			   | AARCH64_APPROX_MODE (V4SFmode))
+#define AARCH64_APPROX_DP (AARCH64_APPROX_MODE (DFmode) \
+			   | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_DFORM (AARCH64_APPROX_MODE (SFmode) \
+			  | AARCH64_APPROX_MODE (DFmode) \
+			  | AARCH64_APPROX_MODE (V2SFmode))
+#define AARCH64_APPROX_QFORM (AARCH64_APPROX_MODE (V4SFmode) \
+			  | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_SCALAR (AARCH64_APPROX_MODE (SFmode) \
+			   | AARCH64_APPROX_MODE (DFmode))
+#define AARCH64_APPROX_VECTOR (AARCH64_APPROX_MODE (V2SFmode) \
+			   | AARCH64_APPROX_MODE (V4SFmode) \
+			   | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_ALL (-1)
+
 struct tune_params
 {
   const struct cpu_cost_table *insn_extra_cost;
@@ -218,6 +244,7 @@ struct tune_params
   } autoprefetcher_model;
 
   unsigned int extra_tuning_flags;
+  unsigned int approx_rsqrt_modes;
 };
 
 #define AARCH64_FUSION_PAIR(x, name) \
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..048c2a3 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,3 @@
  AARCH64_TUNE_ to give an enum name. */
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
-
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9995494..68381bf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -38,6 +38,7 @@

[PATCH/AARCH64/ILP32] Fix unwinding (libgcc)

2016-04-27 Thread Andrew Pinski

Hi,
  AARCH64 ILP32 is like x32 where UNITS_PER_WORD > sizeof(void*) so we
need to define REG_VALUE_IN_UNWIND_CONTEXT for ILP32.  This fixes
unwinding through the signal handler.  This is independent of the ABI
which Linux kernel uses to store the registers.

OK?  Bootstrapped and tested on aarch64 with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/value-unwind.h: New file.
* config.host (aarch64*-*-*): Add aarch64/value-unwind.h to tm_file.
Index: libgcc/config.host
===
--- libgcc/config.host  (revision 235529)
+++ libgcc/config.host  (working copy)
@@ -1385,4 +1385,8 @@ i[34567]86-*-linux* | x86_64-*-linux*)
fi
tm_file="${tm_file} i386/value-unwind.h"
;;
+aarch64*-*-*)
+   # ILP32 needs an extra header for unwinding
+   tm_file="${tm_file} aarch64/value-unwind.h"
+   ;;
 esac
Index: libgcc/ChangeLog
===
--- libgcc/ChangeLog(revision 235529)
+++ libgcc/ChangeLog(working copy)
@@ -1,3 +1,9 @@
+2016-04-27  Andrew Pinski  
+
+   * config/aarch64/value-unwind.h: New file.
+   * config.host (aarch64*-*-*): Add aarch64/value-unwind.h
+   to tm_file.
+
 2016-04-25  Nick Clifton  
 
* config/msp430/cmpd.c (__mspabi_cmpf): Add prototype.
Index: libgcc/config/aarch64/value-unwind.h
===
--- libgcc/config/aarch64/value-unwind.h(revision 0)
+++ libgcc/config/aarch64/value-unwind.h(revision 0)
@@ -0,0 +1,25 @@
+/* Store register values as _Unwind_Word type in DWARF2 EH unwind context.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Define this macro if the target stores register values as _Unwind_Word
+   type in unwind context.  Only enable it for ilp32.  */
+#if defined __aarch64__ && !defined __LP64__
+# define REG_VALUE_IN_UNWIND_CONTEXT
+#endif

[PATCH 0/3][AArch64] Add infrastructure for more approximate FP operations

2016-04-27 Thread Evandro Menezes

This patch suite increases the granularity of target selections of 
approximate FP operations and adds the options of emitting approximate 
square root and division.


The full suite is contained in the emails tagged:

1.

   [PATCH 1/3][AArch64] Add more choices for the reciprocal square root 
approximation

2.

   [PATCH 2/3][AArch64] Emit square root using the Newton series

3.

   [PATCH 3/3][AArch64] Emit division using the Newton series


Thank you,

--
Evandro Menezes

New French PO file for 'cpplib' (version 6.1.0)

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the French team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/fr.po

(This file, 'cpplib-6.1.0.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Contents of PO file 'cpplib-6.1.0.fr.po'

2016-04-27 Thread Translation Project Robot



cpplib-6.1.0.fr.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

Re: Fix PR44281 (bad RA with global regs)

2016-04-27 Thread Jeff Law


On 04/27/2016 03:02 PM, Bernd Schmidt wrote:

On 04/27/2016 10:59 PM, Jeff Law wrote:

PR rtl-optimization/44281
* hard-reg-set.h (struct target_hard_regs): New field
x_fixed_nonglobal_reg_set.
(fixed_nonglobal_reg_set): New macro.
* reginfo.c (init_reg_sets_1): Initialize it.
* ira.c (setup_alloc_regs): Use fixed_nonglobal_reg_set instead
of fixed_reg_set.

PR rtl-optimization/44281
* gcc.target/i386/pr44281.c: New test.

So  my recollection of where we left things was that we needed to
dataflow information fixed for implicit uses/sets in asms.  With that
infrastructure in place I think we'll be ready to resolve this old
issue.  Right?


Yeah. I had prepared the following, and I was planning on getting around
to retesting it with current trunk on top of the other patch, but I
haven't quite got there yet.
Looks like it'll do the trick to me.  Consider the combination of the 
two patches pre-approved once you've run it through the usual testing.


Thanks,
Jeff

Re: [C PATCH] Don't print -Waddress comparison warnings for macros (PR c/48778)

2016-04-27 Thread Jeff Law


On 03/01/2016 07:10 AM, Marek Polacek wrote:

This PR from 2011 reports that -Waddress prints unhelpful warning when the
comparison comes from a macro.  Since I've added from_macro_expansion_at,
this is easy to circumvent.  I'm not so sure we actually want to disable
the warning in the case of a macro, but probably yes.

Bootstrapped/regtested on x86_64-linux, ok for trunk or should I defer to
GCC 7?

2016-03-01  Marek Polacek  

PR c/48778
* c-typeck.c (build_binary_op): Don't issue -Waddress warnings
for macro expansions.

* gcc.dg/Waddress-2.c: New test.
I've got mixed feelings about this patch.  Though we have traditionally 
desired to suppress some warnings that occur due to macro expansions, 
based on that, I'll ack for the trunk.


Thanks,
jeff

Re: Fix PR44281 (bad RA with global regs)

2016-04-27 Thread Bernd Schmidt


On 04/27/2016 10:59 PM, Jeff Law wrote:

PR rtl-optimization/44281
* hard-reg-set.h (struct target_hard_regs): New field
x_fixed_nonglobal_reg_set.
(fixed_nonglobal_reg_set): New macro.
* reginfo.c (init_reg_sets_1): Initialize it.
* ira.c (setup_alloc_regs): Use fixed_nonglobal_reg_set instead
of fixed_reg_set.

PR rtl-optimization/44281
* gcc.target/i386/pr44281.c: New test.

So  my recollection of where we left things was that we needed to
dataflow information fixed for implicit uses/sets in asms.  With that
infrastructure in place I think we'll be ready to resolve this old
issue.  Right?


Yeah. I had prepared the following, and I was planning on getting around 
to retesting it with current trunk on top of the other patch, but I 
haven't quite got there yet.



Bernd

Index: gcc/df-scan.c
===
--- gcc/df-scan.c   (revision 234341)
+++ gcc/df-scan.c   (working copy)
@@ -3223,11 +3223,22 @@ df_insn_refs_collect (struct df_collecti
 }
 }

+  int flags = (is_cond_exec) ? DF_REF_CONDITIONAL : 0;
   /* For CALL_INSNs, first record DF_REF_BASE register defs, as well as
  uses from CALL_INSN_FUNCTION_USAGE. */
   if (CALL_P (insn_info->insn))
-df_get_call_refs (collection_rec, bb, insn_info,
- (is_cond_exec) ? DF_REF_CONDITIONAL : 0);
+df_get_call_refs (collection_rec, bb, insn_info, flags);
+
+  if (asm_noperands (PATTERN (insn_info->insn)) >= 0)
+for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+  if (global_regs[i])
+   {
+ /* As with calls, asm statements reference all global regs. */
+ df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+NULL, bb, insn_info, DF_REF_REG_USE, flags);
+ df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+NULL, bb, insn_info, DF_REF_REG_DEF, flags);
+   }

   /* Record other defs.  These should be mostly for DF_REF_REGULAR, so
  that a qsort on the defs is unnecessary in most cases.  */

Re: Fix PR44281 (bad RA with global regs)

2016-04-27 Thread Jeff Law


On 02/19/2016 03:03 PM, Bernd Schmidt wrote:

In this PR, we generate unnecessarily bad code for code that declares a
global register var. Since global regs get added to fixed_regs, IRA
never considers them as candidates. However, we do seem to have proper
data flow information for them. In the testcase, the global reg dies,
some operations are done on temporary results, and the final result
stored back in the global reg. We can achieve the desired code
generation by reusing the global reg for those temporaries.

Bootstrapped and tested on x86_64-linux. Ok? An argument could be made
not to use this for gcc-6 since global register vars are both not very
important and not very well represented in the testsuite.


Bernd

global-regalloc.diff


PR rtl-optimization/44281
* hard-reg-set.h (struct target_hard_regs): New field
x_fixed_nonglobal_reg_set.
(fixed_nonglobal_reg_set): New macro.
* reginfo.c (init_reg_sets_1): Initialize it.
* ira.c (setup_alloc_regs): Use fixed_nonglobal_reg_set instead
of fixed_reg_set.

PR rtl-optimization/44281
* gcc.target/i386/pr44281.c: New test.
So  my recollection of where we left things was that we needed to 
dataflow information fixed for implicit uses/sets in asms.  With that 
infrastructure in place I think we'll be ready to resolve this old 
issue.  Right?


jeff

Re: [PATCH v2] gcov: Runtime configurable destination output

2016-04-27 Thread Aaron Conole

Apologies for the top post. Pinging on this again. It still applies
cleanly, so no need to resubmit, I think. Is there anything else missing
or required before this can go in?

Thanks,
-Aaron

Aaron Conole  writes:

> The previous gcov behavior was to always output errors on the stderr channel.
> This is fine for most uses, but some programs will require stderr to be
> untouched by libgcov for certain tests. This change allows configuring
> the gcov output via an environment variable which will be used to open
> the appropriate file.
> ---
> v2:
> * Retitled subject
> * Cleaned up whitespace in libgcov-driver-system.c diff
> * Lazy error file opening
> * non-static error file
> * No warnings during compilation
>
>  libgcc/libgcov-driver-system.c | 35 ++-
>  libgcc/libgcov-driver.c|  6 ++
>  2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/libgcc/libgcov-driver-system.c b/libgcc/libgcov-driver-system.c
> index 4e3b244..0eb9755 100644
> --- a/libgcc/libgcov-driver-system.c
> +++ b/libgcc/libgcov-driver-system.c
> @@ -23,6 +23,24 @@ a copy of the GCC Runtime Library Exception along with 
> this program;
>  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  .  */
>  
> +FILE *__gcov_error_file = NULL;
> +
> +static FILE *
> +get_gcov_error_file(void)
> +{
> +  char *gcov_error_filename = getenv("GCOV_ERROR_FILE");
> +  FILE *gcov_error_file = NULL;
> +  if (gcov_error_filename)
> +{
> +  FILE *openfile = fopen(gcov_error_filename, "a");
> +  if (openfile)
> +gcov_error_file = openfile;
> +}
> +  if (!gcov_error_file)
> +gcov_error_file = stderr;
> +  return gcov_error_file;
> +}
> +
>  /* A utility function for outputing errors.  */
>  
>  static int __attribute__((format(printf, 1, 2)))
> @@ -30,12 +48,27 @@ gcov_error (const char *fmt, ...)
>  {
>int ret;
>va_list argp;
> +
> +  if (!__gcov_error_file)
> +__gcov_error_file = get_gcov_error_file();
> +
>va_start (argp, fmt);
> -  ret = vfprintf (stderr, fmt, argp);
> +  ret = vfprintf (__gcov_error_file, fmt, argp);
>va_end (argp);
>return ret;
>  }
>  
> +#if !IN_GCOV_TOOL
> +static void
> +gcov_error_exit(void)
> +{
> +  if (__gcov_error_file && __gcov_error_file != stderr)
> +{
> +  fclose(__gcov_error_file);
> +}
> +}
> +#endif
> +
>  /* Make sure path component of the given FILENAME exists, create
> missing directories. FILENAME must be writable.
> Returns zero on success, or -1 if an error occurred.  */
> diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
> index 9c4eeca..83d84c5c 100644
> --- a/libgcc/libgcov-driver.c
> +++ b/libgcc/libgcov-driver.c
> @@ -46,6 +46,10 @@ void __gcov_init (struct gcov_info *p __attribute__ 
> ((unused))) {}
>  /* A utility function for outputing errors.  */
>  static int gcov_error (const char *, ...);
>  
> +#if !IN_GCOV_TOOL
> +static void gcov_error_exit(void);
> +#endif
> +
>  #include "gcov-io.c"
>  
>  struct gcov_fn_buffer
> @@ -878,6 +882,8 @@ gcov_exit (void)
>  __gcov_root.prev->next = __gcov_root.next;
>else
>  __gcov_master.root = __gcov_root.next;
> +
> +  gcov_error_exit();
>  }
>  
>  /* Add a new object file onto the bb chain.  Invoked automatically

Re: [PATCH][PR tree-optimization/65917] Record both equivalences from if (x == y) style conditionals

2016-04-27 Thread Jeff Law


On 02/09/2016 12:55 AM, Bernhard Reutner-Fischer wrote:

On February 8, 2016 9:18:03 AM GMT+01:00, Jeff Law  wrote:


This turns out to be far easier than expected.  Given a conditional
like
x == y, we already record the canonicalized x = y equivalence.  If we
just record y = x then this "just works".

The only tricky thing is the following of the SSA_NAME_VALUE chain for
the source argument -- we need to avoid that when recording the y = x
variant (since following x's value chain would lead back to y).  That's

easily resolved with an _raw variant which doesn't follow the source
value chain.

While working through the code, I saw the old comment WRT loop depth
and
PR 61757 in record_equality.  With the code to follow backedges in the
CFG gone from the old threader, that code is no longer needed.  So I
removed it, restoring Richi's cleanup work from early 2015.

Bootstrapped & regression tested on x86-linux.  Installed on the trunk.


+  /* We already recorded that LHS = RHS, with canonicalization,
+value chain following, etc.
+
+We also want to return RHS = LHS, but without any canonicalization

Just curious and for my education, we want to *record*, not to return, don't we?
Correct.  Thanks for pointing it out.  I've just fixed this typo on the 
trunk.


jeff

Re: [PATCH][cilkplus] fix c++ implicit conversions with cilk_spawn (PR/69024, PR/68997)

2016-04-27 Thread Jeff Law


On 01/20/2016 10:57 AM, Ryan Burn wrote:

This patch follows on from
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02142.html

As discussed, it creates a separate function
cilk_cp_detect_spawn_and_unwrap in gcc/cp to handle processing
cilk_spawn expressions for c++ and adds support for implicit
constructor and type conversions.

Bootstrapped and regression tested on x86_64-linux.

gcc/c-family/ChangeLog:
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * cilk.c (cilk_ignorable_spawn_rhs_op): Change to have external linkage.
  * cilk.c (recognize_spawn): Rename to cilk_recognize_spawn. Change to have
  external linkage.
  * cilk.c (cilk_detect_and_unwrap): Rename to recognize_spawn to
  cilk_recognize_spawn.
  * cilk.c (extract_free_variables): Don't extract free variables from
  AGGR_INIT_EXPR slot.

gcc/cp/ChangeLog
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * cp-gimplify.c (cp_gimplify_expr): Call cilk_cp_detect_spawn_and_unwrap
  instead of cilk_detect_spawn_and_unwrap.
  * cp-cilkplus.c (is_conversion_operator_function_decl_p): New.
  * cp-cilkplus.c (find_spawn): New.
  * cp-cilkplus.c (cilk_cp_detect_spawn_and_unwrap): New.

gcc/testsuite/ChangeLog
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * g++.dg/cilk-plus/CK/pr68001.cc: Fix to not depend on broken diagnostic.
  * g++.dg/cilk-plus/CK/pr69024.cc: New test.
  * g++.dg/cilk-plus/CK/pr68997.cc: New test.

The updated patch (as expected) bootstrapped and regression tested.  I 
fixed a few more whitespace/formatting nits, updated the ChangeLogs and 
committed the change.


Jeff
commit adc44ab035896ab23180ed4bd552226610e958ab
Author: Jeff Law 
Date:   Wed Apr 27 14:39:01 2016 -0600

PR c++/69024
PR c++/68997
* cilk.c (cilk_ignorable_spawn_rhs_op): Change to external linkage.
(cilk_recognize_spawn): Renamed from recognize_spawn and change to
external linkage.
(cilk_detect_and_unwrap): Corresponding changes.
(extract_free_variables): Don't extract free variables from
AGGR_INIT_EXPR slot.
* c-common.h (cilk_ignorable_spawn_rhs_op): Prototype.
(cilk_recognize_spawn): Likewise.

PR c++/69024
PR c++/68997
* cp-gimplify.c (cp_gimplify_expr): Call cilk_cp_detect_spawn_and_unwrap
instead of cilk_detect_spawn_and_unwrap.
* cp-cilkplus.c (is_conversion_operator_function_decl_p): New.
(find_spawn): New.
(cilk_cp_detect_spawn_and_unwrap): New.
* lambda.c: Include cp-cilkplus.h.
* parser.c: Include cp-cilkplus.h.
* cp-tree.h (cpp_validate_cilk_plus_loop): Move prototype into...
* cp-cilkpus.h: New file.

PR c++/69024
PR c++/68997
* g++.dg/cilk-plus/CK/pr68001.cc: Fix to not depend on broken
diagnostic.
* g++.dg/cilk-plus/CK/pr69024.cc: New test.
* g++.dg/cilk-plus/CK/pr68997.cc: New test.

diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index 1d87d9d..ac3be53 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,16 @@
+2015-04-27  Ryan Burn  
+
+   PR c++/69024
+   PR c++/68997
+   * cilk.c (cilk_ignorable_spawn_rhs_op): Change to external linkage.
+   (cilk_recognize_spawn): Renamed from recognize_spawn and change to
+   external linkage.
+   (cilk_detect_and_unwrap): Corresponding changes.
+   (extract_free_variables): Don't extract free variables from
+   AGGR_INIT_EXPR slot.
+   * c-common.h (cilk_ignorable_spawn_rhs_op): Prototype.
+   (cilk_recognize_spawn): Likewise.
+
 2016-04-27  Bernd Schmidt  
 
* c.opt (Wmemset-elt-size): New option.
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b631e7d..1309549 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1468,4 +1468,7 @@ extern bool reject_gcc_builtin (const_tree, location_t = 
UNKNOWN_LOCATION);
 extern void warn_duplicated_cond_add_or_warn (location_t, tree, vec **);
 extern bool valid_array_size_p (location_t, tree, tree);
 
+extern bool cilk_ignorable_spawn_rhs_op (tree);
+extern bool cilk_recognize_spawn (tree, tree *);
+
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/cilk.c b/gcc/c-family/cilk.c
index 0b876b9..69a79ba 100644
--- a/gcc/c-family/cilk.c
+++ b/gcc/c-family/cilk.c
@@ -185,7 +185,7 @@ call_graph_add_fn (tree fndecl)
A comparison to constant is simple enough to allow, and
is used to convert to bool.  */
 
-static bool
+bool
 cilk_ignorable_spawn_rhs_op (tree exp)
 {
   enum tree_code code = TREE_CODE (exp);
@@ -223,8 +223,8 @@ unwrap_cilk_spawn_stmt (tree *tp, int *walk_subtrees, void 
*)
 /* Returns true when EXP is a CALL_EXPR with _Cilk_spawn in front.  Unwraps
CILK_SPAWN_STMT wrapper from the CALL_EXPR in *EXP0 statement.  */
 
-static bool
-recognize_spawn (tree exp, tree

Re: [RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-27 Thread H.J. Lu

On Wed, Apr 27, 2016 at 12:58 PM, Uros Bizjak  wrote:
> Hello!
>
> This RFC patch illustrates the idea of using STV pass to load/store
> any TImode constant using SSE insns. The testcase:
>
> --cut here--
> __int128 x;
>
> __int128 test_1 (void)
> {
>   x = (__int128) 0x00112233;
> }
>
> __int128 test_2 (void)
> {
>   x = ((__int128) 0x0011223344556677 << 64);
> }
>
> __int128 test_3 (void)
> {
>   x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677;
> }
> --cut here--
>
> currently compiles (-O2) on x86_64 to:
>
> test_1:
> movq$1122867, x(%rip)
> movq$0, x+8(%rip)
> ret
>
> test_2:
> xorl%eax, %eax
> movabsq $4822678189205111, %rdx
> movq%rax, x(%rip)
> movq%rdx, x+8(%rip)
> ret
>
> test_3:
> movabsq $4822678189205111, %rax
> movabsq $4822678189205111, %rdx
> movq%rax, x(%rip)
> movq%rdx, x+8(%rip)
> ret
>
> However, using the attached patch, we compile all tests to:
>
> test:
> movdqa  .LC0(%rip), %xmm0
> movaps  %xmm0, x(%rip)
> ret
>
> Ilya, HJ - do you think new sequences are better, or - as suggested by
> Jakub - they are beneficial with STV pass, as we are now able to load

I like it.  It is on my todo list :-).

> any immediate value? A variant of this patch can also be used to load
> DImode values to 32bit STV pass.
>

Yes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70763

-- 
H.J.

[RFC patch, i386]: Use STV pass to load/store any TImode constant using SSE insns

2016-04-27 Thread Uros Bizjak

Hello!

This RFC patch illustrates the idea of using STV pass to load/store
any TImode constant using SSE insns. The testcase:

--cut here--
__int128 x;

__int128 test_1 (void)
{
  x = (__int128) 0x00112233;
}

__int128 test_2 (void)
{
  x = ((__int128) 0x0011223344556677 << 64);
}

__int128 test_3 (void)
{
  x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677;
}
--cut here--

currently compiles (-O2) on x86_64 to:

test_1:
movq$1122867, x(%rip)
movq$0, x+8(%rip)
ret

test_2:
xorl%eax, %eax
movabsq $4822678189205111, %rdx
movq%rax, x(%rip)
movq%rdx, x+8(%rip)
ret

test_3:
movabsq $4822678189205111, %rax
movabsq $4822678189205111, %rdx
movq%rax, x(%rip)
movq%rdx, x+8(%rip)
ret

However, using the attached patch, we compile all tests to:

test:
movdqa  .LC0(%rip), %xmm0
movaps  %xmm0, x(%rip)
ret

Ilya, HJ - do you think new sequences are better, or - as suggested by
Jakub - they are beneficial with STV pass, as we are now able to load
any immediate value? A variant of this patch can also be used to load
DImode values to 32bit STV pass.

Uros.
Index: i386.c
===
--- i386.c  (revision 235526)
+++ i386.c  (working copy)
@@ -2854,29 +2854,16 @@ timode_scalar_to_vector_candidate_p (rtx_insn *ins
 
   if (MEM_P (dst))
 {
-  /* Check for store.  Only support store from register or standard
-SSE constants.  Memory must be aligned or unaligned store is
-optimal.  */
-  if (misaligned_operand (dst, TImode)
- && !TARGET_SSE_UNALIGNED_STORE_OPTIMAL)
-   return false;
-
-  switch (GET_CODE (src))
-   {
-   default:
- return false;
-
-   case REG:
- return true;
-
-   case CONST_INT:
- return standard_sse_constant_p (src, TImode);
-   }
+  /* Check for store.  Memory must be aligned
+or unaligned store is optimal.  */
+  return ((REG_P (src) || CONST_SCALAR_INT_P (src))
+ && (!misaligned_operand (dst, TImode)
+ || TARGET_SSE_UNALIGNED_STORE_OPTIMAL));
 }
   else if (MEM_P (src))
 {
-  /* Check for load.  Memory must be aligned or unaligned load is
-optimal.  */
+  /* Check for load.  Memory must be aligned
+or unaligned load is optimal.  */
   return (REG_P (dst)
  && (!misaligned_operand (src, TImode)
  || TARGET_SSE_UNALIGNED_LOAD_OPTIMAL));
@@ -3744,6 +3731,7 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
  PUT_MODE (XEXP (tmp, 0), V1TImode);
   }
   /* FALLTHRU */
+
 case MEM:
   PUT_MODE (dst, V1TImode);
   break;
@@ -3759,28 +3747,26 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
   PUT_MODE (src, V1TImode);
   break;
 
-case CONST_INT:
-  switch (standard_sse_constant_p (src, TImode))
-   {
-   case 1:
- src = CONST0_RTX (GET_MODE (dst));
- break;
-   case 2:
- src = CONSTM1_RTX (GET_MODE (dst));
- break;
-   default:
- gcc_unreachable ();
-   }
-  if (NONDEBUG_INSN_P (insn))
-   {
- rtx tmp = gen_reg_rtx (V1TImode);
- /* Since there are no instructions to store standard SSE
-constant, temporary register usage is required.  */
- emit_conversion_insns (gen_rtx_SET (dst, tmp), insn);
- dst = tmp;
-   }
-  break;
+CASE_CONST_SCALAR_INT:
+  {
+   rtx vec = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec (1, src));
 
+   if (NONDEBUG_INSN_P (insn))
+ {
+   rtx tmp = gen_reg_rtx (V1TImode);
+
+   if (!standard_sse_constant_p (src, TImode))
+ vec = validize_mem (force_const_mem (V1TImode, vec));
+
+   /* We can only store from a SSE register.  */
+   emit_conversion_insns (gen_rtx_SET (dst, tmp), insn);
+   dst = tmp;
+ }
+
+   src = vec;
+   break;
+  }
+  
 default:
   gcc_unreachable ();
 }
@@ -14784,8 +14770,7 @@ ix86_legitimate_constant_p (machine_mode mode, rtx
 #endif
   break;
 
-case CONST_INT:
-case CONST_WIDE_INT:
+CASE_CONST_SCALAR_INT:
   switch (mode)
{
case TImode:
@@ -14823,10 +14808,7 @@ ix86_cannot_force_const_mem (machine_mode mode, rt
   /* We can always put integral constants and vectors in memory.  */
   switch (GET_CODE (x))
 {
-case CONST_INT:
-case CONST_WIDE_INT:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_ANY:
   return false;
 
 default:

Re: [PATCH][AArch64] Replace insn to zero up SIMD registers

2016-04-27 Thread Evandro Menezes


On 04/26/16 08:25, Wilco Dijkstra wrote:

Evandro Menezes wrote:

On 03/10/16 10:37, James Greenhalgh wrote:

Thanks for sticking with it. This is OK for GCC 7 when development
opens.

Remember to mention the most recent changes in your Changelog entry
(Remove "fp" attribute from *movhf_aarch64 and *movtf_aarch64).


OK to commit?

The updated Changelog looks fine - James already OK'd this.


Bootstrapped and checked on aarch64-unknown-linux-gnu and committed as 
r235532.


Thank you,

--
Evandro Menezes

Re: [PATCH] Fix a recent warning in reorg.c

2016-04-27 Thread Mike Stump

> On Apr 26, 2016, at 5:56 PM, Trevor Saunders  wrote:
> So pre ISO C++ gave the second decl the same scope as the first one?
> that's... exciting ;)

So, all the code in the world that is meant to be ported up the an ANSI 
standard for C++ has already been so ported, we could remove all notion that 
the scoping changed from the warnings given new C++ (any ISO C++ and later).  
The old semantics and warning can remain for users that select the old 
languages.  This means that in 5 years, once the new compiler is released 
distributed, and everyone has it in their OS distribution, we could _then_ not 
get the warning anymore and we could then write code that way.  We could also 
maybe turn the warning off, with -ffor-scope, if that works.  The benefit of 
that is we could then do that now and start using that style of code now.  We 
should excise that warning in any event.

Re: [PATCH, rs6000] Add support for vector element-reversal built-ins

2016-04-27 Thread Bill Schmidt

Hi,

While looking into documenting the new built-ins, I realized that these
instructions provide correct support for the vec_xl and vec_xst
built-ins required by the vector API.  I've therefore reworked the patch
to provide those as overloaded built-ins, rather than the separate
per-mode built-ins in the original patch, and then documented those.
(Note that vec_xl and vec_xst were previously incorrectly aliased to
vec_vsx_ld and vec_vsx_st, which does not provide the proper
element-reversal semantics.)

This in turn required support for the RS6000_BTM_P9_VECTOR and
MASK_P9_VECTOR macros.  This currently exists on the ibm/pre-gcc7 branch
but not upstream, so I've copied the necessary pieces of that into this
patch to avoid future conflicts.  Other than changing the test cases,
the rest of the patch is pretty much as before.  As a reminder:

ISA 3.0 adds the lxvh8x, lxvb16x, stxvh8x, and stxvb16x instructions,
which perform vector loads in big-endian order, regardless of the target
endianness.  These join the similar lxvd2x, lxvw4x, stxvd2x, and stxvw4x
instructions introduced in 2.6.  These existing instructions have been
used in several ways, but we don't yet have built-ins to allow them to
be specifically generated for little-endian.  This patch corrects that,
and adds built-ins for the new ISA 3.0 instructions as well.

Note that the behavior of lxvd2x, lxvw4x, lxvh8x, and lxvb16x are
indistinguishable from one another in big-endian mode, and similarly for
the stores.  So we can treat these as simple moves that will generate
any applicable load or store (such as lxvx and stxvx for ISA 3.0).  For
little-endian, however, we require separate patterns for each of these
loads and stores to ensure that we get the correct element-reversal
semantics for each of them, depending on the vector mode.

I've added four new tests to demonstrate correct behavior of the new
built-in functions.  These include variants for big- and little-endian,
and variants for -mcpu=power8 and -mcpu=power9.

Bootstrapped and tested on powerpc64-unknown-linux-gnu and
powerpc64le-unknown-linux-gnu with no regressions.  Is this revised
version ok for trunk?

Thanks!
Bill


[gcc]

2016-04-27  Bill Schmidt  

* config/rs6000/altivec.h: Change definitions of vec_xl and
vec_xst.
* config/rs6000/rs6000-builtin.def (LD_ELEMREV_V2DF): New.
(LD_ELEMREV_V2DI): New.
(LD_ELEMREV_V4SF): New.
(LD_ELEMREV_V4SI): New.
(LD_ELEMREV_V8HI): New.
(LD_ELEMREV_V16QI): New.
(ST_ELEMREV_V2DF): New.
(ST_ELEMREV_V2DI): New.
(ST_ELEMREV_V4SF): New.
(ST_ELEMREV_V4SI): New.
(ST_ELEMREV_V8HI): New.
(ST_ELEMREV_V16QI): New.
(XL): New.
(XST): New.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
descriptions for VSX_BUILTIN_VEC_XL and VSX_BUILTIN_VEC_XST.
* config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Map from
TARGET_P9_VECTOR to RS6000_BTM_P9_VECTOR.
(altivec_expand_builtin): Add handling for
VSX_BUILTIN_ST_ELEMREV_ and VSX_BUILTIN_LD_ELEMREV_.
(rs6000_invalid_builtin): Add error-checking for
RS6000_BTM_P9_VECTOR.
(altivec_init_builtins): Define builtins used to implement vec_xl
and vec_xst.
(rs6000_builtin_mask_names): Define power9-vector.
* config/rs6000/rs6000.h (MASK_P9_VECTOR): Define.
(RS6000_BTM_P9_VECTOR): Define.
(RS6000_BTM_COMMON): Include RS6000_BTM_P9_VECTOR.
* config/rs6000/vsx.md (vsx_ld_elemrev_v2di): New define_insn.
(vsx_ld_elemrev_v2df): Likewise.
(vsx_ld_elemrev_v4sf): Likewise.
(vsx_ld_elemrev_v4si): Likewise.
(vsx_ld_elemrev_v8hi): Likewise.
(vsx_ld_elemrev_v16qi): Likewise.
(vsx_st_elemrev_v2df): Likewise.
(vsx_st_elemrev_v2di): Likewise.
(vsx_st_elemrev_v4sf): Likewise.
(vsx_st_elemrev_v4si): Likewise.
(vsx_st_elemrev_v8hi): Likewise.
(vsx_st_elemrev_v16qi): Likewise.
* doc/extend.texi: Add prototypes for vec_xl and vec_xst.  Correct
grammar.

[gcc/testsuite]

2016-04-27  Bill Schmidt  

* gcc.target/powerpc/vsx-elemrev-1.c: New.
* gcc.target/powerpc/vsx-elemrev-2.c: New.
* gcc.target/powerpc/vsx-elemrev-3.c: New.
* gcc.target/powerpc/vsx-elemrev-4.c: New.


diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index ea6af8d..5fc1cce 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -327,8 +327,8 @@
 #define vec_sqrt __builtin_vec_sqrt
 #define vec_vsx_ld __builtin_vec_vsx_ld
 #define vec_vsx_st __builtin_vec_vsx_st
-#define vec_xl __builtin_vec_vsx_ld
-#define vec_xst __builtin_vec_vsx_st
+#define vec_xl __builtin_vec_xl
+#define vec_xst __builtin_vec_xst
 
 /* Note, xxsldi and xxpermdi were added as __builtin_vsx_ functions
instead of

[Ada] Implement AI05-0151 (incomplete types in profiles) in gigi

2016-04-27 Thread Eric Botcazou

This implements support in gigi for incomplete types in profiles introduced
in Ada 2012 and remove the various kludges present in the FE and gigi that
were added to make it work in the simple cases.

Tested on x86_64-suse-linux, applied on the mainline.


2016-04-27  Eric Botcazou  

* sem_aux.adb (Is_By_Reference_Type): Also return true for a tagged
incomplete type without full view.
* sem_ch6.adb (Exchange_Limited_Views): Change into a function and
return the list of changes.
(Restore_Limited_Views): New procedure to undo the transformation
made by Exchange_Limited_Views.
(Analyze_Subprogram_Body_Helper): Adjust call to
Exchange_Limited_Views
and call Restore_Limited_Views at the end, if need be.
(Possible_Freeze): Do not delay freezing because of incomplete types.
(Process_Formals): Remove kludges for class-wide types.
* types.h (By_Copy_Return): Delete.
* gcc-interface/ada-tree.h (TYPE_MAX_ALIGN): Move around.
(TYPE_DUMMY_IN_PROFILE_P): New macro.
* gcc-interface/gigi.h (update_profiles_with): Declare.
(finish_subprog_decl): Likewise.
(get_minimal_subprog_decl): Delete.
(create_subprog_type): Likewise.
(create_param_decl): Adjust prototype.
(create_subprog_decl): Likewise.
* gcc-interface/decl.c (defer_limited_with): Rename into...
(defer_limited_with_list): ...this.
(gnat_to_gnu_entity): Adjust to above renaming.
(finalize_from_limited_with): Likewise.
(tree_entity_vec_map): New structure.
(gt_pch_nx): New helpers.
(dummy_to_subprog_map): New hash table.
(gnat_to_gnu_param): Set the SLOC here.  Remove MECH parameter and
add FIRST parameter.  Deal with the mechanism here instead of...
Do not make read-only variant of types.  Simplify expressions.
In the by-ref case, test the mechanism before must_pass_by_ref
and also TYPE_IS_BY_REFERENCE_P before building the reference type.
(gnat_to_gnu_subprog_type): New static function extracted from...
Do not special-case the type_annotate_only mode.  Call
gnat_to_gnu_profile_type instead of gnat_to_gnu_type on return type.
Deal with dummy return types.  Likewise for parameter types.  Deal
with by-reference types explicitly and add kludge for null procedures
with untagged incomplete types.  Remove assertion on the types and be
prepared for multiple elaboration of the declarations.  Skip whole
CICO processing if the profile is incomplete.  Handle the completion
of a previously incomplete profile.
(gnat_to_gnu_entity) : Rename local variable.
Adjust couple of calls to create_param_decl.
:
Remove specific deferring code.
: Also deal with E_Subprogram_Type designated type.
Simplify handling of dummy types and remove obsolete comment.
Constify a couple of variables.  Do not set TYPE_UNIVERSAL_ALIASING_P
on dummy types.
: Tweak comment and simplify condition.
: ...here.  Call it and clean up handling.  Remove
obsolete comment and adjust call to gnat_to_gnu_param.  Adjust call
to create_subprog_decl.
: Add a couple of 'const' qualifiers and get rid
of inner break statements.  Tidy up condition guarding direct use of
the full view.
(get_minimal_subprog_decl): Delete.
(finalize_from_limited_with): Call update_profiles_with on dummy
types   with TYPE_DUMMY_IN_PROFILE_P set.
(is_from_limited_with_of_main): Delete.
(associate_subprog_with_dummy_type): New function.
(update_profile): Likewise.
(update_profiles_with): Likewise.
(gnat_to_gnu_profile_type): Likewise.
(init_gnat_decl): Initialize dummy_to_subprog_map.
(destroy_gnat_decl): Destroy dummy_to_subprog_map.
* gcc-interface/misc.c (gnat_get_alias_set): Add guard for accessing
TYPE_UNIVERSAL_ALIASING_P.
(gnat_get_array_descr_info): Minor tweak.
* gcc-interface/trans.c (gigi): Adjust calls to create_subprog_decl.
(build_raise_check): Likewise.
(Compilation_Unit_to_gnu): Likewise.
(Identifier_to_gnu): Accept mismatches coming from a limited context.
(Attribute_to_gnu): Remove kludge for dispatch table entities.
(process_freeze_entity): Do not retrieve old definition if there is
an address clause on the entity.  Call update_profiles_with on dummy
types with TYPE_DUMMY_IN_PROFILE_P set.
* gcc-interface/utils.c (build_dummy_unc_pointer_types): Also set
TYPE_REFERENCE_TO to the fat pointer type.
(create_subprog_type): Delete.
(create_param_decl): Remove READONLY parameter.
(finish_subprog_decl): New

[PATCH, i386]: Fix ix86_spill_class condition

2016-04-27 Thread Uros Bizjak

Hello!

Based on recent discussion, the attached patch fixes ix86_spill_class
condition. The spills to SSE registers are now enabled for real on
SSE2 target, where inter-unit moves to/from vector registers are
enabled.

Since this is new functionality, the patch can cause some minor
runtime regressions (or unwanted regmove chains), so IMO the beginning
of stage1 is appropriate timing for these kind of changes.

TARGET_GENERAL_REGS_SSE_SPILL flag is enabled by default on all Intel
Core processors, so the change will be picked by SPEC testers and any
problems will soon be detected.

2016-04-27  Uros Bizjak  

* config/i386/i386.c (ix86_spill_class): Enable for TARGET_SSE2 when
inter-unit moves to/from vector registers are enabled.  Do not disable
for TARGET_MMX.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32}, configured with --with-arch=corei7.

Committed to mainline SVN.

Uros.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 235516)
+++ config/i386/i386.c  (working copy)
@@ -53560,9 +53560,12 @@
 static reg_class_t
 ix86_spill_class (reg_class_t rclass, machine_mode mode)
 {
-  if (TARGET_SSE && TARGET_GENERAL_REGS_SSE_SPILL && ! TARGET_MMX
+  if (TARGET_GENERAL_REGS_SSE_SPILL
+  && TARGET_SSE2
+  && TARGET_INTER_UNIT_MOVES_TO_VEC
+  && TARGET_INTER_UNIT_MOVES_FROM_VEC
   && (mode == SImode || (TARGET_64BIT && mode == DImode))
-  && rclass != NO_REGS && INTEGER_CLASS_P (rclass))
+  && INTEGER_CLASS_P (rclass))
 return ALL_SSE_REGS;
   return NO_REGS;
 }

Re: Cilk Plus testsuite needs massive cleanup (PR testsuite/70595)

2016-04-27 Thread Mike Stump


> On Apr 27, 2016, at 2:22 AM, Rainer Orth  
> wrote:
> Will commit to mainline in a day or two, giving interested parties an
> opportunity to comment.

:-)  Always nice to see cleanups.

[PATCH, i386] Set mode of input operand ...

2016-04-27 Thread Uros Bizjak

... to avoid build warnings.

2016-04-27  Uros Bizjak  

* config/i386/i386.md
(lea arith with mem operand + setcc peephole2): Set operator mode.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.

Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 235516)
+++ config/i386/i386.md (working copy)
@@ -18023,8 +18023,8 @@
   [(set (match_operand:SWI 0 "register_operand")
(match_operand:SWI 1 "memory_operand"))
(set (match_operand:SWI 3 "register_operand")
-   (plus (match_dup 0)
- (match_operand:SWI 2 "")))
+   (plus:SWI (match_dup 0)
+ (match_operand:SWI 2 "")))
(set (match_dup 1) (match_dup 3))
(set (reg FLAGS_REG) (compare (match_dup 3) (const_int 0)))]
   "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())

Re: [PATCH][AArch64][wwwdocs] Summarise some more AArch64 changes for GCC6

2016-04-27 Thread Jim Wilson

On Wed, Apr 27, 2016 at 3:33 AM, Kyrill Tkachov
 wrote:
> Thanks, I've incorporated your and James' feedback.
> Since James ok'd the content of the patch from an AArch64 perspective
> I'll commit this later today if I receive no further feedback.

There is no paragraph for the Qualcomm qdf24xx.  Do you want me to
write that and submit it?  That could take a while as I will have to
discuss if with Qualcomm first.

Jim

Re: [PATCH][cilkplus] fix c++ implicit conversions with cilk_spawn (PR/69024, PR/68997)

2016-04-27 Thread Jeff Law


On 01/20/2016 10:57 AM, Ryan Burn wrote:

This patch follows on from
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02142.html

As discussed, it creates a separate function
cilk_cp_detect_spawn_and_unwrap in gcc/cp to handle processing
cilk_spawn expressions for c++ and adds support for implicit
constructor and type conversions.

Bootstrapped and regression tested on x86_64-linux.

gcc/c-family/ChangeLog:
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * cilk.c (cilk_ignorable_spawn_rhs_op): Change to have external linkage.
  * cilk.c (recognize_spawn): Rename to cilk_recognize_spawn. Change to have
  external linkage.
  * cilk.c (cilk_detect_and_unwrap): Rename to recognize_spawn to
  cilk_recognize_spawn.
  * cilk.c (extract_free_variables): Don't extract free variables from
  AGGR_INIT_EXPR slot.

gcc/cp/ChangeLog
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * cp-gimplify.c (cp_gimplify_expr): Call cilk_cp_detect_spawn_and_unwrap
  instead of cilk_detect_spawn_and_unwrap.
  * cp-cilkplus.c (is_conversion_operator_function_decl_p): New.
  * cp-cilkplus.c (find_spawn): New.
  * cp-cilkplus.c (cilk_cp_detect_spawn_and_unwrap): New.

gcc/testsuite/ChangeLog
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * g++.dg/cilk-plus/CK/pr68001.cc: Fix to not depend on broken diagnostic.
  * g++.dg/cilk-plus/CK/pr69024.cc: New test.
  * g++.dg/cilk-plus/CK/pr68997.cc: New test.


cilk3.diff


Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c(revision 232444)
+++ gcc/cp/cp-gimplify.c(working copy)
@@ -39,6 +39,7 @@
 static tree cp_fold_r (tree *, int *, void *);
 static void cp_genericize_tree (tree*);
 static tree cp_fold (tree);
+bool cilk_cp_detect_spawn_and_unwrap (tree *);
The right thing to do here is create cp-cilkplus.h and put the prototype 
in here.  Along with cpp_validate_cilk_plus_loop.







Index: gcc/cp/cp-cilkplus.c
===
--- gcc/cp/cp-cilkplus.c(revision 232444)
+++ gcc/cp/cp-cilkplus.c(working copy)
@@ -27,6 +27,108 @@
 #include "tree-iterator.h"
 #include "cilk.h"

+bool cilk_ignorable_spawn_rhs_op (tree);
+bool cilk_recognize_spawn (tree, tree *);
These should be prototyped in an appropriate .h file.  c-common.h, while 
not ideal, would be OK.  c-common seems to be a fairly bad dumping 
ground and we'll want to untangle separately.




+
+/* Return TRUE if T is a FUNCTION_DECL for a type-conversion operator.  */
+
+static bool
+is_conversion_operator_function_decl_p (tree t) {
+  if (TREE_CODE (t) != FUNCTION_DECL)
+return false;
+
+  return DECL_NAME (t) && IDENTIFIER_TYPENAME_P (DECL_NAME (t));
+}

Formatting.  The open-curly goes on a line by itself

I'm spinning up those changes for testing.  Assuming they pass, I'll 
update the ChangeLog appropriately as well.



Jeff

[ubsan PATCH] Fix compile-time hog with _EXPRs (PR sanitizer/70342)

2016-04-27 Thread Marek Polacek

This test took forever to compile with -fsanitize=null, because the
instrumentation was creating incredible amount of duplicated expressions, in a
quadratic fashion.  I think the problem is that we instrument _EXPR <>
expressions, which doesn't seem to be needed -- we only need to instrument the
initializers in TARGET_EXPRs.  With this patch, we avoid creating tons of 
useless
expressions and the compile time is reduced from ~ infinity to <1s.

Jakub, do you see any problem with this?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-27  Marek Polacek  

PR sanitizer/70342
* c-ubsan.c (ubsan_maybe_instrument_reference_or_call): Don't
null-instrument _EXPR <...>.

* g++.dg/ubsan/null-7.C: New test.

diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
index 4022bdf..b829c04 100644
--- gcc/c-family/c-ubsan.c
+++ gcc/c-family/c-ubsan.c
@@ -395,8 +395,11 @@ ubsan_maybe_instrument_reference_or_call (location_t loc, 
tree op, tree ptype,
  int save_flag_delete_null_pointer_checks
= flag_delete_null_pointer_checks;
  flag_delete_null_pointer_checks = 1;
- if (!tree_single_nonzero_warnv_p (op, _overflow_p)
- || strict_overflow_p)
+ if ((!tree_single_nonzero_warnv_p (op, _overflow_p)
+  || strict_overflow_p)
+ /* Instrumenting _EXPR <...> is a waste and can result
+in compile-time hog; see PR70342.  */
+ && TREE_CODE (TREE_OPERAND (op, 0)) != TARGET_EXPR)
instrument = true;
  flag_delete_null_pointer_checks
= save_flag_delete_null_pointer_checks;
diff --git gcc/testsuite/g++.dg/ubsan/null-7.C 
gcc/testsuite/g++.dg/ubsan/null-7.C
index e69de29..8284bc7 100644
--- gcc/testsuite/g++.dg/ubsan/null-7.C
+++ gcc/testsuite/g++.dg/ubsan/null-7.C
@@ -0,0 +1,24 @@
+// PR sanitizer/70342
+// { dg-do compile }
+// { dg-options "-fsanitize=null" }
+
+class A {};
+class B {
+public:
+  B(A);
+};
+class C {
+public:
+  C operator<<(B);
+};
+class D {
+  D(const int &);
+  C m_blackList;
+};
+D::D(const int &) {
+  m_blackList << A() << A() << A() << A() << A() << A() << A() << A() << A()
+  << A() << A() << A() << A() << A() << A() << A() << A() << A()
+  << A() << A() << A() << A() << A() << A() << A() << A() << A()
+  << A() << A() << A() << A() << A() << A() << A() << A() << A()
+  << A() << A() << A() << A() << A() << A() << A() << A() << A();
+}

Marek

Re: [PATCH] Convert DF_SCAN etc from #define to an enum

2016-04-27 Thread Bernd Schmidt


On 04/27/2016 07:06 PM, David Malcolm wrote:

Whilst debugging an issue in df, I noticed that there are
some #define constants that could be an enum (thus making them known
to gdb).

Convert them to a new enum, and update the "id" field of
struct df_problem.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu

OK for trunk?

gcc/ChangeLog:
* df.h (DF_SCAN, DF_LR, DF_LIVE, DF_RD, DF_CHAIN, DF_WORD_LR,
DF_NOTE, DF_MD, DF_MIR, DF_LAST_PROBLEM_PLUS1): Convert from
#define to...
(enum df_problem_id): ...this new enum.
(struct df_problem): Convert field "id" from "int" to
enum df_problem_id.


Ok.


Bernd

Re: [PATCH] maybe_set_first_label_num can take an rtx_code_label *

2016-04-27 Thread Bernd Schmidt


On 04/27/2016 07:12 PM, David Malcolm wrote:

The function maybe_set_first_label_num acts on a CODE_LABEL; we can
capture that in the type system.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; it's
only used from stmt.c (not in any of the config subdirs), so I didn't
attempt a many-config test.

OK for trunk?

gcc/ChangeLog:
* emit-rtl.c (maybe_set_first_label_num): Strengthen param from
rtx to rtx_code_label *.
* rtl.h (maybe_set_first_label_num): Likewise.


Ok.


Bernd

[PATCH] maybe_set_first_label_num can take an rtx_code_label *

2016-04-27 Thread David Malcolm

The function maybe_set_first_label_num acts on a CODE_LABEL; we can
capture that in the type system.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; it's
only used from stmt.c (not in any of the config subdirs), so I didn't
attempt a many-config test.

OK for trunk?

gcc/ChangeLog:
* emit-rtl.c (maybe_set_first_label_num): Strengthen param from
rtx to rtx_code_label *.
* rtl.h (maybe_set_first_label_num): Likewise.
---
 gcc/emit-rtl.c | 2 +-
 gcc/rtl.h  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 0fcd9d9..4e5ba41 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1355,7 +1355,7 @@ get_first_label_num (void)
Fix this now so that array indices work later.  */
 
 void
-maybe_set_first_label_num (rtx x)
+maybe_set_first_label_num (rtx_code_label *x)
 {
   if (CODE_LABEL_NUMBER (x) < first_label_num)
 first_label_num = CODE_LABEL_NUMBER (x);
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 8267252..b531ab7 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3508,7 +3508,7 @@ extern int condjump_in_parallel_p (const rtx_insn *);
 extern int max_reg_num (void);
 extern int max_label_num (void);
 extern int get_first_label_num (void);
-extern void maybe_set_first_label_num (rtx);
+extern void maybe_set_first_label_num (rtx_code_label *);
 extern void delete_insns_since (rtx_insn *);
 extern void mark_reg_pointer (rtx, int);
 extern void mark_user_reg (rtx);
-- 
1.8.5.3

Re: [PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand

2016-04-27 Thread Wilco Dijkstra

James Greenhalgh wrote:
> So the part of this patch removing the fallthrough to general operand
> is not OK for trunk.
>
> The other parts look reasonable to me, please resubmit just those.

Right, I removed the removal of the fallthrough. Here is the revised version:

ChangeLog:
2016-04-27  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.md
(add3_compareC_cconly_imm): Remove use of %w.
(add3_compareC_imm): Likewise.
(si3_uxtw): Split into register and immediate variants.
(andsi3_compare0_uxtw): Likewise.
(and3_compare0): Likewise.
(and3nr_compare0): Likewise.
(stack_protect_test_): Don't use %x for memory operands.

--

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
19981c205d3e2a6102510647bde9b29906a4fdc9..4e41b3b0f5b2369431ffec1a0029af53fc5aebd9
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1755,7 +1755,7 @@
   "aarch64_zero_extend_const_eq (mode, operands[2],
 mode, operands[1])"
   "@
-  cmn\\t%0, %1
+  cmn\\t%0, %1
   cmp\\t%0, #%n1"
   [(set_attr "type" "alus_imm")]
 )
@@ -1787,11 +1787,11 @@
   "aarch64_zero_extend_const_eq (mode, operands[3],
  mode, operands[2])"
   "@
-  adds\\t%0, %1, %2
+  adds\\t%0, %1, %2
   subs\\t%0, %1, #%n2"
   [(set_attr "type" "alus_imm")]
 )
- 
+
 (define_insn "add3_compareC"
   [(set (reg:CC_C CC_REGNUM)
(ne:CC_C
@@ -3394,7 +3394,9 @@
  (LOGICAL:SI (match_operand:SI 1 "register_operand" "%r,r")
 (match_operand:SI 2 "aarch64_logical_operand" "r,K"]
   ""
-  "\\t%w0, %w1, %w2"
+  "@
+   \\t%w0, %w1, %w2
+   \\t%w0, %w1, %2"
   [(set_attr "type" "logic_reg,logic_imm")]
 )
 
@@ -3407,7 +3409,9 @@
(set (match_operand:GPI 0 "register_operand" "=r,r")
(and:GPI (match_dup 1) (match_dup 2)))]
   ""
-  "ands\\t%0, %1, %2"
+  "@
+   ands\\t%0, %1, %2
+   ands\\t%0, %1, %2"
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
@@ -3421,7 +3425,9 @@
(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI (and:SI (match_dup 1) (match_dup 2]
   ""
-  "ands\\t%w0, %w1, %w2"
+  "@
+   ands\\t%w0, %w1, %w2
+   ands\\t%w0, %w1, %2"
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
@@ -3775,7 +3781,9 @@
  (match_operand:GPI 1 "aarch64_logical_operand" "r,"))
 (const_int 0)))]
   ""
-  "tst\\t%0, %1"
+  "@
+   tst\\t%0, %1
+   tst\\t%0, %1"
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
@@ -5170,7 +5178,7 @@
 UNSPEC_SP_TEST))
(clobber (match_scratch:PTR 3 "="))]
   ""
-  "ldr\t%3, %x1\;ldr\t%0, %x2\;eor\t%0, %3, %0"
+  "ldr\t%3, %1\;ldr\t%0, %2\;eor\t%0, %3, %0"
   [(set_attr "length" "12")
(set_attr "type" "multiple")])

Re: [PATCH] df: make df_problem instances "const"

2016-04-27 Thread Bernd Schmidt


On 04/27/2016 07:08 PM, David Malcolm wrote:

The various struct df_problem instances are constant data; mark them
as such.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu

OK for trunk?

gcc/ChangeLog:
* df-core.c (df_add_problem): Make the problem param be const.
(df_remove_problem): Make local "problem" be const.
* df-problems.c (problem_RD): Make const.
(problem_LR): Likewise.
(problem_LIVE): Likewise.
(problem_MIR): Likewise.
(problem_CHAIN): Likewise.
(problem_WORD_LR): Likewise.
(problem_NOTE): Likewise.
(problem_MD): Likewise.
* df-scan.c (problem_SCAN): Likewise.
* df.h (struct df_problem): Make field "dependent_problem" be
const.
(struct dataflow): Likewise for field "problem".
(df_add_problem): Make param const.


Ok.


Bernd

Re: [PATCH] Fix comment in rtl.def

2016-04-27 Thread Bernd Schmidt




On 04/27/2016 07:03 PM, David Malcolm wrote:

Commit r210360 removed the first "i" field from the various instruction
nodes in rtx.def, moving it to an explicit "int insn_uid;" field
of the union "u2" within rtx_def.

Update the comment in rtl.def to reflect this change.  Also, fix
a stray apostrophe.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu

OK for trunk?

gcc/ChangeLog:
* rtl.def: Update comment for "things in the instruction chain" to
reflect the removal of the leading "i" field for INSN_UID in
r210360.  Fix bogus apostrophe.


Ok.


Bernd

[PATCH] df: make df_problem instances "const"

2016-04-27 Thread David Malcolm

The various struct df_problem instances are constant data; mark them
as such.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu

OK for trunk?

gcc/ChangeLog:
* df-core.c (df_add_problem): Make the problem param be const.
(df_remove_problem): Make local "problem" be const.
* df-problems.c (problem_RD): Make const.
(problem_LR): Likewise.
(problem_LIVE): Likewise.
(problem_MIR): Likewise.
(problem_CHAIN): Likewise.
(problem_WORD_LR): Likewise.
(problem_NOTE): Likewise.
(problem_MD): Likewise.
* df-scan.c (problem_SCAN): Likewise.
* df.h (struct df_problem): Make field "dependent_problem" be
const.
(struct dataflow): Likewise for field "problem".
(df_add_problem): Make param const.
---
 gcc/df-core.c |  4 ++--
 gcc/df-problems.c | 16 
 gcc/df-scan.c |  2 +-
 gcc/df.h  |  6 +++---
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/df-core.c b/gcc/df-core.c
index 5464bc3..c1e9714 100644
--- a/gcc/df-core.c
+++ b/gcc/df-core.c
@@ -411,7 +411,7 @@ struct df_d *df;
 /* Add PROBLEM (and any dependent problems) to the DF instance.  */
 
 void
-df_add_problem (struct df_problem *problem)
+df_add_problem (const struct df_problem *problem)
 {
   struct dataflow *dflow;
   int i;
@@ -584,7 +584,7 @@ df_set_blocks (bitmap blocks)
 void
 df_remove_problem (struct dataflow *dflow)
 {
-  struct df_problem *problem;
+  const struct df_problem *problem;
   int i;
 
   if (!dflow)
diff --git a/gcc/df-problems.c b/gcc/df-problems.c
index f7bf3c8..132c127 100644
--- a/gcc/df-problems.c
+++ b/gcc/df-problems.c
@@ -668,7 +668,7 @@ df_rd_bottom_dump (basic_block bb, FILE *file)
 
 /* All of the information associated with every instance of the problem.  */
 
-static struct df_problem problem_RD =
+static const struct df_problem problem_RD =
 {
   DF_RD,  /* Problem id.  */
   DF_FORWARD, /* Direction.  */
@@ -1190,7 +1190,7 @@ df_lr_verify_solution_end (void)
 
 /* All of the information associated with every instance of the problem.  */
 
-static struct df_problem problem_LR =
+static const struct df_problem problem_LR =
 {
   DF_LR,  /* Problem id.  */
   DF_BACKWARD,/* Direction.  */
@@ -1718,7 +1718,7 @@ df_live_verify_solution_end (void)
 
 /* All of the information associated with every instance of the problem.  */
 
-static struct df_problem problem_LIVE =
+static const struct df_problem problem_LIVE =
 {
   DF_LIVE,  /* Problem id.  */
   DF_FORWARD,   /* Direction.  */
@@ -2169,7 +2169,7 @@ df_mir_verify_solution_end (void)
 
 /* All of the information associated with every instance of the problem.  */
 
-static struct df_problem problem_MIR =
+static const struct df_problem problem_MIR =
 {
   DF_MIR,   /* Problem id.  */
   DF_FORWARD,   /* Direction.  */
@@ -2641,7 +2641,7 @@ df_chain_insn_bottom_dump (const rtx_insn *insn, FILE 
*file)
 }
 }
 
-static struct df_problem problem_CHAIN =
+static const struct df_problem problem_CHAIN =
 {
   DF_CHAIN,   /* Problem id.  */
   DF_NONE,/* Direction.  */
@@ -3008,7 +3008,7 @@ df_word_lr_bottom_dump (basic_block bb, FILE *file)
 
 /* All of the information associated with every instance of the problem.  */
 
-static struct df_problem problem_WORD_LR =
+static const struct df_problem problem_WORD_LR =
 {
   DF_WORD_LR,  /* Problem id.  */
   DF_BACKWARD, /* Direction.  */
@@ -3683,7 +3683,7 @@ df_note_free (void)
 
 /* All of the information associated every instance of the problem.  */
 
-static struct df_problem problem_NOTE =
+static const struct df_problem problem_NOTE =
 {
   DF_NOTE,/* Problem id.  */
   DF_NONE,/* Direction.  */
@@ -4693,7 +4693,7 @@ df_md_bottom_dump (basic_block bb, FILE *file)
   df_print_regset (file, _info->out);
 }
 
-static struct df_problem problem_MD =
+static const struct df_problem problem_MD =
 {
   DF_MD,  /* Problem id.  */
   DF_FORWARD, /* Direction.  */
diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 98de844..e6d01d6 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -396,7 +396,7 @@ df_scan_start_block (basic_block bb, FILE *file)
 #endif
 }
 
-static struct df_problem problem_SCAN =
+static const struct df_problem problem_SCAN =
 {
   DF_SCAN,/* Problem id.  */
   DF_NONE,/* Direction.  */
diff --git a/gcc/df.h b/gcc/df.h
index 7741ea5..40c3794 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -275,7 +275,7 @@ struct df_problem {
   df_dump_insn_problem_function dump_insn_bottom_fun;
   df_verify_solution_start verify_start_fun;
   df_verify_solution_end verify_end_fun;
-  struct df_problem *dependent_problem;
+  const struct

[PATCH] Convert DF_SCAN etc from #define to an enum

2016-04-27 Thread David Malcolm

Whilst debugging an issue in df, I noticed that there are
some #define constants that could be an enum (thus making them known
to gdb).

Convert them to a new enum, and update the "id" field of
struct df_problem.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu

OK for trunk?

gcc/ChangeLog:
* df.h (DF_SCAN, DF_LR, DF_LIVE, DF_RD, DF_CHAIN, DF_WORD_LR,
DF_NOTE, DF_MD, DF_MIR, DF_LAST_PROBLEM_PLUS1): Convert from
#define to...
(enum df_problem_id): ...this new enum.
(struct df_problem): Convert field "id" from "int" to
enum df_problem_id.
---
 gcc/df.h | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/gcc/df.h b/gcc/df.h
index 34de926..7741ea5 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -43,17 +43,20 @@ union df_ref_d;
a uniform manner.  The last four problems can be added or deleted
at any time are always defined (though LIVE is always there at -O2
or higher); the others are always there.  */
-#define DF_SCAN0
-#define DF_LR  1  /* Live Registers backward. */
-#define DF_LIVE2  /* Live Registers & Uninitialized Registers */
-#define DF_RD  3  /* Reaching Defs. */
-#define DF_CHAIN   4  /* Def-Use and/or Use-Def Chains. */
-#define DF_WORD_LR 5  /* Subreg tracking lr.  */
-#define DF_NOTE6  /* REG_DEAD and REG_UNUSED notes.  */
-#define DF_MD  7  /* Multiple Definitions. */
-#define DF_MIR 8  /* Must-initialized Registers.  */
-
-#define DF_LAST_PROBLEM_PLUS1 (DF_MIR + 1)
+enum df_problem_id
+  {
+DF_SCAN,
+DF_LR,/* Live Registers backward. */
+DF_LIVE,  /* Live Registers & Uninitialized Registers */
+DF_RD,/* Reaching Defs. */
+DF_CHAIN, /* Def-Use and/or Use-Def Chains. */
+DF_WORD_LR,   /* Subreg tracking lr.  */
+DF_NOTE,  /* REG_DEAD and REG_UNUSED notes.  */
+DF_MD,/* Multiple Definitions. */
+DF_MIR,   /* Must-initialized Registers.  */
+
+DF_LAST_PROBLEM_PLUS1
+  };
 
 /* Dataflow direction.  */
 enum df_flow_dir
@@ -251,7 +254,7 @@ typedef void (*df_verify_solution_end) (void);
 struct df_problem {
   /* The unique id of the problem.  This is used it index into
  df->defined_problems to make accessing the problem data easy.  */
-  unsigned int id;
+  enum df_problem_id id;
   enum df_flow_dir dir;/* Dataflow direction.  */
   df_alloc_function alloc_fun;
   df_reset_function reset_fun;
-- 
1.8.5.3

[PATCH] Fix comment in rtl.def

2016-04-27 Thread David Malcolm

Commit r210360 removed the first "i" field from the various instruction
nodes in rtx.def, moving it to an explicit "int insn_uid;" field
of the union "u2" within rtx_def.

Update the comment in rtl.def to reflect this change.  Also, fix
a stray apostrophe.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu

OK for trunk?

gcc/ChangeLog:
* rtl.def: Update comment for "things in the instruction chain" to
reflect the removal of the leading "i" field for INSN_UID in
r210360.  Fix bogus apostrophe.
---
 gcc/rtl.def | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/rtl.def b/gcc/rtl.def
index 0b7f894..45a9272 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -131,10 +131,10 @@ DEF_RTL_EXPR(ADDRESS, "address", "i", RTX_EXTRA)
 /* --
Expression types used for things in the instruction chain.
 
-   All formats must start with "iuu" to handle the chain.
+   All formats must start with "uu" to handle the chain.
Each insn expression holds an rtl instruction and its semantics
during back-end processing.
-   See macros's in "rtl.h" for the meaning of each rtx->u.fld[].
+   See macros in "rtl.h" for the meaning of each rtx->u.fld[].
 
-- */
 
-- 
1.8.5.3

Re: [PATCH 1/4] Add gcc-auto-profile script

2016-04-27 Thread Andi Kleen

On Wed, Apr 27, 2016 at 05:42:48PM +0200, Bernd Schmidt wrote:
> On 03/28/2016 06:44 AM, Andi Kleen wrote:
> >This patch adds a new gcc-auto-profile script that figures out the
> >correct event and runs perf. The script is installed with on Linux systems.
> 
> That sounds useful, and I think we'll want to accept this.
> 
> >So Linux just hardcodes installing the script, but it may fail at runtime.
> 
> For this reason it would probably be best to retain the documentation for
> the old method alongside the new one.

The old method actually doesn't work, unless you apply a very obscure
patch to your perf. I don't think it is very useful for users.

> 
> >+
> >+baseurl = "https://download.01.org/perfmon;
> 
> Slightly scary to see a random unknown download URL. Apparently it's an
> Intel thing? Is this referenced somewhere on an intel.com web page?

http://www.intel.com/content/www/us/en/search.html?toplevelcategory=none=01.org


> 
> >  E.g.
> >  @smallexample
> >  create_gcov --binary=your_program.unstripped --profile=perf.data \
> >---gcov=profile.afdo
> >+--gcov=profile.afdo -gcov_version 1
> >  @end smallexample
> >  @end table
> 
> Why this change? What does it do?

It actually makes it work. The google autofdo distribution defaults
to some google internal magic gcov version number that doesn't work
with standard gcc.

I can split it out.

> 
> Why isn't the new script in contrib? Does it have to be in gcc to be
> installed?

Because autoprofiledfeedback needs it.

Also the idea was to eventually install it by default (although the patch
doesn't do that yet)

-Andi


-- 
a...@linux.intel.com -- Speaking for myself only

Re: Fix for PR70498 in Libiberty Demangler

2016-04-27 Thread Bernd Schmidt


On 04/15/2016 07:39 PM, Marcel Böhme wrote:


Sure. The updated patch, including Changelog and more test cases. Regression 
tested.


This patch seems seriously damaged by sending it through the email body. 
Please attach it (text/plain) instead.



Bernd

Re: [PATCH 4/4] Add make autoprofiledbootstrap

2016-04-27 Thread Andi Kleen

On Wed, Apr 27, 2016 at 05:36:09PM +0200, Bernd Schmidt wrote:
> On 03/28/2016 06:44 AM, Andi Kleen wrote:
> >From: Andi Kleen 
> >
> >Add support for profiledbootstrap with autofdo. Will be useful
> >to get better testing coverage of autofdo.
> 
> Is this the only purpose? I'll admit this is the patch I like least out of
> the series.

It's the main purpose, since we already have profiled feedback. 
bootstrap is one of the best tests we have and autofdo badly needs it.

It's also a useful benchmark, so if autofdo gets improved it can catch up more
with explicit profiling.

> >The autofdo'ed compiler is ~7% faster on insn-recog.i (vs ~11% for
> >profiledfeedback), and ~4% faster for tramp3d-v4 (vs 10% for
> >profiledfeedback)
> 
> So it seems like we get worse results than with a feature we already have,
> so I don't quite see the value.

I hope as it gets tuned that will improve. Also autofdo is so much
easier to use on other software that the main usage elsewhere may well be 
autofdo,
not explicit instrumentation.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only

Re: [PATCH] add -fprolog-pad=N option to c-family

2016-04-27 Thread Szabolcs Nagy

On 27/04/16 16:22, Torsten Duwe wrote:
> Hi Maxim,
> 
> thanks for starting the work on this; I have added the missing
> command line option. It builds now and the resulting compiler generates
> a linux kernel with the desired properties, so work can continue there.
> 
>   Torsten

i guess the flag should be documented in invoke.texi

it's not clear what N means in -fprolog-pad=N, how
location recording is enabled and how it interacts
with -fipa-ra. (-pg disables -fipa-ra, but -fprolog-pad
works without -pg.)

with -mfentry, by default the user only has to
implement the fentry call (linux wants nops there, but
e.g. glibc could use -pg -mfentry for profiling on
aarch64 and the target specific details are easier to
document for an -m option than for something general).

the nop-padding is more general, but the size and
layout of nops and the call abi will be target
specific and the user will most likely need to modify
the binary (to get the right sequence) which needs
additional tooling.  i don't know who might use it
other than linux (which already has tools to deal with
-mfentry).

i'm not against nop-padding, but i think more evidence
is needed that the generalization is a good idea and
users can deal with the resulting issues.

Contents of PO file 'cpplib-6.1.0.ru.po'

2016-04-27 Thread Translation Project Robot



cpplib-6.1.0.ru.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

New Russian PO file for 'cpplib' (version 6.1.0)

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Russian team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/ru.po

(This file, 'cpplib-6.1.0.ru.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

New Ukrainian PO file for 'gcc' (version 6.1.0)

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Ukrainian team of translators.  The file is available at:

http://translationproject.org/latest/gcc/uk.po

(This file, 'gcc-6.1.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-04-27 Thread Dhole

Thanks again for the review Bernd,

On 16-04-27 01:33:47, Bernd Schmidt wrote:
> >+  epoch = strtoll (source_date_epoch, , 10);
> >+  if ((errno == ERANGE && (epoch == LLONG_MAX || epoch == LLONG_MIN))
> >+  || (errno != 0 && epoch == 0))
> >+fatal_error (UNKNOWN_LOCATION, "environment variable 
> >$SOURCE_DATE_EPOCH: "
> >+ "strtoll: %s\n", xstrerror(errno));
> >+  if (endptr == source_date_epoch)
> >+fatal_error (UNKNOWN_LOCATION, "environment variable 
> >$SOURCE_DATE_EPOCH: "
> >+ "No digits were found: %s\n", endptr);
> >+  if (*endptr != '\0')
> >+fatal_error (UNKNOWN_LOCATION, "environment variable 
> >$SOURCE_DATE_EPOCH: "
> >+ "Trailing garbage: %s\n", endptr);
> >+  if (epoch < 0)
> >+fatal_error (UNKNOWN_LOCATION, "environment variable 
> >$SOURCE_DATE_EPOCH: "
> >+ "Value must be nonnegative: %lld \n", epoch);
> 
> These are somewhat unusual for error messages, but I think the general
> principle of no capitalization probably applies, so "No", "Trailing", and
> "Value" should be lowercase.

Done.

> >+  time_t source_date_epoch = (time_t) -1;
> >+
> >+  source_date_epoch = get_source_date_epoch ();
> 
> First initialization seems unnecessary. Might want to merge the declaration
> with the initialization.

And done.

I'm attaching the updated patch with the two minor issues fixed.

Cheers,
-- 
Dhole
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index f2846bb..5315475 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -12741,4 +12741,37 @@ valid_array_size_p (location_t loc, tree type, tree 
name)
   return true;
 }
 
+/* Read SOURCE_DATE_EPOCH from environment to have a deterministic
+   timestamp to replace embedded current dates to get reproducible
+   results.  Returns -1 if SOURCE_DATE_EPOCH is not defined.  */
+time_t
+get_source_date_epoch ()
+{
+  char *source_date_epoch;
+  long long epoch;
+  char *endptr;
+
+  source_date_epoch = getenv ("SOURCE_DATE_EPOCH");
+  if (!source_date_epoch)
+return (time_t) -1;
+
+  errno = 0;
+  epoch = strtoll (source_date_epoch, , 10);
+  if ((errno == ERANGE && (epoch == LLONG_MAX || epoch == LLONG_MIN))
+  || (errno != 0 && epoch == 0))
+fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
+"strtoll: %s\n", xstrerror(errno));
+  if (endptr == source_date_epoch)
+fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
+"no digits were found: %s\n", endptr);
+  if (*endptr != '\0')
+fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
+"trailing garbage: %s\n", endptr);
+  if (epoch < 0)
+fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
+"value must be nonnegative: %lld \n", epoch);
+
+  return (time_t) epoch;
+}
+
 #include "gt-c-family-c-common.h"
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index fa3746c..656bc75 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1467,4 +1467,9 @@ extern bool reject_gcc_builtin (const_tree, location_t = 
UNKNOWN_LOCATION);
 extern void warn_duplicated_cond_add_or_warn (location_t, tree, vec **);
 extern bool valid_array_size_p (location_t, tree, tree);
 
+/* Read SOURCE_DATE_EPOCH from environment to have a deterministic
+   timestamp to replace embedded current dates to get reproducible
+   results.  Returns -1 if SOURCE_DATE_EPOCH is not defined.  */
+extern time_t get_source_date_epoch (void);
+
 #endif /* ! GCC_C_COMMON_H */
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 96da4fc..bf1db6c 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -385,6 +385,9 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned 
char *cpp_flags,
   enum cpp_ttype type;
   unsigned char add_flags = 0;
   enum overflow_type overflow = OT_NONE;
+  time_t source_date_epoch = get_source_date_epoch ();
+
+  cpp_init_source_date_epoch (parse_in, source_date_epoch);
 
   timevar_push (TV_CPP);
  retry:
diff --git a/gcc/doc/cppenv.texi b/gcc/doc/cppenv.texi
index 22c8cb3..e958e93 100644
--- a/gcc/doc/cppenv.texi
+++ b/gcc/doc/cppenv.texi
@@ -79,4 +79,21 @@ main input file is omitted.
 @ifclear cppmanual
 @xref{Preprocessor Options}.
 @end ifclear
+
+@item SOURCE_DATE_EPOCH
+
+If this variable is set, its value specifies a UNIX timestamp to be
+used in replacement of the current date and time in the @code{__DATE__}
+and @code{__TIME__} macros, so that the embedded timestamps become
+reproducible.
+
+The value of @env{SOURCE_DATE_EPOCH} must be a UNIX timestamp,
+defined as the number of seconds (excluding leap seconds) since
+01 Jan 1970 00:00:00 represented in ASCII, identical to the output of
+@samp{@command{date +%s}}.
+
+The value should be a known timestamp such as the last modification
+time of the source or package and it should be set by the build
+process.
+
 @end vtable
diff --git

[PATCH GCC]Do more tree if-conversions by handlding PHIs with more than two arguments.

2016-04-27 Thread Bin Cheng

Hi,
Currently tree if-conversion only supports PHIs with no more than two arguments 
unless the loop is marked with "simd pragma".  This patch makes such PHIs 
supported unconditionally if they have no more than MAX_PHI_ARG_NUM arguments, 
thus cases like PR56541 can be fixed.  Note because a chain of "?:" operators 
are needed to compute mult-arg PHI, this patch records the case and versions 
loop so that vectorizer can fall back to the original loop if 
if-conversion+vectorization isn't beneficial.  Ideally, cost computation in 
vectorizer should be improved to measure benefit against the original loop, 
rather than if-converted loop.  So far MAX_PHI_ARG_NUM is set to (4) because 
cases with more arguments are rare and not likely beneficial.

Apart from above change, the patch also makes changes like: only split critical 
edge when we have to; cleanups code logic in if_convertible_loop_p about 
aggressive_if_conv.

Bootstrap and test on x86_64 and AArch64, is it OK?

Thanks,
bin

2016-04-26  Bin Cheng  

PR tree-optimization/56541
* tree-if-conv.c (MAX_PHI_ARG_NUM): New macro.
(any_complicated_phi): New static variable.
(aggressive_if_conv): Delete.
(if_convertible_phi_p): Support PHIs with more than two arguments.
(if_convertible_bb_p): Remvoe check on aggressive_if_conv and
critical pred edges.
(ifcvt_split_critical_edges): Support PHIs with more than two
arguments by checking new parameter.  Only split critical edges
if needed.
(tree_if_conversion): Handle simd pragma marked loop using new
local variable aggressive_if_conv.  Check any_complicated_phi.

gcc/testsuite/ChangeLog
2016-04-26  Bin Cheng  

PR tree-optimization/56541
* gcc.dg/tree-ssa/ifc-pr56541.c: New test.
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 32ced16..31fe390 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -113,11 +113,21 @@ along with GCC; see the file COPYING3.  If not see
 #include "varasm.h"
 #include "builtins.h"
 #include "params.h"
- 
+
+/* Only handle PHIs with no more arguments unless we are asked to by
+   simd pragma.  */
+#define MAX_PHI_ARG_NUM (4)
+
 /* Indicate if new load/store that needs to be predicated is introduced
during if conversion.  */
 static bool any_pred_load_store;
 
+/* Indicate if there are any complicated PHIs that need to be handled in
+   if-conversion.  Complicated PHI has more than two arguments and can't
+   be degenerated to two arguments PHI.  See more information in comment
+   before phi_convertible_by_degenerating_args.  */
+static bool any_complicated_phi;
+
 /* Hash for struct innermost_loop_behavior.  It depends on the user to
free the memory.  */
 
@@ -172,9 +182,6 @@ innermost_loop_behavior_hash::equal (const value_type ,
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
-/* Apply more aggressive (extended) if-conversion if true.  */
-static bool aggressive_if_conv;
-
 /* Hash table to store  pairs.  */
 static hash_map *innermost_DR_map;
@@ -639,13 +646,9 @@ phi_convertible_by_degenerating_args (gphi *phi)
 }
 
 /* Return true when PHI is if-convertible.  PHI is part of loop LOOP
-   and it belongs to basic block BB.
-
-   PHI is not if-convertible if:
-   - it has more than 2 arguments.
-
-   When the aggressive_if_conv is set, PHI can have more than
-   two arguments.  */
+   and it belongs to basic block BB.  Note at this point, it is sure
+   that PHI is if-convertible.  This function updates global variable
+   ANY_COMPLICATED_PHI if PHI is complicated.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi)
@@ -656,17 +659,10 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, 
gphi *phi)
   print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
 }
 
-  if (bb != loop->header)
-{
-  if (gimple_phi_num_args (phi) > 2
- && !aggressive_if_conv
- && !phi_convertible_by_degenerating_args (phi))
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Phi can't be predicated by single cond.\n");
- return false;
-}
-}
+  if (bb != loop->header
+  && gimple_phi_num_args (phi) > 2
+  && !phi_convertible_by_degenerating_args (phi))
+any_complicated_phi = true;
 
   return true;
 }
@@ -1012,8 +1008,6 @@ has_pred_critical_p (basic_block bb)
- it is after the exit block but before the latch,
- its edges are not normal.
 
-   Last restriction is valid if aggressive_if_conv is false.
-
EXIT_BB is the basic block containing the exit of the LOOP.  BB is
inside LOOP.  */
 
@@ -1062,19 +1056,6 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, 
basic_block exit_bb)
return false;
   }
 
-  /* At least one

Re: [PATCH][AArch64] Simplify ashl3 expander for SHORT modes

2016-04-27 Thread Evandro Menezes


On 04/27/16 09:10, Kyrill Tkachov wrote:

2016-04-27 Kyrylo Tkachov  

* config/aarch64/aarch64.md (ashl3, SHORT modes):
Use const_int_operand for operand 2 predicate.  Simplify expand code
as a result.


LGTM

--
Evandro Menezes

Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-27 Thread Bernd Edlinger



Am 27.04.2016 um 17:37 schrieb Rainer Orth:
> Bernd Edlinger  writes:
>
>> On 26.04.2016 22:14, Joseph Myers wrote:
>>> On Tue, 26 Apr 2016, Bernd Edlinger wrote:
>>>
 Hi,

 as we all know, it's high time now to adjust the minimum supported
 gmp/mpfr/mpc versions for gcc-7.
>>>
>>> I think updating the minimum versions (when using previously built
>>> libraries, not in-tree) is only appropriate when it allows some cleanup in
>>> GCC, such as removing conditionals on whether a more recently added
>>> function is available, adding functionality that depends on a newer
>>> interface, or using newer interfaces instead of older ones that are now
>>> deprecated.
>>>
>>> For example, you could justify a move to requiring MPFR 3.0.0 or later
>>> with cleanups to use MPFR_RND* instead of the older GMP_RND*, and
>>> similarly mpfr_rnd_t instead of the older mp_rnd_t and likewise mpfr_exp_t
>>> and mpfr_prec_t in fortran/.  You could justify a move to requiring MPC
>>> 1.0.0 (or 1.0.2) by optimizing clog10 using mpc_log10.  I don't know what
>>> if any newer GMP interfaces would be beneficial in GCC.  And as always in
>>> such cases, it's a good idea to look at e.g. how widespread the newer
>>> versions are in GNU/Linux distributions, which indicates how many people
>>> might be affected by an increase in the version requirement.
>>>
>>
>> Yes I see.
>>
>> I would justify it this way: gmp-6.0.0 is the first version that does
>> not invoke undefined behavior in gmp.h, once we update to gmp-6.0.0
>> we could emit at least a warning in cstddef for this invalid code.
>>
>> Once we have gmp-6.0.0, the earliest mpfr version that compiles at all
>> is mpfr-3.1.1 and the earliest mpc version that compiles at all is
>> mpc-0.9.  This would be the supported installed versions.
>>
>> In-tree gmp-6.0.0 does _not_ work for ARM.  But gmp-6.1.0 does (with a
>> little quirk).  All supported mpfr and mpc versions are working in-tree
>> too, even for the ARM target.
>>
>> When we have at least mpfr-3.1.1, it is straight forward to remove the
>> pre-3.1.0 compatibility code from gcc/fortran/simplify.c for instance.
>>
>> So I would propose this updated patch for gcc-7.
>
> would this version combo (gmp 6.0.0, mpfr 3.1.1, mpc 0.9) also work on
> the active release branches (gcc-5 and gcc-6, gcc-4.9 is on it's way
> out)?  Having to install two different sets of the libraries for trunk
> and branch work would be extremely tedious.
>
>   Rainer
>

Yes, when they are pre-installed there should be no problem.
Also newer versions than these seem to work.

In-tree only the versions that download_prerequisite picks are
tested and guaranteed to work.

Re: [AArch64] Emit square root using the Newton series

2016-04-27 Thread Evandro Menezes


On 04/27/16 09:23, James Greenhalgh wrote:

On Tue, Apr 12, 2016 at 01:14:51PM -0500, Evandro Menezes wrote:

On 04/05/16 17:30, Evandro Menezes wrote:

On 04/05/16 13:37, Wilco Dijkstra wrote:

I can't get any of these to work... Not only do I get a large
number of collisions and duplicated
code between these patches, when I try to resolve them, all I
get is crashes whenever I try
to use sqrt (even rsqrt stopped working). Do you have a patchset
that applies cleanly so I can
try all approximation routines?

The original patches should be independent of each other, so
indeed they duplicate code.

This patch suite should be suitable for testing.

Take look at other patch sets posted to this list for examples of how
to make review easier.

Please send a series of emails tagged:

[Patch 0/3 AArch64] Add infrastructure for more approximate FP operations
[PATCH 1/3 AArch64] Add more choices for the reciprocal square root 
approximation
[PATCH 2/3 AArch64] Emit square root using the Newton series
[PATCH 3/3 AArch64] Emit division using the Newton series

One patch per email, with the dependencies explicit like this, is
infinitely easier to follow than the current structure of your patch set.

I'm not trying to be pedantic for the sake of it, I'm genuinely unsure where
the latest patch versions currently are and how I should apply them to a
clean tree for review.


I can certainly create such a series, but the patch above should be 
suitable for testing.


Thank you,

--
Evandro Menezes

Re: [AArch64] Emit division using the Newton series

2016-04-27 Thread Evandro Menezes


On 04/27/16 09:15, James Greenhalgh wrote:
So this is off for all cores currently supported by GCC? I'm not sure 
I understand why we should take this if it will immediately be dead code? 


Excuse me?  Not only are other target maintainers free to evaluate if 
this code is useful to them, but so are users to use it through the 
command line option -mlow-precision-div.


--
Evandro Menezes

Re: [PATCH 1/4] Add gcc-auto-profile script

2016-04-27 Thread Bernd Schmidt


On 03/28/2016 06:44 AM, Andi Kleen wrote:

This patch adds a new gcc-auto-profile script that figures out the
correct event and runs perf. The script is installed with on Linux systems.


That sounds useful, and I think we'll want to accept this.


So Linux just hardcodes installing the script, but it may fail at runtime.


For this reason it would probably be best to retain the documentation 
for the old method alongside the new one.



+
+baseurl = "https://download.01.org/perfmon;


Slightly scary to see a random unknown download URL. Apparently it's an 
Intel thing? Is this referenced somewhere on an intel.com web page?



  E.g.
  @smallexample
  create_gcov --binary=your_program.unstripped --profile=perf.data \
---gcov=profile.afdo
+--gcov=profile.afdo -gcov_version 1
  @end smallexample
  @end table


Why this change? What does it do?

Why isn't the new script in contrib? Does it have to be in gcc to be 
installed? As a target-specific thing it probably needs to live at least 
inside config/.


Please review the patch yourself for proper sentences everywhere.


Bernd

Re: C, C++: New warning for memset without multiply by elt size

2016-04-27 Thread Martin Sebor


On 04/27/2016 03:55 AM, Bernd Schmidt wrote:

On 04/26/2016 11:23 PM, Martin Sebor wrote:

The documentation for the new option implies that it should warn
for calls to memset where the third argument contains the number
of elements not multiplied by the element size.  But in my (quick)
testing it only warns when the argument is a constant equal to
the number of elements and less than the size of the array.  For
example, neither of the following is diagnosed:

 int a [4];
 __builtin_memset (a, 0, 2 + 2);
 __builtin_memset (a, 0, 4 * 1);
 __builtin_memset (a, 0, 3);
 __builtin_memset (a, 0, 4 * sizeof a);

If it's possible and not too difficult, it would be nice if
the detection logic could be made a bit smarter to also diagnose
these less trivial cases (and matched the documented behavior).


I've thought about some of these cases. The problem is there are
legitimate cases of calling memset for only part of an array. I wanted
to start with something that is unlikely to give false positives.

A multiplication by the wrong sizeof would be a nice thing to spot.
Would you like to work on followup patches? I probably won't get to it
in a while.


Yes, I think enhancing this warning would be in line with
the _FORTIFY_SOURCE improvements I'm starting to look into now.
I agree that minimizing false positives is important.  I'm not
sure there is complete consensus on exactly what qualifies as
a false positive, but that's probably a discussion we can have
once we have a patch and some tests to look at).

Martin

Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-27 Thread Rainer Orth

Bernd Edlinger  writes:

> On 26.04.2016 22:14, Joseph Myers wrote:
>> On Tue, 26 Apr 2016, Bernd Edlinger wrote:
>>
>>> Hi,
>>>
>>> as we all know, it's high time now to adjust the minimum supported
>>> gmp/mpfr/mpc versions for gcc-7.
>>
>> I think updating the minimum versions (when using previously built
>> libraries, not in-tree) is only appropriate when it allows some cleanup in
>> GCC, such as removing conditionals on whether a more recently added
>> function is available, adding functionality that depends on a newer
>> interface, or using newer interfaces instead of older ones that are now
>> deprecated.
>>
>> For example, you could justify a move to requiring MPFR 3.0.0 or later
>> with cleanups to use MPFR_RND* instead of the older GMP_RND*, and
>> similarly mpfr_rnd_t instead of the older mp_rnd_t and likewise mpfr_exp_t
>> and mpfr_prec_t in fortran/.  You could justify a move to requiring MPC
>> 1.0.0 (or 1.0.2) by optimizing clog10 using mpc_log10.  I don't know what
>> if any newer GMP interfaces would be beneficial in GCC.  And as always in
>> such cases, it's a good idea to look at e.g. how widespread the newer
>> versions are in GNU/Linux distributions, which indicates how many people
>> might be affected by an increase in the version requirement.
>>
>
> Yes I see.
>
> I would justify it this way: gmp-6.0.0 is the first version that does
> not invoke undefined behavior in gmp.h, once we update to gmp-6.0.0
> we could emit at least a warning in cstddef for this invalid code.
>
> Once we have gmp-6.0.0, the earliest mpfr version that compiles at all
> is mpfr-3.1.1 and the earliest mpc version that compiles at all is
> mpc-0.9.  This would be the supported installed versions.
>
> In-tree gmp-6.0.0 does _not_ work for ARM.  But gmp-6.1.0 does (with a
> little quirk).  All supported mpfr and mpc versions are working in-tree
> too, even for the ARM target.
>
> When we have at least mpfr-3.1.1, it is straight forward to remove the
> pre-3.1.0 compatibility code from gcc/fortran/simplify.c for instance.
>
> So I would propose this updated patch for gcc-7.

would this version combo (gmp 6.0.0, mpfr 3.1.1, mpc 0.9) also work on
the active release branches (gcc-5 and gcc-6, gcc-4.9 is on it's way
out)?  Having to install two different sets of the libraries for trunk
and branch work would be extremely tedious.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 4/4] Add make autoprofiledbootstrap

2016-04-27 Thread Bernd Schmidt


On 03/28/2016 06:44 AM, Andi Kleen wrote:

From: Andi Kleen 

Add support for profiledbootstrap with autofdo. Will be useful
to get better testing coverage of autofdo.


Is this the only purpose? I'll admit this is the patch I like least out 
of the series.



The autofdo'ed compiler is ~7% faster on insn-recog.i (vs ~11% for
profiledfeedback), and ~4% faster for tramp3d-v4 (vs 10% for
profiledfeedback)


So it seems like we get worse results than with a feature we already 
have, so I don't quite see the value.



+AUTO_PROFILE = gcc-auto-profile -c 100


Shouldn't this be from $(srcdir) somewhere?


+# get from configure?
+CREATE_GCOV = create_gcov


Probably.

Please remove diffs against autogenerated files when submitting patches.


+ifeq ($(shell cat ../stage_current),stageautofeedback)
+$(CXX_AND_OBJCXX_OBJS): CFLAGS += -fauto-profile=cc1plus.fda
+$(CXX_AND_OBJCXX_OBJS): cc1plus.fda
+endif



+cc1.fda: ../stage1-gcc/cc1$(exeext) ../prev-gcc/$(PERF_DATA)
+   $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov cc1.fda 
-profile ../prev-gcc/$(PERF_DATA) -gcov_version 1
+


These Makefile  bits all looks somewhat hackish to me. I'll defer to 
build system maintainers if they like it better.



Bernd

Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-27 Thread Bernd Edlinger

On 26.04.2016 22:14, Joseph Myers wrote:
> On Tue, 26 Apr 2016, Bernd Edlinger wrote:
>
>> Hi,
>>
>> as we all know, it's high time now to adjust the minimum supported
>> gmp/mpfr/mpc versions for gcc-7.
>
> I think updating the minimum versions (when using previously built
> libraries, not in-tree) is only appropriate when it allows some cleanup in
> GCC, such as removing conditionals on whether a more recently added
> function is available, adding functionality that depends on a newer
> interface, or using newer interfaces instead of older ones that are now
> deprecated.
>
> For example, you could justify a move to requiring MPFR 3.0.0 or later
> with cleanups to use MPFR_RND* instead of the older GMP_RND*, and
> similarly mpfr_rnd_t instead of the older mp_rnd_t and likewise mpfr_exp_t
> and mpfr_prec_t in fortran/.  You could justify a move to requiring MPC
> 1.0.0 (or 1.0.2) by optimizing clog10 using mpc_log10.  I don't know what
> if any newer GMP interfaces would be beneficial in GCC.  And as always in
> such cases, it's a good idea to look at e.g. how widespread the newer
> versions are in GNU/Linux distributions, which indicates how many people
> might be affected by an increase in the version requirement.
>

Yes I see.

I would justify it this way: gmp-6.0.0 is the first version that does
not invoke undefined behavior in gmp.h, once we update to gmp-6.0.0
we could emit at least a warning in cstddef for this invalid code.

Once we have gmp-6.0.0, the earliest mpfr version that compiles at all
is mpfr-3.1.1 and the earliest mpc version that compiles at all is
mpc-0.9.  This would be the supported installed versions.

In-tree gmp-6.0.0 does _not_ work for ARM.  But gmp-6.1.0 does (with a
little quirk).  All supported mpfr and mpc versions are working in-tree
too, even for the ARM target.

When we have at least mpfr-3.1.1, it is straight forward to remove the
pre-3.1.0 compatibility code from gcc/fortran/simplify.c for instance.

So I would propose this updated patch for gcc-7.


Thanks
Bernd.
2016-04-26  Bernd Edlinger  

* configure.ac (mpfr): Remove pre-3.1.0 mpfr compatibility code.
Adjust check to new minimum gmp, mpfr and mpc versions.
* configure: Regenerated.
* Makefile.def (gmp): Explicitly disable assembler.
(mpfr): Adjust lib_path.
(mpc): Likewise.
* Makefile.in: Regenerated.

gcc/
2016-04-26  Bernd Edlinger  

* doc/install.texi: Adjust gmp/mpfr/mpc minimum versions.

gcc/fortran/
2016-04-26  Bernd Edlinger  

* simplify.c (gfc_simplify_fraction): Remove pre-3.1.0 mpfr
compatibility code.

contrib/
2016-04-26  Bernd Edlinger  

* download_prerequisites: Adjust gmp/mpfr/mpc versions.
Index: Makefile.def
===
--- Makefile.def	(Revision 235487)
+++ Makefile.def	(Arbeitskopie)
@@ -50,6 +50,7 @@ host_modules= { module= gcc; bootstrap=true;
 host_modules= { module= gmp; lib_path=.libs; bootstrap=true;
 		// Work around in-tree gmp configure bug with missing flex.
 		extra_configure_flags='--disable-shared LEX="touch lex.yy.c"';
+		extra_make_flags='AM_CFLAGS="-DNO_ASM"';
 		no_install= true;
 		// none-*-* disables asm optimizations, bootstrap-testing
 		// the compiler more thoroughly.
@@ -57,11 +58,11 @@ host_modules= { module= gmp; lib_path=.libs; boots
 		// gmp's configure will complain if given anything
 		// different from host for target.
 	target="none-${host_vendor}-${host_os}"; };
-host_modules= { module= mpfr; lib_path=.libs; bootstrap=true;
+host_modules= { module= mpfr; lib_path=src/.libs; bootstrap=true;
 		extra_configure_flags='--disable-shared @extra_mpfr_configure_flags@';
 		extra_make_flags='AM_CFLAGS="-DNO_ASM"';
 		no_install= true; };
-host_modules= { module= mpc; lib_path=.libs; bootstrap=true;
+host_modules= { module= mpc; lib_path=src/.libs; bootstrap=true;
 		extra_configure_flags='--disable-shared @extra_mpc_gmp_configure_flags@ @extra_mpc_mpfr_configure_flags@';
 		no_install= true; };
 host_modules= { module= isl; lib_path=.libs; bootstrap=true;
Index: Makefile.in
===
--- Makefile.in	(Revision 235487)
+++ Makefile.in	(Arbeitskopie)
@@ -639,12 +639,12 @@ HOST_LIB_PATH_gmp = \
 
 @if mpfr
 HOST_LIB_PATH_mpfr = \
-  $$r/$(HOST_SUBDIR)/mpfr/.libs:$$r/$(HOST_SUBDIR)/prev-mpfr/.libs:
+  $$r/$(HOST_SUBDIR)/mpfr/src/.libs:$$r/$(HOST_SUBDIR)/prev-mpfr/src/.libs:
 @endif mpfr
 
 @if mpc
 HOST_LIB_PATH_mpc = \
-  $$r/$(HOST_SUBDIR)/mpc/.libs:$$r/$(HOST_SUBDIR)/prev-mpc/.libs:
+  $$r/$(HOST_SUBDIR)/mpc/src/.libs:$$r/$(HOST_SUBDIR)/prev-mpc/src/.libs:
 @endif mpc
 
 @if isl
@@ -11300,7 +11300,7 @@ all-gmp: configure-gmp
 	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
 	$(HOST_EXPORTS)  \
 	(cd $(HOST_SUBDIR)/gmp && \
-	  $(MAKE) $(BASE_FLAGS_TO_PASS)

Re: Fix some i386 testcases for -frename-registers

2016-04-27 Thread Bernd Schmidt


On 04/27/2016 05:16 PM, H.J. Lu wrote:


This works for -m32, -mx32 and -m64.  OK for trunk?


Yes, thanks.


Bernd

Re: [PATCH 3/4] Run profile feedback tests with autofdo

2016-04-27 Thread Bernd Schmidt

On 03/28/2016 06:44 AM, Andi Kleen wrote:

From: Andi Kleen 

Extend the existing bprob and tree-prof tests to also run with autofdo.
The test runtimes are really a bit too short for autofdo, but it's
a reasonable sanity check.

This only works natively for now.

dejagnu doesn't seem to support a wrapper for unix tests, so I had
to open code running these tests.  That should be ok due to the
native run restrictions.

Ideally this would be reviewed by someone who knows tcl (and autofdo) a 
little better. Some observations.

+set profile_wrapper [profopt-perf-wrapper]
+set profile_options "-g"
+set feedback_options "-fauto-profile"
+set run_autofdo 1
+
+foreach profile_option $profile_options feedback_option $feedback_options {
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/bprob-*.c]] {
+if ![runtest_file_p $runtests $src] then {
+continue
+}
+   set base [file tail $srco
+profopt-execute $src
+}
+}
+
+set run_autofdo ""
+set profile_wrapper ""

This block appears duplicated across several files. Is there a way to 
unify that?

> +  if { $run_autofdo == 1 } {
> +  # unix_load does not support wrappers in $PATH, so implement
> +  # it manually here

Please write full sentences with proper capitalization and punctuation. 
This occurs across several of these patches, I'll only mention it here.

@@ -313,6 +320,7 @@ proc profopt-execute { src } {
# valid, by running it after dg-additional-files-options.
foreach ext $prof_ext {
profopt-target-cleanup $tmpdir $base $ext
+   profopt-target-cleanup $tmpdir perf data
}

We have this, and then...

@@ -400,6 +451,7 @@ proc profopt-execute { src } {
foreach ext $prof_ext {
profopt-target-cleanup $tmpdir $base $ext
}
+   # XXX remove perf.data

... this - does that need to look the same as the above?

+   # Should check if create_gcov exists

So maybe do that?

Bernd

New Brazilian Portuguese PO file for 'cpplib' (version 6.1.0)

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Brazilian Portuguese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/pt_BR.po

(This file, 'cpplib-6.1.0.pt_BR.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Contents of PO file 'cpplib-6.1.0.pt_BR.po'

2016-04-27 Thread Translation Project Robot



cpplib-6.1.0.pt_BR.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

[PATCH] add -fprolog-pad=N option to c-family

2016-04-27 Thread Torsten Duwe

Hi Maxim,

thanks for starting the work on this; I have added the missing
command line option. It builds now and the resulting compiler generates
a linux kernel with the desired properties, so work can continue there.

Torsten

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 9bc02fc..57265c5 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -393,6 +393,7 @@ static tree handle_designated_init_attribute (tree *, tree, 
tree, int, bool *);
 static tree handle_bnd_variable_size_attribute (tree *, tree, tree, int, bool 
*);
 static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
 static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
+static tree handle_prolog_pad_attribute (tree *, tree, tree, int, bool *);
 
 static void check_function_nonnull (tree, int, tree *);
 static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT);
@@ -833,6 +834,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_bnd_legacy, false },
   { "bnd_instrument", 0, 0, true, false, false,
  handle_bnd_instrument, false },
+  { "prolog_pad",1, 1, false, true, true,
+ handle_prolog_pad_attribute, false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
@@ -9663,6 +9666,16 @@ handle_designated_init_attribute (tree *node, tree name, 
tree, int,
   return NULL_TREE;
 }
 
+static tree
+handle_prolog_pad_attribute (tree *, tree name, tree, int,
+bool *)
+{
+  warning (OPT_Wattributes,
+  "%qE attribute is used", name);
+
+  return NULL_TREE;
+}
+
 
 /* Check for valid arguments being passed to a function with FNTYPE.
There are NARGS arguments in the array ARGARRAY.  */
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 9ae181f..31a8026 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -532,6 +532,10 @@ c_common_handle_option (size_t scode, const char *arg, int 
value,
   cpp_opts->ext_numeric_literals = value;
   break;
 
+case OPT_fprolog_pad_:
+  prolog_nop_pad_size = value;
+  break;
+
 case OPT_idirafter:
   add_path (xstrdup (arg), AFTER, 0, true);
   break;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index aafd802..929ebb6 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1407,6 +1407,10 @@ fpreprocessed
 C ObjC C++ ObjC++
 Treat the input file as already preprocessed.
 
+fprolog-pad=
+C ObjC C++ ObjC++ RejectNegative Joined UInteger Var(prolog_nop_pad_size) 
Init(0)
+Pad NOPs before each function prolog
+
 ftrack-macro-expansion
 C ObjC C++ ObjC++ JoinedOrMissing RejectNegative UInteger
 ; converted into ftrack-macro-expansion=
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 1ce7181..9d10b10 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4553,6 +4553,10 @@ will select the smallest suitable mode.
 This section describes the macros that output function entry
 (@dfn{prologue}) and exit (@dfn{epilogue}) code.
 
+@deftypefn {Target Hook} void TARGET_ASM_PRINT_PROLOG_PAD (FILE *@var{file}, 
unsigned HOST_WIDE_INT @var{pad_size}, bool @var{record_p})
+Generate prologue pad
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_ASM_FUNCTION_PROLOGUE (FILE *@var{file}, 
HOST_WIDE_INT @var{size})
 If defined, a function that outputs the assembler code for entry to a
 function.  The prologue is responsible for setting up the stack frame,
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index a0a0a81..bda6d5c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3662,6 +3662,8 @@ will select the smallest suitable mode.
 This section describes the macros that output function entry
 (@dfn{prologue}) and exit (@dfn{epilogue}) code.
 
+@hook TARGET_ASM_PRINT_PROLOG_PAD
+
 @hook TARGET_ASM_FUNCTION_PROLOGUE
 
 @hook TARGET_ASM_FUNCTION_END_PROLOGUE
diff --git a/gcc/final.c b/gcc/final.c
index 1edc446..e0cff80 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -1753,6 +1753,7 @@ void
 final_start_function (rtx_insn *first, FILE *file,
  int optimize_p ATTRIBUTE_UNUSED)
 {
+  unsigned HOST_WIDE_INT pad_size = prolog_nop_pad_size;
   block_depth = 0;
 
   this_is_asm_operands = 0;
@@ -1765,6 +1766,21 @@ final_start_function (rtx_insn *first, FILE *file,
 
   high_block_linenum = high_function_linenum = last_linenum;
 
+  tree prolog_pad_attr
+= lookup_attribute ("prolog_pad", TYPE_ATTRIBUTES (TREE_TYPE 
(current_function_decl)));
+  if (prolog_pad_attr)
+{
+  tree prolog_pad_value = TREE_VALUE (TREE_VALUE (prolog_pad_attr));
+
+  if (tree_fits_uhwi_p (prolog_pad_value))
+   pad_size = tree_to_uhwi (prolog_pad_value);
+  else
+   gcc_unreachable ();
+
+}
+  if (pad_size > 0)
+targetm.asm_out.print_prolog_pad (file, pad_size, true);
+
   if (flag_sanitize & SANITIZE_ADDRESS)
 asan_function_start ();
 
diff --git a/gcc/target.def

Re: [AArch64] Emit division using the Newton series

2016-04-27 Thread Wilco Dijkstra

James Greenhalgh wrote:
> So this is off for all cores currently supported by GCC?
> 
> I'm not sure I understand why we should take this if it will immediately
> be dead code?

I presume it was meant to have the vector variants enabled with -mcpu=exynos-m1
as that is where you can get a good gain if you only have a single divide+sqrt 
unit.
The same applies to the sqrt case too, and I guess -mcpu=xgene-1.

Wilco

Re: Fix some i386 testcases for -frename-registers

2016-04-27 Thread H.J. Lu

On Wed, Apr 27, 2016 at 2:09 AM, Bernd Schmidt  wrote:
> On 04/27/2016 02:10 AM, H.J. Lu wrote:
>>
>> On Tue, Apr 26, 2016 at 3:11 PM, Bernd Schmidt 
>> wrote:
>>>
>>> On 04/26/2016 09:39 PM, H.J. Lu wrote:


 make check-gcc RUNTESTFLAGS="--target_board='unix{-mx32}'
 i386.exp=avx512vl-vmovdqa64-1.c"
>>>
>>>
>>>
>>> Unfortunately, that doesn't work:
>>>
>>> /usr/include/gnu/stubs.h:13:28: fatal error: gnu/stubs-x32.h: No such
>>> file
>>> or directory
>>> compilation terminated.
>>>
>>> Trying to follow the recipe to get am x32 glibc built fails with the same
>>> error when trying to build an x32 libgcc. I think I'll need you to send
>>> me
>>> before/after assembly files (I'm assuming it's -frename-registers which
>>> makes the test fail on x32).
>>>
>>
>> Here are avx512vl-vmovdqa64-1.i, old.s and new.s.
>
>
> Still somewhat at a loss. Is it trying to match a register name and close
> paren with the '.{5}'? What if you replace that with something like
> '%[re][0-9a-z]*//)'? Or maybe '.{5,6}'?
>

This works for -m32, -mx32 and -m64.  OK for trunk?

Thanks.

-- 
H.J.
	* gcc.target/i386/avx512vl-vmovdqa64-1.c: Replace ".{5}" with
	".{5,6}".

diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
index 6930f79..14fe4b8 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
@@ -10,7 +10,7 @@
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*\\)\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\nxy\]*\\(.{5}(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
+/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\nxy\]*\\(.{5,6}(?:\n|\[ \\t\]+#)" 1 { target nonpic } } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\nxy\]*\\((?:\n|\[ \\t\]+#)" 1 { xfail *-*-* } } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*\\)\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */

Re: [PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand

2016-04-27 Thread James Greenhalgh

On Fri, Apr 22, 2016 at 02:24:49PM +, Wilco Dijkstra wrote:
> Some patterns are using '%w2' for immediate operands, which means that a zero
> immediate is actually emitted as 'wzr' or 'xzr'. This not only changes an
> immediate operand into a register operand but may emit illegal instructions
> from legal RTL (eg. ORR x0, SP, xzr rather than ORR x0, SP, 0).
> 
> Remove the fallthrough in aarch64_print_operand from the 'w' and 'x' case
> into the '0' case that created this issue. Modify a few patterns to use '%2'
> rather than '%w2' for an immediate or memory operand so they now print
> correctly without the fallthrough.
> 
> OK for trunk?
> 
> (note this requires https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01265.html 
> to
> be committed first)

If you've got dependencies like this, formatting the mails as a patch set
makes review easier.  e.g.:

  [PATCH 1/2 AArch64] Fix shift attributes
  [PATCH 2/2 AArch64] print_operand should not fallthrough from register
operand into generic operand

My biggest concern about this patch is that it might break code that is in
the wild. Looks to me like this breaks (at least)
arch/arm64/asm/percpu.h in the kernel.

So the part of this patch removing the fallthrough to general operand
is not OK for trunk. 

The other parts look reasonable to me, please resubmit just those.

> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 881dc52e2de03231abb217a9ce22cbb1cc44bc6c..bcef50825c8315c39e29dbe57c387ea2a4fe445d
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4608,7 +4608,8 @@ aarch64_print_operand (FILE *f, rtx x, int code)
> break;
>   }
>  
> -  /* Fall through */
> +  output_operand_lossage ("invalid operand for '%%%c'", code);
> +  return;
>  
>  case 0:
>/* Print a normal operand, if it's a general register, then we

To be clear, this is the hunk that is not OK.

Thanks,
James

Re: [PATCH] DWARF: turn dw_loc_descr_node field into hash map for frame offset check

2016-04-27 Thread Pierre-Marie de Rodat


On 04/27/2016 01:31 PM, Richard Biener wrote:

Ok.

Thanks,
Richard.


Thank you for the very quick feedback! I just commited the change.

--
Pierre-Marie de Rodat

Re: moxie-rtems patch for libgcc/config.host

2016-04-27 Thread Jeff Law


On 04/18/2016 03:43 PM, Joel Sherrill wrote:

Hi

For some unknown reason, moxie-rtems has its own stanza
in libgcc/config.host which does not include extra_parts.
This results in C++ RTEMS applications not linking.

Also the tmake_file variable is overridden by the
shared moxie stanza rather than being added to.

This patch addresses both issues. This patch (or some
minor variant) needs to be applied to every branch from
4.9 to master.

Comments?


2015-04-18  Joel Sherrill 

* config.host (moxie-*-rtems*): Merge this stanza with
other moxie targets so the same extra_parts are built.
Also have tmake_file add on to its value rather than override.


OK for the trunk and branches.
jeff

Re: [PATCH GCC]Refactor IVOPT.

2016-04-27 Thread Bin.Cheng

On Fri, Apr 22, 2016 at 8:20 AM, Richard Biener
 wrote:
> On Thu, Apr 21, 2016 at 7:26 PM, Bin Cheng  wrote:
>> Hi,
>> This patch refactors IVOPT in three major aspects:
>> Firstly it rewrites iv_use groups.  Use group is originally introduced only 
>> for address type uses, this patch makes it general to all (generic, compare, 
>> address) types.  Currently generic/compare groups contain only one iv_use, 
>> and compare groups can be extended to contain multiple uses.  As far as 
>> generic use is concerned, it won't contain multiple uses because IVOPT 
>> reuses one iv_use structure for generic uses at different places already.  
>> This change also cleanups algorithms as well as data structures.
>> Secondly it implements group data structure in vector rather than in list as 
>> originally.  List was used because it's easy to split.  Of course list is 
>> hard to sort (For example, we use quadratic insertion sort now).  This 
>> problem will become more critical since I plan to introduce fine-control 
>> over splitting small address groups by checking if target supports 
>> load/store pair instructions or not.  In this case address group needs to be 
>> sorted more than once and against complex conditions, for example, memory 
>> loads in one basic block should be sorted together in offset ascending 
>> order.  With vector group, sorting can be done very efficiently with quick 
>> sort.
>> Thirdly this patch cleanups/reformats IVOPT's dump information.  I think the 
>> information is easier to read/test now.  Since change of dump information is 
>> entangled with group data-structure change, it's hard to make it a 
>> standalone patch.  Given this part patch is quite straightforward, I hope it 
>> won't be confusing.
>>
>> Bootstrap and test on x86_64 and AArch64, no regressions.  I also checked 
>> generated assembly for spec2k and spec2k6 on both platforms, turns out 
>> output assembly is almost not changed except for several files.  After 
>> further investigation, I can confirm the difference is cause by small change 
>> when sorting groups. Given the original sorting condition as below:
>> -  /* Sub use list is maintained in offset ascending order.  */
>> -  if (addr_offset <= group->addr_offset)
>> -{
>> -  use->related_cands = group->related_cands;
>> -  group->related_cands = NULL;
>> -  use->next = group;
>> -  data->iv_uses[id_group] = use;
>> -}
>> iv_uses with same addr_offset are sorted in reverse control flow order.  
>> This might be a typo since I don't remember any specific reason for it.  If 
>> this patch sorts groups in the same way, there will be no difference in 
>> generated assembly at all.  So this patch is a pure refactoring work which 
>> doesn't have any functional change.
>>
>> Any comments?
>
> Looks good to me.

Hi
Attachment is what I applied as r235513., picking up two new tests
that need to be revised.  Also applied Martin's patch on top of it as
r2355134.

Thanks,
bin

2016-04-27  Martin Liska  

* tree-ssa-loop-ivopts.c (iv_ca_dump): Fix level of indentation.
(free_loop_data): Release vuses of groups.

>
> Richard.
>
>> Thanks,
>> bin
>>
>> 2016-04-19  Bin Cheng  
>>
>> * tree-ssa-loop-ivopts.c (struct iv): Use pointer to struct iv_use
>> instead of redundant use_id and boolean have_use_for.
>> (struct iv_use): Change sub_id into group_id.  Remove field next.
>> Move fields: related_cands, n_map_members, cost_map and selected
>> to ...
>> (struct iv_group): ... here.  New structure.
>> (struct iv_common_cand): Use structure declaration directly.
>> (struct ivopts_data, iv_ca, iv_ca_delta): Rename fields.
>> (MAX_CONSIDERED_USES): Rename macro to ...
>> (MAX_CONSIDERED_GROUPS): ... here.
>> (n_iv_uses, iv_use, n_iv_cands, iv_cand): Delete.
>> (dump_iv, dump_use, dump_cand): Refactor format of dump information.
>> (dump_uses): Rename to ...
>> (dump_groups): ... here.  Update all uses.
>> (tree_ssa_iv_optimize_init, alloc_iv): Update all uses.
>> (find_induction_variables): Refactor format of dump information.
>> (record_sub_use): Delete.
>> (record_use): Update all uses.
>> (record_group): New function.
>> (record_group_use, find_interesting_uses_op): Call above functions.
>> Update all uses.
>> (find_interesting_uses_cond): Ditto.
>> (group_compare_offset): New function.
>> (split_all_small_groups): Rename to ...
>> (split_small_address_groups_p): ... here.  Update all uses.
>> (split_address_groups):  Update all uses.
>> (find_interesting_uses): Refactor format of dump information.
>> (add_candidate_1): Update all uses.  Remove redundant check on iv,
>> base and step.
>> (add_candidate, record_common_cand): Remove

Re: [PATCH][AArch64] Fix shift attributes

2016-04-27 Thread James Greenhalgh

On Fri, Apr 22, 2016 at 02:11:52PM +, Wilco Dijkstra wrote:
> This patch fixes the attributes of integer immediate shifts which were
> incorrectly modelled as register controlled shifts. Also change EXTR
> attribute to being a rotate.
> 
> OK for trunk?

OK. Thanks for the fix.

Thanks,
James

> 
> ChangeLog:
> 2016-04-22  Wilco Dijkstra  
> 
>   * gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3):
>   Split integer shifts into shift_reg and bfm.
>   (aarch64_lshr_sisd_or_int_3): Likewise.
>   (aarch64_ashr_sisd_or_int_3): Likewise.
>   (ror3_insn): Likewise.
>   (si3_insn_uxtw): Likewise.
>   (3_insn): Change to rotate_imm.
>   (extr5_insn_alt): Likewise.
>   (extrsi5_insn_uxtw): Likewise.
>   (extrsi5_insn_uxtw_alt): Likewise.

Re: [PATCH][RFC] Gimplify "into SSA"

2016-04-27 Thread Jeff Law


On 04/21/2016 06:55 AM, Richard Biener wrote:


The following patch makes us not allocate decls but SSA names for
temporaries required during gimplification.  This is basically the
same thing as we do when calling the gimplifier on GENERIC expressions
from optimization passes (when we are already in SSA).

There are two benefits of doing this.

1) SSA names are smaller (72 bytes) than VAR_DECLs (144 bytes) and we
rewrite them into anonymous SSA names later anyway, leaving up the
VAR_DECLs for GC reclaim (but not their UID)

2) We keep expressions "connected" by having the use->def link via
SSA_NAME_DEF_STMT for example allowing match-and-simplify of
larger expressions on early GIMPLE
I like it -- I can't see any significant reason to keep the _DECL nodes 
for these temporaries.  They're not useful for end-user debugging or 
debugging GCC itself.  In fact, I would claim that these temporary _DECL 
nodes just add noise when diffing debugging dumps.


While GC would reclaim the _DECL nodes, I'm all for avoiding placing 
work on the GC system when it can be easily avoided.




Complications arise from the fact that there is no CFG built and thus
we have to make sure to not use SSA names where we'd need PHIs.  Or
when CFG build may end up separating SSA def and use in a way current
into-SSA doesn't fix up (adding of abnormal edges, save-expr placement,
gimplification of type sizes, etc.).

:(



As-is the patch has the downside of effectively disabling the
lookup_tmp_var () CSE for register typed temporaries and not
preserving the "fancy" names we derive from val in
create_tmp_from_val (that can be recovered easily though if
deemed worthwhile).

I don't think it's worthwhile.

ISTM this will affect something like the gimple front-end project which 
would need to see the anonymous name and map it back to a suitable type, 
but I don't think that should stop this from moving forward.


Jeff

Re: [PATCH, ARM] Fix gcc.c-torture/execute/loop-2b.c execution failure on cortex-m0

2016-04-27 Thread Ramana Radhakrishnan

>
> Ping? Note that the patch has been on GCC 6 for more than 3 months now without
> any issue reported against it.

OK.

Ramana

>
> Best regards,
>
> Thomas

Re: [PATCH][AArch64] Improve aarch64_modes_tieable_p

2016-04-27 Thread James Greenhalgh

On Fri, Apr 22, 2016 at 01:22:51PM +, Wilco Dijkstra wrote:
> Improve modes_tieable by returning true in more cases: allow scalar access
> within vectors without requiring an extra move. Removing these moves helps
> the register allocator in deciding whether to use integer or FP registers on
> operations that can be done on both. This saves about 100 instructions in the
> gcc.target/aarch64 tests.
>

[snip]

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index abc864c..6e921f0 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -12294,7 +12294,14 @@ aarch64_reverse_mask (enum machine_mode mode)
>return force_reg (V16QImode, mask);
>  }
>  
> -/* Implement MODES_TIEABLE_P.  */
> +/* Implement MODES_TIEABLE_P.  In principle we should always return true.
> +   However due to issues with register allocation it is preferable to avoid
> +   tieing integer scalar and FP scalar modes.  Executing integer operations
> +   in general registers is better than treating them as scalar vector
> +   operations.  This reduces latency and avoids redundant int<->FP moves.
> +   So tie modes if they are either the same class, or vector modes with
> +   other vector modes, vector structs or any scalar mode.
> +*/

*/ shouldn't be on the newline, just "[...] scalar mode.  */"

It would be handy if you could raise something in bugzilla for the
register allocator deficiency.

>  bool
>  aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> @@ -12305,9 +12312,12 @@ aarch64_modes_tieable_p (machine_mode mode1, 
> machine_mode mode2)
>/* We specifically want to allow elements of "structure" modes to
>   be tieable to the structure.  This more general condition allows
>   other rarer situations too.  */
> -  if (TARGET_SIMD
> -  && aarch64_vector_mode_p (mode1)
> -  && aarch64_vector_mode_p (mode2))
> +  if (aarch64_vector_mode_p (mode1) && aarch64_vector_mode_p (mode2))
> +return true;

This relaxes the TARGET_SIMD check that would have prevented
OImode/CImode/XImode ties when !TARGET_SIMD. What's the reasoning
behind that?

> +  /* Also allow any scalar modes with vectors.  */
> +  if (aarch64_vector_mode_supported_p (mode1)
> +  || aarch64_vector_mode_supported_p (mode2))
>  return true;

Does this always hold? It seems like you might need to be more restrictive
with what we allow to avoid ties with some of the more obscure modes
(V4DF etc.).

Thanks,
James

Re: C, C++: New warning for memset without multiply by elt size

2016-04-27 Thread Jeff Law


On 04/27/2016 03:55 AM, Bernd Schmidt wrote:

On 04/26/2016 11:23 PM, Martin Sebor wrote:

The documentation for the new option implies that it should warn
for calls to memset where the third argument contains the number
of elements not multiplied by the element size.  But in my (quick)
testing it only warns when the argument is a constant equal to
the number of elements and less than the size of the array.  For
example, neither of the following is diagnosed:

 int a [4];
 __builtin_memset (a, 0, 2 + 2);
 __builtin_memset (a, 0, 4 * 1);
 __builtin_memset (a, 0, 3);
 __builtin_memset (a, 0, 4 * sizeof a);

If it's possible and not too difficult, it would be nice if
the detection logic could be made a bit smarter to also diagnose
these less trivial cases (and matched the documented behavior).


I've thought about some of these cases. The problem is there are
legitimate cases of calling memset for only part of an array. I wanted
to start with something that is unlikely to give false positives.
So I wonder if what we really want is to track which bytes in the object 
are set and which are not -- utilizing both memset and standard stores 
and if the object as a whole is not initialized, then warn.


We've actually got a lot of the code that would be necessary to detect 
this in tree DSE, with more coming in this stage1 as I extend it to 
handle some missing cases.


Clearly a follow-up rather than a requirement for the current patch to 
move forward.


Jeff

Re: [PING^3] Re: [PATCH 1/4] Add gcc-auto-profile script

2016-04-27 Thread Andi Kleen

Andi Kleen  writes:

Ping^3 for the patch series!

> Andi Kleen  writes:
>
> Ping^2 for the patch series!
>
>> Andi Kleen  writes:
>>
>> Ping for the patch series!
>>
>>> From: Andi Kleen 
>>>
>>> Using autofdo is currently something difficult. It requires using the
>>> model specific branches taken event, which differs on different CPUs.
>>> The example shown in the manual requires a special patched version of
>>> perf that is non standard, and also will likely not work everywhere.
>>>
>>> This patch adds a new gcc-auto-profile script that figures out the
>>> correct event and runs perf. The script is installed with on Linux systems.
>>>
>>> Since maintaining the script would be somewhat tedious (needs changes
>>> every time a new CPU comes out) I auto generated it from the online
>>> Intel event database. The script to do that is in contrib and can be
>>> rerun.
>>>
>>> Right now there is no test if perf works in configure. This
>>> would vary depending on the build and target system, and since
>>> it currently doesn't work in virtualization and needs uptodate
>>> kernel it may often fail in common distribution build setups.
>>>
>>> So Linux just hardcodes installing the script, but it may fail at runtime.
>>>
>>> This is needed to actually make use of autofdo in a generic way
>>> in the build system and in the test suite.
>>>
>>> So far the script is not installed.
>>>
>>> gcc/:
>>> 2016-03-27  Andi Kleen  
>>>
>>> * doc/invoke.texi: Document gcc-auto-profile
>>> * gcc-auto-profile: Create.
>>>
>>> contrib/:
>>>
>>> 2016-03-27  Andi Kleen  
>>>
>>> * gen_autofdo_event.py: New file to regenerate
>>> gcc-auto-profile.
>>> ---
>>>  contrib/gen_autofdo_event.py | 155 
>>> +++
>>>  gcc/doc/invoke.texi  |  31 +++--
>>>  gcc/gcc-auto-profile |  70 +++
>>>  3 files changed, 251 insertions(+), 5 deletions(-)
>>>  create mode 100755 contrib/gen_autofdo_event.py
>>>  create mode 100755 gcc/gcc-auto-profile
>>>
>>> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
>>> new file mode 100755
>>> index 000..db4db33
>>> --- /dev/null
>>> +++ b/contrib/gen_autofdo_event.py
>>> @@ -0,0 +1,155 @@
>>> +#!/usr/bin/python
>>> +# generate Intel taken branches Linux perf event script for autofdo 
>>> profiling
>>> +
>>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>>> +#
>>> +# GCC is free software; you can redistribute it and/or modify it under
>>> +# the terms of the GNU General Public License as published by the Free
>>> +# Software Foundation; either version 3, or (at your option) any later
>>> +# version.
>>> +#
>>> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +# for more details.
>>> +#
>>> +# You should have received a copy of the GNU General Public License
>>> +# along with GCC; see the file COPYING3.  If not see
>>> +# .  */
>>> +
>>> +# run it with perf record -b -e EVENT program ...
>>> +# The Linux Kernel needs to support the PMU of the current CPU, and
>>> +# it will likely not work in VMs.
>>> +# add --all to print for all cpus, otherwise for current cpu
>>> +# add --script to generate shell script to run correct event
>>> +#
>>> +# requires internet (https) access. this may require setting up a proxy
>>> +# with export https_proxy=...
>>> +#
>>> +import urllib2
>>> +import sys
>>> +import json
>>> +import argparse
>>> +import collections
>>> +
>>> +baseurl = "https://download.01.org/perfmon;
>>> +
>>> +target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
>>> + u'BR_INST_EXEC.TAKEN',
>>> + u'BR_INST_RETIRED.TAKEN_JCC',
>>> + u'BR_INST_TYPE_RETIRED.COND_TAKEN')
>>> +
>>> +ap = argparse.ArgumentParser()
>>> +ap.add_argument('--all', '-a', help='Print for all CPUs', 
>>> action='store_true')
>>> +ap.add_argument('--script', help='Generate shell script', 
>>> action='store_true')
>>> +args = ap.parse_args()
>>> +
>>> +eventmap = collections.defaultdict(list)
>>> +
>>> +def get_cpu_str():
>>> +with open('/proc/cpuinfo', 'r') as c:
>>> +vendor, fam, model = None, None, None
>>> +for j in c:
>>> +n = j.split()
>>> +if n[0] == 'vendor_id':
>>> +vendor = n[2]
>>> +elif n[0] == 'model' and n[1] == ':':
>>> +model = int(n[2])
>>> +elif n[0] == 'cpu' and n[1] == 'family':
>>> +fam = int(n[3])
>>> +if vendor and fam and model:
>>> +return "%s-%d-%X" % (vendor, fam, model), model
>>> +return None, None
>>> +
>>> +def find_event(eventurl, model):
>>> +print >>sys.stderr,

Re: [AArch64] Emit square root using the Newton series

2016-04-27 Thread James Greenhalgh

On Tue, Apr 12, 2016 at 01:14:51PM -0500, Evandro Menezes wrote:
> On 04/05/16 17:30, Evandro Menezes wrote:
> >On 04/05/16 13:37, Wilco Dijkstra wrote:
> >>I can't get any of these to work... Not only do I get a large
> >>number of collisions and duplicated
> >>code between these patches, when I try to resolve them, all I
> >>get is crashes whenever I try
> >>to use sqrt (even rsqrt stopped working). Do you have a patchset
> >>that applies cleanly so I can
> >>try all approximation routines?
> >
> >The original patches should be independent of each other, so
> >indeed they duplicate code.
> >
> >This patch suite should be suitable for testing.

Take look at other patch sets posted to this list for examples of how
to make review easier.

Please send a series of emails tagged:

[Patch 0/3 AArch64] Add infrastructure for more approximate FP operations
[PATCH 1/3 AArch64] Add more choices for the reciprocal square root 
approximation
[PATCH 2/3 AArch64] Emit square root using the Newton series
[PATCH 3/3 AArch64] Emit division using the Newton series

One patch per email, with the dependencies explicit like this, is
infinitely easier to follow than the current structure of your patch set.

I'm not trying to be pedantic for the sake of it, I'm genuinely unsure where
the latest patch versions currently are and how I should apply them to a
clean tree for review.

Thanks,
James

Re: [arm-embedded][PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs

2016-04-27 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On Thursday 17 December 2015 17:32:48 Thomas Preud'homme wrote:
> Hi,
> 
> We decided to apply the following patch to the ARM embedded 5 branch.
> 
> Best regards,
> 
> Thomas
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > Sent: Wednesday, December 16, 2015 7:59 PM
> > To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
> > Kyrylo Tkachov
> > Subject: [PATCH, GCC/ARM, 2/3] Error out for incompatible ARM
> > multilibs
> > 
> > Currently in config.gcc, only the first multilib in a multilib list is
> > checked for validity and the following elements are ignored due to the
> > break which only breaks out of loop in shell. A loop is also done over
> > the multilib list elements despite no combination being legal. This patch
> > rework the code to address both issues.
> > 
> > ChangeLog entry is as follows:
> > 
> > 
> > 2015-11-24  Thomas Preud'homme  
> > 
> > * config.gcc: Error out when conflicting multilib is detected.  Do
> > not
> > loop over multilibs since no combination is legal.
> > 
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 59aee2c..be3c720 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -3772,38 +3772,40 @@ case "${target}" in
> > 
> > # Add extra multilibs
> > if test "x$with_multilib_list" != x; then
> > 
> > arm_multilibs=`echo $with_multilib_list | sed -e
> > 
> > 's/,/ /g'`
> > -   for arm_multilib in ${arm_multilibs}; do
> > -   case ${arm_multilib} in
> > -   aprofile)
> > +   case ${arm_multilibs} in
> > +   aprofile)
> > 
> > # Note that arm/t-aprofile is a
> > # stand-alone make file fragment to be
> > # used only with itself.  We do not
> > # specifically use the
> > # TM_MULTILIB_OPTION framework
> > 
> > because
> > 
> > # this shorthand is more
> > 
> > -   # pragmatic. Additionally it is only
> > -   # designed to work without any
> > -   # with-cpu, with-arch with-mode
> > +   # pragmatic.
> > +   tmake_profile_file="arm/t-aprofile"
> > +   ;;
> > +   default)
> > +   ;;
> > +   *)
> > +   echo "Error: --with-multilib-
> > list=${with_multilib_list} not supported." 1>&2
> > +   exit 1
> > +   ;;
> > +   esac
> > +
> > +   if test "x${tmake_profile_file}" != x ; then
> > +   # arm/t-aprofile is only designed to work
> > +   # without any with-cpu, with-arch, with-
> > mode,
> > 
> > # with-fpu or with-float options.
> > 
> > -   if test "x$with_arch" != x \
> > -   || test "x$with_cpu" != x \
> > -   || test "x$with_float" != x \
> > -   || test "x$with_fpu" != x \
> > -   || test "x$with_mode" != x ;
> > then
> > -   echo "Error: You cannot use
> > any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=aprofile"
> > 1>&2
> > -   exit 1
> > -   fi
> > -   tmake_file="${tmake_file}
> > arm/t-aprofile"
> > -   break
> > -   ;;
> > -   default)
> > -   ;;
> > -   *)
> > -   echo "Error: --with-multilib-
> > list=${with_multilib_list} not supported." 1>&2
> > -   exit 1
> > -   ;;
> > -   esac
> > -   done
> > +   if test "x$with_arch" != x \
> > +   || test "x$with_cpu" != x \
> > +   || test "x$with_float" != x \
> > +   || test "x$with_fpu" != x \
> > +   || test "x$with_mode" != x ; then
> > +   echo "Error: You cannot use any of --
> > with-arch/cpu/fpu/float/mode with --with-multilib-list=${arm_multilib}"
> > 1>&2
> > +   exit 1
> > +   fi
> > +
> > +

Re: [AArch64] Emit division using the Newton series

2016-04-27 Thread James Greenhalgh

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index b7086dd..21af809 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -414,7 +414,8 @@ static const struct tune_params generic_tunings =
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
>tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE)  /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> +  (AARCH64_APPROX_NONE)  /* approx_div_modes.  */
>  };
>  
>  static const struct tune_params cortexa35_tunings =
> @@ -439,7 +440,8 @@ static const struct tune_params cortexa35_tunings =
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
>tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE)  /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> +  (AARCH64_APPROX_NONE)  /* approx_div_modes.  */
>  };
>  
>  static const struct tune_params cortexa53_tunings =
> @@ -464,7 +466,8 @@ static const struct tune_params cortexa53_tunings =
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
>tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE)  /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> +  (AARCH64_APPROX_NONE)  /* approx_div_modes.  */
>  };
>  
>  static const struct tune_params cortexa57_tunings =
> @@ -489,7 +492,8 @@ static const struct tune_params cortexa57_tunings =
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
>tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)   /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),  /* tune_flags.  */
> +  (AARCH64_APPROX_NONE)  /* approx_div_modes.  */
>  };
>  
>  static const struct tune_params cortexa72_tunings =
> @@ -514,7 +518,8 @@ static const struct tune_params cortexa72_tunings =
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
>tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE)  /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> +  (AARCH64_APPROX_NONE)  /* approx_div_modes.  */
>  };
>  
>  static const struct tune_params exynosm1_tunings =
> @@ -538,7 +543,8 @@ static const struct tune_params exynosm1_tunings =
>48,/* max_case_values.  */
>64,/* cache_line_size.  */
>tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_APPROX_RSQRT) /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_APPROX_RSQRT), /* tune_flags.  */
> +  (AARCH64_APPROX_NONE) /* approx_div_modes.  */
>  };
>  
>  static const struct tune_params thunderx_tunings =
> @@ -562,7 +568,8 @@ static const struct tune_params thunderx_tunings =
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
>tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE)  /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> +  (AARCH64_APPROX_NONE)  /* approx_div_modes.  */
>  };
>  
>  static const struct tune_params xgene1_tunings =
> @@ -586,7 +593,8 @@ static const struct tune_params xgene1_tunings =
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
>tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_APPROX_RSQRT)  /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_APPROX_RSQRT), /* tune_flags.  */
> +  (AARCH64_APPROX_NONE)  /* approx_div_modes.  */
>  };

So this is off for all cores currently supported by GCC?

I'm not sure I understand why we should take this if it will immediately
be dead code?

Thanks,
James

[PATCH][ARM] Fix costing of sign-extending load in rtx costs

2016-04-27 Thread Kyrill Tkachov


Hi all,

Another costs issue that came out of the investigation for PR 65932 is that
sign-extending loads get a higher cost than they should in the arm backend.
The problem is that when handling a sign-extend of a MEM we add the cost
of the load_sign_extend cost field and then recursively add the cost of the 
inner MEM
rtx, which is bogus. This will end up adding an extra load cost on it.

The solution in this patch is to just remove that recursive step.
With this patch from various CSE dumps I see much more sane costs assign to 
these
expressions (such as 12 instead of 32 or higher).

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-04-27  Kyrylo Tkachov  

* config/arm/arm.c (arm_new_rtx_costs, SIGN_EXTEND case):
Don't add cost of inner memory when handling sign-extended
loads.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7781b4b449ed48a8d902802d8e6a5c8e1ae7793f..7f2babe7339de3586de190bbe2cf8112919dd96f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10911,8 +10911,6 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
   if ((arm_arch4 || GET_MODE (XEXP (x, 0)) == SImode)
 	  && MEM_P (XEXP (x, 0)))
 	{
-	  *cost = rtx_cost (XEXP (x, 0), VOIDmode, code, 0, speed_p);
-
 	  if (mode == DImode)
 	*cost += COSTS_N_INSNS (1);

New Ukrainian PO file for 'cpplib' (version 6.1.0)

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Ukrainian team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/uk.po

(This file, 'cpplib-6.1.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Contents of PO file 'cpplib-6.1.0.uk.po'

2016-04-27 Thread Translation Project Robot



cpplib-6.1.0.uk.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.

Re: [RFC] Update gmp/mpfr/mpc minimum versions

2016-04-27 Thread Bernd Edlinger

On 26.04.2016 21:28, Marc Glisse wrote:
> On Tue, 26 Apr 2016, Bernd Edlinger wrote:
>
>> For instance PR libstdc++/69881: gmp.h did this:
>>
>> #define __need_size_t  /* tell gcc stddef.h we only want size_t */
>> #include  /* for size_t */
>>
>> I've persuaded Jonathan to work around that in libstdc++.
>>
>> Of course the in-tree build does work with less versions than
>> otherwise.
>
> IIUC, the bug only shows up if you compile in C++11 or later, so
> basically g++-6 or later, and there is a workaround in libstdc++
> starting from version 6 that means that it doesn't cause any problem. So
> there might be a problem if someone tries to build gcc using CXX='g++-5
> -std=c++11' or CXX='clang++ -stdlib=libc++' on a glibc system (I don't
> think others use __need_size_t?), but those are rather odd cases.
>

Yea, but Jonathan did not like this workaround at all, and my personal
preference would also just have been a better error message for this
clearly invalid code.


Bernd.

[PATCH][AArch64] Delete obsolete CC_ZESWP and CC_SESWP CC modes

2016-04-27 Thread Kyrill Tkachov


Hi all,

The CC_ZESWP and CC_SESWP are not used anywhere and seem to be a remmant of some
old code that was removed. The various compare+extend patterns in aarch64.md 
don't
use these modes. So it should be safe to remove them to avoid future confusion.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2016-04-27  Kyrylo Tkachov  

* config/aarch64/aarch64-modes.def (CC_ZESWP, CC_SESWP): Delete.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Remove condition
that returns CC_SESWPmode and CC_ZESWPmode.
(aarch64_get_condition_code_1): Remove handling of CC_SESWPmode
and CC_SESWPmode.
(aarch64_rtx_costs): Likewise.
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index 7de0b3f2fec1024946e40c66088b5b48675c4b7a..de8227f0ce47f4268761047d4e7bc46627c34bc7 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -21,8 +21,6 @@
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
 CC_MODE (CC_SWP);
-CC_MODE (CC_ZESWP); /* zero-extend LHS (but swap to make it RHS).  */
-CC_MODE (CC_SESWP); /* sign-extend LHS (but swap to make it RHS).  */
 CC_MODE (CC_NZ);/* Only N and Z bits of condition flags are valid.  */
 CC_MODE (CC_Z); /* Only Z bit of condition flags is valid.  */
 CC_MODE (CC_C); /* Only C bit of condition flags is valid.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 466712e0c1e9c99ed76cb55728e9eeb6783eaa13..5ca2ae820335e9e656fb2c1b929f903bf0287f19 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4192,14 +4192,6 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && GET_CODE (x) == NEG)
 return CC_Zmode;
 
-  /* A compare of a mode narrower than SI mode against zero can be done
- by extending the value in the comparison.  */
-  if ((GET_MODE (x) == QImode || GET_MODE (x) == HImode)
-  && y == const0_rtx)
-/* Only use sign-extension if we really need it.  */
-return ((code == GT || code == GE || code == LE || code == LT)
-	? CC_SESWPmode : CC_ZESWPmode);
-
   /* A test for unsigned overflow.  */
   if ((GET_MODE (x) == DImode || GET_MODE (x) == TImode)
   && code == NE
@@ -4268,8 +4260,6 @@ aarch64_get_condition_code_1 (enum machine_mode mode, enum rtx_code comp_code)
   break;
 
 case CC_SWPmode:
-case CC_ZESWPmode:
-case CC_SESWPmode:
   switch (comp_code)
 	{
 	case NE: return AARCH64_NE;
@@ -6402,10 +6392,6 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIBUTE_UNUSED,
   /* TODO: A write to the CC flags possibly costs extra, this
 	 needs encoding in the cost tables.  */
 
-  /* CC_ZESWPmode supports zero extend for free.  */
-  if (mode == CC_ZESWPmode && GET_CODE (op0) == ZERO_EXTEND)
-op0 = XEXP (op0, 0);
-
 	  mode = GET_MODE (op0);
   /* ANDS.  */
   if (GET_CODE (op0) == AND)

[PATCH][AArch64] Simplify ashl3 expander for SHORT modes

2016-04-27 Thread Kyrill Tkachov


Hi all,

The ashl3 expander for QI and HI modes is needlessly obfuscated.
The 2nd operand predicate accepts nonmemory_operand but the expand code
FAILs if it's not a CONST_INT. We can just demand a const_int_operand in
the predicate and remove the extra CONST_INT check.

Looking at git blame, it seems it was written that way as a result of some
other refactoring a few years back for an unrelated change.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?

Thanks,
Kyrill

2016-04-27  Kyrylo Tkachov  

* config/aarch64/aarch64.md (ashl3, SHORT modes):
Use const_int_operand for operand 2 predicate.  Simplify expand code
as a result.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d7a669e40f9d4ae863c3e48b73f0eebdecea340d..c08e89bc4eb7b51dbb1e5f893238824caeb5f317 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3770,22 +3770,16 @@ (define_expand "3"
 (define_expand "ashl3"
   [(set (match_operand:SHORT 0 "register_operand")
 	(ashift:SHORT (match_operand:SHORT 1 "register_operand")
-		  (match_operand:QI 2 "nonmemory_operand")))]
+		  (match_operand:QI 2 "const_int_operand")))]
   ""
   {
-if (CONST_INT_P (operands[2]))
-  {
-operands[2] = GEN_INT (INTVAL (operands[2])
-   & (GET_MODE_BITSIZE (mode) - 1));
+operands[2] = GEN_INT (INTVAL (operands[2]) & GET_MODE_MASK (mode));
 
-if (operands[2] == const0_rtx)
-  {
-	emit_insn (gen_mov (operands[0], operands[1]));
-	DONE;
-  }
+if (operands[2] == const0_rtx)
+  {
+	emit_insn (gen_mov (operands[0], operands[1]));
+	DONE;
   }
-else
-  FAIL;
   }
 )

[PATCH][AArch64] Define WORD_REGISTER_OPERATIONS to zero and comment why

2016-04-27 Thread Kyrill Tkachov


Hi all,

WORD_REGISTER_OPERATIONS is currently cryptically commented out in aarch64.h.
In reality, we cannot define it to 1 for aarch64 because operations narrower 
than word_mode (DImode for aarch64)
don't behave like word_mode if they use the W-form of the registers. They'll be 
performed in SImode in that case.
This patch adds a comment to that effect.
Longer term, I think it should be possible to teach the midend about this 
behaviour on aarch64 (maybe re-define
WORD_REGISTER_OPERATIONS to something like NARROWEST_NATURAL_INT_MODE?) to take 
advantage of these semantics,
but in the meantime this should clear up the current situation.

Bootstrapped and tested on aarch64.
This patch shouldn't have any functional changes.
Does the wording look ok for trunk?

Thanks,
Kyrill

2015-04-27  Kyrylo Tkachov  

* config/aarch64/aarch64.h (WORD_REGISTER_OPERATIONS): Define to 0
and explain why in a comment.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index e2ead511076a2192eb79b79ec0a72777f82af35c..61c56b17efc09b65eeaa5441ab916ab7e0c8a969 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -708,7 +708,12 @@ do {	 \
 #define USE_STORE_PRE_INCREMENT(MODE)   0
 #define USE_STORE_PRE_DECREMENT(MODE)   0
 
-/* ?? #define WORD_REGISTER_OPERATIONS  */
+/* WORD_REGISTER_OPERATIONS does not hold for AArch64.
+   The assigned word_mode is DImode but operations narrower than SImode
+   behave as 32-bit operations if using the W-form of the registers rather
+   than as word_mode (64-bit) operations as WORD_REGISTER_OPERATIONS
+   expects.  */
+#define WORD_REGISTER_OPERATIONS 0
 
 /* Define if loading from memory in MODE, an integral mode narrower than
BITS_PER_WORD will either zero-extend or sign-extend.  The value of this

Re: [Patch AArch64] Set TARGET_OMIT_STRUCT_RETURN_REG to true.

2016-04-27 Thread James Greenhalgh

On Tue, Apr 26, 2016 at 02:22:58PM +0100, Ramana Radhakrishnan wrote:
> As $SUBJECT. The reason this caught my eye on aarch64 is because
> the return value register (x0) is not identical to the register in which
> the hidden parameter for AArch64 is set (x8). Thus setting this to true
> seems to be quite reasonable and shaves off 100 odd mov x0, x8's from
> cc1 in a bootstrap build.
> 
> I don't expect this to make a huge impact on performance but as they say
> every little counts.  The AAPCS64 is quite explicit about not requiring that
> the contents of x8 be kept live.
> 
> Bootstrapped and regression tested on aarch64.
> 
> Ok to apply ?

OK.

Thanks,
James

Re: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops

2016-04-27 Thread Richard Biener

On Tue, Apr 26, 2016 at 2:29 PM, Ilya Enkovich  wrote:
> 2016-04-22 10:13 GMT+03:00 Richard Biener :
>> On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> Currently when loop is vectorized we adjust its nb_iterations_upper_bound
>>> by dividing it by VF.  This is incorrect since nb_iterations_upper_bound
>>> is upper bound for ( - 1) and therefore simple
>>> dividing it by VF in many cases gives us bounds greater than a real one.
>>> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).
>>
>> Yeah, that seems correct.
>>
>>> Also decrement due to peeling for gaps should happen before we scale it
>>> by VF because peeling applies to a scalar loop, not vectorized one.
>>
>> That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration
>> is peeled as scalar operations.  We do not account for the amount
>> of known prologue peeling (if peeling for alignment and the misalignment
>> is known at compile-time) - that would be peeling of scalar iterations.
>
> My initial patch didn't change anything for PEELING_FOR_GAP and it caused
> a runfail for one of SPEC2006 benchmarks.  My investigation showed number
> of vector iterations calculation doesn't match nb_iterations_upper_bound
> adjustment in a way PEELING_FOR_GAP is accounted.
>
> Looking into vect_generate_tmps_on_preheader I see:
>
> /* If epilogue loop is required because of data accesses with gaps, we
>subtract one iteration from the total number of iterations here for
>correct calculation of RATIO.  */
>
> And then we decrement loop counter before dividing it by VF to compute
> ratio and ratio_mult_vf.  This doesn't match nb_iterations_upper_bound
> update and that's why I fixed it.  This resolved runfail for me.
>
> Thus ratio_mult_vf computation conflicts with your statement we peel a
> vector iteration.

Hum.  I stand corrected.  So yes, we remove the last vector iteration if
there are not already epilogue iterations.

>>
>> But it would be interesting to know why we need the != 0 check - static
>> cost modelling should have disabled vectorization if the vectorized body
>> isn't run.
>>
>>> This patch modifies nb_iterations_upper_bound computation to resolve
>>> these issues.
>>
>> You do not adjust the ->nb_iterations_estimate accordingly.
>>
>>> Running regression testing I got one fail due to optimized loop. Heres
>>> is a loop:
>>>
>>> foo (signed char s)
>>> {
>>>   signed char i;
>>>   for (i = 0; i < s; i++)
>>> yy[i] = (signed int) i;
>>> }
>>>
>>> Here we vectorize for AVX512 using VF=64.  Original loop has max 127
>>> iterations and therefore vectorized loop may be executed only once.
>>> With the patch applied compiler detects it and transforms loop into
>>> BB with just stores of constants vectors into yy.  Test was adjusted
>>> to increase number of possible iterations.  A copy of test was added
>>> to check we can optimize out the original loop.
>>>
>>> Bootstrapped and regtested on x86_64-pc-linux-gnu.  OK for trunk?
>>
>> I'd like to see testcases covering the corner-cases - have them have
>> upper bound estimates by adjusting known array sizes and also cover
>> the case of peeling for gaps.
>
> OK, I'll make more tests.

Thanks,
Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>

New template for 'gcc' made available

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

http://translationproject.org/POT-files/gcc-6.1.0.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

ftp://ftp.gnu.org/gnu/gcc/gcc-6.1.0/gcc-6.1.0.tar.bz2

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-04-27 Thread Richard Biener

On Sun, Apr 24, 2016 at 12:02 AM, kugan
 wrote:
> Hi Richard,
>
> As you have said in the other email, I tried implementing with the
> add_reapeats_to_ops_vec but the whole repeat vector is designed for
> MULT_EXPR chain. I tried changing it but it turned out to be not
> straightforward without lots of re-write. Therefore I tried to implement
> based on your review here. Please tell me what you think.

Hmm, ok.

>>> +/* Transoform repeated addition of same values into multiply with
>>> +   constant.  */
>>>
>>> Transform
>
>
> Done.
>
>>>
>>> +static void
>>> +transform_add_to_multiply (gimple_stmt_iterator *gsi, gimple *stmt,
>>> vec *ops)
>>>
>>> split the long line
>
>
> Done.
>
>>>
>>> op_list looks redundant - ops[start]->op gives you the desired value
>>> already and if you
>>> use a vec> you can have a more C++ish start,end pair.
>>>
>>> +  tree tmp = make_temp_ssa_name (TREE_TYPE (op), NULL,
>>> "reassocmul");
>>> +  gassign *mul_stmt = gimple_build_assign (tmp, MULT_EXPR,
>>> +  op, build_int_cst
>>> (TREE_TYPE(op), count));
>>>
>>> this won't work for floating point or complex numbers - you need to use
>>> sth like
>>> fold_convert (TREE_TYPE (op), build_int_cst (integer_type_node, count));
>
>
> Done.
>
>>>
>>> For FP types you need to guard the transform with
>>> flag_unsafe_math_optimizations
>
>
> Done.
>
>>>
>>> +  gimple_set_location (mul_stmt, gimple_location (stmt));
>>> +  gimple_set_uid (mul_stmt, gimple_uid (stmt));
>>> +  gsi_insert_before (gsi, mul_stmt, GSI_SAME_STMT);
>>>
>>> I think you do not want to set the stmt uid
>
>
> assert in reassoc_stmt_dominates_p (gcc_assert (gimple_uid (s1) &&
> gimple_uid (s2))) is failing. So I tried to add the uid of the adjacent stmt
> and it seems to work.

Hmm, yes, other cases seem to do the same.

>>> and you want to insert the
>>> stmt right
>>> after the def of op (or at the original first add - though you can't
>>> get your hands at
>
>
> Done.

maybe instert_stmt_after will help here, I don't think you got the insertion
logic correct, thus insert_stmt_after (mul_stmt, def_stmt) which I think
misses GIMPLE_NOP handling.  At least

+  if (SSA_NAME_VAR (op) != NULL

huh?  I suppose you could have tested SSA_NAME_IS_DEFAULT_DEF
but just the GIMPLE_NOP def-stmt test should be enough.

+ && gimple_code (def_stmt) == GIMPLE_NOP)
+   {
+ gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+ stmt = gsi_stmt (gsi);
+ gsi_insert_before (, mul_stmt, GSI_NEW_STMT);

not sure if that is the best insertion point choice, it un-does all
code-sinking done
(and no further sinking is run after the last reassoc pass).  We do know we
are handling all uses of op in our chain so inserting before the plus-expr
chain root should work here (thus 'stmt' in the caller context).  I'd
use that here instead.
I think I'd use that unconditionally even if it works and not bother
finding something
more optimal.

Apart from this this now looks ok to me.

But the testcases need some work


--- a/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr63586-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
...
+
+/* { dg-final { scan-tree-dump-times "\\\*" 4 "reassoc1" } } */

I would have expected 3.  Also please check for \\\* 5 for example
to be more specific (and change the cases so you get different constants
for the different functions).

That said, please make the scans more specific.

Thanks,
Richard.


>>> that easily).  You also don't want to set the location to the last stmt
>>> of the
>>> whole add sequence - simply leave it unset.
>>>
>>> +  oe = operand_entry_pool.allocate ();
>>> +  oe->op = tmp;
>>> +  oe->rank = get_rank (op) * count;
>>>
>>> ?  Why that?  oe->rank should be get_rank (tmp).
>>>
>>> +  oe->id = 0;
>>>
>>> other places use next_operand_entry_id++.  I think you want to simply
>>> use add_to_ops_vec (oe, tmp); here for all of the above.
>
>
> Done.
>
>>>
>>> Please return whether you did any optimization and do the
>>> qsort of the operand vector only if you did sth.
>
>
> Done.
>
>
>>> Testcase with FP math missing.  Likewise with complex or vector math.
>>
>>
>> Btw, does it handle associating
>>
>>x + 3 * x + x
>>
>> to
>>
>>5 * x
>>
>> ?
>
>
> Added this to the testcase and verified it is working.
>
> Regression tested and bootstrapped on x86-64-linux-gnu with no new
> regressions.
>
> Is this OK for trunk?
>
> Thanks,
> Kugan
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-04-24  Kugan Vivekanandarajah  
>
> PR middle-end/63586
> * gcc.dg/tree-ssa/pr63586-2.c: New test.
> * gcc.dg/tree-ssa/pr63586.c: New test.
> * gcc.dg/tree-ssa/reassoc-14.c: Adjust multiplication count.
>
> gcc/ChangeLog:
>
> 2016-04-24  Kugan Vivekanandarajah  
>
>
> PR middle-end/63586

New template for 'cpplib' made available

2016-04-27 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'cpplib' has been made available
to the language teams for translation.  It is archived as:

http://translationproject.org/POT-files/cpplib-6.1.0.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

ftp://ftp.gnu.org/gnu/gcc/gcc-6.1.0/gcc-6.1.0.tar.bz2

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] PR target/70155: Use SSE for TImode load/store

2016-04-27 Thread H.J. Lu

On Wed, Apr 27, 2016 at 6:28 AM, Uros Bizjak  wrote:
> On Wed, Apr 27, 2016 at 2:51 PM, H.J. Lu  wrote:
>> On Wed, Apr 27, 2016 at 5:03 AM, Uros Bizjak  wrote:
>>> On Tue, Apr 26, 2016 at 9:50 PM, H.J. Lu  wrote:
>>>
> Here is the updated patch which does that.  Ok for trunk if there
> is no regressions on x86-64?
>

 CSE works with SSE constants now.  Here is the updated patch.
 OK for trunk if there are no regressions on x86-64?
>>>
>>> +static bool
>>> +timode_scalar_to_vector_candidate_p (rtx_insn *insn)
>>> +{
>>> +  rtx def_set = single_set (insn);
>>> +
>>> +  if (!def_set)
>>> +return false;
>>> +
>>> +  if (has_non_address_hard_reg (insn))
>>> +return false;
>>> +
>>> +  rtx src = SET_SRC (def_set);
>>> +  rtx dst = SET_DEST (def_set);
>>> +
>>> +  /* Only TImode load and store are allowed.  */
>>> +  if (GET_MODE (dst) != TImode)
>>> +return false;
>>> +
>>> +  if (MEM_P (dst))
>>> +{
>>> +  /* Check for store.  Only support store from register or standard
>>> + SSE constants.  */
>>> +  switch (GET_CODE (src))
>>> + {
>>> + default:
>>> +  return false;
>>> +
>>> + case REG:
>>> +  /* For store from register, memory must be aligned or both
>>> + unaligned load and store are optimal.  */
>>> +  return (!misaligned_operand (dst, TImode)
>>> +  || (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL
>>> +  && TARGET_SSE_UNALIGNED_STORE_OPTIMAL));
>>>
>>> Why check TARGET_SSE_UNALIGNED_LOAD_OPTIMAL here? We are moving from a
>>> register here.
>>>
>>> + case CONST_INT:
>>> +  /* For store from standard SSE constant, memory must be
>>> + aligned or unaligned store is optimal.  */
>>> +  return (standard_sse_constant_p (src, TImode)
>>> +  && (!misaligned_operand (dst, TImode)
>>> +  || TARGET_SSE_UNALIGNED_STORE_OPTIMAL));
>>> + }
>>> +}
>>> +  else if (MEM_P (src))
>>> +{
>>> +  /* Check for load.  Memory must be aligned or both unaligned
>>> + load and store are optimal.  */
>>> +  return (GET_CODE (dst) == REG
>>> +  && (!misaligned_operand (src, TImode)
>>> +  || (TARGET_SSE_UNALIGNED_LOAD_OPTIMAL
>>> +  && TARGET_SSE_UNALIGNED_STORE_OPTIMAL)));
>>>
>>> Also here. We are loading a regiister, no point to check
>>> TARGET_SSE_UNALIGNED_STORE_OPTIMAL.
>>>
>>> +}
>>> +
>>> +  return false;
>>> +}
>>> +
>>>
>>> +/* Convert INSN from TImode to V1T1mode.  */
>>> +
>>> +void
>>> +timode_scalar_chain::convert_insn (rtx_insn *insn)
>>> +{
>>> +  rtx def_set = single_set (insn);
>>> +  rtx src = SET_SRC (def_set);
>>> +  rtx tmp;
>>> +  rtx dst = SET_DEST (def_set);
>>>
>>> No need for tmp declaration above ...
>>>
>>> +  switch (GET_CODE (dst))
>>> +{
>>> +case REG:
>>> +  tmp = find_reg_equal_equiv_note (insn);
>>>
>>> ... if you declare it here ...
>>>
>>> +  if (tmp)
>>> + PUT_MODE (XEXP (tmp, 0), V1TImode);
>>>
>>> /* FALLTHRU */
>>>
>>> +case MEM:
>>> +  PUT_MODE (dst, V1TImode);
>>> +  break;
>>>
>>> +case CONST_INT:
>>> +  switch (standard_sse_constant_p (src, TImode))
>>> + {
>>> + case 1:
>>> +  src = CONST0_RTX (GET_MODE (dst));
>>> +  tmp = gen_reg_rtx (V1TImode);
>>> +  break;
>>> + case 2:
>>> +  src = CONSTM1_RTX (GET_MODE (dst));
>>> +  tmp = gen_reg_rtx (V1TImode);
>>> +  break;
>>> + default:
>>> +  gcc_unreachable ();
>>> + }
>>> +  if (NONDEBUG_INSN_P (insn))
>>> + {
>>>
>>> ... and here. Please generate temp register here.
>>>
>>> +  /* Since there are no instructions to store standard SSE
>>> + constant, temporary register usage is required.  */
>>> +  emit_conversion_insns (gen_rtx_SET (dst, tmp), insn);
>>> +  dst = tmp;
>>> + }
>>>
>>>
>>>/* This needs to be done at start up.  It's convenient to do it here.  */
>>>register_pass (_vzeroupper_info);
>>> -  register_pass (_info);
>>> +  register_pass (TARGET_64BIT ? _info_64 : _info_32);
>>>  }
>>>
>>> stv_info_timode and stv_info_dimode?
>>>
>>
>> Here is the updated patch.  OK for trunk if there is no regression?
>
> OK with a small improvement:
>
> if (MEM_P (dst))
>   {
> /* Check for store.  Destination must be aligned or unaligned
> store is optimal.  */
>
> if (misaligned_operands (dst, TImode) && !TARGET_SSE_UNALIGNED_STORE_OPTIMAL)
>   return false;
>
>   /* Only support store from register or standard SSE constants.  */
>  switch (GET_CODE (src))
>  {
>  default:
>   return false;
>
>  case REG:
>   return true;
>
>  case CONST_INT:
>   return (standard_sse_constant_p (src, TImode))
>  }
> }
>
> +  else if (MEM_P (src))
> +{
> +  /* Check for load.  Memory must be aligned or unaligned load is
> + optimal.  */
> +  return (GET_CODE (dst) == REG
>
> REG_P
>
> +  && (!misaligned_operand (src, TImode)
> +  || TARGET_SSE_UNALIGNED_LOAD_OPTIMAL));
>
> Thanks,
> Uros.

This is the patch I will check in.

Thanks.

-- 
H.J.
From 0568f8a300588a8cc8fda4b0f212666714299c32 Mon Sep 17 00:00:00 2001
From:

1 2 >

1 - 100 of 166 matches

Mail list logo