date:20160613

Re: [regex, libstdc++/71500, patch] Fix icase on bracket expression

2016-06-13 Thread Jonathan Wakely


On 11/06/16 11:46 -0700, Tim Shen wrote:

On Sat, Jun 11, 2016 at 5:01 AM, Jonathan Wakely wrote:

N.B. The "typename" and "::type" are redundant here, because it names
the same type as the integral_constant itself, and you could
use __bool_constant<__collate> instead:

return _M_transform_impl(_M_translate(__ch),
__bool_constant<__collate>());

OK for trunk without the redundant typename ...::type, your choice
whether to use __bool_constant or not.


Thanks! I was looking at std::bool_constant but that's in C++17.
__bool_constant is even better. :)



Will this fix apply cleanly to the branches too?



For gcc6 yes; for gcc5 there needs more work. I guess it's OK for
backporting to gcc6?


Yes, OK for trunk and gcc-6-branch, thanks.

[Committed] S/390: Fix MAX_ARGS value.

2016-06-13 Thread Andreas Krebbel

Committed to GCC 5 and mainline branches.

gcc/ChangeLog:

2016-06-13  Andreas Krebbel  

PR target/71379
* config/s390/s390.c (s390_expand_builtin): Increase MAX_ARGS by
one.
---
 gcc/config/s390/s390.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 48b8222..ee0187c 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -791,7 +791,7 @@ s390_expand_builtin (tree exp, rtx target, rtx subtarget 
ATTRIBUTE_UNUSED,
 machine_mode mode ATTRIBUTE_UNUSED,
 int ignore ATTRIBUTE_UNUSED)
 {
-#define MAX_ARGS 5
+#define MAX_ARGS 6
 
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
-- 
1.9.1

Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-06-13 Thread Richard Biener

On Fri, 10 Jun 2016, Richard Biener wrote:

> 
> With the proposed cost change for vector construction we will end up
> vectorizing the testcase in PR68961 again (on x86_64 and likely
> on ppc64le as well after that target gets adjustments).  Currently
> we can't optimize that away again noticing the direct overlap of
> argument and return registers.  The obstackle is
> 
> (insn 7 4 8 2 (set (reg:V2DF 93)
> (vec_concat:V2DF (reg/v:DF 91 [ a ])
> (reg/v:DF 92 [ aa ]))) 
> ...
> (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> 
> which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> 
> First of all simplify_subreg doesn't handle the subregs of a vec_concat
> (easy fix below).
> 
> Then combine doesn't like to simplify the multi-use (it tries some
> parallel it seems).  So I went to forwprop which eventually manages
> to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> because it is not a constant.  Thus I allow arbitrary simplification
> results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
> to be a magic flag to tell it to restrict to the case where all
> uses can be simplified or so, nor to restrict simplifications to a REG.
> But I don't see any undesirable simplifications of (subreg 
> ([vec_]concat)).
> 
> For the testcase I'm not sure if I have to exclude some ABIs (mingw?).
> 
> Boostrap and regtest in progress on x86_64-unknown-linux-gnu, I'll
> install the simplify-rtx.c if that succeeds but like to have opinions
> on the fwprop.c change.

So the bootstrap exposes a latent issue in simplify-rtx.c in the changed
hunk via gcc.target/i386/mmx-8.c on i?86 which ends up with a 

(vec_concat:V2SI (reg:SI 103)
(const_int 0 [0]))

and thus a VOIDmode 2nd operand (I'm sure this can happen for
complex integer concat as well, thus latent).  I am adjusting the
simplify_subreg hunk to always pass GET_MODE_INNER (innermode)
(that hopefully exercises it a bit more than just using that
if GET_MODE (part) == VOIDmode - and hopefully they should always
agree).

Re-bootstrap / regtest running on x86_64-unknown-linux-gnu.

Comments still welcome.

Thanks,
Richard.

2016-06-13  Richard Biener  

PR rtl-optimization/68961
* simplify-rtx.c (simplify_subreg): Handle VEC_CONCAT like CONCAT.
* fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and CONCAT
to simplify to a non-constant.

* gcc.target/i386/pr68961.c: New testcase.

Index: gcc/simplify-rtx.c
===
*** gcc/simplify-rtx.c  (revision 237286)
--- gcc/simplify-rtx.c  (working copy)
*** simplify_subreg (machine_mode outermode,
*** 6108,6116 
&& GET_MODE_SIZE (outermode) <= GET_MODE_SIZE (GET_MODE (op)))
  return adjust_address_nv (op, outermode, byte);
  
!   /* Handle complex values represented as CONCAT
!  of real and imaginary part.  */
!   if (GET_CODE (op) == CONCAT)
  {
unsigned int part_size, final_offset;
rtx part, res;
--- 6108,6117 
&& GET_MODE_SIZE (outermode) <= GET_MODE_SIZE (GET_MODE (op)))
  return adjust_address_nv (op, outermode, byte);
  
!   /* Handle complex or vector values represented as CONCAT or VEC_CONCAT
!  of two parts.  */
!   if (GET_CODE (op) == CONCAT
!   || GET_CODE (op) == VEC_CONCAT)
  {
unsigned int part_size, final_offset;
rtx part, res;
*** simplify_subreg (machine_mode outermode,
*** 6130,6139 
if (final_offset + GET_MODE_SIZE (outermode) > part_size)
return NULL_RTX;
  
!   res = simplify_subreg (outermode, part, GET_MODE (part), final_offset);
if (res)
return res;
!   if (validate_subreg (outermode, GET_MODE (part), part, final_offset))
return gen_rtx_SUBREG (outermode, part, final_offset);
return NULL_RTX;
  }
--- 6131,6141 
if (final_offset + GET_MODE_SIZE (outermode) > part_size)
return NULL_RTX;
  
!   enum machine_mode part_mode = GET_MODE_INNER (innermode);
!   res = simplify_subreg (outermode, part, part_mode, final_offset);
if (res)
return res;
!   if (validate_subreg (outermode, part_mode, part, final_offset))
return gen_rtx_SUBREG (outermode, part, final_offset);
return NULL_RTX;
  }
Index: gcc/fwprop.c
===
*** gcc/fwprop.c(revision 237286)
--- gcc/fwprop.c(working copy)
*** propagate_rtx (rtx x, machine_mode mode,
*** 664,670 
|| (GET_CODE (new_rtx) == SUBREG
  && REG_P (SUBREG_REG (new_rtx))
  && (GET_MODE_SIZE (mode)
! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))
  flags |= PR_CAN_APPEAR;
if (!varying_mem_p

Re: [Committed] S/390: Fix MAX_ARGS value.

2016-06-13 Thread Jakub Jelinek

On Mon, Jun 13, 2016 at 10:51:16AM +0200, Andreas Krebbel wrote:
> On 06/13/2016 10:43 AM, Jakub Jelinek wrote:
> > On Mon, Jun 13, 2016 at 10:38:22AM +0200, Andreas Krebbel wrote:
> >> Committed to GCC 5 and mainline branches.
> > 
> > What about gcc-6-branch?  It also has MAX_ARGS 5, and case for arity 6.
> 
> Done.

Also, it isn't clear to me, are there any s390 builtins right now that
actually have 6 arguments (my reading is that you don't count the return
value into that)?  I.e. beyond the bootstrap issues, is the change actually
fixing expansion of any builtins (there is if (arity >= MAX_ARGS) check),
or is the arity 6 case there just for potential further builtins?
My confusion comes from s390-builtin*.def using e.g. DEF_FN_TYPE_6
which looks to me like actually 5 argument builtin type where the first type
is the return type.  Wouldn't e.g. gcc/builtin-types.def call it
DEF_FUNCTION_TYPE_5 (rather than _6)?
Also, where is e.g. __builtin_s390_vstrcbs (as randomly chosen builtin
using DEF_FN_TYPE_6) covered in the testsuite?

Jakub

Re: [PATCH, i386]: Implement PR 71246, Missing built-in functions for float128 NaNs

2016-06-13 Thread Uros Bizjak

On Mon, Jun 13, 2016 at 10:01 AM, Richard Biener  wrote:
> On Fri, 10 Jun 2016, Uros Bizjak wrote:
>
>> Hello!
>>
>> Attached patch implements __builtin_nanq and __builtin_nansq
>> __float128 functions.
>>
>> 2016-06-10  Uros Bizjak  
>>
>> PR target/71241
>> * config/i386/i386.i386-builtin-types.def (CONST_STRING):
>> New primitive type.
>> (FLOAT128_FTYPE_CONST_STRING): New function type.
>> * config/i386/i386.c (enum ix86_builtins) [IX86_BUILTIN_NANQ]: New.
>> [IX86_BUILTIN_NANSQ]: Ditto.
>> (ix86_fold_builtin): Handle IX86_BUILTIN_NANQ and IX86_BUILTIN_NANSQ.
>> (ix86_init_builtin_types) Declare const_string_type_node.
>> Add __builtin_nanq and __builtin_nansq builtin functions.
>> (ix86_expand_builtin): Handle IX86_BUILTIN_NANQ and IX86_BUILTIN_NANSQ.
>> * doc/extend.texi (x86 Built-in Functions): Document
>> __builtin_nanq and __builtin_nansq.
>>
>> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>>
>> Joseph, does it look OK to you? Richi, I hope I got tree stuff
>> implemented correctly.
>
> Hmm, as we already have BUILT_IN_NAN[S] why not add NAN128 and NANS128
> variants in the middle-end as we already have NAND128 (for decimal float
> 128)?
>
> I don't see why we need target specific builtins for this given you
> simply use middle-end functionality to construct the result.

This goes together with __builtin_infq. These functions are not
standardized yet, so we have to resort to target-dependent semi-hacks.
Once _f128 functions are standardized, this functionality can be moved
to the middle end as a generic expander.

Uros.

[Ada] Fix ICE on renaming of 'Pred or 'Succ attribute

2016-06-13 Thread Eric Botcazou

This is a regression present on the mainline and 6 branch: the compiler stops 
on the renaming of the 'Pred or 'Succ attribute of a fixed-point type.

Tested on x86_64-suse-linux, applied on the mainline and 6 branch.


2016-06-13  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Deal with
PLUS_EXPR in the expression of a renaming.


2016-06-13  Eric Botcazou  

* gnat.dg/renaming10.ad[sb]: New test.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 237324)
+++ gcc-interface/decl.c	(working copy)
@@ -1003,6 +1003,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		 && !call_is_atomic_load (inner))
 		|| TREE_CODE (inner) == ADDR_EXPR
 		|| TREE_CODE (inner) == NULL_EXPR
+		|| TREE_CODE (inner) == PLUS_EXPR
 		|| TREE_CODE (inner) == CONSTRUCTOR
 		|| CONSTANT_CLASS_P (inner)
 		/* We need to detect the case where a temporary is created to
package Renaming10 is

   type Rec is record
  Position : Natural;
   end record;

   function F (Input : Rec) return Natural;

end Renaming10;
-- { dg-do compile }

package body Renaming10 is

   function F (Input : Rec) return Natural is
  Position : Natural renames Input.Position;
  Index : Natural renames Natural'Succ(Position);
   begin
  return Index;
   end;

end Renaming10;

Re: [Committed] S/390: Fix MAX_ARGS value.

2016-06-13 Thread Andreas Krebbel

On 06/13/2016 10:43 AM, Jakub Jelinek wrote:
> On Mon, Jun 13, 2016 at 10:38:22AM +0200, Andreas Krebbel wrote:
>> Committed to GCC 5 and mainline branches.
> 
> What about gcc-6-branch?  It also has MAX_ARGS 5, and case for arity 6.

Done.

-Andreas-

[PATCH GCC]Improve alias check code generation in vectorizer

2016-06-13 Thread Bin Cheng

Hi,
Take subroutine "DACOP" from spec2k/200.sixtrack as an example, the loop needs 
to be versioned for vectorization because of possibly alias.  The alias check 
for data-references are like:

//pair 1
dr_a:
(Data Ref: 
  bb: 8 
  stmt: _92 = da.cc[_27];
  ref: da.cc[_27];
)
dr_b:
(Data Ref: 
  bb: 8 
  stmt: da.cc[_93] = _92;
  ref: da.cc[_93];
)
//pair 2
dr_a:
(Data Ref: 
  bb: 8 
  stmt: pretmp_29 = da.i2[_27];
  ref: da.i2[_27];
)
dr_b:
(Data Ref: 
  bb: 8 
  stmt: da.i2[_93] = pretmp_29;
  ref: da.i2[_93];
)
//pair 3
dr_a:
(Data Ref: 
  bb: 8 
  stmt: pretmp_28 = da.i1[_27];
  ref: da.i1[_27];
)
dr_b:
(Data Ref: 
  bb: 8 
  stmt: da.i1[_93] = pretmp_28;
  ref: da.i1[_93];
)

The code generated for alias checks are as below:

  :
  # iy_186 = PHI <_413(22), 2(2)>
  # ivtmp_1050 = PHI 
  _155 = iy_186 + -2;
  _156 = _155 * 516;
  _241 = iy_186 + -1;
  _242 = _241 * 516;
  _328 = iy_186 * 516;
  _413 = iy_186 + 1;
  _414 = _413 * 516;
  _499 = iy_186 + 2;
  _500 = _499 * 516;
  _998 = iy_186 * 516;
  _997 = (sizetype) _998;
  _996 = _997 + 6;
  _995 = _996 * 4;
  _994 = global_Output.2_16 + _995;
  _993 = iy_186 * 516;
  _992 = (long unsigned int) _993;
  _991 = _992 * 4;
  _990 = _991 + 18446744073709547488;
  _989 = global_Input.0_153 + _990;
  _886 = _989 >= _994;
  _885 = iy_186 * 516;
  _884 = (sizetype) _885;
  _883 = _884 + 1040;
  _882 = _883 * 4;
  _881 = global_Input.0_153 + _882;
  _880 = (sizetype) _998;
  _879 = _880 + 2;
  _878 = _879 * 4;
  _877 = global_Output.2_16 + _878;
  _876 = _877 >= _881;
  _875 = _876 | _886;
  _874 = iy_186 * 516;
  _873 = (sizetype) _874;
  _872 = _873 + 514;
  _871 = _872 * 4;
  _870 = global_Output.2_16 + _871;
  _869 = local_Filter_33 >= _870;
  _868 = local_Filter_33 + 100;
  _867 = (sizetype) _874;
  _866 = _867 + 2;
  _865 = _866 * 4;
  _864 = global_Output.2_16 + _865;
  _863 = _864 >= _868;
  _862 = _863 | _869;
  _861 = _862 & _875;
  if (_861 != 0)
goto ;
  else
goto ;

It contains quite a lot redundant computations.  Root cause is vectorizer 
simply translates alias checks into full address expressions comparison, and 
CSE opportunities are covered by foler.  This patch improves function 
vect_create_cond_for_alias_checks by simplifying the comparison by comparing 
DR_BASE_ADDRESS/DR_INIT of both data-reference at compilation time.  It also 
simplifies conditions:
  (addr_a_min + addr_a_length) <= addr_b_min || (addr_b_min + addr_b_length) <= 
addr_a_min
into below form:
  cond_expr = addr_b_min - addr_a_min
  cond_expr >= addr_a_length || cond_expr <= -addr_b_length
if the comparison is done in signed type.  And this can be further simplified 
by folder if addr_a_length and addr_b_lengnth are equal/const, which is quite 
common.
I looked into generated assembly, this patch does introduces small regression 
in some cases, but overall I think it's good.  Bootstrap and test on x86_64 and 
AArch64.  Is it OK?
Thanks,
bin

2016-06-08  Bin Cheng  

* tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): New
Parameter.  Simplify alias check conditions at compilation time.
(vect_loop_versioning): Pass new argument to above function.diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 438458e..b38a6e4 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2211,17 +2211,22 @@ vect_create_cond_for_align_checks (loop_vec_info 
loop_vinfo,
 
Output:
COND_EXPR - conditional expression.
+   COND_EXPR_STMT_LIST - statements needed to construct the conditional
+ expression.
 
The returned COND_EXPR is the conditional expression to be used in the if
statement that controls which version of the loop gets executed at runtime.
 */
 
 void
-vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr)
+vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo,
+  tree * cond_expr,
+  gimple_seq *cond_expr_stmt_list)
 {
   vec comp_alias_ddrs =
 LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
   tree part_cond_expr;
+  gimple_seq seq;
 
   /* Create expression
  ((store_ptr_0 + store_segment_length_0) <= load_ptr_0)
@@ -2237,21 +2242,17 @@ vect_create_cond_for_alias_checks (loop_vec_info 
loop_vinfo, tree * cond_expr)
 
   for (size_t i = 0, s = comp_alias_ddrs.length (); i < s; ++i)
 {
+  enum tree_code code;
+  tree type_a, type_b, type_offset = ssizetype;
   const dr_with_seg_len& dr_a = comp_alias_ddrs[i].first;
   const dr_with_seg_len& dr_b = comp_alias_ddrs[i].second;
   tree segment_length_a = dr_a.seg_len;
   tree segment_length_b = dr_b.seg_len;
   tree addr_base_a = DR_BASE_ADDRESS (dr_a.dr);
   tree addr_base_b = DR_BASE_ADDRESS (dr_b.dr);
+  tree init_a = DR_INIT (dr_a.dr), init_b = DR_INIT (dr_b.dr);
   tree offset_a = DR_OFFSET (dr_a.dr), offset_b = DR_OFFSET

Re: [PATCH GCC]Resolve compilation time known alias checks in vectorizer

2016-06-13 Thread Bin Cheng

And Below is the ChangeLog entry for test cases

gcc/testsuite/ChangeLog
2016-06-07  Bin Cheng  

* gcc.dg/vect/vect-35-big-array.c: Refine comment and test.
* gcc.dg/vect/vect-35.c: Ditto.

BTW, this patch also makes gcc.dg/vect/vect-mask-store-move-1.c fail, but I 
think it just exposes existing issue in PR65206.  Vectorizer's dependence 
analyzer should be fixed for this.

Thanks,
bin


> From: gcc-patches-ow...@gcc.gnu.org  on behalf 
> of Bin Cheng 
> Sent: 13 June 2016 11:01
> To: gcc-patches@gcc.gnu.org
> Cc: nd
> Subject: [PATCH GCC]Resolve compilation time known alias checks in vectorizer
>
> Hi,
> GCC vectorizer generates many unnecessary runtime alias checks known at 
> compilation time.  For some data-reference pairs, alias relation can be 
> resolved at compilation time, in this case, we can 
> simply skip the alias check.  For some other data-reference pairs,  alias 
> relation can be realized at compilation time, in this case, we should return 
> false and simply skip vectorizing the loop.  For the second 
> case, the corresponding loop is vectorized for nothing because the guard 
> alias condition is bound to false anyway.   Vectorizing it not only wastes 
> compilation time, but also slows down generated code 
> because GCC fails to resolve these "false" alias check after vectorization.  
> Even in cases it can be resolved (by VRP), GCC fails to cleanup all the mess 
> generated in loop  versioning.
> This looks like a common issue in spec2k6.  For example, in 
> 434.zeusmp/ggen.f, there are three loops vectorized but never executed; in 
> 464.h264ref, there are loops in which all runtime alias checks are 
> resolved at compilation time thus loop versioning is proven  unnecessary.  
> Statistic data also shows that about >100 loops are falsely vectorized 
> currently in my build of spec2k6.
> 
> This patch is based on  
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00399.html, bootstrap and test 
> on x86_64 and AArch64 (ongoing), is it OK?
> 
> Thanks,
> bin
> 
> 2016-06-07  Bin Cheng  
> 
>     * tree-vect-data-refs.c (vect_no_alias_p): New function.
>     (vect_prune_runtime_alias_test_list): Call vect_no_alias_p to
>     resolve alias checks which are known at compilation time.
>     Truncate vector LOOP_VINFO_MAY_ALIAS_DDRS(loop_vinfo) if all
>     alias checks are resolved at compilation time.

Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-06-13 Thread James Greenhalgh

On Fri, Jun 03, 2016 at 04:50:24PM -0500, Evandro Menezes wrote:
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 73a3fb8..4d7bcb7 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -577,6 +577,7 @@ Objective-C and Objective-C++ Dialects}.
>  -mfix-cortex-a53-843419  -mno-fix-cortex-a53-843419 @gol
>  -mlow-precision-recip-sqrt -mno-low-precision-recip-sqrt@gol
>  -mlow-precision-sqrt -mno-low-precision-sqrt@gol
> +-mlow-precision-div -mno-low-precision-div @gol
>  -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
>  
>  @emph{Adapteva Epiphany Options}
> @@ -13032,6 +13033,15 @@ precision of square root results to about 16 bits for
>  single precision and to 32 bits for double precision.
>  If enabled, it implies @option{-mlow-precision-recip-sqrt}.
>  
> +@item -mlow-precision-div
> +@item -mno-low-precision-div
> +@opindex -mlow-precision-div
> +@opindex -mno-low-precision-div
> +When calculating the division approximation,
> +uses one less step than otherwise, thus reducing latency and precision.

s/uses/use/

Otherwise, this is ok for trunk.

Thanks for your patience on this patch series.

Thanks,
James

> From d791090aae6a29fa94d8fc10894ee1053b05bcc2 Mon Sep 17 00:00:00 2001
> From: Evandro Menezes 
> Date: Mon, 4 Apr 2016 14:02:24 -0500
> Subject: [PATCH 3/3] [AArch64] Emit division using the Newton series
> 
> 2016-04-04  Evandro Menezes  
> Wilco Dijkstra  
> 
> gcc/
>   * config/aarch64/aarch64-protos.h
>   (cpu_approx_modes): Add new member "division".
>   (aarch64_emit_approx_div): Declare new function.
>   * config/aarch64/aarch64.c
>   (generic_approx_modes): New member "division".
>   (exynosm1_approx_modes): Likewise.
>   (xgene1_approx_modes): Likewise.
>   (aarch64_emit_approx_div): Define new function.
>   * config/aarch64/aarch64.md ("div3"): New expansion.
>   * config/aarch64/aarch64-simd.md ("div3"): Likewise.
>   * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
>   * doc/invoke.texi (-mlow-precision-div): Describe new option.

Re: Move optimize_minmax_comparison to match.pd

2016-06-13 Thread Richard Biener

On Sun, Jun 12, 2016 at 10:30 AM, Marc Glisse  wrote:
> Hello,
>
> this move is pretty straightforward. The transformation would probably work
> just fine for any type (floats, vectors), but I didn't want the headache of
> checking the behavior for NaN.
>
> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

Ok.

Thanks,
Richard.

> 2016-06-13  Marc Glisse  
>
> * fold-const.c (optimize_minmax_comparison): Remove.
> (fold_comparison): Remove call to the above.
> * match.pd (MIN (X, Y) == X, MIN (X, 5) == 0, MIN (X, C1) < C2):
> New transformations.
>
> --
> Marc Glisse
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c(revision 237336)
> +++ gcc/fold-const.c(working copy)
> @@ -121,22 +121,20 @@ static tree eval_subst (location_t, tree
>  static tree optimize_bit_field_compare (location_t, enum tree_code,
> tree, tree, tree);
>  static int simple_operand_p (const_tree);
>  static bool simple_operand_p_2 (tree);
>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>  static tree range_predecessor (tree);
>  static tree range_successor (tree);
>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>  static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree,
> tree);
>  static tree unextend (tree, int, int, tree);
> -static tree optimize_minmax_comparison (location_t, enum tree_code,
> -   tree, tree, tree);
>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
>  static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
>  static tree fold_binary_op_with_conditional_arg (location_t,
>  enum tree_code, tree,
>  tree, tree,
>  tree, tree, int);
>  static tree fold_div_compare (location_t, enum tree_code, tree, tree,
> tree);
>  static bool reorder_operands_p (const_tree, const_tree);
>  static tree fold_negate_const (tree, tree);
>  static tree fold_not_const (const_tree, tree);
> @@ -5972,124 +5970,20 @@ fold_truth_andor_1 (location_t loc, enum
>ll_unsignedp || rl_unsignedp, ll_reversep);
>
>ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
>if (! all_ones_mask_p (ll_mask, lnbitsize))
>  result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask);
>
>return build2_loc (loc, wanted_code, truth_type, result,
>  const_binop (BIT_IOR_EXPR, l_const, r_const));
>  }
>
> -/* Optimize T, which is a comparison of a MIN_EXPR or MAX_EXPR with a
> -   constant.  */
> -
> -static tree
> -optimize_minmax_comparison (location_t loc, enum tree_code code, tree type,
> -   tree op0, tree op1)
> -{
> -  tree arg0 = op0;
> -  enum tree_code op_code;
> -  tree comp_const;
> -  tree minmax_const;
> -  int consts_equal, consts_lt;
> -  tree inner;
> -
> -  STRIP_SIGN_NOPS (arg0);
> -
> -  op_code = TREE_CODE (arg0);
> -  minmax_const = TREE_OPERAND (arg0, 1);
> -  comp_const = fold_convert_loc (loc, TREE_TYPE (arg0), op1);
> -  consts_equal = tree_int_cst_equal (minmax_const, comp_const);
> -  consts_lt = tree_int_cst_lt (minmax_const, comp_const);
> -  inner = TREE_OPERAND (arg0, 0);
> -
> -  /* If something does not permit us to optimize, return the original tree.
> */
> -  if ((op_code != MIN_EXPR && op_code != MAX_EXPR)
> -  || TREE_CODE (comp_const) != INTEGER_CST
> -  || TREE_OVERFLOW (comp_const)
> -  || TREE_CODE (minmax_const) != INTEGER_CST
> -  || TREE_OVERFLOW (minmax_const))
> -return NULL_TREE;
> -
> -  /* Now handle all the various comparison codes.  We only handle EQ_EXPR
> - and GT_EXPR, doing the rest with recursive calls using logical
> - simplifications.  */
> -  switch (code)
> -{
> -case NE_EXPR:  case LT_EXPR:  case LE_EXPR:
> -  {
> -   tree tem
> - = optimize_minmax_comparison (loc,
> -   invert_tree_comparison (code,
> false),
> -   type, op0, op1);
> -   if (tem)
> - return invert_truthvalue_loc (loc, tem);
> -   return NULL_TREE;
> -  }
> -
> -case GE_EXPR:
> -  return
> -   fold_build2_loc (loc, TRUTH_ORIF_EXPR, type,
> -optimize_minmax_comparison
> -(loc, EQ_EXPR, type, arg0, comp_const),
> -optimize_minmax_comparison
> -(loc, GT_EXPR, type, arg0, comp_const));
> -
> -case EQ_EXPR:
> -  if (op_code == MAX_EXPR && consts_equal)
> -   /* MAX (X, 0) == 0  ->  X <= 0  */
> -   return fold_build2_loc (loc, LE_EXPR, type, inner, comp_const);
> -
> -  else if (op_code == MAX_EXPR && consts_lt)
> -

Re: [patch] generate_libstdcxx_web_docs: Use realpath to get absolute path

2016-06-13 Thread Jonathan Wakely


On 12/06/16 12:53 +0200, Gerald Pfeifer wrote:

Hi Jonathan,

On Thu, 28 Apr 2016, Jonathan Wakely wrote:

When I ran maintainer-scripts/generate_libstdcxx_web_docs to make the
onlinedocs/libstdc++ for 6.1 the other day it failed because I use a
relative path for the output dir argument. This would make it work,
but is relying on GNU realpath OK?


realpath also exists on other systems, though the options you used are 
not portable.


How about going with your patch, just without the -es options?  (I 
verified that this works on FreeBSD, for example.)



I could instead use a Bashism like:

DOCSDIR=$(test "${2:0:1}" = "/" && echo "$2" || echo "$PWD/$2")


That gives me major headache. :-)


If it's OK for trunk it could go on the banches too, for generating
the 4.9.4, 5.4 and 6.2 docs.


Let's go with your patch for trunk and the GCC 6 branch.  Beyond
those, I would not care, though if you feel about it a bit more
strongly, no objection.

Note, there is a second hunk in the patch you posted (cf. the
attached) that makes sense, but will need to be added to the
ChangeLog.

Gerald


I've committed this to trunk and gcc-6-branch.


commit c03949980e1177d6d71b9de0f2f8e6e1048e45f1
Author: Jonathan Wakely 
Date:   Wed Apr 27 14:52:03 2016 +0100

	* generate_libstdcxx_web_docs: Use realpath to get absolute path.

	Add comment about LaTeX errors.

diff --git a/maintainer-scripts/generate_libstdcxx_web_docs b/maintainer-scripts/generate_libstdcxx_web_docs
index 700e522..00ebcbf 100755
--- a/maintainer-scripts/generate_libstdcxx_web_docs
+++ b/maintainer-scripts/generate_libstdcxx_web_docs
@@ -3,7 +3,7 @@
 # i.e. http://gcc.gnu.org/onlinedocs/gcc-x.y.z/libstdc++*
 
 SRCDIR=${1}
-DOCSDIR=${2}
+DOCSDIR=$(realpath ${2})
 
 if ! [ $# -eq 2 -a -x "${SRCDIR}/configure" -a -d "${DOCSDIR}" ]
 then
@@ -34,6 +34,9 @@ set -x
 ${SRCDIR}/configure --enable-languages=c,c++ --disable-gcc $disabled_libs --docdir=/docs
 eval `grep '^target=' config.log`
 make configure-target
+# If the following step fails with an error like
+# ! LaTeX Error: File `xtab.sty' not found.
+# then you need to install the relevant TeX package e.g. texlive-xtab
 make -C $target/libstdc++-v3 doc-install-html doc-install-xml doc-install-pdf DESTDIR=$DESTDIR
 cd $DESTDIR/docs
 mkdir libstdc++

[PATCH] Fix PR71505

2016-06-13 Thread Richard Biener


Committed as obvious to trunk and branches.

Richard.

2016-06-13  Richard Biener  

PR tree-optimization/71505
* tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Make
assert match comment.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 237367)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2752,7 +2752,7 @@ vect_analyze_data_ref_accesses (vec_info
  /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
  HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
  HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
- gcc_assert (init_a < init_b);
+ gcc_assert (init_a <= init_b);
 
  /* If init_b == init_a + the size of the type * k, we have an
 interleaving, and DRA is accessed before DRB.  */

Re: [PATCH]: Restore bootstrap with gcc < 4.3

2016-06-13 Thread Richard Biener

On Mon, Jun 13, 2016 at 11:23 AM, Uros Bizjak  wrote:
> Hello!
>
> The new test finalization self tests fail wigh gcc < 4.3 due to the
> way need_finalization_p is defined:
>
> template
> static inline bool
> need_finalization_p ()
> {
> #if GCC_VERSION >= 4003
>   return !__has_trivial_destructor (T);
> #else
>   return true;
> #endif
> }
>
> It is obvious that checking for
>
>ASSERT_FALSE (need_finalization_p  ());
>
> will always fail. Checking need_finalization_p is meaningless with gcc < 4.3.
>
> 2016-06-13  Uros Bizjak  
>
> * ggc-tests.c (test_finalization): Only test need_finalization_p
> for GCC_VERSION >= 4003.
>
> Bootstrapped on x86_64-linux-gnu, CentOS 5.11.
>
> OK for mainline?

Ok.

Richard.

> Uros.
>
> diff --git a/gcc/ggc-tests.c b/gcc/ggc-tests.c
> index 48eac03..7f97231 100644
> --- a/gcc/ggc-tests.c
> +++ b/gcc/ggc-tests.c
> @@ -190,8 +190,10 @@ int test_struct_with_dtor::dtor_call_count;
>  static void
>  test_finalization ()
>  {
> +#if GCC_VERSION >= 4003
>ASSERT_FALSE (need_finalization_p  ());
>ASSERT_TRUE (need_finalization_p  ());
> +#endif
>
>/* Create some garbage.  */
>const int count = 10;

[Ada] Fix annoying oversight in elaboration of subprograms

2016-06-13 Thread Eric Botcazou

This fixes an annoying oversight introduced in the new elaboration model for 
subprograms in gigi: calls might be generated while the type of parameters is 
still incomplete, which leads to truncation to the low part for access types
on 64-bit targets...

Tested on x86_64-suse-linux, applied on the mainline.


2016-06-13  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_subprog_type): Build only minimal
PARM_DECL when the parameter type is dummy.
* gcc-interface/trans.c (Call_to_gnu): Translate formal types before
formal objects.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 237360)
+++ gcc-interface/decl.c	(working copy)
@@ -5959,8 +5959,11 @@ gnat_to_gnu_subprog_type (Entity_Id gnat
 
 	  else
 		{
+		  /* Build a minimal PARM_DECL without DECL_ARG_TYPE so that
+		 Call_to_gnu will stop if it encounters the PARM_DECL.  */
 		  gnu_param
-		= create_param_decl (gnu_param_name, gnu_param_type);
+		= build_decl (input_location, PARM_DECL, gnu_param_name,
+  gnu_param_type);
 		  associate_subprog_with_dummy_type (gnat_subprog,
 		 gnu_param_type);
 		  incomplete_profile_p = true;
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 237328)
+++ gcc-interface/trans.c	(working copy)
@@ -4341,9 +4341,9 @@ Call_to_gnu (Node_Id gnat_node, tree *gn
gnat_actual = Next_Actual (gnat_actual))
 {
   Entity_Id gnat_formal_type = Etype (gnat_formal);
+  tree gnu_formal_type = gnat_to_gnu_type (gnat_formal_type);
   tree gnu_formal = present_gnu_tree (gnat_formal)
 			? get_gnu_tree (gnat_formal) : NULL_TREE;
-  tree gnu_formal_type = gnat_to_gnu_type (gnat_formal_type);
   const bool is_true_formal_parm
 	= gnu_formal && TREE_CODE (gnu_formal) == PARM_DECL;
   const bool is_by_ref_formal_parm

Re: [Committed] S/390: Fix MAX_ARGS value.

2016-06-13 Thread Jakub Jelinek

On Mon, Jun 13, 2016 at 10:38:22AM +0200, Andreas Krebbel wrote:
> Committed to GCC 5 and mainline branches.

What about gcc-6-branch?  It also has MAX_ARGS 5, and case for arity 6.

> gcc/ChangeLog:
> 
> 2016-06-13  Andreas Krebbel  
> 
>   PR target/71379
>   * config/s390/s390.c (s390_expand_builtin): Increase MAX_ARGS by
>   one.
> ---
>  gcc/config/s390/s390.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 48b8222..ee0187c 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -791,7 +791,7 @@ s390_expand_builtin (tree exp, rtx target, rtx subtarget 
> ATTRIBUTE_UNUSED,
>machine_mode mode ATTRIBUTE_UNUSED,
>int ignore ATTRIBUTE_UNUSED)
>  {
> -#define MAX_ARGS 5
> +#define MAX_ARGS 6
>  
>tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
>unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
> -- 
> 1.9.1

Jakub

Re: move increase_alignment from simple to regular ipa pass

2016-06-13 Thread Prathamesh Kulkarni

On 10 June 2016 at 16:47, Richard Biener  wrote:
> On Fri, 10 Jun 2016, Prathamesh Kulkarni wrote:
>
>> On 10 June 2016 at 01:53, Jan Hubicka  wrote:
>> >> On 8 June 2016 at 20:38, Jan Hubicka  wrote:
>> >> >> I think it would be nice to work towards transitioning
>> >> >> flag_section_anchors to a flag on varpool nodes, thereby removing
>> >> >> the Optimization flag from common.opt:fsection-anchors
>> >> >>
>> >> >> That would simplify the walk over varpool candidates.
>> >> >
>> >> > Makes sense to me, too. There are more candidates for sutff that should 
>> >> > be
>> >> > variable specific in common.opt (such as variable alignment, 
>> >> > -fdata-sctions,
>> >> > -fmerge-constants) and targets.  We may try to do it in an easy to 
>> >> > extend way
>> >> > so incrementally we can get rid of those global flags, too.
>> >> In this version I removed Optimization from fsection-anchors entry in
>> >> common.opt,
>> >> and gated the increase_alignment pass on flag_section_anchors != 0.
>> >> Cross tested on arm*-*-*, aarch64*-*-*.
>> >> Does it look OK ?
>> >
>> > If you go this way you will need to do something sane for LTO.  Here one 
>> > can compile
>> > some object files with -fsection-anchors and other without and link with 
>> > random setting
>> > (because in traditional compilation linktime flags does not matter).
>> >
>> > For global flags we have magic in merge_and_complain that determines flags 
>> > to pass
>> > to the LTO compiler.
>> > It is not very robust though.
>> >> >
>> >> > One thing that needs to be done for LTO is sane merging, I guess in 
>> >> > this case
>> >> > it is clear that the variable should be anchored when its previaling 
>> >> > definition
>> >> > is.
>> >> Um could we determine during WPA if symbol is a section anchor for 
>> >> merging ?
>> >> Seems to me SYMBOL_REF_ANCHOR_P is defined only on DECL_RTL and not at
>> >> tree level.
>> >> Do we have DECL_RTL info available during WPA ?
>> >
>> > We don't have anchros computed, but we can decide whether we want to 
>> > potentially
>> > anchor the variable if we can.
>> >
>> > I would say all you need is to have section_anchor flag in varpool node 
>> > itself
>> > which controls RTL production. At varpool_finalize_decl you will set it
>> > according to flag_varpool and stream it to LTO objects. At WPA when doing
>> > linking, the section_anchor flag of the previaling decl wins..
>> Thanks for the suggestions.
>> IIUC, we want to add new section_anchor flag to varpool_node class
>> and set it in varpool_node::finalize_decl and stream it to LTO byte-code,
>> and then during WPA set section_anchor_flag during symbol merging if it is 
>> set
>> for prevailing decl.
>
> Yes.
>
>> In the increase_alignment_pass if a vnode has section_anchor flag set,
>> we will walk all functions that reference it to check if they have
>> -ftree-loop-vectorize set.
>> Is that correct ?
>
> Yes.
>
>> Could you please elaborate a bit more on "at varpool_finalize_decl you will
>> set section_anchor flag according to flag_varpool" ?
>> flag_varpool doesn't appear to be defined.
>
> flag_section_anchors.
Hi,
I have done the changes in this version
In varpool_node::finalize_decl,
I just set vnode->section_anchor = flag_section_anchors.
Should that be sufficient ?

I tried with a couple of test-cases, once with prevailing->section_anchors == 1
and once with entry->section_anchors == 1 and it appears
prevailing->section_anchor
always took precedence.
So I wonder if the change to lto_symtab_merge () in the patch is necessary ?

Re-introduced flag_ipa_increase_alignment to gate the pass on, so it runs only
for targets supporting section anchors.
Cross tested  on aarch64*-*-*, arm*-*-*.

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> > Honza
>> >> >>
>> >> >> Richard.
>> >> >>
>> >> >> > Thanks,
>> >> >> > Prathamesh
>> >> >> > >
>> >> >> > > Honza
>> >> >> > >>
>> >> >> > >> Richard.
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> --
>> >> >> Richard Biener 
>> >> >> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, 
>> >> >> HRB 21284 (AG Nuernberg)
>> >
>> >> diff --git a/gcc/common.opt b/gcc/common.opt
>> >> index f0d7196..f93f26c 100644
>> >> --- a/gcc/common.opt
>> >> +++ b/gcc/common.opt
>> >> @@ -2133,7 +2133,7 @@ Common Report Var(flag_sched_dep_count_heuristic) 
>> >> Init(1) Optimization
>> >>  Enable the dependent count heuristic in the scheduler.
>> >>
>> >>  fsection-anchors
>> >> -Common Report Var(flag_section_anchors) Optimization
>> >> +Common Report Var(flag_section_anchors)
>> >>  Access data in the same section from shared anchor points.
>> >>
>> >>  fsee
>> >> diff --git a/gcc/passes.def b/gcc/passes.def
>> >> index 3647e90..3a8063c 100644
>> >> --- a/gcc/passes.def
>> >> +++ b/gcc/passes.def
>> >> @@ -138,12 +138,12 @@ along with GCC; see the file COPYING3.  If not see

[PATCH PR71347][Partial revert r235513]Compute cost for all uses in group

2016-06-13 Thread Bin Cheng

Hi,
This patch partially reverts part of r235513 to fix PR71347, the original patch 
is to improve compilation time for a small amount.  Root cause as analyzed in 
bugzilla PR is that we can't skip computing cost for sub iv_use if it has 
different position to the first use in group.  The patch also includes a new 
test.

Bootstrap and test on x86_64.  Is it OK?

Thanks,
bin

2016-05-31  Bin Cheng  

PR tree-optimization/71347
* tree-ssa-loop-ivopts.c (determine_group_iv_cost_address): Compute
cost for all uses in group.

gcc/testsuite/ChangeLog
2016-05-31  Bin Cheng  

PR tree-optimization/71347
* gcc.dg/tree-ssa/pr71347.c: New test.
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1e8d637..25b9780 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5115,7 +5115,7 @@ determine_group_iv_cost_address (struct ivopts_data *data,
 {
   unsigned i;
   bitmap depends_on;
-  bool can_autoinc, first = true;
+  bool can_autoinc;
   iv_inv_expr_ent *inv_expr = NULL;
   struct iv_use *use = group->vuses[0];
   comp_cost sum_cost = no_cost, cost;
@@ -5142,30 +5142,11 @@ determine_group_iv_cost_address (struct ivopts_data 
*data,
 {
   struct iv_use *next = group->vuses[i];
 
-  /* Compute cost for the first use with different offset to the main
-use and add it afterwards.  Costs for these uses could be quite
-different.  Given below uses in a group:
-  use 0  : {base + A + offset_0, step}
-  use 0.1: {base + A + offset_0, step}
-  use 0.2: {base + A + offset_1, step}
-  use 0.3: {base + A + offset_2, step}
-when we need to compute costs with candidate:
-  cand 1 : {base + B + offset_0, step}
-
-The first use with different offset is use 0.2, its cost is larger
-than cost of use 0/0.1 because we need to compute:
-  A - B + offset_1 - offset_0
-  rather than:
-  A - B.  */
-  if (first && next->addr_offset != use->addr_offset)
-   {
- first = false;
- cost = get_computation_cost (data, next, cand, true,
-  NULL, _autoinc, NULL);
- /* Remove setup cost.  */
- if (!cost.infinite_cost_p ())
-   cost -= cost.scratch;
-   }
+  /* TODO: We could skip computing cost for sub iv_use when it has the
+same cost as the first iv_use, but the cost really depends on the
+offset and where the iv_use is.  */
+   cost = get_computation_cost (data, next, cand, true,
+NULL, _autoinc, NULL);
   sum_cost += cost;
 }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
new file mode 100644
index 000..7e5ad49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71347.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+double in;
+extern void Write (double);
+void foo (void)
+{
+  static double X[9];
+  int i;
+X[1] = in * in; 
+for (i = 2; i <= 8; i++)
+X[i] = X[i - 1] * X[1]; 
+Write (X[5]);
+}
+
+/* Load of X[i - i] can be omitted by reusing X[i] in previous iteration.  */
+/* { dg-final { scan-tree-dump-not ".* = MEM.*;" "optimized"} } */

Re: Vectorize 2*x as x+x if needed

2016-06-13 Thread Richard Biener

On Sun, Jun 12, 2016 at 11:19 AM, Marc Glisse  wrote:
> Hello,
>
> canonicalizing x+x to x*2 made us regress some vectorization tests on sparc.
> As suggested by Richard, this lets the vectorizer handle x*2 as x+x if that
> helps. Let me copy a few remarks I had in the PR:
>
> « We could probably also handle x*3 as x+x+x, but where to stop?
>
> I don't understand why the optab test for LSHIFT_EXPR was using
> optab_vector, as far as I understand we are creating vec<<3, so optab_scalar
> makes more sense.

I think it should test both (ok if either one is available) and the
current optab_vector makes more sense
since it is more generic.

Ok with either not changing optab_vector to optab_scalar or testing both with ||

Thanks,
Richard.

> I gave priority to x+x over x<<1, not sure if that's right, it probably
> doesn't matter much as one will probably be turned into the other in later
> passes. »
>
> Rainer bootstrapped and regtested the patch on sparc. As a bonus, it now
> vectorizes one more loop in gcc.dg/vect/vect-iv-9.c, I'll let someone else
> tweak the test (which will temporarily appear as a FAIL).
>
> 2016-06-13  Marc Glisse  
>
> PR tree-optimization/70923
> * tree-vect-patterns.c (vect_recog_mult_pattern): Use optab_scalar
> for LSHIFT_EXPR. Handle 2 * X as X + X.
>
> --
> Marc Glisse
> Index: gcc/tree-vect-patterns.c
> ===
> *** gcc/tree-vect-patterns.c(revision 237336)
> --- gcc/tree-vect-patterns.c(working copy)
> *** vect_recog_vector_vector_shift_pattern (
> *** 2166,2189 
>
> * Return value: A new stmt that will be used to replace the
> multiplication
>   S1 or S2 stmt.  */
>
>   static gimple *
>   vect_recog_mult_pattern (vec *stmts,
>  tree *type_in, tree *type_out)
>   {
> gimple *last_stmt = stmts->pop ();
> tree oprnd0, oprnd1, vectype, itype;
> !   gimple *pattern_stmt, *def_stmt;
> optab optab;
> stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
> !   int power2_val, power2_neg_val;
> tree shift;
>
> if (!is_gimple_assign (last_stmt))
>   return NULL;
>
> if (gimple_assign_rhs_code (last_stmt) != MULT_EXPR)
>   return NULL;
>
> oprnd0 = gimple_assign_rhs1 (last_stmt);
> oprnd1 = gimple_assign_rhs2 (last_stmt);
> --- 2166,2189 
>
> * Return value: A new stmt that will be used to replace the
> multiplication
>   S1 or S2 stmt.  */
>
>   static gimple *
>   vect_recog_mult_pattern (vec *stmts,
>  tree *type_in, tree *type_out)
>   {
> gimple *last_stmt = stmts->pop ();
> tree oprnd0, oprnd1, vectype, itype;
> !   gimple *pattern_stmt;
> optab optab;
> stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
> !   int power2_val;
> tree shift;
>
> if (!is_gimple_assign (last_stmt))
>   return NULL;
>
> if (gimple_assign_rhs_code (last_stmt) != MULT_EXPR)
>   return NULL;
>
> oprnd0 = gimple_assign_rhs1 (last_stmt);
> oprnd1 = gimple_assign_rhs2 (last_stmt);
> *** vect_recog_mult_pattern (vec *
> *** 2203,2261 
>don't attempt to optimize this.  */
> optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default);
> if (optab != unknown_optab)
>   {
> machine_mode vec_mode = TYPE_MODE (vectype);
> int icode = (int) optab_handler (optab, vec_mode);
> if (icode != CODE_FOR_nothing)
> return NULL;
>   }
>
> !   /* If target cannot handle vector left shift then we cannot
> !  optimize and bail out.  */
> !   optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
> !   if (!optab
> !   || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
> ! return NULL;
> !
> !   power2_val = wi::exact_log2 (oprnd1);
> !   power2_neg_val = wi::exact_log2 (wi::neg (oprnd1));
>
> !   /* Handle constant operands that are postive or negative powers of 2.
> */
> !   if (power2_val != -1)
> ! {
> !   shift = build_int_cst (itype, power2_val);
> !   pattern_stmt
> !   = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
> !  LSHIFT_EXPR, oprnd0, shift);
> ! }
> !   else if (power2_neg_val != -1)
>   {
> /* If the target cannot handle vector NEGATE then we cannot
>  do the optimization.  */
> !   optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector);
> if (!optab
>   || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
> return NULL;
>
> !   shift = build_int_cst (itype, power2_neg_val);
> !   def_stmt
> = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
> !  LSHIFT_EXPR, oprnd0, shift);
> !   new_pattern_def_seq (stmt_vinfo, def_stmt);
> pattern_stmt
> != gimple_build_assign (vect_recog_temp_ssa_var

Re: [BUILDROBOT] MPS430 build problem due to new enum

2016-06-13 Thread Martin Liška

On 06/12/2016 01:55 PM, Jan-Benedict Glaw wrote:
> The new `NONE' from your enum clashes with a NONE used in a MSP430
> private enum.
> 
> MfG, JBG

Hi.

Thanks for having heads up, I've been testing following patch. The patch
survives with --target=msp430-elf.

Ready after it finishes?
Thanks,
Martin

>From 540c82e618ef6b38b69160c77533705d4e160895 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 13 Jun 2016 09:20:10 +0200
Subject: [PATCH] Change enum value to not to clash with a MSP430 private enum

gcc/ChangeLog:

2016-06-13  Martin Liska  

	* predict.c (enum predictor_reason): Rename NONE to VALID.
	(combine_predictions_for_insn): Likewise.
	(combine_predictions_for_bb): Likewise.
---
 gcc/predict.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/predict.c b/gcc/predict.c
index 0fa8c5b..e1d161a 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -59,7 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 
 enum predictor_reason
 {
-  NONE,
+  VALID,
   IGNORED,
   SINGLE_EDGE_DUPLICATE,
   EDGE_PAIR_DUPLICATE
@@ -739,7 +739,7 @@ invert_br_probabilities (rtx insn)
 
 static void
 dump_prediction (FILE *file, enum br_predictor predictor, int probability,
-		 basic_block bb, enum predictor_reason reason = NONE,
+		 basic_block bb, enum predictor_reason reason = VALID,
 		 edge ep_edge = NULL)
 {
   edge e = ep_edge;
@@ -864,9 +864,9 @@ combine_predictions_for_insn (rtx_insn *insn, basic_block bb)
   else
 {
   dump_prediction (dump_file, PRED_DS_THEORY, combined_probability,
-		   bb, !first_match ? NONE : IGNORED);
+		   bb, !first_match ? VALID : IGNORED);
   dump_prediction (dump_file, PRED_FIRST_MATCH, best_probability,
-		   bb, first_match ? NONE: IGNORED);
+		   bb, first_match ? VALID : IGNORED);
 }
 
   if (first_match)
@@ -883,7 +883,7 @@ combine_predictions_for_insn (rtx_insn *insn, basic_block bb)
 
 	  dump_prediction (dump_file, predictor, probability, bb,
 			   (!first_match || best_predictor == predictor)
-			   ? NONE : IGNORED);
+			   ? VALID : IGNORED);
 	  *pnote = XEXP (*pnote, 1);
 	}
   else
@@ -1150,9 +1150,9 @@ combine_predictions_for_bb (basic_block bb, bool dry_run)
   else
 {
   dump_prediction (dump_file, PRED_DS_THEORY, combined_probability, bb,
-		   !first_match ? NONE : IGNORED);
+		   !first_match ? VALID : IGNORED);
   dump_prediction (dump_file, PRED_FIRST_MATCH, best_probability, bb,
-		   first_match ? NONE : IGNORED);
+		   first_match ? VALID : IGNORED);
 }
 
   if (first_match)
@@ -1168,7 +1168,7 @@ combine_predictions_for_bb (basic_block bb, bool dry_run)
 
 	  dump_prediction (dump_file, predictor, probability, bb,
 			   (!first_match || best_predictor == predictor)
-			   ? NONE : IGNORED, pred->ep_edge);
+			   ? VALID : IGNORED, pred->ep_edge);
 	}
 }
   clear_bb_predictions (bb);
-- 
2.8.3

Re: [Patch, fotran] PR70673 - [5/6/7 Regression] ICE with module containing functions with allocatable character scalars

2016-06-13 Thread Paul Richard Thomas

Dear All,

Committed to trunk as revision 237358.

5- and 6- branches to follow towards the end of the week.

Cheers

Paul

On 12 June 2016 at 17:21, Thomas Koenig  wrote:
> Hi Paul,
>
>> The fix to eliminate this ICE is trivial.
>
>
> Trivial once you have found it, not so trivial before...
>
>> Bootstrapped and regtested on FC21/x86_64 - OK for 5 to 7 branches?
>
>
> OK.
>
> Thanks a lot for the patch!
>
> Thomas



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein

Re: [BUILDROBOT] MPS430 build problem due to new enum (was: [PATCH 2/2] Add edge predictions pruning)

2016-06-13 Thread Jan Hubicka

> Hi Martin,
> 
> On Thu, 2016-06-09 13:24:10 +0200, Martin Liška  wrote:
> > On 06/08/2016 02:41 PM, Jan Hubicka wrote:
> > > Adding hash for this prupose is bit of an overkill (there are
> > > definitly cheaper ways of solving so), but it will hardly affect compile
> > > time, so the pathc is OK.
> > 
> > Sending the final version where I added comments and I also changed
> > dump scanning to cover the new dump format.
> 
> I just noticed a build problem my Build Robot found
> (http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=569576):
> 
> g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   
> -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall 
> -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. 
> -I/scratch/4/jbglaw/regular/repos/gcc/gcc 
> -I/scratch/4/jbglaw/regular/repos/gcc/gcc/. 
> -I/scratch/4/jbglaw/regular/repos/gcc/gcc/../include 
> -I/scratch/4/jbglaw/regular/repos/gcc/gcc/../libcpp/include  
> -I/scratch/4/jbglaw/regular/repos/gcc/gcc/../libdecnumber 
> -I/scratch/4/jbglaw/regular/repos/gcc/gcc/../libdecnumber/dpd 
> -I../libdecnumber -I/scratch/4/jbglaw/regular/repos/gcc/gcc/../libbacktrace   
> -o predict.o -MT predict.o -MMD -MP -MF ./.deps/predict.TPo 
> /scratch/4/jbglaw/regular/repos/gcc/gcc/predict.c
> /scratch/4/jbglaw/regular/repos/gcc/gcc/predict.c:62:3: error: redeclaration 
> of ‘NONE’
>NONE,

Hmm, namespace conflict.  I guess renaming enum items to REASON_* should solve 
it easily.
Or we can add a namespace.

Martin, both variants of fix are pre-approved.
Honza

Re: RFC (gimplify, openmp): PATCH to is_gimple_reg to check DECL_HAS_VALUE_EXPR_P

2016-06-13 Thread Richard Biener

On Sat, Jun 11, 2016 at 9:30 PM, Jakub Jelinek  wrote:
> On Sat, Jun 11, 2016 at 08:43:06PM +0200, Richard Biener wrote:
>> On June 10, 2016 9:48:45 PM GMT+02:00, Jason Merrill  
>> wrote:
>> >While working on another issue I noticed that is_gimple_reg was happily
>> >
>> >accepting VAR_DECLs with DECL_VALUE_EXPR even when later gimplification
>> >
>> >would replace them with something that is_gimple_reg doesn't like,
>> >leading to trouble.  So I've modified is_gimple_reg to check the
>> >VALUE_EXPR.
>>
>> Can you instead try rejecting them?  I've run into similar issues lately 
>> with is_gimple_val.
>
> I'm afraid that would break OpenMP badly.
> During gimplification, outside of OpenMP contexts we always replace decls
> for their DECL_VALUE_EXPR, but inside of OpenMP contexts we do it only for
> some decls.  In particular, omp_notice_variable returns whether the
> DECL_VALUE_EXPR should be temporarily ignored (if it returns true) or not.
> If DECL_VALUE_EXPR is temporarily ignored, it is only for a short time,
> in particular until the omplower pass, which makes sure that the right thing
> is done with it and everything is regimplified.

Ugh :/  Feels like OMP lowering should happen during gimplification then.
The PR71104 fix (yes, still pending...) runs into this generally with the
change to first gimplify the RHS and then the LHS for assignments
as it affects how rhs_predicate_for works - I've adjusted rhs_predicate_for like

@@ -3771,7 +3771,9 @@ gimplify_init_ctor_eval (tree object, ve
 gimple_predicate
 rhs_predicate_for (tree lhs)
 {
-  if (is_gimple_reg (lhs))
+  if (is_gimple_reg (lhs)
+  && (! DECL_P (lhs)
+ || ! DECL_HAS_VALUE_EXPR_P (lhs)))
 return is_gimple_reg_rhs_or_call;
   else
 return is_gimple_mem_rhs_or_call;

but I don't like this very much either (it's Jasons change but rejecting
decls with value expr instead).

Richard.

> Anyway, looking at Jason's patch, I'm really surprised it didn't break far
> more, it is fine if such an ignored DECL_VALUE_EXPR is considered
> is_gimple_reg.  And I have no idea how else to express this in the IL,
> the DECL_VALUE_EXPR is often something already the FEs set, and we really
> want to replace it with the values in most uses, just can't allow it if we
> want to replace it by something different instead (e.g. privatize in some
> OpenMP/OpenACC region).
>
> Jakub

Re: RFC (gimplify, openmp): PATCH to is_gimple_reg to check DECL_HAS_VALUE_EXPR_P

2016-06-13 Thread Jakub Jelinek

On Mon, Jun 13, 2016 at 11:03:54AM +0200, Richard Biener wrote:
> > I'm afraid that would break OpenMP badly.
> > During gimplification, outside of OpenMP contexts we always replace decls
> > for their DECL_VALUE_EXPR, but inside of OpenMP contexts we do it only for
> > some decls.  In particular, omp_notice_variable returns whether the
> > DECL_VALUE_EXPR should be temporarily ignored (if it returns true) or not.
> > If DECL_VALUE_EXPR is temporarily ignored, it is only for a short time,
> > in particular until the omplower pass, which makes sure that the right thing
> > is done with it and everything is regimplified.
> 
> Ugh :/  Feels like OMP lowering should happen during gimplification then.

That is not really possible.  OMP lowering relies on all the OpenMP clauses
(including implicitly added) to be finalized before it can figure out what
to do.  And to have the OpenMP clauses finalized, you need to gimplify
everything.  So, it is impossible to do it at the same time as
gimplification, it needs to be another pass (whether a separate full pass,
or a "subpass" of the gimplification matters less; though for debugging,
dumps, etc. having it a separate full pass is better; in any case, it needs
another processing of the whole IL, and for the is_gimple_reg case doesn't
change anything, we still need to postpone the DECL_VALUE_EXPR processing
of some decls in certain uses till the second pass or subpass).

Jakub

[PATCH GCC]Resolve compilation time known alias checks in vectorizer

2016-06-13 Thread Bin Cheng

Hi,
GCC vectorizer generates many unnecessary runtime alias checks known at 
compilation time.  For some data-reference pairs, alias relation can be 
resolved at compilation time, in this case, we can simply skip the alias check. 
 For some other data-reference pairs, alias relation can be realized at 
compilation time, in this case, we should return false and simply skip 
vectorizing the loop.  For the second case, the corresponding loop is 
vectorized for nothing because the guard alias condition is bound to false 
anyway.  Vectorizing it not only wastes compilation time, but also slows down 
generated code because GCC fails to resolve these "false" alias check after 
vectorization.  Even in cases it can be resolved (by VRP), GCC fails to cleanup 
all the mess generated in loop versioning.
This looks like a common issue in spec2k6.  For example, in 434.zeusmp/ggen.f, 
there are three loops vectorized but never executed; in 464.h264ref, there are 
loops in which all runtime alias checks are resolved at compilation time thus 
loop versioning is proven unnecessary.  Statistic data also shows that about 
>100 loops are falsely vectorized currently in my build of spec2k6.

This patch is based on 
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00399.html, bootstrap and test on 
x86_64 and AArch64 (ongoing), is it OK?

Thanks,
bin

2016-06-07  Bin Cheng  

* tree-vect-data-refs.c (vect_no_alias_p): New function.
(vect_prune_runtime_alias_test_list): Call vect_no_alias_p to
resolve alias checks which are known at compilation time.
Truncate vector LOOP_VINFO_MAY_ALIAS_DDRS(loop_vinfo) if all
alias checks are resolved at compilation time.diff --git a/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c 
b/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
index 1caca74..ca57a10 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-35-big-array.c
@@ -21,7 +21,9 @@ int main1 ()
 }
 
   /* Dependence analysis fails cause s.a and s.b may overlap.
- Use runtime aliasing test with versioning.  */
+ Try to use runtime aliasing test with versioning, and
+ later versioning/vectorization are skipped because the
+ overlap is proven at compilation time.  */
   for (i = 0; i < N; i++)
 {
   s.a[i] = s.b[i] + 1;
@@ -45,5 +47,5 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { xfail { 
ia64-*-* sparc*-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "can't determine dependence between" 1 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { 
ia64-*-* sparc*-*-* } } } } */
+/* { dg-final { scan-tree-dump "can't determine dependence between" "vect" } } 
*/
diff --git a/gcc/testsuite/gcc.dg/vect/vect-35.c 
b/gcc/testsuite/gcc.dg/vect/vect-35.c
index edbeb1f..76fe32d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-35.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-35.c
@@ -21,7 +21,9 @@ int main1 ()
 }
 
   /* Dependence analysis fails cause s.a and s.b may overlap.
- Use runtime aliasing test with versioning.  */
+ Try to use runtime aliasing test with versioning, and
+ later versioning/vectorization are skipped because the
+ overlap is proven at compilation time.  */
   for (i = 0; i < N; i++)
 {
   s.a[i] = s.b[i] + 1;
@@ -45,5 +47,5 @@ int main (void)
 } 
 
 
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { xfail { 
ia64-*-* sparc*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { 
ia64-*-* sparc*-*-* } } } } */
 /* { dg-final { scan-tree-dump "can't determine dependence between" "vect" } } 
*/
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index ba4d637..c70f658 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -2927,6 +2927,54 @@ vect_vfa_segment_size (struct data_reference *dr, tree 
length_factor)
   return segment_length;
 }
 
+/* Function vect_no_alias_p.
+
+   Given data references A and B whose alias relation is known at
+   compilation time, return TRUE if they do not alias to each other;
+   return FALSE if they do.  SEGMENT_LENGTH_A and SEGMENT_LENGTH_B
+   are the memory lengths accessed by A and B respectively.  */
+
+static bool
+vect_no_alias_p (struct data_reference *a, struct data_reference *b,
+ tree segment_length_a, tree segment_length_b)
+{
+  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST
+ && TREE_CODE (DR_INIT (b)) == INTEGER_CST);
+  if (wi::to_widest (DR_INIT (a)) == wi::to_widest (DR_INIT (b)))
+return false;
+
+  tree seg_a_min = DR_INIT (a);
+  tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),
+   seg_a_min, segment_length_a);
+  /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
+ bytes, e.g., int a[3] -> a[1] range is [a+4, a+16) instead of
+ [a, a+12) */
+  if

Re: [PATCH 1/3][AArch64] Add more choices for the reciprocal square root approximation

2016-06-13 Thread James Greenhalgh

On Fri, Jun 03, 2016 at 04:50:00PM -0500, Evandro Menezes wrote:
> From 763562f829d4fec54d21555b2bfd6478d449294f Mon Sep 17 00:00:00 2001
> From: Evandro Menezes 
> Date: Thu, 3 Mar 2016 18:13:46 -0600
> Subject: [PATCH 1/3] [AArch64] Add more choices for the reciprocal square root
>  approximation
> 
> Allow a target to prefer such operation depending on the operation mode.

This is OK for trunk.

Thanks for the patch, and your work getting it ready for trunk.

Thanks,
James

> 
> 2016-03-03  Evandro Menezes  
> 
> gcc/
>   * config/aarch64/aarch64-protos.h
>   (AARCH64_APPROX_MODE): New macro.
>   (AARCH64_APPROX_{NONE,ALL}): Likewise.
>   (cpu_approx_modes): New structure.
>   (tune_params): New member "approx_modes".
>   * config/aarch64/aarch64-tuning-flags.def
>   (AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
>   * config/aarch64/aarch64.c
>   ({generic,exynosm1,xgene1}_approx_modes): New core
>   "cpu_approx_modes" structures.
>   (generic_tunings): New member "approx_modes".
>   (cortexa35_tunings): Likewise.
>   (cortexa53_tunings): Likewise.
>   (cortexa57_tunings): Likewise.
>   (cortexa72_tunings): Likewise.
>   (exynosm1_tunings): Likewise.
>   (thunderx_tunings): Likewise.
>   (xgene1_tunings): Likewise.
>   (use_rsqrt_p): New argument for the mode and use new member from
>   "tune_params".
>   (aarch64_builtin_reciprocal): Devise mode from builtin.
>   (aarch64_optab_supported_p): New argument for the mode.
>   * doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description.

Re: [PATCH, AARCH64] add qdf24xx tuning structure

2016-06-13 Thread Kyrill Tkachov


Hi Jim,

On 10/06/16 23:48, Jim Wilson wrote:

This adds a tuning structure for qdf24xx.  This was tested with an
aarch64-linux bootstrap and a make check, with no regressions.  I also
tested it with an x86_64-linux C make check to verify that I didn't
break the testsuite for non aarch64 targets.


As this also changes code in the arm backend
it also needs a bootstrap and test on an arm target (arm-none-linux-gnueabihf 
for example).
Can you please confirm that this passes successfully?

This is ok from an arm perspective if testing is ok.

Thanks,
Kyrill


I had to change one testcase because it assumes that a divide by
constant will always be emitted as a multiply.  That actually depends
on the relative costs of multiply, shift, and divide instructions.  I
ended up with a divide instruction for my target, as it has reasonably
fast divide instructions.  I fixed it by adding a -mtune=cortex-a53
option for aarch64 to ensure that we always get the multiply insn.

Jim

[arm-embedded][committed] patch for PR61578

2016-06-13 Thread Andre Vieira (lists)

Hi

Backported the following two patches to embedded-5-branch:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00096.html
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg02028.html

Committed as revisions r237369 and r237371.


gcc/ChangeLog.arm:

2016-06-13  Andre Vieira  

Backport from Mainline
2015-09-25  Vladimir Makarov  

PR target/61578
* lra-constarints.c (match_reload): Check presence of the input pseudo
  in the output pseudo.


2016-06-13 Andre Vieira 
Backport from Mainline
2015-09-01  Vladimir Makarov  

PR target/61578
* lra-lives.c (process_bb_lives): Process move pseudos with the
  same value for copies and preferences
* lra-constraints.c (match_reload): Create match reload pseudo
  with the same value from single dying input pseudo.


Cheers,
Andre

Unreviewed^2 patch

2016-06-13 Thread Rainer Orth

The following patch has remained unreviewed for two weeks:

[build] Handle gas/gld --compress-debug-sections=type
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02325.html

It requires a build maintainer unless one wants to consider it obvious.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH, i386]: Implement PR 71246, Missing built-in functions for float128 NaNs

2016-06-13 Thread Richard Biener

On Fri, 10 Jun 2016, Uros Bizjak wrote:

> Hello!
> 
> Attached patch implements __builtin_nanq and __builtin_nansq
> __float128 functions.
> 
> 2016-06-10  Uros Bizjak  
> 
> PR target/71241
> * config/i386/i386.i386-builtin-types.def (CONST_STRING):
> New primitive type.
> (FLOAT128_FTYPE_CONST_STRING): New function type.
> * config/i386/i386.c (enum ix86_builtins) [IX86_BUILTIN_NANQ]: New.
> [IX86_BUILTIN_NANSQ]: Ditto.
> (ix86_fold_builtin): Handle IX86_BUILTIN_NANQ and IX86_BUILTIN_NANSQ.
> (ix86_init_builtin_types) Declare const_string_type_node.
> Add __builtin_nanq and __builtin_nansq builtin functions.
> (ix86_expand_builtin): Handle IX86_BUILTIN_NANQ and IX86_BUILTIN_NANSQ.
> * doc/extend.texi (x86 Built-in Functions): Document
> __builtin_nanq and __builtin_nansq.
> 
> Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Joseph, does it look OK to you? Richi, I hope I got tree stuff
> implemented correctly.

Hmm, as we already have BUILT_IN_NAN[S] why not add NAN128 and NANS128
variants in the middle-end as we already have NAND128 (for decimal float 
128)?

I don't see why we need target specific builtins for this given you 
simply use middle-end functionality to construct the result.

Richard.

Re: [patch, avr] Fix PR67353

2016-06-13 Thread Pitchumani Sivanupandi

On Fri, 2016-06-10 at 20:08 +0200, Georg-Johann Lay wrote:
> Pitchumani Sivanupandi schrieb:
> > 
> > Hi,
> > 
> > This patch introduces new flags for warning 'misspelled interrupt/
> > signal handler'. Flag -Wmisspelled-isr is enabled by default and it
> > will warn user if the interrupt/ signal handler is without
> > '__vector'
> > prefix. Flag -Wno-misspelled-isr shall be enabled by user to allow
> > custom names, i.e. without __vector prefix.
> > 
> > // avr-gcc -c test.c
> > void custom_interruption(void) __attribute__((signal));
> > void custom_interruption(void) {}
> > 
> > Behavior after applying this patch:
> > 
> > $ avr-gcc test.c 
> > test.c: In function 'custom_interruption':
> > test.c:2:6: warning: 'custom_interruption' appears to be a
> > misspelled
> > signal handler
> >  void custom_interruption(void) {}
> >   ^~~
> > 
> > $ avr-gcc test.c -Wmisspelled-isr
> > test.c: In function
> > 'custom_interruption':
> > test.c:2:6: warning: 'custom_interruption'
> > appears to be a misspelled signal handler
> >  void
> > custom_interruption(void) {}
> >   ^~~
> > 
> > $ avr-gcc test.c -Wno-misspelled-isr
> > $
> > 
> > If OK, could someone commit please? I do not have commit access.
> > 
> > Regards,
> > Pitchumani
> > 
> > gcc/ChangeLog
> > 
> > 2016-06-10  Pitchumani Sivanupandi  
> > 
> Missing PR target/67353
> > 
> > * config/avr/avr.c (avr_set_current_function): Warn misspelled
> > interrupt/ signal handler if warn_misspelled_isr flag is set.
> > * config/avr/avr.opt (Wmisspelled-isr): New warning flag.
> > Enabled
> > by default to warn misspelled interrupt/ signal handler.
> Shouldn't it also be documented in doc/invoke.texi?

Thanks Johann.

Updated the patch. Updated description for -nodevicelib option
as well, device library should be lib.a.

Regards,
Pitchumani

gcc/ChangeLog

2016-06-10  Pitchumani Sivanupandi  

    PR target/67353
* config/avr/avr.c (avr_set_current_function): Warn misspelled
interrupt/ signal handler if warn_misspelled_isr flag is set.
* config/avr/avr.opt (Wmisspelled-isr): New warning flag. Enabled
by default to warn misspelled interrupt/ signal handler.
    * doc/invoke.texi (AVR Options): Document it. Update description
    for -nodevicelib option.diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index ba5cd91..587bdbc 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -753,7 +753,7 @@ avr_set_current_function (tree decl)
  that the name of the function is "__vector_NN" so as to catch
  when the user misspells the vector name.  */
 
-  if (!STR_PREFIX_P (name, "__vector"))
+  if ((!STR_PREFIX_P (name, "__vector")) && (avr_warn_misspelled_isr))
 warning_at (loc, 0, "%qs appears to be a misspelled %s handler",
 name, isr);
 }
diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index 8809b9b..0703f5a 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -91,6 +91,10 @@ Waddr-space-convert
 Warning C Report Var(avr_warn_addr_space_convert) Init(0)
 Warn if the address space of an address is changed.
 
+Wmisspelled-isr
+Target Warning Report Var(avr_warn_misspelled_isr) Init(1)
+Warn if the ISR is misspelled, i.e. without __vector prefix. Enabled by default.
+
 mfract-convert-truncate
 Target Report Mask(FRACT_CONV_TRUNC)
 Allow to use truncation instead of rounding towards 0 for fractional int types.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa11209..0bf39c5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -640,7 +640,8 @@ Objective-C and Objective-C++ Dialects}.
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
 -mcall-prologues -mint8 -mn_flash=@var{size} -mno-interrupts @gol
--mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert}
+-mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert @gol
+-Wmisspelled-isr}
 
 @emph{Blackfin Options}
 @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol
@@ -14554,12 +14555,17 @@ Only change the lower 8@tie{}bits of the stack pointer.
 
 @item -nodevicelib
 @opindex nodevicelib
-Don't link against AVR-LibC's device specific library @code{libdev.a}.
+Don't link against AVR-LibC's device specific library @code{lib.a}.
 
 @item -Waddr-space-convert
 @opindex Waddr-space-convert
 Warn about conversions between address spaces in the case where the
 resulting address space is not contained in the incoming address space.
+
+@item -Wmisspelled-isr
+@opindex Wmisspelled-isr
+Warn if the ISR is misspelled, i.e. without __vector prefix.
+Enabled by default.
 @end table
 
 @subsubsection @code{EIND} and Devices with More Than 128 Ki Bytes of Flash

Re: [PATCH] fold-const: Don't access bit fields with too big mode (PR71310)

2016-06-13 Thread Richard Biener

On Fri, 10 Jun 2016, Segher Boessenkool wrote:

> Currently, optimize_bit_field_compare reads the bitfield in word_mode
> if it can.  If the bit field is normally accessed in a smaller mode,
> this might be a violation of the memory model, although the "extra"
> part of the read is not used.  But also, previous stores to the bit
> field will have been done in the smaller mode, and then bigger loads
> from it cause a LHS problem.
> 
> Bootstrapped and regchecked on powerpc64-linux.  Is this okay for
> trunk?

I think you miss a && DECL_BIT_FIELD_TYPE (TREE_OPERAND (lhs, 1))
after the COMPONENT_REF check.  Also while this change is certainly
correct it might be not complete dependent on how the code computes
the access offset - for the C++ memory model the access needs to
be constrained to the DECL_BIT_FIELD_REPRESENTATIVE extent.  That is,
a C++ memory model fix would be to supply the bitregion_start and
bitregion_end arguments to get_best_mode.  RTL expansion uses
expr.c:get_bit_range as a helper for this.  I guess the best thing
would be to export it and re-use it here.

Disclaimer: I think this "optimization" does not belong in
fold-const.c and is very premature at the time we invoke it
(when folding GENERIC from the frontends).

Thanks,
Richard.


> 
> Segher
> 
> 
> 2016-06-10  Segher Boessenkool  
> 
>   PR middle-end/71310
>   * fold-const.c (optimize_bit_field_compare): Don't try to use
>   word_mode unconditionally for reading the bit field, look at
>   DECL_BIT_FIELD_REPRESENTATIVE instead.
> 
> ---
>  gcc/fold-const.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 5058746..f067001 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -3903,13 +3903,24 @@ optimize_bit_field_compare (location_t loc, enum 
> tree_code code,
> return 0;
> }
>  
> +  /* Don't use a larger mode for reading the bit field than we will
> + use in other places accessing the bit field.  */
> +  machine_mode largest_mode = word_mode;
> +  if (TREE_CODE (lhs) == COMPONENT_REF)
> +{
> +  tree field = TREE_OPERAND (lhs, 1);
> +  tree repr = DECL_BIT_FIELD_REPRESENTATIVE (field);
> +  if (repr)
> + largest_mode = DECL_MODE (repr);
> +}
> +
>/* See if we can find a mode to refer to this field.  We should be able to,
>   but fail if we can't.  */
>nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
>const_p ? TYPE_ALIGN (TREE_TYPE (linner))
>: MIN (TYPE_ALIGN (TREE_TYPE (linner)),
>   TYPE_ALIGN (TREE_TYPE (rinner))),
> -  word_mode, false);
> +  largest_mode, false);
>if (nmode == VOIDmode)
>  return 0;
>  
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH]: Restore bootstrap with gcc < 4.3

2016-06-13 Thread Uros Bizjak

Hello!

The new test finalization self tests fail wigh gcc < 4.3 due to the
way need_finalization_p is defined:

template
static inline bool
need_finalization_p ()
{
#if GCC_VERSION >= 4003
  return !__has_trivial_destructor (T);
#else
  return true;
#endif
}

It is obvious that checking for

   ASSERT_FALSE (need_finalization_p  ());

will always fail. Checking need_finalization_p is meaningless with gcc < 4.3.

2016-06-13  Uros Bizjak  

* ggc-tests.c (test_finalization): Only test need_finalization_p
for GCC_VERSION >= 4003.

Bootstrapped on x86_64-linux-gnu, CentOS 5.11.

OK for mainline?

Uros.

diff --git a/gcc/ggc-tests.c b/gcc/ggc-tests.c
index 48eac03..7f97231 100644
--- a/gcc/ggc-tests.c
+++ b/gcc/ggc-tests.c
@@ -190,8 +190,10 @@ int test_struct_with_dtor::dtor_call_count;
 static void
 test_finalization ()
 {
+#if GCC_VERSION >= 4003
   ASSERT_FALSE (need_finalization_p  ());
   ASSERT_TRUE (need_finalization_p  ());
+#endif

   /* Create some garbage.  */
   const int count = 10;

[PATCH Obvious]Check gimple seq before inserting it.

2016-06-13 Thread Bin Cheng

Hi,
This is an obvious change which checks if gimple seq is empty before inserting 
it.  I built spec2k6 and found the gimple seq in most (if not all) cases is 
empty, we can save a function call here.

Build on x86_64.

Thanks,
bin

2016-05-17 Bin Cheng  

* tree-vect-loop.c (vect_create_epilog_for_reduction): Only
insert gimple seq if it's not empty.diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index d673c67..0aad964 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4373,7 +4373,9 @@ vect_create_epilog_for_reduction (vec vect_defs, 
gimple *stmt,
   gimple_seq stmts;
   vec_init_def = force_gimple_operand (vec_initial_defs[i], ,
   true, NULL_TREE);
-  gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
+  if (stmts)
+   gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
+
   def = vect_defs[i];
   for (j = 0; j < ncopies; j++)
 {

Re: [BUILDROBOT] MPS430 build problem due to new enum

2016-06-13 Thread Martin Liška

On 06/13/2016 10:46 AM, Jan Hubicka wrote:
> Hmm, namespace conflict.  I guess renaming enum items to REASON_* should 
> solve it easily.
> Or we can add a namespace.
> 
> Martin, both variants of fix are pre-approved.
> Honza

OK, I've just installed (r237370) a patch that prefixes all enum values.

Martin

Re: [PATCH PR71347][Partial revert r235513]Compute cost for all uses in group

2016-06-13 Thread Richard Biener

On Mon, Jun 13, 2016 at 11:57 AM, Bin Cheng  wrote:
> Hi,
> This patch partially reverts part of r235513 to fix PR71347, the original 
> patch is to improve compilation time for a small amount.  Root cause as 
> analyzed in bugzilla PR is that we can't skip computing cost for sub iv_use 
> if it has different position to the first use in group.  The patch also 
> includes a new test.
>
> Bootstrap and test on x86_64.  Is it OK?

Ok.

Richard.

> Thanks,
> bin
>
> 2016-05-31  Bin Cheng  
>
> PR tree-optimization/71347
> * tree-ssa-loop-ivopts.c (determine_group_iv_cost_address): Compute
> cost for all uses in group.
>
> gcc/testsuite/ChangeLog
> 2016-05-31  Bin Cheng  
>
> PR tree-optimization/71347
> * gcc.dg/tree-ssa/pr71347.c: New test.

Re: Container debug light mode

2016-06-13 Thread Jonathan Wakely


On 08/06/16 22:53 +0200, François Dumont wrote:

Hi

   Here is the patch I already proposed to introduce the debug light 
mode for vector and deque containers.


   It also simplify some internal calls.


This looks great, and I'd like to see it on trunk, but could you split
it into two patches please? The simplifications to use
__iterator_category and replace insert() with _M_insert_* are good but
are unrelated to the debug mode parts so if there are two separate
commits it's easier to backport one piece separately, or to identify
any regressions that might be introduced.

Re: [PATCH][ARM] Fix gcc.target/arm/builtin-bswap16-1.c

2016-06-13 Thread Christophe Lyon

On 10 June 2016 at 11:28, Kyrill Tkachov  wrote:
> Ping.
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00249.html
>

I tested this patch in my usual matrix, and it's OK for me.

Christophe

> Thanks,
> Kyrill
>
>
> On 03/06/16 09:30, Kyrill Tkachov wrote:
>>
>> Hi all,
>>
>> The test gcc.target/arm/builtin-bswap16-1.c refuses to compile when
>> testing a toolchain configured with
>> --with-mode=thumb --with-float=hard and an architecture that supports
>> Thumb2.
>> This is because the test explicitly sets the -march option to armv6 and we
>> get an error complaining
>> about Thumb1 used with the hard-float ABI.
>>
>> The proposed solution in this patch is to bump the architecture to armv6t2
>> so that it uses Thumb2 when
>> -mthumb is used.
>>
>> But we don't want to lose Thumb1 test coverage. So this patch moves the
>> actual C code into a separate
>> .x file and includes it in two different tests, each testing Thumb1 or
>> Thumb2.
>>
>> The new test passes and builtin-bswap16-1.c also now passes rather than
>> complaining about the float ABI.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> 2016-06-03  Kyrylo Tkachov  
>>
>> * gcc.target/arm/builtin-bswap16-1.c: Add -mfloat-abi=soft
>> and -mthumb to dg-options.  Include builtin-bswap16.x.
>> * gcc.target/arm/builtin-bswap16: New file.
>> * gcc.target/arm/builtin-bswap16-2.c: New test.
>
>

[Ada] Improve alignment computation for allocators

2016-06-13 Thread Eric Botcazou

Allocators return pointer to void internally so the derived alignment is the 
minimal one for known_alignment, although we know that it's always larger.

Tested on x86_64-suse-linux, applied on the mainline.


2016-06-13  Eric Botcazou  

* gcc-interface/utils2.c (known_alignment) : Deal specially
with calls to malloc.

-- 
Eric BotcazouIndex: gcc-interface/utils2.c
===
--- gcc-interface/utils2.c	(revision 237323)
+++ gcc-interface/utils2.c	(working copy)
@@ -171,6 +171,10 @@ known_alignment (tree exp)
 
 case CALL_EXPR:
   {
+	tree func = get_callee_fndecl (exp);
+	if (func && DECL_IS_MALLOC (func))
+	  return get_target_system_allocator_alignment () * BITS_PER_UNIT;
+
 	tree t = maybe_inline_call_in_expr (exp);
 	if (t)
 	  return known_alignment (t);

Re: [PATCH, AARCH64] add qdf24xx tuning structure

2016-06-13 Thread James Greenhalgh

On Fri, Jun 10, 2016 at 03:48:38PM -0700, Jim Wilson wrote:
> This adds a tuning structure for qdf24xx.  This was tested with an
> aarch64-linux bootstrap and a make check, with no regressions.  I also
> tested it with an x86_64-linux C make check to verify that I didn't
> break the testsuite for non aarch64 targets.
> 
> I had to change one testcase because it assumes that a divide by
> constant will always be emitted as a multiply.  That actually depends
> on the relative costs of multiply, shift, and divide instructions.  I
> ended up with a divide instruction for my target, as it has reasonably
> fast divide instructions.  I fixed it by adding a -mtune=cortex-a53
> option for aarch64 to ensure that we always get the multiply insn.
> 
> Index: config/arm/aarch-cost-tables.h
> ===
> --- config/arm/aarch-cost-tables.h(revision 237273)
> +++ config/arm/aarch-cost-tables.h(working copy)
> @@ -537,4 +537,107 @@ const struct cpu_cost_table xgene1_extra_costs =
>}
>  };
>  
> +const struct cpu_cost_table qdf24xx_extra_costs =
> +{

<...snip...>

> +  {
> +/* FP SFmode */
> +{
> +  COSTS_N_INSNS (6),  /* div.  */
> +  COSTS_N_INSNS (5),   /* mult.  */
> +  COSTS_N_INSNS (5),   /* mult_addsub. */
> +  COSTS_N_INSNS (5),   /* fma.  */
> +  COSTS_N_INSNS (3),   /* addsub.  */
> +  COSTS_N_INSNS (1),   /* fpconst. */
> +  COSTS_N_INSNS (1),   /* neg.  */
> +  COSTS_N_INSNS (2),   /* compare.  */
> +  COSTS_N_INSNS (4),   /* widen.  */
> +  COSTS_N_INSNS (4),   /* narrow.  */
> +  COSTS_N_INSNS (4),   /* toint.  */
> +  COSTS_N_INSNS (4),   /* fromint.  */
> +  COSTS_N_INSNS (2)/* roundint.  */
> +},
> +/* FP DFmode */
> +{
> +  COSTS_N_INSNS (11),  /* div.  */
> +  COSTS_N_INSNS (6),   /* mult.  */
> +  COSTS_N_INSNS (6),   /* mult_addsub.  */
> +  COSTS_N_INSNS (6),   /* fma.  */
> +  COSTS_N_INSNS (3),   /* addsub.  */
> +  COSTS_N_INSNS (1),   /* fpconst.  */
> +  COSTS_N_INSNS (1),   /* neg.  */
> +  COSTS_N_INSNS (2),   /* compare.  */
> +  COSTS_N_INSNS (4),   /* widen.  */
> +  COSTS_N_INSNS (4),   /* narrow.  */
> +  COSTS_N_INSNS (4),   /* toint.  */
> +  COSTS_N_INSNS (4),   /* fromint.  */
> +  COSTS_N_INSNS (2)/* roundint.  */
> +}
> +  },

Have you seen my recent patch for Cortex-A57 that changes the costs there
to be relative to the cost of a floating-point register to floating-point
register move [1]? I found that gave me a number of
improvements due to comparisons in the compiler that assume a move in a
mode is cheap, and other costs will be defined relative to it.

Did you consider that for the qdf24xx costs?

Otherwise, the AArch64 parts look good to me, but you'll want to wait for
an ARM OK too.

Thanks,
James

[1]: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00251.html

[PATCH PR71354]Require vect_cond in test gcc.dg/vect/vect-23.c

2016-06-13 Thread Bin Cheng

Hi,
This is a simple patch adding vect_cond requirement to case 
gcc.dg/vect/vect-23.c.  

Checked test behavior on sparc64. Is it OK?

Thanks,
bin
gcc/testsuite/ChangeLog
2016-05-31  Bin Cheng  

PR tree-optimization/71354
* gcc.dg/vect/vect-23.c: Add VECT_COND requirement.diff --git a/gcc/testsuite/gcc.dg/vect/vect-23.c 
b/gcc/testsuite/gcc.dg/vect/vect-23.c
index e463f1b..670e3d8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-23.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-23.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_cond } */
 
 #include 
 #include "tree-vect.h"

Re: [PATCH 2/3][AArch64] Emit square root using the Newton series

2016-06-13 Thread James Greenhalgh

On Fri, Jun 03, 2016 at 04:50:16PM -0500, Evandro Menezes wrote:
> >>+return false;
> >>-  emit_insn ((*get_rsqrte_type (mode)) (x0, xsrc));
> >>+  rtx xmsk = gen_reg_rtx (mmsk);
> >>+  if (!recp)
> >>+/* When calculating the approximate square root, compare the argument 
> >>with
> >>+   0.0 and create a mask.  */
> >>+emit_insn (gen_rtx_SET (xmsk, gen_rtx_NEG (mmsk, gen_rtx_EQ (mmsk, src,
> >>+ CONST0_RTX (mode);
> >I guess you've done it this way rather than calling gen_aarch64_cmeq
> >directly to avoid having a switch on mode? I wonder whether it is worth just
> >writing that helper function to make it explicit what instruction we want
> >to match?
> 
> I prefer to avoid calling the gen_...() functions for forward
> portability.  If a future version of the ISA can do it better than
> the explicit gen_...() function, then this just works.  Or at least
> this is the hope.  Again, this is just me.

I prefer calling the gen functions, in the hope that those patterns would
be "upgraded" to cover the new ISA versions. But, I can see your argument
so I'm happy to drop this comment.

> @@ -7369,10 +7372,10 @@ aarch64_builtin_reciprocal (tree fndecl)
>  
>  typedef rtx (*rsqrte_type) (rtx, rtx);
>  
> -/* Select reciprocal square root initial estimate
> -   insn depending on machine mode.  */
> +/* Select reciprocal square root initial estimate insn depending on machine
> +   mode.  */
>  
> -rsqrte_type
> +static rsqrte_type
>  get_rsqrte_type (machine_mode mode)
>  {
>switch (mode)
> @@ -7382,16 +7385,15 @@ get_rsqrte_type (machine_mode mode)
>  case V2DFmode: return gen_aarch64_rsqrte_v2df2;
>  case V2SFmode: return gen_aarch64_rsqrte_v2sf2;
>  case V4SFmode: return gen_aarch64_rsqrte_v4sf2;
> -default: gcc_unreachable ();
> +default:   gcc_unreachable ();
>}
>  }
>  
>  typedef rtx (*rsqrts_type) (rtx, rtx, rtx);
>  
> -/* Select reciprocal square root Newton-Raphson step
> -   insn depending on machine mode.  */
> +/* Select reciprocal square root series step insn depending on machine mode. 
>  */
>  
> -rsqrts_type
> +static rsqrts_type
>  get_rsqrts_type (machine_mode mode)
>  {
>switch (mode)
> @@ -7401,50 +7403,88 @@ get_rsqrts_type (machine_mode mode)
>  case V2DFmode: return gen_aarch64_rsqrts_v2df3;
>  case V2SFmode: return gen_aarch64_rsqrts_v2sf3;
>  case V4SFmode: return gen_aarch64_rsqrts_v4sf3;
> -default: gcc_unreachable ();
> +default:   gcc_unreachable ();
>}
>  }

You'll find these two hunks hit a merge conflict on trunk after Jiong's
recent changes to these pattern names. Just be careful when applying the
patch.

The patch is OK for trunk.

Thanks,
James

> From 5c5c07f38cb06507fe997a890dfc5bae1d3179f6 Mon Sep 17 00:00:00 2001
> From: Evandro Menezes 
> Date: Mon, 4 Apr 2016 11:23:29 -0500
> Subject: [PATCH 2/3] [AArch64] Emit square root using the Newton series
> 
> 2016-04-04  Evandro Menezes  
> Wilco Dijkstra  
> 
> gcc/
>   * config/aarch64/aarch64-protos.h
>   (aarch64_emit_approx_rsqrt): Replace with new function
>   "aarch64_emit_approx_sqrt".
>   (cpu_approx_modes): New member "sqrt".
>   * config/aarch64/aarch64.c
>   (generic_approx_modes): New member "sqrt".
>   (exynosm1_approx_modes): Likewise.
>   (xgene1_approx_modes): Likewise.
>   (aarch64_emit_approx_rsqrt): Replace with new function
>   "aarch64_emit_approx_sqrt".
>   (aarch64_override_options_after_change_1): Handle new option.
>   * config/aarch64/aarch64-simd.md
>   (rsqrt2): Use new function instead.
>   (sqrt2): New expansion and insn definitions.
>   * config/aarch64/aarch64.md: Likewise.
>   * config/aarch64/aarch64.opt
>   (mlow-precision-sqrt): Add new option description.
>   * doc/invoke.texi (mlow-precision-sqrt): Likewise.

Re: [PATCH PR71354]Require vect_cond in test gcc.dg/vect/vect-23.c

2016-06-13 Thread Richard Biener

On Mon, Jun 13, 2016 at 11:58 AM, Bin Cheng  wrote:
> Hi,
> This is a simple patch adding vect_cond requirement to case 
> gcc.dg/vect/vect-23.c.
>
> Checked test behavior on sparc64. Is it OK?

Ok.

Richard.

> Thanks,
> bin
> gcc/testsuite/ChangeLog
> 2016-05-31  Bin Cheng  
>
> PR tree-optimization/71354
> * gcc.dg/vect/vect-23.c: Add VECT_COND requirement.

Re: [Committed] S/390: Fix MAX_ARGS value.

2016-06-13 Thread Andreas Krebbel

On 06/13/2016 11:01 AM, Jakub Jelinek wrote:
> Also, it isn't clear to me, are there any s390 builtins right now that
> actually have 6 arguments (my reading is that you don't count the return
> value into that)?  I.e. beyond the bootstrap issues, is the change actually
> fixing expansion of any builtins (there is if (arity >= MAX_ARGS) check),
> or is the arity 6 case there just for potential further builtins?

No, it doesn't fix a problem with builtin expansion.  I've only backported the 
mainline patch
because it was inconsistent and there might problems arise with warnings as 
well.  I could also have
removed the arity == 6 case.

> My confusion comes from s390-builtin*.def using e.g. DEF_FN_TYPE_6
> which looks to me like actually 5 argument builtin type where the first type
> is the return type.  Wouldn't e.g. gcc/builtin-types.def call it
> DEF_FUNCTION_TYPE_5 (rather than _6)?

Yes. It is inconsistent to builtin-types.def. Not sure if it is worth fixing it.

> Also, where is e.g. __builtin_s390_vstrcbs (as randomly chosen builtin
> using DEF_FN_TYPE_6) covered in the testsuite?

I test the builtins with a script which generates the testcases from 
s390-builtins.def.  The result
are about 1 testcases I didn't want to check in.

-Andreas-

Re: Update probabilities in predict.def to match reality

2016-06-13 Thread Kyrill Tkachov


Hi Honza,

On 07/06/16 20:27, Jan Hubicka wrote:

Hello,
Maritn Liska measured branch predictor hitrates on current tree and SPEC2006.

CPU2006
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
loop iv compare33   0.1%  20.27% /  86.24%   
30630826   30.63M   0.0%
no prediction   10406  19.5%  33.41% /  84.76%   
139755242456  139.76G  14.1%
early return (on trees)  6328  11.9%  54.20% /  86.48%
33569991740   33.57G   3.4%
guessed loop iterations   112   0.2%  62.06% /  64.49%  
958458522  958.46M   0.1%
fail alloc595   1.1%  62.18% / 100.00%  
  595   595.00   0.0%
opcode values positive (on trees)4266   8.0%  64.30% /  91.28%
16931889792   16.93G   1.7%
opcode values nonequal (on trees)6600  12.4%  66.23% /  80.60%
71483051282   71.48G   7.2%
continue  507   0.9%  66.66% /  82.85%
10086808016   10.09G   1.0%
call11351  21.3%  67.16% /  92.24%
34680666103   34.68G   3.5%
loop iterations  2689   5.0%  67.99% /  67.99%   
408309517405  408.31G  41.3%
DS theory   26385  49.4%  68.62% /  85.44%   
146974369890  146.97G  14.9%
const return  271   0.5%  69.39% /  87.09%  
301566712  301.57M   0.0%
pointer (on trees)   6230  11.7%  69.59% /  87.18%
16667735314   16.67G   1.7%
combined53398 100.0%  70.31% /  80.36%   
989164856862  989.16G 100.0%
goto   78   0.1%  70.36% /  96.96%  
951041538  951.04M   0.1%
first match 16607  31.1%  78.00% /  78.42%   
702435244516  702.44G  71.0%
extra loop exit   141   0.3%  82.80% /  88.17% 
16969469421.70G   0.2%
null return   393   0.7%  91.47% /  93.08% 
32686781973.27G   0.3%
loop exit9909  18.6%  91.80% /  92.81%   
282927773783  282.93G  28.6%
guess loop iv compare 178   0.3%  97.81% /  97.85% 
43750864534.38G   0.4%
negative return   277   0.5%  97.94% /  99.23% 
10621190281.06G   0.1%
noreturn call2372   4.4% 100.00% / 100.00% 
83565623238.36G   0.8%
overflow 1282   2.4% 100.00% / 100.00%  
175074177  175.07M   0.0%
zero-sized array  677   1.3% 100.00% / 100.00%  
112723803  112.72M   0.0%
unconditional jump103   0.2% 100.00% / 100.00% 
491001  491.00K   0.0%

We used to track SPEC2000 until 2008 but then the infrastructure broke. The
numbers show some differences to 2008 results:

HEURISTICS BRANCHES  (REL)  HITRATE  COVERAGE  (REL)
DS theory 42611  57.1%  74.54% /  89.71%   9237799352  28.7%
combined  74578 100.0%  72.88% /  90.59%  32201983315 100.0%
opcode values nonequal (on trees)14544  19.5%  72.03% /  88.64%   
3387233627  10.5%
early return (on trees)   11078  14.9%  61.23% /  89.25%   2349499033   7.3%
first match   13249  17.8%  89.11% /  93.08%  15876522911  49.3%
guessed loop iterations2722   3.6%  86.50% /  90.76%   7308035517  22.7%
no prediction 18718  25.1%  34.36% /  86.14%   7087661052  22.0%
call  23937  32.1%  71.38% /  93.08%   3829002205  11.9%
opcode values positive (on trees) 2515   3.4%  72.77% /  86.49%
927995806   2.9%
loop branch 378   0.5%  87.61% /  95.54%   1491510452   4.6%
loop exit  8833  11.8%  91.43% /  94.52%   6538486043  20.3%
loop iterations 912   1.2%  99.11% /  99.11%396451321   1.2%
noreturn call   890   1.2%  99.99% /  99.99%205957905   0.6%
pointer (on trees) 8394  11.3%  85.09% /  94.80%   1315262058   4.1%
negative return 272   0.4%  96.47% /  99.74% 49156319   0.1%
const return551   0.7%  67.92% /  68.97% 96082001   0.3%
__builtin_expect 20   0.0%  0% /  0%0   0.0%
null return 566   0.8%  96.58% /  98.77% 87555632   0.3%

There is some degradation in the combined heuristicshitrate (72.8->70) which 
may be caused
simply by fact that new spec is harder to guess. Main decrease seems to be in 
opcode_positive/nonequal
which may be also attributed to the fact that early opts now optimize out more 
code before
we do the statistics.

There are bugs in few predictors - goto predictor is dead because the FE code 
was dropped,
return predictor is bit random because CFG is optimized (it should probably be 
done in

Re: [PATCH] Add ggc-tests.c

2016-06-13 Thread David Malcolm

On Mon, 2016-06-13 at 13:36 +0200, Ulrich Weigand wrote:
> Gerald Pfeifer wrote:
> 
> > The source code of need_finalization_p in ggc.h reads
> > 
> >template
> >static inline bool
> >need_finalization_p ()
> >{
> >#if GCC_VERSION >= 4003
> >  return !__has_trivial_destructor (T);
> >#else
> >  return true;
> >#endif
> >}
> > 
> > which means your self test is broken by design for any compiler
> > that is not GCC in at least version 4.3, isn't it?
> 
> Just to confirm that I'm seeing the same failure on my SPU
> daily build machine, which is running RHEL 5 with a host
> compiler of GCC 4.1.2.

Sorry about this.

Looks like Uros fixed this in r237381.

Re: [PATCH] Fix bootstrap when user language is not english

2016-06-13 Thread David Malcolm

On Mon, 2016-06-13 at 14:41 +, Bernd Edlinger wrote:
> Hi,
> 
> as noted in PR bootstrap/71481, comment#4 currently
> the trunk fails to bootstrap if the current language is
> not english.  A workaround is possible by setting LANG=C,
> but OTOH it is rather easy to fix, by translating the string
> in the assertion, as it is the only place that is affected by
> the language setting.
> 
> 
> Boot-strapped and reg-tested on trunk with LANG=de_DE.UTF-8.
> OK to commit?

Sorry about the breakage.

I believe I can approve this with my "libcpp"/"diagnostics" hats on, so
LGTM.

That said, should we hardcode LANG=C when running the selftests from
gcc/Makefile.in?


Dave

Fix CASE_CHAIN typos (was: [patch] Fix CASE_LABEL_EXPR documentation in tree.def and tree-cfg.c)

2016-06-13 Thread Thomas Schwinge

Hi!

On Wed, 18 Apr 2012 17:32:08 +0200, Steven Bosscher  
wrote:
> Subject says all. Will commit as obvious.
> 
> * tree.def (CASE_LABEL_EXPR): Fix documentation, mention all operands.
> * tree-cfg.c (edge_to_cases): Fix documentation.

> --- tree.def(revision 186526)
> +++ tree.def(working copy)
> @@ -876,10 +876,16 @@ DEFTREECODE (LOOP_EXPR, "loop_expr", tcc_statement
>   of all the cases.  */
>  DEFTREECODE (SWITCH_EXPR, "switch_expr", tcc_statement, 3)
> 
> -/* Used to represent a case label. The operands are CASE_LOW and
> -   CASE_HIGH, respectively. If CASE_LOW is NULL_TREE, the label is a
> -   'default' label. If CASE_HIGH is NULL_TREE, the label is a normal case
> -   label.  CASE_LABEL is the corresponding LABEL_DECL.  */
> +/* Used to represent a case label.
> +
> +   Operand 0 is CASE_LOW.  It may be NULL_TREE, in which case the label
> + is a 'default' label.
> +   Operand 1 is CASE_HIGH.  If it is NULL_TREE, the label is a simple
> + (one-value) case label.  If it is non-NULL_TREE, the case is a range.
> +   Operand 2 is CASE_LABEL, which is is the corresponding LABEL_DECL.
> +   Operand 4 is CASE_CHAIN.  This operand is only used in tree-cfg.c to
> + speed up the lookup of case labels which use a particular edge in
> + the control flow graph.  */
>  DEFTREECODE (CASE_LABEL_EXPR, "case_label_expr", tcc_statement, 4)

Typo: the last one's operand 3 not 4.  ;-)

> --- tree-cfg.c  (revision 186526)
> +++ tree-cfg.c  (working copy)
> @@ -56,7 +56,7 @@ static const int initial_cfg_capacity = 20;
> 
>  /* This hash table allows us to efficiently lookup all CASE_LABEL_EXPRs
> which use a particular edge.  The CASE_LABEL_EXPRs are chained together
> -   via their TREE_CHAIN field, which we clear after we're done with the
> +   via their CASE_CHAIN field, which we clear after we're done with the
> hash table to prevent problems with duplication of GIMPLE_SWITCHes.
> 
> Access to this list of CASE_LABEL_EXPRs allows us to efficiently

The thing doing the "clear after we're done" likewise needs to get its
documentation updated.  ;-)

As obvious, committed to trunk in r237384:

commit 00091facd9b1a23f371a11b4c48e7a106f6d1011
Author: tschwinge 
Date:   Mon Jun 13 16:10:35 2016 +

Fix CASE_CHAIN typos

gcc/
* tree-cfg.c (edge_to_cases_cleanup): Fix CASE_CHAIN typo.
* tree.def (CASE_LABEL_EXPR): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237384 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  | 5 +
 gcc/tree-cfg.c | 2 +-
 gcc/tree.def   | 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index c2f0f7e..733e512 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-06-13  Thomas Schwinge  
+
+   * tree-cfg.c (edge_to_cases_cleanup): Fix CASE_CHAIN typo.
+   * tree.def (CASE_LABEL_EXPR): Likewise.
+
 2016-06-13  Bernd Edlinger  
 
* input.c (test_builtins): Fix an assertion.
diff --git gcc/tree-cfg.c gcc/tree-cfg.c
index 40e524b..0fac49c 100644
--- gcc/tree-cfg.c
+++ gcc/tree-cfg.c
@@ -1126,7 +1126,7 @@ make_cond_expr_edges (basic_block bb)
 /* Called for each element in the hash table (P) as we delete the
edge to cases hash table.
 
-   Clear all the TREE_CHAINs to prevent problems with copying of
+   Clear all the CASE_CHAINs to prevent problems with copying of
SWITCH_EXPRs and structure sharing rules, then free the hash table
element.  */
 
diff --git gcc/tree.def gcc/tree.def
index d16575a..2c35540 100644
--- gcc/tree.def
+++ gcc/tree.def
@@ -949,7 +949,7 @@ DEFTREECODE (SWITCH_EXPR, "switch_expr", tcc_statement, 3)
Operand 1 is CASE_HIGH.  If it is NULL_TREE, the label is a simple
  (one-value) case label.  If it is non-NULL_TREE, the case is a range.
Operand 2 is CASE_LABEL, which is is the corresponding LABEL_DECL.
-   Operand 4 is CASE_CHAIN.  This operand is only used in tree-cfg.c to
+   Operand 3 is CASE_CHAIN.  This operand is only used in tree-cfg.c to
  speed up the lookup of case labels which use a particular edge in
  the control flow graph.  */
 DEFTREECODE (CASE_LABEL_EXPR, "case_label_expr", tcc_statement, 4)


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-13 Thread Kyrill Tkachov



On 13/06/16 15:48, Marc Glisse wrote:

+  /* All synthesis algorithms require shifts, so bail out early if
+ target cannot vectorize them.  */
+  if (!target_has_vecop_for_code (LSHIFT_EXPR, vectype))
+return false;

Hmm, 2 points:

* Could you use vect_supportable_shift (or equivalent) instead? This way it will work even 
if a target/mode supports vector << scalar and not vector << vector.



Ok, will do.

* This means that we will refuse to vectorize x*2 as x+x, which was the goal of my patch (SPARC VIS has additions, no shift, and limited multiplications, IIRC). I guess it would be possible, as a follow-up (it doesn't have to block your 
patch), not to give up in the no-shift branch, but to handle some small factors with only additions and subtractions. Or to split the emission of shifts to a function that, when shifts are not supported, emulates them with additions. Or 
even emit shifts and rely on expand or vector lowering to turn them to additions (though the estimated cost might be off). Any idea on the best way to handle SPARC?




This is my first time touching the vectorizer so I don't know for sure what 
would be the preferred approach.
Looks like expand_shift_1 in expmed.c already has code to expand a shift as 
additions, though it's gated on rtx costs
which I suppose SPARC won't implement accurately for vector shifts since it 
doesn't support them.
I suppose that code could easily be factored out to do the right thing though.

I think splitting emission of shifts into a function that synthesises them with 
additions when appropriate
would be best.

Kyrill

[PATCH, Fortran] PR71523 - Static variables given automatic initializers with -finit-* and -fmax-stack-var-size

2016-06-13 Thread Fritz Reese

RE: https://gcc.gnu.org/ml/fortran/2016-06/msg00023.html

On Thu, Jun 9, 2016 at 2:01 PM, Fritz Reese  wrote:
> It looks like when -fautomatic and -finit-local-zero are set with
> -fmax-stack-var-size=X, an automatic initializer is generated even for
> variables larger than X which are given static storage, causing such
> static variables to have their value re-initialized upon each entry to
> their namespace.
> ...

After doing more research I noticed PR41860
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41860) was very similar
to this issue, so I've decided this is a bug and created PR71523
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71523). Here's a patch
for it.

The bug seems to be due to an oversight - since the size of a variable
is not known at resolution time when initializer expressions are
applied, -finit-* is too greedy in the case that the variable is large
enough to be removed from the stack according to -fmax-stack-var-size.
This patch removes automatic initializers at translation time which
were inserted by -finit-* (and inserts the appropriate static
initializer) according to -fmax-stack-var-size.

The patch passes all regression tests (on x86_64-redhat-linux),
including the two additional tests of its own demonstrating the issue.

---
Fritz Reese

0001-PR-Fortran-71523.patch
Description: Binary data

Re: [patch, avr] Fix PR67353

2016-06-13 Thread Georg-Johann Lay


Pitchumani Sivanupandi schrieb:

Hi,

This patch introduces new flags for warning 'misspelled interrupt/
signal handler'. Flag -Wmisspelled-isr is enabled by default and it
will warn user if the interrupt/ signal handler is without '__vector'
prefix. Flag -Wno-misspelled-isr shall be enabled by user to allow
custom names, i.e. without __vector prefix.

// avr-gcc -c test.c
void custom_interruption(void) __attribute__((signal));
void custom_interruption(void) {}

Behavior after applying this patch:

$ avr-gcc test.c 
test.c: In function 'custom_interruption':

test.c:2:6: warning: 'custom_interruption' appears to be a misspelled
signal handler
 void custom_interruption(void) {}
  ^~~

$ avr-gcc test.c -Wmisspelled-isr
test.c: In function
'custom_interruption':
test.c:2:6: warning: 'custom_interruption'
appears to be a misspelled signal handler
 void
custom_interruption(void) {}
  ^~~

$ avr-gcc test.c -Wno-misspelled-isr
$


What about -Werror=misspelled-isr?

> [...]

diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index ba5cd91..587bdbc 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -753,7 +753,7 @@ avr_set_current_function (tree decl)
  that the name of the function is "__vector_NN" so as to catch
  when the user misspells the vector name.  */
 
-  if (!STR_PREFIX_P (name, "__vector"))

+  if ((!STR_PREFIX_P (name, "__vector")) && (avr_warn_misspelled_isr))
 warning_at (loc, 0, "%qs appears to be a misspelled %s handler",


If, instead of the "0" the respective OPT_... enum is used in the call 
to warning_at, the -Werror= should work as expected (and explicit "&& 
avr_warn_misspelled_isr" no more needed).


Johann

[libiberty][PATCH] Avoid zero-length VLAs.

2016-06-13 Thread Brooks Moses

Zero-length variable-length-arrays are not allowed in standard C99,
and perhaps more importantly, they cause ASAN to complain.  (See,
e.g., https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00917.html.)

With this patch, the libiberty tests, including demangler-fuzzer, are
ASAN-clean.

- Brooks



 libiberty/ChangeLog 
--- a/libiberty/ChangeLog
+++ b/libiberty/ChangeLog
@@ -1,3 +1,8 @@
+2016-06-12  Brooks Moses  
+
+   * cp-demangle.c (cplus_demangle_print_callback): Avoid zero-length
+   VLAs.
+
 2016-05-31  Alan Modra  

* xmemdup.c (xmemdup): Use xmalloc rather than xcalloc.
 libiberty/cp-demangle.c 
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -4120,8 +4120,10 @@

   {
 #ifdef CP_DYNAMIC_ARRAYS
-__extension__ struct d_saved_scope scopes[dpi.num_saved_scopes];
-__extension__ struct d_print_template temps[dpi.num_copy_templates];
+__extension__ struct d_saved_scope scopes[(dpi.num_saved_scopes > 0)
+ ? dpi.num_saved_scopes : 1];
+__extension__ struct d_print_template temps[(dpi.num_copy_templates > 0)
+   ? dpi.num_copy_templates : 1];

 dpi.saved_scopes = scopes;
 dpi.copy_templates = temps;

Re: [PR middle-end/71373] Document missing OMP_CLAUSE_* in gcc/tree-nested.c

2016-06-13 Thread Thomas Schwinge

Hi!

On Mon, 13 Jun 2016 16:48:56 +0200, Jakub Jelinek  wrote:
> On Mon, Jun 13, 2016 at 04:43:25PM +0200, Thomas Schwinge wrote:
> > On Wed, 01 Jun 2016 17:06:42 +0200, Thomas Schwinge 
> >  wrote:
> > > Here are the OpenACC bits of .
> > 
> > In the PR, Jakub clarified that all the missing other OMP_CLAUSE_* are in
> > fact all unreachable here.

> > The "anything else" default case in fact now is just the non-clause
> > OMP_CLAUSE_ERROR, so when adding a case for that one, we could then
> > remove the default case, and thus get a compiler warning when new clauses
> > are added in the future, without handling them here.  That makes sense to
> > me (would have made apparent much earlier the original problem of missing
> > handling for certain OMP_CLAUSE_*), but based on feedback received, it
> > feels as if I'm the only supporter of such "defensive" programming
> > paradigms?

Any thoughts about that,
?

> > [PR middle-end/71373] Document missing OMP_CLAUSE_* in gcc/tree-nested.c

> Ok, [...]

As posted, committed to trunk in r237386:

commit be2a5a8e8ffd13c099d372c4fcc363d5cd3c83c2
Author: tschwinge 
Date:   Mon Jun 13 16:37:29 2016 +

[PR middle-end/71373] Document missing OMP_CLAUSE_* in gcc/tree-nested.c

gcc/
PR middle-end/71373
* tree-nested.c (convert_nonlocal_omp_clauses)
(convert_local_omp_clauses): Document missing OMP_CLAUSE_*.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237386 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |  4 
 gcc/tree-nested.c | 60 ++-
 2 files changed, 46 insertions(+), 18 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index ff685b1..89098e7 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,5 +1,9 @@
 2016-06-13  Thomas Schwinge  
 
+   PR middle-end/71373
+   * tree-nested.c (convert_nonlocal_omp_clauses)
+   (convert_local_omp_clauses): Document missing OMP_CLAUSE_*.
+
* tree-cfg.c (edge_to_cases_cleanup): Fix CASE_CHAIN typo.
* tree.def (CASE_LABEL_EXPR): Likewise.
 
diff --git gcc/tree-nested.c gcc/tree-nested.c
index 812f619..62cb01f 100644
--- gcc/tree-nested.c
+++ gcc/tree-nested.c
@@ -1203,17 +1203,29 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE_AUTO:
  break;
 
+ /* OpenACC tile clauses are discarded during gimplification.  */
case OMP_CLAUSE_TILE:
- /* OpenACC tile clauses are discarded during gimplification, so we
-don't expect to see anything here.  */
- gcc_unreachable ();
-
+ /* The following clause belongs to the OpenACC cache directive, which
+is discarded during gimplification.  */
case OMP_CLAUSE__CACHE_:
- /* These clauses belong to the OpenACC cache directive, which is
-discarded during gimplification, so we don't expect to see
-anything here.  */
- gcc_unreachable ();
-
+ /* The following clauses are only allowed in the OpenMP declare simd
+directive, so not seen here.  */
+   case OMP_CLAUSE_UNIFORM:
+   case OMP_CLAUSE_INBRANCH:
+   case OMP_CLAUSE_NOTINBRANCH:
+ /* The following clauses are only allowed on OpenMP cancel and
+cancellation point directives, which at this point have already
+been lowered into a function call.  */
+   case OMP_CLAUSE_FOR:
+   case OMP_CLAUSE_PARALLEL:
+   case OMP_CLAUSE_SECTIONS:
+   case OMP_CLAUSE_TASKGROUP:
+ /* The following clauses are only added during OMP lowering; nested
+function decomposition happens before that.  */
+   case OMP_CLAUSE__LOOPTEMP_:
+   case OMP_CLAUSE__SIMDUID_:
+   case OMP_CLAUSE__GRIDDIM_:
+ /* Anything else.  */
default:
  gcc_unreachable ();
}
@@ -1899,17 +1911,29 @@ convert_local_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE_AUTO:
  break;
 
+ /* OpenACC tile clauses are discarded during gimplification.  */
case OMP_CLAUSE_TILE:
- /* OpenACC tile clauses are discarded during gimplification, so we
-don't expect to see anything here.  */
- gcc_unreachable ();
-
+ /* The following clause belongs to the OpenACC cache directive, which
+is discarded during gimplification.  */
case OMP_CLAUSE__CACHE_:
- /* These clauses belong to the OpenACC cache directive, which is
-discarded during gimplification, so we don't expect to see
-anything here.  */
- gcc_unreachable ();
-
+ /* The following clauses are only allowed in the

Re: [PATCH][1/2] Move mult synthesis definitions into a separate file

2016-06-13 Thread Kyrill Tkachov


Hi Richard,

On 13/06/16 15:07, Richard Biener wrote:

On Mon, Jun 13, 2016 at 2:23 PM, Kyrill Tkachov
 wrote:

Hi all,

There are other places besides expand where we might want to synthesize an
integer
multiplication by a constant.  Thankfully the algorithm selection code in
expmed.c
is already quite well separated from the RTL implementation, so if we can
just factor
out the prototype of choose_mult_variant and some enums and structs that it
needs into
a separate header file we can reuse them from other parts of the compiler.

I need this for patch 2/2 which hooks up the vectorizer to synthesize vector
multiplications using sequences of shifts and other arithmetic ops when
appropriate.

The new header is called mult-synthesis.h. Should I add it to some makefile?
grepping around for a bit I'm not sure what to do about it.

Possibly PLUGIN_HEADERS.


Ok.


You could have included expmed.h from the vectorizer, no?  After all this
patch now breaks that things declared in A.h are defined in A.c as you
didn't move choose_mult_variant itself.


I think including expmed.h would work. I thought it defined too many
irrelevant RTL-specific things that you wouldn't want in the vectoriser.
If you don't mind I'm happy to just include expmed.h.
Do we have a rule for defining things delcared in A.h in A.c?
I notice we declare various extern things in rtl.h that aren't defined in
rtl.c, though I suppose that would be an exception...

Thanks,
Kyrill


Thanks,
Richard.


Bootstrapped and tested on arm, aarch64, x86_64.

Thanks,
Kyrill

2016-06-13  Kyrylo Tkachov  

 * mult-synthesis.h: New file.  Add choose_mult_variant prototype.
 * expmed.h: Include mult-synthesis.h
 (enum alg_code): Move to mult-synthesis.h
 (struct mult_cost): Likewise.
 (struct algorithm): Likewise.
 * expmed.c (enum mult_variant): Move to mult-synthesis.h
 (choose_mult_variant): Delete prototype.  Remove static qualifier.

Re: RFC (gimplify, openmp): PATCH to is_gimple_reg to check DECL_HAS_VALUE_EXPR_P

2016-06-13 Thread Jason Merrill

On Mon, Jun 13, 2016 at 5:03 AM, Richard Biener
 wrote:
> On Sat, Jun 11, 2016 at 9:30 PM, Jakub Jelinek  wrote:
>> On Sat, Jun 11, 2016 at 08:43:06PM +0200, Richard Biener wrote:
>>> On June 10, 2016 9:48:45 PM GMT+02:00, Jason Merrill  
>>> wrote:
>>> >While working on another issue I noticed that is_gimple_reg was happily
>>> >
>>> >accepting VAR_DECLs with DECL_VALUE_EXPR even when later gimplification
>>> >
>>> >would replace them with something that is_gimple_reg doesn't like,
>>> >leading to trouble.  So I've modified is_gimple_reg to check the
>>> >VALUE_EXPR.
>>>
>>> Can you instead try rejecting them?  I've run into similar issues lately 
>>> with is_gimple_val.
>>
>> I'm afraid that would break OpenMP badly.
>> During gimplification, outside of OpenMP contexts we always replace decls
>> for their DECL_VALUE_EXPR, but inside of OpenMP contexts we do it only for
>> some decls.  In particular, omp_notice_variable returns whether the
>> DECL_VALUE_EXPR should be temporarily ignored (if it returns true) or not.
>> If DECL_VALUE_EXPR is temporarily ignored, it is only for a short time,
>> in particular until the omplower pass, which makes sure that the right thing
>> is done with it and everything is regimplified.
>
> Ugh :/  Feels like OMP lowering should happen during gimplification then.
> The PR71104 fix (yes, still pending...) runs into this generally with the
> change to first gimplify the RHS and then the LHS for assignments

Yep, that's what led me here, too.

Jason

> as it affects how rhs_predicate_for works - I've adjusted rhs_predicate_for 
> like

> @@ -3771,7 +3771,9 @@ gimplify_init_ctor_eval (tree object, ve
>  gimple_predicate
>  rhs_predicate_for (tree lhs)
>  {
> -  if (is_gimple_reg (lhs))
> +  if (is_gimple_reg (lhs)
> +  && (! DECL_P (lhs)
> + || ! DECL_HAS_VALUE_EXPR_P (lhs)))
>  return is_gimple_reg_rhs_or_call;
>else
>  return is_gimple_mem_rhs_or_call;
>
> but I don't like this very much either (it's Jasons change but rejecting
> decls with value expr instead).
>
> Richard.
>
>> Anyway, looking at Jason's patch, I'm really surprised it didn't break far
>> more, it is fine if such an ignored DECL_VALUE_EXPR is considered
>> is_gimple_reg.  And I have no idea how else to express this in the IL,
>> the DECL_VALUE_EXPR is often something already the FEs set, and we really
>> want to replace it with the values in most uses, just can't allow it if we
>> want to replace it by something different instead (e.g. privatize in some
>> OpenMP/OpenACC region).
>>
>> Jakub

Re: [PATCH] c/69507 - bogus warning: ISO C does not allow ‘alignof (expression)’

2016-06-13 Thread Joseph Myers

On Fri, 27 May 2016, Martin Sebor wrote:

> The patch below adjusts the C alignof pedantic warning to avoid
> diagnosing the GCC extension (__alignof__) and only diagnose
> _Alignof in C99 and prior modes.  This is consistent with how
> __attribute__ ((aligned)) and _Alignas is handled (among other
> extensions vs standard features).

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 1/3] selftest: show values when ASSERT_STREQ fails

2016-06-13 Thread Jeff Law


On 06/09/2016 12:42 PM, David Malcolm wrote:

Rework ASSERT_STREQ so that it prints the actual and expected values
to stderr when it fails (by moving it to a helper function).

gcc/ChangeLog:
* selftest.c (selftest::fail_formatted): New function.
(selftest::assert_streq): New function.
* selftest.h (selftests::fail_formatted): New decl.
(selftest::assert_streq): New decl.
(ASSERT_STREQ): Reimplement in terms of selftest::assert_streq.

OK.
jeff

Re: [PATCH 3/3] pretty-print.c: skip color selftests if GCC_COLORS is set

2016-06-13 Thread Jeff Law


On 06/09/2016 12:42 PM, David Malcolm wrote:

gcc/ChangeLog:
* pretty-print.c (assert_pp_format_colored): Skip the test if
GCC_COLORS is set.
(test_pp_format): Remove comment about GCC_COLORS.

OK.
jeff

[PATCH] Fix SOURCE_DATE_EPOCH handling with -E (PR preprocessor/71183)

2016-06-13 Thread Jakub Jelinek

Hi!

The SOURCE_DATE_EPOCH env var is ignored during -E, which is undesirable
and inconsistent.  The problem is that the appropriate callback for
libcpp is only installed when compiling and not when preprocessing only.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-06-13  Jakub Jelinek  

PR preprocessor/71183
* c-ppoutput.c (init_pp_output): Set cb->get_source_date_epoch
to cb_get_source_date_epoch.

* gcc.dg/cpp/source_date_epoch-3.c: New test.

--- gcc/c-family/c-ppoutput.c.jj2016-01-04 14:55:58.0 +0100
+++ gcc/c-family/c-ppoutput.c   2016-06-12 19:49:50.932112947 +0200
@@ -150,6 +150,7 @@ init_pp_output (FILE *out_stream)
 }
 
   cb->has_attribute = c_common_has_attribute;
+  cb->get_source_date_epoch = cb_get_source_date_epoch;
 
   /* Initialize the print structure.  */
   print.src_line = 1;
--- gcc/testsuite/gcc.dg/cpp/source_date_epoch-3.c.jj   2016-06-12 
19:56:49.988696438 +0200
+++ gcc/testsuite/gcc.dg/cpp/source_date_epoch-3.c  2016-06-12 
19:57:36.648093343 +0200
@@ -0,0 +1,9 @@
+/* PR preprocessor/71183 */
+/* { dg-do preprocess } */
+/* { dg-set-compiler-env-var SOURCE_DATE_EPOCH "630333296" } */
+
+const char *date = __DATE__;
+const char *time = __TIME__;
+
+/* { dg-final { scan-file source_date_epoch-3.i "Dec 22 1989" } } */
+/* { dg-final { scan-file source_date_epoch-3.i "12:34:56" } } */

Jakub

Re: [PATCH], PowerPC: Allow DImode in Altivec registers

2016-06-13 Thread Michael Meissner

It would help if I included the patch.

On Mon, Jun 13, 2016 at 01:28:16PM -0400, Michael Meissner wrote:
> This patch goes through the PowerPC compiler and adds support to allow DImode
> (64-bit integers) into Altivec registers for VSX systems.  It also adds some
> support to allow loading some DImode constants via either ISA 2.07 or ISA 3.0
> instructions.
> 
> I have bootstrapped this with no regressions on both a big endian power7 
> system
> and a little endian power8 system.
> 
> I have run a Spec 2006 INT tests with these changes, and the run times were
> comparable between the original compiler and the compiler with the changes.
> 
> Are these changes ok to install in the trunk?  Assuming they go in the trunk,
> can I install them in the 6.2 branch if they cause no regression?
> 
> Note, I will be away from the office, starting Thursday afternoon (June 16th,
> 2016) and I will return on Monday (June 20th, 2016).  I will not have easy
> access to email during this time.

[gcc]
2016-06-13  Michael Meissner  

* config/rs6000/vsx.md (VSINT_84): Add DImode to enable loading
DImode constants with XXSPLTIB in vector registers.
(vsx_extract_, V2DImode/V2DFmode): Combine both
vsx_extract__internal{1,2} into a single insn that handles
direct move (both ISA 2.07 and ISA 3.0 versions), and optimizes
extraction of the element at the top of the register as a scalar
value.
(vsx_extract__internal1): Likewise.
(vsx_extract__internal2): Likewise.
* config/rs6000/constraints.md (wi constraint): Remove a comment
about DImode not being allowed in Altivec registers.
(wB constraint): New constraint for constants that can be
generated in Altivec registers with VSPLTISW/VUPKHSW.
* config/rs6000/predicates.md (xxspltib_constant_split): Update
comments.
(xxspltib_constant_nosplit): Likewise.
* config/rs6000/rs6000-cpus.def (ISA_2_6_MASKS_SERVER): Add
support for -mupper-regs-di to enable DImode to go into Altivec
registers.
(POWERPC_MASKS): Likewise.
(power7 cpu): Likewise.
* config/rs6000/rs6000.opt (-mupper-regs-di): Likewise.
* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Add support
for DImode being allowed in Altivec registers.  Update wi/wj
constraints.  Set scalar_in_vmx_p flag.
(rs6000_option_override_internal): Add checks for -mupper-regs-di.
(xxspltib_constant_p): Allow CONST_INT's with VOIDmode.  Don't
return true if we could use VSPLTISW/VUPKHSW instead of XXSPLTIB.
(rs6000_opt_masks): Add -mupper-regs-di.
* config/rs6000/rs6000.md (lfiwax): Update clobbers that don't use
direct move to use wi and now wj.
(lfiwzx): Likewise.
(floatsi2_lfiwax_mem): Combine alternatives into a single
alternative.
(floatunssi2_lfiwzx_mem): Likewise.
(fix_truncdi2_fctidz): Change second alternative to allow
any VSX register, instead of just Altivec registers, to allow
either operand to be an Altivec register or both.
(fixuns_truncdi2_fctiduz): Likewise.
(movdi_internal32): Add support for -mupper-regs-di.  Add support
to load constants via XXSPLTIB or VSPLTISW.  Add spacing to allow
the alternatives and attributes to be lined up to be easier to
read.
(movdi_internal64): Likewise.
(64-bit DImode splitters): Change predicates to only split loading
up GPR registers.  Add splits for using XXSPLTIB or VSPLTISW to
load constants in ISA 3.0 or ISA 2.07 respectively.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document
-mupper-regs-di.  Update -mupper-regs-df and -mupper-regs-sf to
mention -mcpu=power9 sets these options.
* doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
wB constraint.

[gcc/testsuite]
2016-06-13  Michael Meissner  

* gcc.target/powerpc/p9-dimode1.c: New test.
* gcc.target/powerpc/p9-dimode2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 237222)
+++ gcc/config/rs6000/vsx.md(.../gcc/config/rs6000) (working copy)
@@ -260,7 +260,7 @@ (define_mode_attr VS_64reg [(V2DF   "ws")
(V2DI   "wi")])
 
 ;; Iterators for loading constants with xxspltib
-(define_mode_iterator VSINT_84  [V4SI V2DI])
+(define_mode_iterator VSINT_84  [V4SI V2DI DI])
 (define_mode_iterator VSINT_842 [V8HI V4SI V2DI])
 
 ;; Constants for creating unspecs
@@ -2095,77 +2095,69 @@

Re: Fix pure/const discovery WRT interposition part 2

2016-06-13 Thread H.J. Lu

On Sat, Apr 16, 2016 at 9:47 AM, Jan Hubicka  wrote:
> Hi,
> this patch updates ipa-pure-const.c to only propagate PURE flag across
> calls that does not bind to local defs and are not explicitly declared const.
> This gets memory state into shape that the callee produced by other compiler
> and still accessing memory is safe.
>
> We need similar logic for -fnon-call-exceptions which I will do incrementally.
> We also want to track if the original unoptimized body did access memory but
> that needs frontend changes because memory accesses may get folded away during
> parsing.
>
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>
> Honza
>
> PR ipa/70018
> * cgraph.c (cgraph_set_const_flag_1): Only set as pure if
> function does not bind to current def.
> * ipa-pure-const.c (worse_state): Add FROM and TO parameters;
> handle conservatively calls to functions that does not need to bind
> to current def.
> (check_call): Update call of worse_state.
> (ignore_edge_for_nothrow): Update.
> (ignore_edge_for_pure_const): Likewise.
> (propagate_pure_const): Update calls to worse_state.
> (skip_function_for_local_pure_const): Reformat comments.
>

This cased:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71524

H.J.

Re: [patch, avr] Fix PR67353

2016-06-13 Thread Bernhard Reutner-Fischer

On June 13, 2016 5:48:43 PM GMT+02:00, Georg-Johann Lay  wrote:
>Pitchumani Sivanupandi schrieb:
>> Hi,
>> 
>> This patch introduces new flags for warning 'misspelled interrupt/
>> signal handler'. Flag -Wmisspelled-isr is enabled by default and it
>> will warn user if the interrupt/ signal handler is without '__vector'
>> prefix. Flag -Wno-misspelled-isr shall be enabled by user to allow
>> custom names, i.e. without __vector prefix.
>> 
>> // avr-gcc -c test.c
>> void custom_interruption(void) __attribute__((signal));
>> void custom_interruption(void) {}
>> 
>> Behavior after applying this patch:
>> 
>> $ avr-gcc test.c 
>> test.c: In function 'custom_interruption':
>> test.c:2:6: warning: 'custom_interruption' appears to be a misspelled
>> signal handler
>>  void custom_interruption(void) {}
>>   ^~~
>> 
>> $ avr-gcc test.c -Wmisspelled-isr
>> test.c: In function
>> 'custom_interruption':
>> test.c:2:6: warning: 'custom_interruption'
>> appears to be a misspelled signal handler
>>  void
>> custom_interruption(void) {}
>>   ^~~
>> 
>> $ avr-gcc test.c -Wno-misspelled-isr
>> $
>
>What about -Werror=misspelled-isr?
>
> > [...]
>> diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
>> index ba5cd91..587bdbc 100644
>> --- a/gcc/config/avr/avr.c
>> +++ b/gcc/config/avr/avr.c
>> @@ -753,7 +753,7 @@ avr_set_current_function (tree decl)
>>   that the name of the function is "__vector_NN" so as to
>catch
>>   when the user misspells the vector name.  */
>>  
>> -  if (!STR_PREFIX_P (name, "__vector"))
>> +  if ((!STR_PREFIX_P (name, "__vector")) &&
>(avr_warn_misspelled_isr))
>>  warning_at (loc, 0, "%qs appears to be a misspelled %s
>handler",
>
>If, instead of the "0" the respective OPT_... enum is used in the call 
>to warning_at, the -Werror= should work as expected (and explicit "&& 
>avr_warn_misspelled_isr" no more needed).

And maybe even mention __vector in the message?
thanks,

Re: [PATCH 3/8] nvptx -muniform-simt

2016-06-13 Thread Alexander Monakov

On Sun, 12 Jun 2016, Sandra Loosemore wrote:
> On 06/09/2016 10:53 AM, Alexander Monakov wrote:
> > +@item -muniform-simt
> > +@opindex muniform-simt
> > +Generate code that allows to keep all lanes in each warp active, even when
> 
> Allows *what* to keep?  E.g. what is doing the keeping here?  If it is the
> generated code itself, please rephrase as
> 
> Generate code that keeps

Let me try to expand and rephrase what I meant:

Allows the compiler to emit code that, at run time, may have all lanes active,
particularly in those regions of the program where observable effects from
execution must happen as if one lane is active (outside of SIMD loops).

But nevertheless generated code can run just like conventionally generated
code does: with each lane being active/inactive independently, and side
effects happening from each active lane (inside of SIMD loops).

Whether it actually runs in the former (let's call it "uniform") or the latter
("conventional") way is switchable at run time. The compiler itself is
responsible for emitting mode changes at SIMD region boundaries.

Does this help? Below I went with your suggestion, but changed "keeps" to "may
keep" because that's generally true only outside of SIMD regions.

> > +observable effects from execution should appear as if only one lane was
> 
> s/was/is/
> 
> > +active. This is achieved by instrumenting syscalls and atomic instructions
> > in
> > +a lightweight way that allows to switch behavior at runtime. This code
> 
> Same issue here  allows *what* to switch behavior?  (And how would you
> select which run-time behavior you want?)

Sorry. This gives compiler itself a way to emit code that will switch behavior
of the subsequently running code.

> Also, in the snippet above where it is used as a noun, please
> s/runtime/run time/

Thanks. Does the following look better?

@item -muniform-simt
@opindex muniform-simt
Generate code that may keep all lanes in each warp active, even when
observable effects from execution must appear as if only one lane is active.
This is achieved by instrumenting syscalls and atomic instructions in a
lightweight way, allowing the compiler to emit code that can switch at run
time between this and conventional execution modes. This code generation
variant is used for OpenMP offloading, but the option is exposed on its own
for the purpose of testing the compiler; to generate code suitable for linking
into programs using OpenMP offloading, use option @option{-mgomp}.

Alexander

Fix oversight in vn_reference_lookup_3

2016-06-13 Thread Eric Botcazou

The second test on shared_lookup_references in the block:

  /* We need to pre-pend vr->operands[0..i] to rhs.  */
  vec old = vr->operands;
  if (i + 1 + rhs.length () > vr->operands.length ())
{
  vr->operands.safe_grow (i + 1 + rhs.length ());
  if (old == shared_lookup_references)
shared_lookup_references = vr->operands;
}
  else
vr->operands.truncate (i + 1 + rhs.length ());
  FOR_EACH_VEC_ELT (rhs, j, vro)
vr->operands[i + 1 + j] = *vro;
  vr->operands = valueize_refs (vr->operands);
  if (old == shared_lookup_references)
shared_lookup_references = vr->operands;

is bypassed when the first test is true because "old" contains a stalled value 
of shared_lookup_references.  This may result in either memory corruption 
(when checking is disabled) or in the failure of one of the assertions:

  gcc_checking_assert (vr1.operands == shared_lookup_references);

in vn_reference_lookup_pieces or vn_reference_lookup.  This was caught on a 
big proprietary Ada application in LTO mode.

Tested on x86_64-suse-linux, approved privately by Richard B., applied on the 
mainline and 6 branch.


2016-06-13  Eric Botcazou  

* tree-ssa-sccvn.c (vn_reference_lookup_3): Use a uniform test and
update shared_lookup_references only once after changing operands.

-- 
Eric BotcazouIndex: tree-ssa-sccvn.c
===
--- tree-ssa-sccvn.c	(revision 237323)
+++ tree-ssa-sccvn.c	(working copy)
@@ -2089,11 +2089,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   /* We need to pre-pend vr->operands[0..i] to rhs.  */
   vec old = vr->operands;
   if (i + 1 + rhs.length () > vr->operands.length ())
-	{
-	  vr->operands.safe_grow (i + 1 + rhs.length ());
-	  if (old == shared_lookup_references)
-	shared_lookup_references = vr->operands;
-	}
+	vr->operands.safe_grow (i + 1 + rhs.length ());
   else
 	vr->operands.truncate (i + 1 + rhs.length ());
   FOR_EACH_VEC_ELT (rhs, j, vro)
@@ -2244,8 +2240,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	{
 	  vec old = vr->operands;
 	  vr->operands.safe_grow_cleared (2);
-	  if (old == shared_lookup_references
-	  && vr->operands != old)
+	  if (old == shared_lookup_references)
 	shared_lookup_references = vr->operands;
 	}
   else

[PATCH], PowerPC: Allow DImode in Altivec registers

2016-06-13 Thread Michael Meissner

This patch goes through the PowerPC compiler and adds support to allow DImode
(64-bit integers) into Altivec registers for VSX systems.  It also adds some
support to allow loading some DImode constants via either ISA 2.07 or ISA 3.0
instructions.

I have bootstrapped this with no regressions on both a big endian power7 system
and a little endian power8 system.

I have run a Spec 2006 INT tests with these changes, and the run times were
comparable between the original compiler and the compiler with the changes.

Are these changes ok to install in the trunk?  Assuming they go in the trunk,
can I install them in the 6.2 branch if they cause no regression?

Note, I will be away from the office, starting Thursday afternoon (June 16th,
2016) and I will return on Monday (June 20th, 2016).  I will not have easy
access to email during this time.

[gcc]
2016-06-13  Michael Meissner  

* config/rs6000/vsx.md (VSINT_84): Add DImode to enable loading
DImode constants with XXSPLTIB in vector registers.
(vsx_extract_, V2DImode/V2DFmode): Combine both
vsx_extract__internal{1,2} into a single insn that handles
direct move (both ISA 2.07 and ISA 3.0 versions), and optimizes
extraction of the element at the top of the register as a scalar
value.
(vsx_extract__internal1): Likewise.
(vsx_extract__internal2): Likewise.
* config/rs6000/constraints.md (wi constraint): Remove a comment
about DImode not being allowed in Altivec registers.
(wB constraint): New constraint for constants that can be
generated in Altivec registers with VSPLTISW/VUPKHSW.
* config/rs6000/predicates.md (xxspltib_constant_split): Update
comments.
(xxspltib_constant_nosplit): Likewise.
* config/rs6000/rs6000-cpus.def (ISA_2_6_MASKS_SERVER): Add
support for -mupper-regs-di to enable DImode to go into Altivec
registers.
(POWERPC_MASKS): Likewise.
(power7 cpu): Likewise.
* config/rs6000/rs6000.opt (-mupper-regs-di): Likewise.
* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Add support
for DImode being allowed in Altivec registers.  Update wi/wj
constraints.  Set scalar_in_vmx_p flag.
(rs6000_option_override_internal): Add checks for -mupper-regs-di.
(xxspltib_constant_p): Allow CONST_INT's with VOIDmode.  Don't
return true if we could use VSPLTISW/VUPKHSW instead of XXSPLTIB.
(rs6000_opt_masks): Add -mupper-regs-di.
* config/rs6000/rs6000.md (lfiwax): Update clobbers that don't use
direct move to use wi and now wj.
(lfiwzx): Likewise.
(floatsi2_lfiwax_mem): Combine alternatives into a single
alternative.
(floatunssi2_lfiwzx_mem): Likewise.
(fix_truncdi2_fctidz): Change second alternative to allow
any VSX register, instead of just Altivec registers, to allow
either operand to be an Altivec register or both.
(fixuns_truncdi2_fctiduz): Likewise.
(movdi_internal32): Add support for -mupper-regs-di.  Add support
to load constants via XXSPLTIB or VSPLTISW.  Add spacing to allow
the alternatives and attributes to be lined up to be easier to
read.
(movdi_internal64): Likewise.
(64-bit DImode splitters): Change predicates to only split loading
up GPR registers.  Add splits for using XXSPLTIB or VSPLTISW to
load constants in ISA 3.0 or ISA 2.07 respectively.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document
-mupper-regs-di.  Update -mupper-regs-df and -mupper-regs-sf to
mention -mcpu=power9 sets these options.
* doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
wB constraint.

[gcc/testsuite]
2016-06-13  Michael Meissner  

* gcc.target/powerpc/p9-dimode1.c: New test.
* gcc.target/powerpc/p9-dimode2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[PATCH PING] boehm-gc: check for execinfo.h directly

2016-06-13 Thread Mike Frysinger

The current header depends on glibc version checks to determine whether
execinfo.h exists which breaks uClibc.  Instead, add an explicit configure
check for it.

2015-08-29  Mike Frysinger  

* configure.ac: Call AC_CHECK_HEADERS([execinfo.h]).
* configure: Regenerated.
* include/gc.h [HAVE_EXECINFO_H]: Define GC_HAVE_BUILTIN_BACKTRACE.
* include/gc_config.h.in: Regenerated.
---
 boehm-gc/configure  | 105 +++-
 boehm-gc/configure.ac   |   3 ++
 boehm-gc/include/gc.h   |   2 +-
 boehm-gc/include/gc_config.h.in |   3 ++
 4 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/boehm-gc/configure b/boehm-gc/configure
index a8e11dab41b3..7d2b1f7401f7 100755
--- a/boehm-gc/configure
+++ b/boehm-gc/configure
@@ -1945,6 +1945,93 @@ $as_echo "$ac_res" >&6; }
   eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset 
as_lineno;}
 
 } # ac_fn_c_check_member
+
+# ac_fn_c_check_header_mongrel LINENO HEADER VAR INCLUDES
+# ---
+# Tests whether HEADER exists, giving a warning if it cannot be compiled using
+# the include files in INCLUDES and setting the cache variable VAR
+# accordingly.
+ac_fn_c_check_header_mongrel ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  $as_echo_n "(cached) " >&6
+fi
+eval ac_res=\$$3
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+else
+  # Is the header compilable?
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 usability" >&5
+$as_echo_n "checking $2 usability... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$4
+#include <$2>
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_header_compiler=yes
+else
+  ac_header_compiler=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_compiler" >&5
+$as_echo "$ac_header_compiler" >&6; }
+
+# Is the header present?
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 presence" >&5
+$as_echo_n "checking $2 presence... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <$2>
+_ACEOF
+if ac_fn_c_try_cpp "$LINENO"; then :
+  ac_header_preproc=yes
+else
+  ac_header_preproc=no
+fi
+rm -f conftest.err conftest.$ac_ext
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_preproc" >&5
+$as_echo "$ac_header_preproc" >&6; }
+
+# So?  What about this header?
+case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in #((
+  yes:no: )
+{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: accepted by the 
compiler, rejected by the preprocessor!" >&5
+$as_echo "$as_me: WARNING: $2: accepted by the compiler, rejected by the 
preprocessor!" >&2;}
+{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the 
compiler's result" >&5
+$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;}
+;;
+  no:yes:* )
+{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: present but cannot 
be compiled" >&5
+$as_echo "$as_me: WARNING: $2: present but cannot be compiled" >&2;}
+{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: check for 
missing prerequisite headers?" >&5
+$as_echo "$as_me: WARNING: $2: check for missing prerequisite headers?" 
>&2;}
+{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: see the Autoconf 
documentation" >&5
+$as_echo "$as_me: WARNING: $2: see the Autoconf documentation" >&2;}
+{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: section 
\"Present But Cannot Be Compiled\"" >&5
+$as_echo "$as_me: WARNING: $2: section \"Present But Cannot Be Compiled\"" 
>&2;}
+{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the 
compiler's result" >&5
+$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;}
+;;
+esac
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if { as_var=$3; eval "test \"\${$as_var+set}\" = set"; }; then :
+  $as_echo_n "(cached) " >&6
+else
+  eval "$3=\$ac_header_compiler"
+fi
+eval ac_res=\$$3
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+fi
+  eval $as_lineno_stack; test "x$as_lineno_stack" = x && { as_lineno=; unset 
as_lineno;}
+
+} # ac_fn_c_check_header_mongrel
 cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
@@ -11322,7 +11409,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1;

[PATCH] Improve tree-ssa-tail-merge for switches (PR tree-optimization/71520)

2016-06-13 Thread Jakub Jelinek

Hi!

Cross-jumping at GIMPLE level gives up e.g. because there are any labels
at the beginning of the block (which is always the case for bbs referenced
from switches).  While labels for non-local goto as well as computed goto
are hard to handle, after all the edges are then EDGE_ABNORMAL that can't be
redirected anyway, other labels can be handled very easily.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

In the PR, beyond this I'm also talking about switchconv pass not being able
to know that some cases could be cross-jumped and thus use better conversion
sequences.  Wonder if we shouldn't schedule either a full, or a limited
version of tailmerging before switchconv, perhaps just use the
infrastructure from tree-ssa-tail-merge.c to handle the easiest cases
where the cross-jumping in the end would end up simplifying some of the
switches.  Thoughts on this?

2016-06-13  Jakub Jelinek  

PR tree-optimization/71520
* tree-ssa-tail-merge.c (find_duplicate): Handle labels.
(replace_block_by): Move user labels from bb1 to bb2.

* gcc.dg/tree-ssa/pr71520.c: New test.

--- gcc/tree-ssa-tail-merge.c.jj2016-06-10 20:23:55.196164390 +0200
+++ gcc/tree-ssa-tail-merge.c   2016-06-13 12:08:34.691985005 +0200
@@ -1265,6 +1265,10 @@ find_duplicate (same_succ *same_succ, ba
   gimple *stmt1 = gsi_stmt (gsi1);
   gimple *stmt2 = gsi_stmt (gsi2);
 
+  if (gimple_code (stmt1) == GIMPLE_LABEL
+ && gimple_code (stmt2) == GIMPLE_LABEL)
+   break;
+
   if (!gimple_equal_p (same_succ, stmt1, stmt2))
return;
 
@@ -1277,6 +1281,20 @@ find_duplicate (same_succ *same_succ, ba
   gsi_advance_bw_nondebug_nonlocal (, , _escaped);
 }
 
+  while (!gsi_end_p (gsi1) && gimple_code (gsi_stmt (gsi1)) == GIMPLE_LABEL)
+{
+  tree label = gimple_label_label (as_a  (gsi_stmt (gsi1)));
+  if (DECL_NONLOCAL (label) || FORCED_LABEL (label))
+   return;
+  gsi_prev ();
+}
+  while (!gsi_end_p (gsi2) && gimple_code (gsi_stmt (gsi2)) == GIMPLE_LABEL)
+{
+  tree label = gimple_label_label (as_a  (gsi_stmt (gsi2)));
+  if (DECL_NONLOCAL (label) || FORCED_LABEL (label))
+   return;
+  gsi_prev ();
+}
   if (!(gsi_end_p (gsi1) && gsi_end_p (gsi2)))
 return;
 
@@ -1555,6 +1573,23 @@ replace_block_by (basic_block bb1, basic
   e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
 }
 
+  /* Move over any user labels from bb1 after the bb2 labels.  */
+  gimple_stmt_iterator gsi1 = gsi_start_bb (bb1);
+  if (!gsi_end_p (gsi1) && gimple_code (gsi_stmt (gsi1)) == GIMPLE_LABEL)
+{
+  gimple_stmt_iterator gsi2 = gsi_after_labels (bb2);
+  while (!gsi_end_p (gsi1)
+&& gimple_code (gsi_stmt (gsi1)) == GIMPLE_LABEL)
+   {
+ tree label = gimple_label_label (as_a  (gsi_stmt (gsi1)));
+ gcc_assert (!DECL_NONLOCAL (label) && !FORCED_LABEL (label));
+ if (DECL_ARTIFICIAL (label))
+   gsi_next ();
+ else
+   gsi_move_before (, );
+   }
+}
+
   /* Clear range info from all stmts in BB2 -- this transformation
  could make them out of date.  */
   reset_flow_sensitive_info_in_bb (bb2);
--- gcc/testsuite/gcc.dg/tree-ssa/pr71520.c.jj  2016-06-13 12:26:55.251630020 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr71520.c 2016-06-13 12:26:31.0 
+0200
@@ -0,0 +1,90 @@
+/* PR tree-optimization/71520 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void bar (int);
+
+void
+foo (int x)
+{
+  switch (x)
+{
+case 1:
+case 12:
+case 28:
+case 174:
+  bar (1);
+  bar (2);
+  break;
+case 3:
+case 7:
+case 78:
+case 96:
+case 121:
+default:
+  bar (3);
+  bar (4);
+  bar (5);
+  bar (6);
+  break;
+case 8:
+case 13:
+case 27:
+case 19:
+case 118:
+  bar (3);
+  bar (4);
+  bar (5);
+  bar (6);
+  break;
+case 4:
+  bar (7);
+  break;
+}
+}
+
+void
+baz (int x)
+{
+  switch (x)
+{
+case 1:
+case 12:
+case 28:
+case 174:
+  bar (8);
+  bar (9);
+  break;
+case 3:
+case 7:
+case 78:
+case 96:
+case 121:
+default:
+lab1:
+lab2:
+  bar (10);
+  bar (11);
+  bar (12);
+  bar (13);
+  break;
+case 8:
+case 13:
+case 27:
+case 19:
+case 118:
+lab3:
+lab4:
+  bar (10);
+  bar (11);
+  bar (12);
+  bar (13);
+  break;
+case 4:
+  bar (14);
+  break;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\\(3\\\);" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bar \\\(10\\\);" 1 "optimized" } } */

Jakub

Re: [PATCH 2/3] selftests: improve reported failure locations

2016-06-13 Thread Jeff Law


On 06/09/2016 12:42 PM, David Malcolm wrote:

This patch introduce a selftest::location struct to wrap up __FILE__
and __LINE__ information (and __FUNCTION__) throughout the selftests,
allowing location information to be passed around.

It updates the helper functions in pretty-print.c to pass through
the precise location of each test, so that if a failure occurs, the
correct line number is printed, rather than a line within a helper
function.

gcc/ChangeLog:
* input.c (test_reading_source_line): Use SELFTEST_LOCATION.
* pretty-print.c (assert_pp_format_va): Add location param and use
it with ASSERT_STREQ_AT.
(assert_pp_format): Add location param and pass it to
assert_pp_format_va.
(assert_pp_format_colored): Likewise.
(ASSERT_PP_FORMAT_1): New.
(ASSERT_PP_FORMAT_2): New.
(ASSERT_PP_FORMAT_3): New.
(test_pp_format): Provide SELFTEST_LOCATION throughout, either
explicitly, or implicitly via the above macros.
* selftest.c (selftest::pass): Use a selftest::location rather
than file and line.
(selftest::fail): Likewise.  Print the function name.
(selftest::fail_formatted): Likewise.
(selftest::assert_streq): Use a selftest::location rather than
file and line.
* selftest.h (selftest::location): New struct.
(SELFTEST_LOCATION): New macro.
(selftest::pass): Accept a const location & rather than file
and line.
(selftest::fail): Likewise.
(selftest::fail_formatted): Likewise.
(selftest::assert_streq): Likewise.
(ASSERT_TRUE): Update for above changes, using SELFTEST_LOCATION.
(ASSERT_FALSE): Likewise.
(ASSERT_EQ): Likewise.
(ASSERT_NE): Likewise.
(ASSERT_STREQ): Likewise.
(ASSERT_PRED1): Likewise.
(ASSERT_STREQ_AT): New macro.

OK.
jeff

Re: [PATCH] PR bootstrap/71481: fix input.c selftest

2016-06-13 Thread Jeff Law


On 06/09/2016 03:58 PM, David Malcolm wrote:

input.c's selftest::test_reading_source_line attempted to read from
__FILE__, which doesn't work if the binary is run from a different
location than the build dir.

Fix it by rewriting the test to write out a tempfile, and read from
that, rather than from __FILE__.

I used make_temp_file to create the name for the temporary file, on
the grounds that that's what the driver uses for that purpose.

This is on top of the patch kit posted as:
  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00735.html

Successfully bootstrapped on x86_64-pc-linux-gnu
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0

OK for trunk?

gcc/ChangeLog:
PR bootstrap/71481
* input.c (selftest::test_reading_source_line): Avoid reading from
__FILE__ by creating a tempfile with known content and reading
from that instead.

OK.

FWIW, I think the LANG_C vs translating is your call to make.  I can see 
arguments for both directions.


Jeff

[PATCH] Fix ubsan handling of BIND_EXPR (PR sanitizer/71498)

2016-06-13 Thread Jakub Jelinek

Hi!

As has been discussed in the original -fsanitize=bounds submission,
walk_tree for BIND_EXPR walks the body and
DECL_INITIAL/DECL_SIZE/DECL_SIZE_UNIT of all the BIND_EXPR_VARS.
For -fsanitize=bounds instrumentation, we want to avoid walking DECL_INITIAL
of TREE_STATIC vars, so should set *walk_subtrees to 0 and walk it all
ourselves.  But, what the committed code actually does is that for
BIND_EXPRs that contain no TREE_STATIC vars, it walks
DECL_INITIAL/DECL_SIZE/DECL_SIZE_UNIT of all the BIND_EXPR_VARS, and then
walks subtrees normally, which means walking the body (good) and all the
DECL_INITIAL/DECL_SIZE/DECL_SIZE_UNIT exprs again (waste of time, we use
hash_set for duplicates, so just inefficiency).
But, if any TREE_STATIC vars appears, we set *walk_subtrees to 0 and
forget to walk the body (the primary bug).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-06-13  Jakub Jelinek  

PR sanitizer/71498
* c-gimplify.c (ubsan_walk_array_refs_r): Set *walk_subtrees = 0 on
all BIND_EXPRs, and on all BIND_EXPRs recurse also on BIND_EXPR_BODY.

* c-c++-common/ubsan/bounds-13.c: New test.

--- gcc/c-family/c-gimplify.c.jj2016-01-27 19:47:27.0 +0100
+++ gcc/c-family/c-gimplify.c   2016-06-13 13:27:06.531549561 +0200
@@ -67,23 +67,23 @@ ubsan_walk_array_refs_r (tree *tp, int *
 {
   hash_set *pset = (hash_set *) data;
 
-  /* Since walk_tree doesn't call the callback function on the decls
- in BIND_EXPR_VARS, we have to walk them manually.  */
   if (TREE_CODE (*tp) == BIND_EXPR)
 {
+  /* Since walk_tree doesn't call the callback function on the decls
+in BIND_EXPR_VARS, we have to walk them manually, so we can avoid
+instrumenting DECL_INITIAL of TREE_STATIC vars.  */
+  *walk_subtrees = 0;
   for (tree decl = BIND_EXPR_VARS (*tp); decl; decl = DECL_CHAIN (decl))
{
  if (TREE_STATIC (decl))
-   {
- *walk_subtrees = 0;
- continue;
-   }
+   continue;
  walk_tree (_INITIAL (decl), ubsan_walk_array_refs_r, pset,
 pset);
  walk_tree (_SIZE (decl), ubsan_walk_array_refs_r, pset, pset);
  walk_tree (_SIZE_UNIT (decl), ubsan_walk_array_refs_r, pset,
 pset);
}
+  walk_tree (_EXPR_BODY (*tp), ubsan_walk_array_refs_r, pset, pset);
 }
   else if (TREE_CODE (*tp) == ADDR_EXPR
   && TREE_CODE (TREE_OPERAND (*tp, 0)) == ARRAY_REF)
--- gcc/testsuite/c-c++-common/ubsan/bounds-13.c.jj 2016-06-13 
13:36:25.698316271 +0200
+++ gcc/testsuite/c-c++-common/ubsan/bounds-13.c2016-06-13 
13:39:57.240586520 +0200
@@ -0,0 +1,31 @@
+/* PR sanitizer/71498 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds -Wno-array-bounds" } */
+
+struct S { int a[100]; int b, c; } s;
+
+__attribute__((noinline, noclone)) int
+foo (int x)
+{
+  return s.a[x];
+}
+
+__attribute__((noinline, noclone)) int
+bar (int x)
+{
+  static int *d = [99];
+  asm volatile ("" : : "r" ());
+  return s.a[x];
+}
+
+int
+main ()
+{
+  volatile int a = 0;
+  a += foo (100);
+  a += bar (100);
+  return 0;
+}
+
+/* { dg-output "index 100 out of bounds for type 'int 
\\\[100\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 100 out of bounds for type 'int 
\\\[100\\\]'\[^\n\r]*(\n|\r\n|\r)" } */

Jakub

Re: [PATCH][1/2] Move mult synthesis definitions into a separate file

2016-06-13 Thread Richard Biener

On Mon, Jun 13, 2016 at 2:23 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> There are other places besides expand where we might want to synthesize an
> integer
> multiplication by a constant.  Thankfully the algorithm selection code in
> expmed.c
> is already quite well separated from the RTL implementation, so if we can
> just factor
> out the prototype of choose_mult_variant and some enums and structs that it
> needs into
> a separate header file we can reuse them from other parts of the compiler.
>
> I need this for patch 2/2 which hooks up the vectorizer to synthesize vector
> multiplications using sequences of shifts and other arithmetic ops when
> appropriate.
>
> The new header is called mult-synthesis.h. Should I add it to some makefile?
> grepping around for a bit I'm not sure what to do about it.

Possibly PLUGIN_HEADERS.

You could have included expmed.h from the vectorizer, no?  After all this
patch now breaks that things declared in A.h are defined in A.c as you
didn't move choose_mult_variant itself.

Thanks,
Richard.

>
> Bootstrapped and tested on arm, aarch64, x86_64.
>
> Thanks,
> Kyrill
>
> 2016-06-13  Kyrylo Tkachov  
>
> * mult-synthesis.h: New file.  Add choose_mult_variant prototype.
> * expmed.h: Include mult-synthesis.h
> (enum alg_code): Move to mult-synthesis.h
> (struct mult_cost): Likewise.
> (struct algorithm): Likewise.
> * expmed.c (enum mult_variant): Move to mult-synthesis.h
> (choose_mult_variant): Delete prototype.  Remove static qualifier.

[PATCH] Fix bootstrap when user language is not english

2016-06-13 Thread Bernd Edlinger

Hi,

as noted in PR bootstrap/71481, comment#4 currently
the trunk fails to bootstrap if the current language is
not english.  A workaround is possible by setting LANG=C,
but OTOH it is rather easy to fix, by translating the string
in the assertion, as it is the only place that is affected by
the language setting.


Boot-strapped and reg-tested on trunk with LANG=de_DE.UTF-8.
OK to commit?


Thanks
Bernd.2016-06-13  Bernd Edlinger  

	* input.c (test_builtins): Fix an assertion.

Index: gcc/input.c
===
--- gcc/input.c	(Revision 237379)
+++ gcc/input.c	(Arbeitskopie)
@@ -1210,7 +1210,7 @@ test_unknown_location ()
 static void
 test_builtins ()
 {
-  assert_loceq ("", 0, 0, BUILTINS_LOCATION);
+  assert_loceq (_(""), 0, 0, BUILTINS_LOCATION);
   ASSERT_PRED1 (is_location_from_builtin_token, BUILTINS_LOCATION);
 }

[PATCH] Fix code emission for FAIL_ALLOC predictor

2016-06-13 Thread Martin Liška

Hello.

Following patch fixes Fortran FAIL_ALLOC predictor in a way that it introduces 
a new one (PRED_FORTRAN_REALLOC)
and it properly marks returned values as described in the following 2 examples:

A) allocate_allocatable

original annotation:

if ((logical(kind=4)) __builtin_expect ((integer(kind=8)) (overflow.343 != 
0), 0, 33)) // overflow
  {
stat.341 = 5014;
  }
else
  {
if ((logical(kind=4)) __builtin_expect ((integer(kind=8)) (bx_ilow.data 
!= 0B), 0, 34)) // fail alloc
  {
stat.341 = 5014;
  }
else
  {
stat.341 = 0;
bx_ilow.data = (void * restrict) __builtin_malloc (MAX_EXPR 
);
if (bx_ilow.data == 0B)
  {
stat.341 = 5014;
  }
  }
  }  
if ((logical(kind=4)) __builtin_expect ((integer(kind=8)) (stat.341 == 0), 
1, 34)) // fail alloc
  {
bx_ilow.dtype = 539;
bx_ilow.dim[0].lbound = (integer(kind=8)) xstart;
bx_ilow.dim[0].ubound = 1;
bx_ilow.dim[0].stride = 1;
bx_ilow.dim[1].lbound = (integer(kind=8)) ystart;
bx_ilow.dim[1].ubound = D.5342;
bx_ilow.dim[1].stride = D.5341;
bx_ilow.dim[2].lbound = (integer(kind=8)) zstart;
bx_ilow.dim[2].ubound = D.5346;
bx_ilow.dim[2].stride = D.5345;
bx_ilow.offset = D.5352;
  }


I changed it to:

if ((logical(kind=4)) __builtin_expect ((integer(kind=8)) (overflow.343 != 
0), 0, 33)) // overflow
  {
stat.341 = 5014;
  }
else
  {
if ((logical(kind=4)) __builtin_expect ((integer(kind=8)) (bx_ilow.data 
!= 0B), 0, 35)) // repeated allocation/deallocation
  {
stat.341 = 5014;
  }
else
  {
stat.341 = 0;
bx_ilow.data = (void * restrict) __builtin_malloc (MAX_EXPR 
);
if ((logical(kind=4)) __builtin_expect ((integer(kind=8)) 
(bx_ilow.data == 0B), 0, 34)) // fail alloc
  {
stat.341 = 5014;
  }
  }
  }
if (stat.341 == 0) // no expectation
  {
bx_ilow.dtype = 539;
bx_ilow.dim[0].lbound = (integer(kind=8)) xstart;
bx_ilow.dim[0].ubound = 1;
bx_ilow.dim[0].stride = 1;
bx_ilow.dim[1].lbound = (integer(kind=8)) ystart;
bx_ilow.dim[1].ubound = D.5342;
bx_ilow.dim[1].stride = D.5341;
bx_ilow.dim[2].lbound = (integer(kind=8)) zstart;
bx_ilow.dim[2].ubound = D.5346;
bx_ilow.dim[2].stride = D.5345;
bx_ilow.offset = D.5352;
  }

B) array allocation

  :
  # size.1478_3210 = PHI <0(7), size.1478_3743(8)>
  _21 = _3740 != 0;
  _22 = (integer(kind=8)) _21;
  _23 = BUILTIN_EXPECT (_22, 0, 33); // overflow
  _24 = (logical(kind=4)) _23;
  if (_24 != 0)
goto ;
  else
goto ;

  :
  _25 = hrval.data;
  _26 = _25 != 0B;
  _27 = (integer(kind=8)) _26;
  _28 = BUILTIN_EXPECT (_27, 0, 34); // fail malloc
  _29 = (logical(kind=4)) _28;
  if (_29 != 0)
goto ;
  else
goto ;

  :
  _30 = MAX_EXPR ;
  _31 = __builtin_malloc (_30);
  hrval.data = _31;
  if (_31 == 0B)
goto ;
  else
goto ;

  :

  :
  # stat.1477_3202 = PHI <5014(9), 5014(10), 0(11), 5014(12)>
  _33 = stat.1477_3202 == 0;
  _34 = (integer(kind=8)) _33;
  _35 = BUILTIN_EXPECT (_34, 1, 34); // fail malloc
  _36 = (logical(kind=4)) _35;
  if (_36 != 0)
goto ;
  else
goto ;

currently looks as follows:

  :
  # size.1478_3210 = PHI <0(7), size.1478_3743(8)>
  _21 = _3740 != 0;
  _22 = (integer(kind=8)) _21;
  _23 = BUILTIN_EXPECT (_22, 0, 33); // overflow
  _24 = (logical(kind=4)) _23;
  if (_24 != 0)
goto ;
  else
goto ;

  :
  _25 = hrval.data;
  _26 = _25 != 0B;
  _27 = (integer(kind=8)) _26;
  _28 = BUILTIN_EXPECT (_27, 0, 35); // repeated allocation/deallocation
  _29 = (logical(kind=4)) _28;
  if (_29 != 0)
goto ;
  else
goto ;

  :
  _30 = MAX_EXPR ;
  _31 = __builtin_malloc (_30);
  hrval.data = _31;
  _33 = _31 == 0B;
  _34 = (integer(kind=8)) _33;
  _35 = BUILTIN_EXPECT (_34, 0, 34); // fail alloc
  _36 = (logical(kind=4)) _35;
  if (_36 != 0)
goto ;
  else
goto ;

  :

  :
  # stat.1477_3202 = PHI <5014(9), 5014(10), 0(11), 5014(12)>
  if (stat.1477_3202 == 0) // no prediction
goto ;
  else
goto ;

I get following numbers with the patch applied:

1) polyhedron benchmark (aermod.f90.061i.profile):
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
repeated allocation/deallocation  194   4.1% 100.00% / 100.00%  
  194   194.00   0.0%
fail alloc377   7.9% 100.00% / 100.00%  
  377   377.00   0.0%

b) 459.GemsFDTD SPEC2006 benchmark:
HEURISTICS   BRANCHES  (REL)  HITRATE
COVERAGE COVERAGE  (REL)
repeated allocation/deallocation

Re: [PR middle-end/71373] Document missing OMP_CLAUSE_* in gcc/tree-nested.c

2016-06-13 Thread Jakub Jelinek

On Mon, Jun 13, 2016 at 04:43:25PM +0200, Thomas Schwinge wrote:
> On Wed, 01 Jun 2016 17:06:42 +0200, Thomas Schwinge  
> wrote:
> > Here are the OpenACC bits of .
> 
> In the PR, Jakub clarified that all the missing other OMP_CLAUSE_* are in
> fact all unreachable here.  OK to document this as follows, in trunk?
> 
> The "anything else" default case in fact now is just the non-clause
> OMP_CLAUSE_ERROR, so when adding a case for that one, we could then
> remove the default case, and thus get a compiler warning when new clauses
> are added in the future, without handling them here.  That makes sense to
> me (would have made apparent much earlier the original problem of missing
> handling for certain OMP_CLAUSE_*), but based on feedback received, it
> feels as if I'm the only supporter of such "defensive" programming
> paradigms?
> 
> commit c6b10a9bc1437395c4931d43f30e778152a28cb2
> Author: Thomas Schwinge 
> Date:   Mon Jun 13 16:29:37 2016 +0200
> 
> [PR middle-end/71373] Document missing OMP_CLAUSE_* in gcc/tree-nested.c
> 
>   gcc/
>   * tree-nested.c (convert_nonlocal_omp_clauses):
>   (convert_local_omp_clauses): Document missing OMP_CLAUSE_*.

Ok, but please mention the PR line above the ChangeLog entry.  Thanks.

Jakub

Re: [Patch AArch64] Fixup to fcvt patterns added in r237200

2016-06-13 Thread Kyrill Tkachov



On 10/06/16 13:29, James Greenhalgh wrote:

Hi,

My autotester picked up some issues with the vcvt{ds}_n_* intrinsics
added in r237200.

The iterators in this pattern do not resolve, as they have not been
explicitly tied to the mode iterator (rather than the code iterator)
used by the pattern.

This fixup adds the attribute tags, allowing the patterns to work
correctly.

Additionally, the types assigned to these instructions were wrong, and
would permit the immediate operand to be in a register. This will then
develop in to an ICE as the patterns require an immediate operand, and so
won't match. The ICE can be exposed by writing a wrapping function around
the vcvtd_n_* intrinsics, which forces the immediate operand to a register.
We have the infrastructure to error to the user rather than ICEing, but it
needs some different types, which this patch adds.

I've checked this with an aarch64-none-elf test run, and run it through
several rounds of my autotester for aarch64-none-elf and
aarch64_be-none-elf.

OK?

Thanks,
James

---
2016-06-10  James Greenhalgh  

* config/aarch64/aarch64.md
(3): Add attributes to
iterators.
(3): Likewise.  Correct
attributes.
* config/aarch64/aarch64-builtins.c
(aarch64_types_binop_uss_qualifiers): Delete.
(TYPES_BINOP_USS): Likewise.
(aarch64_types_binop_sus_qualifiers): Likewise.
(TYPES_BINOP_SUS): Likewise.
(aarch64_types_fcvt_from_unsigned_qualifiers): New.
(TYPES_FCVTIMM_SUS): Likewise.
* config/aarch64/aarch64-simd-builtins.def (scvtf): Use SHIFTIMM
rather than BINOP.
(ucvtf): Use FCVTIMM_SUS rather than BINOP_SUS.
(fcvtzs): Use SHIFTIMM rather than BINOP.
(fcvtzu): Use SHIFTIMM_USS rather than BINOP_USS.



LGTM (but I can't approve).

Kyrill

Re: [PATCH][C] Avoid reading from FUNCTION_DECL with atomics

2016-06-13 Thread Jakub Jelinek

On Mon, Jun 13, 2016 at 01:25:35PM +0200, Richard Biener wrote:
> The following avoids creating IL that accesses a FUNCTION_DECLs memory
> directly rather than indirectly through an address based on it.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk?

I think the problem is that for these generic builtins we perform no sanity
checking, except for checking TYPE_SIZE_UNIT equality.  But as you show
in the PR, even that doesn't work, as while we check for VLAs on the first
argument, we don't check for that on the second and following arguments.

The question is what all should we reject.

We accept:

void foo (void);
void bar (void);
void baz (void);
void
test (void)
{
  __atomic_exchange (, , , __ATOMIC_RELAXED);
}

which IMHO we definitely should not, what does it mean to exchange
functions?
So, at least diagnose if any of the arguments is pointer to
FUNCTION_TYPE/METHOD_TYPE and handle gracefully VLAs in 2nd+ argument.

Should we perform some further type checking though, like e.g.
complain if one pointer is pointer to integral type and another to
floating, or one to struct, another to union, or do we just keep the
builtins very forgiving and assume that on the C++ side the templates
make sure the arguments are type compatible and for C _Atomic handling is
done differently anyway?

Jakub

Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-06-13 Thread Richard Biener

On Mon, 13 Jun 2016, Richard Biener wrote:

> On Fri, 10 Jun 2016, Richard Biener wrote:
> 
> > 
> > With the proposed cost change for vector construction we will end up
> > vectorizing the testcase in PR68961 again (on x86_64 and likely
> > on ppc64le as well after that target gets adjustments).  Currently
> > we can't optimize that away again noticing the direct overlap of
> > argument and return registers.  The obstackle is
> > 
> > (insn 7 4 8 2 (set (reg:V2DF 93)
> > (vec_concat:V2DF (reg/v:DF 91 [ a ])
> > (reg/v:DF 92 [ aa ]))) 
> > ...
> > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> > (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> > (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> > (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> > 
> > which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> > 
> > First of all simplify_subreg doesn't handle the subregs of a vec_concat
> > (easy fix below).
> > 
> > Then combine doesn't like to simplify the multi-use (it tries some
> > parallel it seems).  So I went to forwprop which eventually manages
> > to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> > because it is not a constant.  Thus I allow arbitrary simplification
> > results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
> > to be a magic flag to tell it to restrict to the case where all
> > uses can be simplified or so, nor to restrict simplifications to a REG.
> > But I don't see any undesirable simplifications of (subreg 
> > ([vec_]concat)).
> > 
> > For the testcase I'm not sure if I have to exclude some ABIs (mingw?).
> > 
> > Boostrap and regtest in progress on x86_64-unknown-linux-gnu, I'll
> > install the simplify-rtx.c if that succeeds but like to have opinions
> > on the fwprop.c change.
> 
> So the bootstrap exposes a latent issue in simplify-rtx.c in the changed
> hunk via gcc.target/i386/mmx-8.c on i?86 which ends up with a 
> 
> (vec_concat:V2SI (reg:SI 103)
> (const_int 0 [0]))
> 
> and thus a VOIDmode 2nd operand (I'm sure this can happen for
> complex integer concat as well, thus latent).  I am adjusting the
> simplify_subreg hunk to always pass GET_MODE_INNER (innermode)
> (that hopefully exercises it a bit more than just using that
> if GET_MODE (part) == VOIDmode - and hopefully they should always
> agree).
> 
> Re-bootstrap / regtest running on x86_64-unknown-linux-gnu.

That works worse given that vec_concat can be

(vec_concat:V16QI (us_truncate:V8QI (reg:V8HI 159))
(us_truncate:V8QI (reg:V8HI 160)))

... now I think the VOIDmode case can only happen for scalar vec_concat
and thus

  enum machine_mode part_mode = GET_MODE (part);
  if (part_mode == VOIDmode)
part_mode = GET_MODE_INNER (GET_MODE (op));

should work.  Re-testing with that... (ok, I know it has coverage of
exactly one testcase on x86_64 as it would otherwise ICE).

Richard.


2016-06-13  Richard Biener  

PR rtl-optimization/68961
* simplify-rtx.c (simplify_subreg): Handle VEC_CONCAT like CONCAT.
* fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and CONCAT
to simplify to a non-constant.

* gcc.target/i386/pr68961.c: New testcase.

Index: gcc/simplify-rtx.c
===
*** gcc/simplify-rtx.c  (revision 237372)
--- gcc/simplify-rtx.c  (working copy)
*** simplify_subreg (machine_mode outermode,
*** 6108,6116 
&& GET_MODE_SIZE (outermode) <= GET_MODE_SIZE (GET_MODE (op)))
  return adjust_address_nv (op, outermode, byte);
  
!   /* Handle complex values represented as CONCAT
!  of real and imaginary part.  */
!   if (GET_CODE (op) == CONCAT)
  {
unsigned int part_size, final_offset;
rtx part, res;
--- 6108,6117 
&& GET_MODE_SIZE (outermode) <= GET_MODE_SIZE (GET_MODE (op)))
  return adjust_address_nv (op, outermode, byte);
  
!   /* Handle complex or vector values represented as CONCAT or VEC_CONCAT
!  of two parts.  */
!   if (GET_CODE (op) == CONCAT
!   || GET_CODE (op) == VEC_CONCAT)
  {
unsigned int part_size, final_offset;
rtx part, res;
*** simplify_subreg (machine_mode outermode,
*** 6130,6139 
if (final_offset + GET_MODE_SIZE (outermode) > part_size)
return NULL_RTX;
  
!   res = simplify_subreg (outermode, part, GET_MODE (part), final_offset);
if (res)
return res;
!   if (validate_subreg (outermode, GET_MODE (part), part, final_offset))
return gen_rtx_SUBREG (outermode, part, final_offset);
return NULL_RTX;
  }
--- 6131,6143 
if (final_offset + GET_MODE_SIZE (outermode) > part_size)
return NULL_RTX;
  
!   enum machine_mode part_mode = GET_MODE (part);
!   if (part_mode == VOIDmode)
!   part_mode = GET_MODE_INNER (GET_MODE (op));
!   res = simplify_subreg (outermode, part, part_mode, final_offset);

Re: [PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-13 Thread Kyrill Tkachov



On 13/06/16 14:58, Marc Glisse wrote:

On Mon, 13 Jun 2016, Kyrill Tkachov wrote:

This patch allows the vectoriser to synthesize multiplications by an integer constant using the algorithms determined by choose_mult_variant from expmed.c. choose_mult_variant returns an algorithm structure that is a linked list of steps 
describing how to synthesize an integer multiplication by any constant using shifts, adds, subs, and negation.


The new function vect_synth_mult_by_constant that does all the hard work is 
very similar in structure to expand_mult_const from expmed.c but it operates on 
gimple SSA rather than RTL.

Note that we synthesize the multiplications if the target does not support a vector multiplication in the current vector mode we're processing. So, for aarch64 this effectively means V2DI (aarch64 has a vector multiply instruction for 
narrower inner modes).


I guess I should drop my patch 
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00881.html

This one seems much better.



Sorry for the conflict. I had actually worked on this back in November as a 
quick prototype and then
got swamped with bug fixing for GCC 6 and only just got around to taking this 
up again.

Kyrill

Re: [PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-13 Thread Marc Glisse


On Mon, 13 Jun 2016, Kyrill Tkachov wrote:

This patch allows the vectoriser to synthesize multiplications by an 
integer constant using the algorithms determined by choose_mult_variant 
from expmed.c. choose_mult_variant returns an algorithm structure that 
is a linked list of steps describing how to synthesize an integer 
multiplication by any constant using shifts, adds, subs, and negation.


The new function vect_synth_mult_by_constant that does all the hard work 
is very similar in structure to expand_mult_const from expmed.c but it 
operates on gimple SSA rather than RTL.


Note that we synthesize the multiplications if the target does not 
support a vector multiplication in the current vector mode we're 
processing. So, for aarch64 this effectively means V2DI (aarch64 has a 
vector multiply instruction for narrower inner modes).


I guess I should drop my patch 
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00881.html


This one seems much better.

--
Marc Glisse

Re: [PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-13 Thread Marc Glisse


+  /* All synthesis algorithms require shifts, so bail out early if
+ target cannot vectorize them.  */
+  if (!target_has_vecop_for_code (LSHIFT_EXPR, vectype))
+return false;

Hmm, 2 points:

* Could you use vect_supportable_shift (or equivalent) instead? This way 
it will work even if a target/mode supports vector << scalar and not 
vector << vector.


* This means that we will refuse to vectorize x*2 as x+x, which was the 
goal of my patch (SPARC VIS has additions, no shift, and limited 
multiplications, IIRC). I guess it would be possible, as a follow-up (it 
doesn't have to block your patch), not to give up in the no-shift branch, 
but to handle some small factors with only additions and subtractions. Or 
to split the emission of shifts to a function that, when shifts are not 
supported, emulates them with additions. Or even emit shifts and rely on 
expand or vector lowering to turn them to additions (though the estimated 
cost might be off). Any idea on the best way to handle SPARC?


--
Marc Glisse

Re: [PR middle-end/71373] Document missing OMP_CLAUSE_* in gcc/tree-nested.c

2016-06-13 Thread Thomas Schwinge

Hi!

On Mon, 13 Jun 2016 16:43:25 +0200, Thomas Schwinge  
wrote:
> On Wed, 01 Jun 2016 17:06:42 +0200, Thomas Schwinge  
> wrote:
> > Here are the OpenACC bits of .
> 
> In the PR, Jakub clarified that all the missing other OMP_CLAUSE_* are in
> fact all unreachable here.  [...]
> 
> The "anything else" default case in fact now is just the non-clause
> OMP_CLAUSE_ERROR, so when adding a case for that one, we could then
> remove the default case, and thus get a compiler warning when new clauses
> are added in the future, without handling them here.  That makes sense to
> me (would have made apparent much earlier the original problem of missing
> handling for certain OMP_CLAUSE_*), but based on feedback received, it
> feels as if I'm the only supporter of such "defensive" programming
> paradigms?

That is, something like that:

--- gcc/tree-nested.c
+++ gcc/tree-nested.c
@@ -1225,8 +1225,9 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE__LOOPTEMP_:
case OMP_CLAUSE__SIMDUID_:
case OMP_CLAUSE__GRIDDIM_:
- /* Anything else.  */
-   default:
+ /* This non-clause should never be seen outside of the front
+ends.  */
+   case OMP_CLAUSE_ERROR:
  gcc_unreachable ();
}
 }
@@ -1933,8 +1934,9 @@ convert_local_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE__LOOPTEMP_:
case OMP_CLAUSE__SIMDUID_:
case OMP_CLAUSE__GRIDDIM_:
- /* Anything else.  */
-   default:
+ /* This non-clause should never be seen outside of the front
+ends.  */
+   case OMP_CLAUSE_ERROR:
  gcc_unreachable ();
}
 }


Grüße
 Thomas

[PATCH, i386]: Use ix86_expand_setcc some more

2016-06-13 Thread Uros Bizjak

No functional changes.

2016-06-13  Uros Bizjak  

* config/i386/i386.md (paritydi2): Use ix86_expand_setcc.
(paritysi2): Ditto.
(isinfxf2): Ditto.
(isinf2): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 237381)
+++ config/i386/i386.md (working copy)
@@ -13458,15 +13458,12 @@
   "! TARGET_POPCNT"
 {
   rtx scratch = gen_reg_rtx (QImode);
-  rtx cond;
 
   emit_insn (gen_paritydi2_cmp (NULL_RTX, NULL_RTX,
NULL_RTX, operands[1]));
 
-  cond = gen_rtx_fmt_ee (ORDERED, QImode,
-gen_rtx_REG (CCmode, FLAGS_REG),
-const0_rtx);
-  emit_insn (gen_rtx_SET (scratch, cond));
+  ix86_expand_setcc (scratch, ORDERED,
+gen_rtx_REG (CCmode, FLAGS_REG), const0_rtx);
 
   if (TARGET_64BIT)
 emit_insn (gen_zero_extendqidi2 (operands[0], scratch));
@@ -13486,14 +13483,11 @@
   "! TARGET_POPCNT"
 {
   rtx scratch = gen_reg_rtx (QImode);
-  rtx cond;
 
   emit_insn (gen_paritysi2_cmp (NULL_RTX, NULL_RTX, operands[1]));
 
-  cond = gen_rtx_fmt_ee (ORDERED, QImode,
-gen_rtx_REG (CCmode, FLAGS_REG),
-const0_rtx);
-  emit_insn (gen_rtx_SET (scratch, cond));
+  ix86_expand_setcc (scratch, ORDERED,
+gen_rtx_REG (CCmode, FLAGS_REG), const0_rtx);
 
   emit_insn (gen_zero_extendqisi2 (operands[0], scratch));
   DONE;
@@ -16159,8 +16153,6 @@
   rtx mask = GEN_INT (0x45);
   rtx val = GEN_INT (0x05);
 
-  rtx cond;
-
   rtx scratch = gen_reg_rtx (HImode);
   rtx res = gen_reg_rtx (QImode);
 
@@ -16168,10 +16160,8 @@
 
   emit_insn (gen_andqi_ext_0 (scratch, scratch, mask));
   emit_insn (gen_cmpqi_ext_3 (scratch, val));
-  cond = gen_rtx_fmt_ee (EQ, QImode,
-gen_rtx_REG (CCmode, FLAGS_REG),
-const0_rtx);
-  emit_insn (gen_rtx_SET (res, cond));
+  ix86_expand_setcc (res, EQ,
+gen_rtx_REG (CCmode, FLAGS_REG), const0_rtx);
   emit_insn (gen_zero_extendqisi2 (operands[0], res));
   DONE;
 })
@@ -16186,8 +16176,6 @@
   rtx mask = GEN_INT (0x45);
   rtx val = GEN_INT (0x05);
 
-  rtx cond;
-
   rtx scratch = gen_reg_rtx (HImode);
   rtx res = gen_reg_rtx (QImode);
 
@@ -16204,10 +16192,8 @@
 
   emit_insn (gen_andqi_ext_0 (scratch, scratch, mask));
   emit_insn (gen_cmpqi_ext_3 (scratch, val));
-  cond = gen_rtx_fmt_ee (EQ, QImode,
-gen_rtx_REG (CCmode, FLAGS_REG),
-const0_rtx);
-  emit_insn (gen_rtx_SET (res, cond));
+  ix86_expand_setcc (res, EQ,
+gen_rtx_REG (CCmode, FLAGS_REG), const0_rtx);
   emit_insn (gen_zero_extendqisi2 (operands[0], res));
   DONE;
 })

[PR middle-end/71373] Document missing OMP_CLAUSE_* in gcc/tree-nested.c

2016-06-13 Thread Thomas Schwinge

Hi!

On Wed, 01 Jun 2016 17:06:42 +0200, Thomas Schwinge  
wrote:
> Here are the OpenACC bits of .

In the PR, Jakub clarified that all the missing other OMP_CLAUSE_* are in
fact all unreachable here.  OK to document this as follows, in trunk?

The "anything else" default case in fact now is just the non-clause
OMP_CLAUSE_ERROR, so when adding a case for that one, we could then
remove the default case, and thus get a compiler warning when new clauses
are added in the future, without handling them here.  That makes sense to
me (would have made apparent much earlier the original problem of missing
handling for certain OMP_CLAUSE_*), but based on feedback received, it
feels as if I'm the only supporter of such "defensive" programming
paradigms?

commit c6b10a9bc1437395c4931d43f30e778152a28cb2
Author: Thomas Schwinge 
Date:   Mon Jun 13 16:29:37 2016 +0200

[PR middle-end/71373] Document missing OMP_CLAUSE_* in gcc/tree-nested.c

gcc/
* tree-nested.c (convert_nonlocal_omp_clauses):
(convert_local_omp_clauses): Document missing OMP_CLAUSE_*.
---
 gcc/tree-nested.c | 60 ++-
 1 file changed, 42 insertions(+), 18 deletions(-)

diff --git gcc/tree-nested.c gcc/tree-nested.c
index 812f619..62cb01f 100644
--- gcc/tree-nested.c
+++ gcc/tree-nested.c
@@ -1203,17 +1203,29 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE_AUTO:
  break;
 
+ /* OpenACC tile clauses are discarded during gimplification.  */
case OMP_CLAUSE_TILE:
- /* OpenACC tile clauses are discarded during gimplification, so we
-don't expect to see anything here.  */
- gcc_unreachable ();
-
+ /* The following clause belongs to the OpenACC cache directive, which
+is discarded during gimplification.  */
case OMP_CLAUSE__CACHE_:
- /* These clauses belong to the OpenACC cache directive, which is
-discarded during gimplification, so we don't expect to see
-anything here.  */
- gcc_unreachable ();
-
+ /* The following clauses are only allowed in the OpenMP declare simd
+directive, so not seen here.  */
+   case OMP_CLAUSE_UNIFORM:
+   case OMP_CLAUSE_INBRANCH:
+   case OMP_CLAUSE_NOTINBRANCH:
+ /* The following clauses are only allowed on OpenMP cancel and
+cancellation point directives, which at this point have already
+been lowered into a function call.  */
+   case OMP_CLAUSE_FOR:
+   case OMP_CLAUSE_PARALLEL:
+   case OMP_CLAUSE_SECTIONS:
+   case OMP_CLAUSE_TASKGROUP:
+ /* The following clauses are only added during OMP lowering; nested
+function decomposition happens before that.  */
+   case OMP_CLAUSE__LOOPTEMP_:
+   case OMP_CLAUSE__SIMDUID_:
+   case OMP_CLAUSE__GRIDDIM_:
+ /* Anything else.  */
default:
  gcc_unreachable ();
}
@@ -1899,17 +1911,29 @@ convert_local_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE_AUTO:
  break;
 
+ /* OpenACC tile clauses are discarded during gimplification.  */
case OMP_CLAUSE_TILE:
- /* OpenACC tile clauses are discarded during gimplification, so we
-don't expect to see anything here.  */
- gcc_unreachable ();
-
+ /* The following clause belongs to the OpenACC cache directive, which
+is discarded during gimplification.  */
case OMP_CLAUSE__CACHE_:
- /* These clauses belong to the OpenACC cache directive, which is
-discarded during gimplification, so we don't expect to see
-anything here.  */
- gcc_unreachable ();
-
+ /* The following clauses are only allowed in the OpenMP declare simd
+directive, so not seen here.  */
+   case OMP_CLAUSE_UNIFORM:
+   case OMP_CLAUSE_INBRANCH:
+   case OMP_CLAUSE_NOTINBRANCH:
+ /* The following clauses are only allowed on OpenMP cancel and
+cancellation point directives, which at this point have already
+been lowered into a function call.  */
+   case OMP_CLAUSE_FOR:
+   case OMP_CLAUSE_PARALLEL:
+   case OMP_CLAUSE_SECTIONS:
+   case OMP_CLAUSE_TASKGROUP:
+ /* The following clauses are only added during OMP lowering; nested
+function decomposition happens before that.  */
+   case OMP_CLAUSE__LOOPTEMP_:
+   case OMP_CLAUSE__SIMDUID_:
+   case OMP_CLAUSE__GRIDDIM_:
+ /* Anything else.  */
default:
  gcc_unreachable ();
}


Grüße
 Thomas

Re: [PATCH] Fix bootstrap when user language is not english

2016-06-13 Thread Bernd Edlinger

On 06/13/16 17:27, David Malcolm wrote:
> On Mon, 2016-06-13 at 14:41 +, Bernd Edlinger wrote:
>> Hi,
>>
>> as noted in PR bootstrap/71481, comment#4 currently
>> the trunk fails to bootstrap if the current language is
>> not english.  A workaround is possible by setting LANG=C,
>> but OTOH it is rather easy to fix, by translating the string
>> in the assertion, as it is the only place that is affected by
>> the language setting.
>>
>>
>> Boot-strapped and reg-tested on trunk with LANG=de_DE.UTF-8.
>> OK to commit?
>
> Sorry about the breakage.
>
> I believe I can approve this with my "libcpp"/"diagnostics" hats on, so
> LGTM.
>

Thanks.

> That said, should we hardcode LANG=C when running the selftests from
> gcc/Makefile.in?
>

Honestly, I am glad to see that there is some sort of unit test which
runs in a different LANG setting than the rest of the testsuite.
Because as this incident clearly shows, there _can_ be bugs that do not
show up in the default locale.

I would put the question this way: could it be possible to run also
some tests in the testsuite with a LANG setting different from "C"?

Bernd.

Re: [PATCH] Fix bootstrap when user language is not english

2016-06-13 Thread Jakub Jelinek

On Mon, Jun 13, 2016 at 03:39:21PM +, Bernd Edlinger wrote:
> On 06/13/16 17:27, David Malcolm wrote:
> > On Mon, 2016-06-13 at 14:41 +, Bernd Edlinger wrote:
> >> Hi,
> >>
> >> as noted in PR bootstrap/71481, comment#4 currently
> >> the trunk fails to bootstrap if the current language is
> >> not english.  A workaround is possible by setting LANG=C,
> >> but OTOH it is rather easy to fix, by translating the string
> >> in the assertion, as it is the only place that is affected by
> >> the language setting.
> >>
> >>
> >> Boot-strapped and reg-tested on trunk with LANG=de_DE.UTF-8.
> >> OK to commit?
> >
> > Sorry about the breakage.
> >
> > I believe I can approve this with my "libcpp"/"diagnostics" hats on, so
> > LGTM.
> >
> 
> Thanks.

Please put PR bootstrap/71481 into the ChangeLog entry though.

> > That said, should we hardcode LANG=C when running the selftests from
> > gcc/Makefile.in?
> >
> 
> Honestly, I am glad to see that there is some sort of unit test which
> runs in a different LANG setting than the rest of the testsuite.
> Because as this incident clearly shows, there _can_ be bugs that do not
> show up in the default locale.
> 
> I would put the question this way: could it be possible to run also
> some tests in the testsuite with a LANG setting different from "C"?

I think running the s-selftest in C locale is a good idea, but maybe
we should have some test in gcc.dg or where that would run -fself-tests
in some other locale.  I think right now we force LC_ALL=C for all tests,
but perhaps /* { dg-set-compiler-env-var LC_ALL "something" } */
would work.  But perhaps we'd need some tcl test for whether the locale is
supported by the system.

Jakub

[Committed] S/390: vecintrin.h fix file description in comment

2016-06-13 Thread Andreas Krebbel

gcc/ChangeLog:

2016-06-13  Andreas Krebbel  

* config/s390/vecintrin.h: Fix file description in comment.
---
 gcc/ChangeLog   | 4 
 gcc/config/s390/vecintrin.h | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e14decf..7bb5d5d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,9 @@
 2016-06-13  Andreas Krebbel  
 
+   * config/s390/vecintrin.h: Fix file description in comment.
+
+2016-06-13  Andreas Krebbel  
+
* config/s390/s390-builtin-types.def: Change builtin type naming
scheme to match builtin-types.def.
 
diff --git a/gcc/config/s390/vecintrin.h b/gcc/config/s390/vecintrin.h
index ab82e7a..2bd35d6 100644
--- a/gcc/config/s390/vecintrin.h
+++ b/gcc/config/s390/vecintrin.h
@@ -1,4 +1,4 @@
-/* GNU compiler hardware transactional execution intrinsics
+/* GNU compiler vector extension intrinsics
Copyright (C) 2015-2016 Free Software Foundation, Inc.
Contributed by Andreas Krebbel (andreas.kreb...@de.ibm.com)
 
-- 
1.9.1

[Committed] S/390: Change builtin type naming scheme to match builtin-types.def.

2016-06-13 Thread Andreas Krebbel

gcc/ChangeLog:

2016-06-13  Andreas Krebbel  

* config/s390/s390-builtin-types.def: Change builtin type naming
scheme to match builtin-types.def.
---
 gcc/ChangeLog  |   5 +
 gcc/config/s390/s390-builtin-types.def | 420 -
 2 files changed, 215 insertions(+), 210 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fbd985b..e14decf 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-06-13  Andreas Krebbel  
+
+   * config/s390/s390-builtin-types.def: Change builtin type naming
+   scheme to match builtin-types.def.
+
 2016-06-13  Richard Biener  
 
PR tree-optimization/71505
diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index 6179b04..3d90d41 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -19,29 +19,29 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
-#define DEF_FN_TYPE_1(FN_TYPE, FLAGS, T1)  \
+#define DEF_FN_TYPE_0(FN_TYPE, FLAGS, T1)  \
   DEF_FN_TYPE (FN_TYPE,\
   FLAGS,   \
   s390_builtin_types[T1])
-#define DEF_FN_TYPE_2(FN_TYPE, FLAGS, T1, T2)  \
+#define DEF_FN_TYPE_1(FN_TYPE, FLAGS, T1, T2)  \
   DEF_FN_TYPE (FN_TYPE,\
   FLAGS,   \
   s390_builtin_types[T1],  \
   s390_builtin_types[T2])
-#define DEF_FN_TYPE_3(FN_TYPE, FLAGS, T1, T2, T3)  \
+#define DEF_FN_TYPE_2(FN_TYPE, FLAGS, T1, T2, T3)  \
   DEF_FN_TYPE (FN_TYPE,\
   FLAGS,   \
   s390_builtin_types[T1],  \
   s390_builtin_types[T2],  \
   s390_builtin_types[T3])
-#define DEF_FN_TYPE_4(FN_TYPE, FLAGS, T1, T2, T3, T4)  \
+#define DEF_FN_TYPE_3(FN_TYPE, FLAGS, T1, T2, T3, T4)  \
   DEF_FN_TYPE (FN_TYPE,\
   FLAGS,   \
   s390_builtin_types[T1],  \
   s390_builtin_types[T2],  \
   s390_builtin_types[T3],  \
   s390_builtin_types[T4])
-#define DEF_FN_TYPE_5(FN_TYPE, FLAGS, T1, T2, T3, T4, T5)  \
+#define DEF_FN_TYPE_4(FN_TYPE, FLAGS, T1, T2, T3, T4, T5)  \
   DEF_FN_TYPE (FN_TYPE,\
   FLAGS,   \
   s390_builtin_types[T1],  \
@@ -49,7 +49,7 @@
   s390_builtin_types[T3],  \
   s390_builtin_types[T4],  \
   s390_builtin_types[T5])
-#define DEF_FN_TYPE_6(FN_TYPE, FLAGS, T1, T2, T3, T4, T5, T6)  \
+#define DEF_FN_TYPE_5(FN_TYPE, FLAGS, T1, T2, T3, T4, T5, T6)  \
   DEF_FN_TYPE (FN_TYPE,\
   FLAGS,   \
   s390_builtin_types[T1],  \
@@ -126,210 +126,210 @@ DEF_OPAQUE_VECTOR_TYPE (BT_OUV4SI, B_VX, BT_UINT, 4)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV4SI, B_VX, BT_BINT, 4)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV2DI, B_VX, BT_BLONGLONG, 2)
 DEF_OPAQUE_VECTOR_TYPE (BT_BV8HI, B_VX, BT_BSHORT, 8)
-DEF_FN_TYPE_1 (BT_FN_INT, B_HTM, BT_INT)
-DEF_FN_TYPE_1 (BT_FN_UINT, 0, BT_UINT)
-DEF_FN_TYPE_2 (BT_FN_INT_INT, B_VX, BT_INT, BT_INT)
-DEF_FN_TYPE_2 (BT_FN_INT_VOIDPTR, B_HTM, BT_INT, BT_VOIDPTR)
-DEF_FN_TYPE_2 (BT_FN_OV4SI_INT, B_VX, BT_OV4SI, BT_INT)
-DEF_FN_TYPE_2 (BT_FN_OV4SI_INTCONSTPTR, B_VX, BT_OV4SI, BT_INTCONSTPTR)
-DEF_FN_TYPE_2 (BT_FN_OV4SI_OV4SI, B_VX, BT_OV4SI, BT_OV4SI)
-DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHAR, B_VX, BT_UV16QI, BT_UCHAR)
-DEF_FN_TYPE_2 (BT_FN_UV16QI_UCHARCONSTPTR, B_VX, BT_UV16QI, BT_UCHARCONSTPTR)
-DEF_FN_TYPE_2 (BT_FN_UV16QI_USHORT, B_VX, BT_UV16QI, BT_USHORT)
-DEF_FN_TYPE_2 (BT_FN_UV16QI_UV16QI, B_VX, BT_UV16QI, BT_UV16QI)
-DEF_FN_TYPE_2 (BT_FN_UV2DI_ULONGLONG, B_VX, BT_UV2DI, BT_ULONGLONG)
-DEF_FN_TYPE_2 (BT_FN_UV2DI_ULONGLONGCONSTPTR, B_VX, BT_UV2DI, 
BT_ULONGLONGCONSTPTR)
-DEF_FN_TYPE_2 (BT_FN_UV2DI_USHORT, B_VX, BT_UV2DI, BT_USHORT)
-DEF_FN_TYPE_2 (BT_FN_UV2DI_UV2DI, B_VX, BT_UV2DI, BT_UV2DI)
-DEF_FN_TYPE_2 (BT_FN_UV2DI_UV4SI, B_VX, BT_UV2DI, BT_UV4SI)
-DEF_FN_TYPE_2 (BT_FN_UV4SI_UINT, B_VX, BT_UV4SI, BT_UINT)
-DEF_FN_TYPE_2 (BT_FN_UV4SI_UINTCONSTPTR, B_VX, BT_UV4SI, BT_UINTCONSTPTR)
-DEF_FN_TYPE_2 (BT_FN_UV4SI_USHORT, B_VX, BT_UV4SI, BT_USHORT)
-DEF_FN_TYPE_2 (BT_FN_UV4SI_UV4SI, B_VX, BT_UV4SI, BT_UV4SI)
-DEF_FN_TYPE_2 (BT_FN_UV4SI_UV8HI, B_VX, BT_UV4SI, BT_UV8HI)
-DEF_FN_TYPE_2 (BT_FN_UV8HI_USHORT, B_VX, BT_UV8HI, BT_USHORT)

Re: [Committed] S/390: Fix MAX_ARGS value.

2016-06-13 Thread Andreas Krebbel

On 06/13/2016 12:49 PM, Jakub Jelinek wrote:
...
>> Yes. It is inconsistent to builtin-types.def. Not sure if it is worth fixing 
>> it.
> 
> I think it wouldn't hurt, it would improve code readability.
> And it affects just the single source file.

Done. https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00948.html

...
> AFAIK other targets test all builtins at least some way.
> 1 testcases is excessive (though e.g. for i386 there are almost 5000
> testcases, and I think about half of them might be testing the builtins),
> but i386 e.g. has testcases that include all the intrinsics headers and
> force all builtins inlines to be actually not inlines where possible;
> or you could e.g. generate a testcase that just tries to use all the
> builtins with all the possible argument types, without actually trying
> to run it, just try to assemble it.
> Or 100 of testcases which each test 100 of cases?

I'll have a look whether I can rework the generator a bit.

> BTW, looking at vecintrin.h, the
> GNU compiler hardware transactional execution intrinsics
> comment looks like pasto from htmintrin.h.

Fixed. https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00947.html

-Andreas-

[PATCH] Fix PR71521

2016-06-13 Thread Richard Biener


Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-06-13  Richard Biener  

PR tree-optimization/71521
* tree-vrp.c (extract_range_from_binary_expr_1): Guard
division int_const_binop against zero divisor.

* gcc.dg/tree-ssa/vrp101.c: New testcase.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c  (revision 237372)
--- gcc/tree-vrp.c  (working copy)
*** extract_range_from_binary_expr_1 (value_
*** 2938,2944 
 and divisor are available.  */
  if (vr1.type == VR_RANGE
  && !symbolic_range_p ()
! && !symbolic_range_p ())
min = int_const_binop (code, vr0.min, vr1.max);
  else
min = zero;
--- 2944,2951 
 and divisor are available.  */
  if (vr1.type == VR_RANGE
  && !symbolic_range_p ()
! && !symbolic_range_p ()
! && compare_values (vr1.max, zero) != 0)
min = int_const_binop (code, vr0.min, vr1.max);
  else
min = zero;
Index: gcc/testsuite/gcc.dg/tree-ssa/vrp101.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/vrp101.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/vrp101.c  (working copy)
***
*** 0 
--- 1,13 
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fdump-tree-optimized" } */
+ 
+ int x = 1;
+ 
+ int main ()
+ {
+   int t = (1/(1>=x))>>1;
+   if (t != 0) __builtin_abort();
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump ":\[\n\r \]*return 0;" "optimized" } } */

[PATCH][vectorizer][2/2] PR 65951: Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-13 Thread Kyrill Tkachov


Hi all,

This patch allows the vectoriser to synthesize multiplications by an integer 
constant using
the algorithms determined by choose_mult_variant from expmed.c.
choose_mult_variant returns an algorithm structure that is a linked list of 
steps describing
how to synthesize an integer multiplication by any constant using shifts, adds, 
subs, and negation.

The new function vect_synth_mult_by_constant that does all the hard work is 
very similar in structure
to expand_mult_const from expmed.c but it operates on gimple SSA rather than 
RTL.

Note that we synthesize the multiplications if the target does not support a 
vector multiplication
in the current vector mode we're processing. So, for aarch64 this effectively 
means V2DI (aarch64
has a vector multiply instruction for narrower inner modes).

This allows us to vectorise more 64-bit multiplications on aarch64. For example:
foo (long long *arr)
{
  for (int i = 0; i < N; i++)
arr[i] *= 5;
}

will now generate:
.L3:
ldr q0, [x3, x1]
add w2, w2, 1
cmp w2, w4
shl v1.2d, v0.2d, 2
add v0.2d, v0.2d, v1.2d
str q0, [x3, x1]
add x1, x1, 16
bcc .L3

in the vectorised loop whereas before we would not vectorise this and generate:
.L2:
ldr x1, [x0]
add x1, x1, x1, lsl 2
str x1, [x0], 8
cmp x0, x2
bne .L2


A multiplication with a more complex immediate also works.
For example:
foo (long long *arr)
{
  for (int i = 0; i < N; i++)
arr[i] *= 19594LL;
}

produces 9 add/shift vector instructions but is rejected by the vector cost 
model.

I've added a couple of execute testcases but added a target check for aarch64 
for the
scan-dump checks because I expect these testcases to synthesize the 64-bit 
vector
multiplication on targets that don't have a 64-bit vector multiply. I can't use 
! vect_int_mult
because aarch64 does have vector multiplication, just not 64-bit.
The testcases don't vectorise on arm with NEON because V2DI shifts are disabled 
in neon.md:
; TODO: V2DI shifts are current disabled because there are bugs in the
; generic vectorizer code.  It ends up creating a V2DI constructor with
; SImode elements.

(define_insn "vashl3"

That's something separate to look at another time

What do you think of this approach?

Bootstrapped and tested on arm, aarch64, x86_64.

This code didn't trigger much in SPEC2006 as the place where it was important 
to convert mults
into shifts was in hmmer and Venkat had already implemented that transformation 
for constants
that are powers of two.  This patch just extends that functionality to 
arbitrary constants.

Thanks,
Kyrill

2016-06-13  Kyrylo Tkachov  

PR target/65951
* tree-vect-patterns.c: Include mult-synthesis.h
(target_supports_mult_synth_alg): New function.
(apply_binop_and_append_stmt): Likewise.
(vect_synth_mult_by_constant): Likewise.
(target_has_vecop_for_code): Likewise.
(vect_recog_mult_pattern): Use above functions to synthesize vector
multiplication by integer constants.

2016-06-13  Kyrylo Tkachov  

* gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
* gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
new file mode 100644
index ..e5dba82d7fa955a6a37a0eabf980127e464ac77b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+arr[i] *= 123;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+{
+  data[i] = i;
+  __asm__ volatile ("");
+}
+
+  foo (data);
+  for (i = 0; i < N; i++)
+{
+  if (data[i] / 123 != i)
+  __builtin_abort ();
+  __asm__ volatile ("");
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
new file mode 100644
index ..83019c96910b866e364a7c2e00261a1ded13cb53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i =

[PATCH][1/2] Move mult synthesis definitions into a separate file

2016-06-13 Thread Kyrill Tkachov


Hi all,

There are other places besides expand where we might want to synthesize an 
integer
multiplication by a constant.  Thankfully the algorithm selection code in 
expmed.c
is already quite well separated from the RTL implementation, so if we can just 
factor
out the prototype of choose_mult_variant and some enums and structs that it 
needs into
a separate header file we can reuse them from other parts of the compiler.

I need this for patch 2/2 which hooks up the vectorizer to synthesize vector
multiplications using sequences of shifts and other arithmetic ops when 
appropriate.

The new header is called mult-synthesis.h. Should I add it to some makefile?
grepping around for a bit I'm not sure what to do about it.

Bootstrapped and tested on arm, aarch64, x86_64.

Thanks,
Kyrill

2016-06-13  Kyrylo Tkachov  

* mult-synthesis.h: New file.  Add choose_mult_variant prototype.
* expmed.h: Include mult-synthesis.h
(enum alg_code): Move to mult-synthesis.h
(struct mult_cost): Likewise.
(struct algorithm): Likewise.
* expmed.c (enum mult_variant): Move to mult-synthesis.h
(choose_mult_variant): Delete prototype.  Remove static qualifier.
diff --git a/gcc/expmed.h b/gcc/expmed.h
index 1a32e9f1b664f250c5092022eb965237ed0342fc..304ce02d78a9e3e024c13caee7869d67dfdab65c 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -21,35 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define EXPMED_H 1
 
 #include "insn-codes.h"
-
-enum alg_code {
-  alg_unknown,
-  alg_zero,
-  alg_m, alg_shift,
-  alg_add_t_m2,
-  alg_sub_t_m2,
-  alg_add_factor,
-  alg_sub_factor,
-  alg_add_t2_m,
-  alg_sub_t2_m,
-  alg_impossible
-};
-
-/* This structure holds the "cost" of a multiply sequence.  The
-   "cost" field holds the total rtx_cost of every operator in the
-   synthetic multiplication sequence, hence cost(a op b) is defined
-   as rtx_cost(op) + cost(a) + cost(b), where cost(leaf) is zero.
-   The "latency" field holds the minimum possible latency of the
-   synthetic multiply, on a hypothetical infinitely parallel CPU.
-   This is the critical path, or the maximum height, of the expression
-   tree which is the sum of rtx_costs on the most expensive path from
-   any leaf to the root.  Hence latency(a op b) is defined as zero for
-   leaves and rtx_cost(op) + max(latency(a), latency(b)) otherwise.  */
-
-struct mult_cost {
-  short cost; /* Total rtx_cost of the multiplication sequence.  */
-  short latency;  /* The latency of the multiplication sequence.  */
-};
+#include "mult-synthesis.h"
 
 /* This macro is used to compare a pointer to a mult_cost against an
single integer "rtx_cost" value.  This is equivalent to the macro
@@ -65,38 +37,6 @@ struct mult_cost {
  || ((X)->cost == (Y)->cost	\
  && (X)->latency < (Y)->latency))
 
-/* This structure records a sequence of operations.
-   `ops' is the number of operations recorded.
-   `cost' is their total cost.
-   The operations are stored in `op' and the corresponding
-   logarithms of the integer coefficients in `log'.
-
-   These are the operations:
-   alg_zero		total := 0;
-   alg_m		total := multiplicand;
-   alg_shift		total := total * coeff
-   alg_add_t_m2		total := total + multiplicand * coeff;
-   alg_sub_t_m2		total := total - multiplicand * coeff;
-   alg_add_factor	total := total * coeff + total;
-   alg_sub_factor	total := total * coeff - total;
-   alg_add_t2_m		total := total * coeff + multiplicand;
-   alg_sub_t2_m		total := total * coeff - multiplicand;
-
-   The first operand must be either alg_zero or alg_m.  */
-
-struct algorithm
-{
-  struct mult_cost cost;
-  short ops;
-  /* The size of the OP and LOG fields are not directly related to the
- word size, but the worst-case algorithms will be if we have few
- consecutive ones or zeros, i.e., a multiplicand like 10101010101...
- In that case we will generate shift-by-2, add, shift-by-2, add,...,
- in total wordsize operations.  */
-  enum alg_code op[MAX_BITS_PER_WORD];
-  char log[MAX_BITS_PER_WORD];
-};
-
 /* The entry for our multiplication cache/hash table.  */
 struct alg_hash_entry {
   /* The number we are multiplying by.  */
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 6645a535b3eef9624e6f3ce61d2fcf864d1cf574..22564fa423aec52febef6220d3f59a82e09b118a 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2482,16 +2482,9 @@ expand_variable_shift (enum tree_code code, machine_mode mode, rtx shifted,
 }
 
 
-/* Indicates the type of fixup needed after a constant multiplication.
-   BASIC_VARIANT means no fixup is needed, NEGATE_VARIANT means that
-   the result should be negated, and ADD_VARIANT means that the
-   multiplicand should be added to the result.  */
-enum mult_variant {basic_variant, negate_variant, add_variant};
 
 static void synth_mult (struct algorithm *, unsigned HOST_WIDE_INT,
 			const struct mult_cost *, machine_mode mode);
-static bool choose_mult_variant (machine_mode,

Re: [PATCH] Add ggc-tests.c

2016-06-13 Thread Ulrich Weigand

Gerald Pfeifer wrote:

> The source code of need_finalization_p in ggc.h reads
> 
>template
>static inline bool
>need_finalization_p ()
>{
>#if GCC_VERSION >= 4003
>  return !__has_trivial_destructor (T);
>#else
>  return true;
>#endif
>}
> 
> which means your self test is broken by design for any compiler
> that is not GCC in at least version 4.3, isn't it?

Just to confirm that I'm seeing the same failure on my SPU
daily build machine, which is running RHEL 5 with a host
compiler of GCC 4.1.2.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

[PATCH, i386]: Introduce __builtin_signbitq to use SSE4.1 PTEST insn

2016-06-13 Thread Uros Bizjak

Hello!

Attached patch intriduces __builtin_signbitq built-in function, so the
compiler will be able to use SSE4.1 PTEST instruction to determine
sign bit of __float128 value.

The patch introduces complete infrastructure, including fallback to
__signbittf2 libgcc function for non-SSE4.1 targets.

I have changed libquadmath to use __builtin_signbitq, and there were
numerous places, where the call to signbitq + test + conditional jump
reduced to e.g.:

e0d8:66 0f 38 17 35 4f a6 ptest  0x1a64f(%rip),%xmm6
 # 28730 <_fini+0x24>
e0df:01 00
e0e1:74 19je e0fc
<__quadmath_kernel_sincosq+0x24c>

2016-06-13  Uros Bizjak  

* config/i386/i386-builtin-types.def (INT_FTYPE_FLOAT128):
New function type.
* config/i386/i386.c (enum ix86_builtins) [IX86_BUILTIN_SIGNBITQ]: New.
(ix86_init_builtins): Add __builtin_signbitq function.
(ix86_expand_args_builtin): Handle INT_FTYPE_FLOAT128.
(ix86_expand_builtin): Handle IX86_BUILTIN_SIGNBITQ.
* config/i386/i386.md (signbittf2): New expander.
* config/i386/sse.md (ptesttf2): New insn pattern.
* doc/extend.texi (x86 Built-in Functions): Document
__builtin_signbitq.

libgcc/ChangeLog:

2016-06-13  Uros Bizjak  

* config.host (i[34567]86-*-* | x86_64-*-*): Always include
i386/${host_address}/t-softfp in tmake_file.
* config/i386/32/t-softfp: Update comment for __builtin_copysignq.
* config/i386/32/tf-signs.c: Add __signbittf2 fallback function.
* config/i386/64/t-softfp: New file.
* config/i386/64/tf-signs.c: Ditto.
* config/i386/libgcc-bsd.ver: Add __signbittf2.
* config/i386/libgcc-glibc.ver: Ditto.
* config/i386/libgcc-sol2.ver: Ditto.

testsuite/ChangeLog:

2016-06-13  Uros Bizjak  

* gcc.target/i386/float128-3.c: New test.
* gcc.target/i386/quad-sse4.c: Ditto.
* gcc.target/i386/quad-sse.c: Use -msse instead of -msse2.
Update scan strings.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32} with and without "--with-arch=corei7 --with-cpu=corei7"
configured compiler. The functionality was also tested by
__builtin_signbitq amended libquadmath library, where ptest insn
generation and a fallback to __signbittf2 support function were
exercised.

Committed to mainline SVN.

Uros.
Index: gcc/config/i386/i386-builtin-types.def
===
--- gcc/config/i386/i386-builtin-types.def  (revision 237380)
+++ gcc/config/i386/i386-builtin-types.def  (working copy)
@@ -202,6 +202,7 @@ DEF_FUNCTION_TYPE (INT, V8QI)
 DEF_FUNCTION_TYPE (INT, V8SF)
 DEF_FUNCTION_TYPE (INT, V32QI)
 DEF_FUNCTION_TYPE (INT, PCCHAR)
+DEF_FUNCTION_TYPE (INT, FLOAT128)
 DEF_FUNCTION_TYPE (INT64, INT64)
 DEF_FUNCTION_TYPE (INT64, V2DF)
 DEF_FUNCTION_TYPE (INT64, V4SF)
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 237380)
+++ gcc/config/i386/i386.c  (working copy)
@@ -32722,6 +32722,7 @@ enum ix86_builtins
   IX86_BUILTIN_NANSQ,
   IX86_BUILTIN_FABSQ,
   IX86_BUILTIN_COPYSIGNQ,
+  IX86_BUILTIN_SIGNBITQ,
 
   /* Vectorizer support builtins.  */
   IX86_BUILTIN_CPYSGNPS,
@@ -33983,6 +33984,8 @@ static const struct builtin_description bdesc_args
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mulv2siv2di3, 
"__builtin_ia32_pmuldq128", IX86_BUILTIN_PMULDQ128, UNKNOWN, (int) 
V2DI_FTYPE_V4SI_V4SI },
   { OPTION_MASK_ISA_SSE4_1, CODE_FOR_mulv4si3, "__builtin_ia32_pmulld128", 
IX86_BUILTIN_PMULLD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI },
 
+  { OPTION_MASK_ISA_SSE4_1, CODE_FOR_signbittf2, 0, IX86_BUILTIN_SIGNBITQ, 
UNKNOWN, (int) INT_FTYPE_FLOAT128 },
+
   /* SSE4.1 */
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundpd, "__builtin_ia32_roundpd", 
IX86_BUILTIN_ROUNDPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_INT },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, "__builtin_ia32_roundps", 
IX86_BUILTIN_ROUNDPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_INT },
@@ -38299,6 +38302,13 @@ ix86_init_builtins (void)
   TREE_READONLY (decl) = 1;
   ix86_builtins[(int) IX86_BUILTIN_COPYSIGNQ] = decl;
 
+  ftype = ix86_get_builtin_func_type (INT_FTYPE_FLOAT128);
+  decl = add_builtin_function ("__builtin_signbitq", ftype,
+  IX86_BUILTIN_SIGNBITQ, BUILT_IN_MD,
+  "__signbittf2", NULL_TREE);
+  TREE_READONLY (decl) = 1;
+  ix86_builtins[(int) IX86_BUILTIN_SIGNBITQ] = decl;
+
   ix86_init_tm_builtins ();
   ix86_init_mmx_sse_builtins ();
   ix86_init_mpx_builtins ();
@@ -39128,6 +39138,7 @@ ix86_expand_args_builtin (const struct builtin_des
 case INT_FTYPE_V4SF:
 case INT_FTYPE_V2DF:
 case INT_FTYPE_V32QI:
+case INT_FTYPE_FLOAT128:
 case V16QI_FTYPE_V16QI:
 case V8SI_FTYPE_V8SF:
 case V8SI_FTYPE_V4SI:
@@ -42638,17 +42649,27 @@ rdseed_step:
i < ARRAY_SIZE (bdesc_args);
i++, d++)

[Patch AArch64] Add some more missing intrinsics

2016-06-13 Thread James Greenhalgh


Hi,

Inspired by Jiong's recent work, here are some more missing intrinsics,
and a smoke test for each of them.

This patch covers:

  vcvt_n_f64_s64
  vcvt_n_f64_u64
  vcvt_n_s64_f64
  vcvt_n_u64_f64
  vcvt_f64_s64
  vrecpe_f64
  vcvt_f64_u64
  vrecps_f64

Tested on aarch64-none-elf, and on an internal testsuite for Neon
intrinsics.

Note that the new tests will ICE without the fixups in
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00805.html

OK?

Thanks,
James

---
gcc/ChangeLog

2016-06-10  James Greenhalgh  

* config/aarch64/arm_neon.h (vcvt_n_f64_s64): New.
(vcvt_n_f64_u64): Likewise.
(vcvt_n_s64_f64): Likewise.
(vcvt_n_u64_f64): Likewise.
(vcvt_f64_s64): Likewise.
(vrecpe_f64): Likewise.
(vcvt_f64_u64): Likewise.
(vrecps_f64): Likewise.

gcc/testsuite/ChangeLog

2016-06-10  James Greenhalgh  

* gcc.target/aarch64/vcvt_f64_1.c: New.
* gcc.target/aarch64/vcvt_n_f64_1.c: New.
* gcc.target/aarch64/vrecp_f64_1.c: New.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index f70b6d3..2f90938 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -12447,6 +12447,20 @@ vcvt_n_f32_u32 (uint32x2_t __a, const int __b)
   return __builtin_aarch64_ucvtfv2si_sus (__a, __b);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vcvt_n_f64_s64 (int64x1_t __a, const int __b)
+{
+  return (float64x1_t)
+{ __builtin_aarch64_scvtfdi (vget_lane_s64 (__a, 0), __b) };
+}
+
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vcvt_n_f64_u64 (uint64x1_t __a, const int __b)
+{
+  return (float64x1_t)
+{ __builtin_aarch64_ucvtfdi_sus (vget_lane_u64 (__a, 0), __b) };
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcvtq_n_f32_s32 (int32x4_t __a, const int __b)
 {
@@ -12509,6 +12523,20 @@ vcvt_n_u32_f32 (float32x2_t __a, const int __b)
   return __builtin_aarch64_fcvtzuv2sf_uss (__a, __b);
 }
 
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vcvt_n_s64_f64 (float64x1_t __a, const int __b)
+{
+  return (int64x1_t)
+{ __builtin_aarch64_fcvtzsdf (vget_lane_f64 (__a, 0), __b) };
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcvt_n_u64_f64 (float64x1_t __a, const int __b)
+{
+  return (uint64x1_t)
+{ __builtin_aarch64_fcvtzudf_uss (vget_lane_f64 (__a, 0), __b) };
+}
+
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vcvtq_n_s32_f32 (float32x4_t __a, const int __b)
 {
@@ -12571,6 +12599,18 @@ vcvt_f32_u32 (uint32x2_t __a)
   return __builtin_aarch64_floatunsv2siv2sf ((int32x2_t) __a);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vcvt_f64_s64 (int64x1_t __a)
+{
+  return (float64x1_t) { vget_lane_s64 (__a, 0) };
+}
+
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vcvt_f64_u64 (uint64x1_t __a)
+{
+  return (float64x1_t) { vget_lane_u64 (__a, 0) };
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcvtq_f32_s32 (int32x4_t __a)
 {
@@ -20659,6 +20699,12 @@ vrecpe_f32 (float32x2_t __a)
   return __builtin_aarch64_frecpev2sf (__a);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vrecpe_f64 (float64x1_t __a)
+{
+  return (float64x1_t) { vrecped_f64 (vget_lane_f64 (__a, 0)) };
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrecpeq_f32 (float32x4_t __a)
 {
@@ -20691,6 +20737,13 @@ vrecps_f32 (float32x2_t __a, float32x2_t __b)
   return __builtin_aarch64_frecpsv2sf (__a, __b);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vrecps_f64 (float64x1_t __a, float64x1_t __b)
+{
+  return (float64x1_t) { vrecpsd_f64  (vget_lane_f64 (__a, 0),
+   vget_lane_f64 (__b, 0)) };
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrecpsq_f32 (float32x4_t __a, float32x4_t __b)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/vcvt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/vcvt_f64_1.c
new file mode 100644
index 000..b7ee7af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vcvt_f64_1.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "arm_neon.h"
+
+/* For each of these intrinsics, we're mapping to a simple C cast.
+   While the compiler has some freedom in terms of choice of instruction,
+   we'd hope that for this simple case it would always pick the single
+   instruction form given in these tests.  Anything else is likely a
+   regression, so check for an exact instruction pattern and
+   register allocation decision.  */
+
+/* Test that if we have a value already in Advanced-SIMD registers, we use
+   the scalar register forms.  */
+
+float64x1_t

Re: [PATCH, i386]: Introduce __builtin_signbitq to use SSE4.1 PTEST insn

2016-06-13 Thread Uros Bizjak

On Mon, Jun 13, 2016 at 11:54 PM, Joseph Myers  wrote:

>> Attached patch intriduces __builtin_signbitq built-in function, so the
>> compiler will be able to use SSE4.1 PTEST instruction to determine
>> sign bit of __float128 value.
>
> The __builtin_signbit function is type-generic from GCC 6 onwards, so I
> don't see any need for this type-specific function.  (The .md pattern may
> still be useful, of course, for better expansion of type-generic
> __builtin_signbit on float128 arguments.)
>
>> The patch introduces complete infrastructure, including fallback to
>> __signbittf2 libgcc function for non-SSE4.1 targets.
>
> I don't see any need for a libgcc fallback either.  Generic code in GCC
> should always be able to implement signbit using bit-manipulation, without
> needing any library fallback.

The problem is in fact that on x86_64 __float128 values live
exclusively in SSE registers exclusively. Apart from PTEST, there are
no convenient instructions to test bits in high part of the SSE
register. So, we would have to move SSE value to memory, load
high-part to an integer register, test the bit in the integer register
and set the flag in the output register to obtain setCC -> jCC
optimization.

Also, please note that there is no generic support for __float128 or
TFmode optimizations in the compiler. Long-double functions (e.g.
signbitl) that are supported by generic functionality correspond to
80bit XFmode. All bit manipulations involving__float128 have to be
done by hand.

Due to above reasons, I have taken the path that is already
implemented in libgcc (__builtin_fabsq and __builtin_copysignq
fallbacks when SSE is not present). Fallback functions actually
implement exactly the same functionalty as fabsq, copysignq and
signbitq functions in libquadmath. *If* we really want to avoid
fallbacks, it is possible to add RTL code to the relevant expanders,
but it will be quite some work for a questionable gain.

>> I have changed libquadmath to use __builtin_signbitq, and there were
>> numerous places, where the call to signbitq + test + conditional jump
>> reduced to e.g.:
>
> Current glibc systematically uses type-generic classification macros such
> as signbit where they exist in , rather than direct calls to
> __signbitl etc. such as were formerly used.

Please note that we are dealing with __float128 types. In contrast to
float, double and long double, this type is non-standard and not known
to glibc, as evident from the code snippet below:

/* Return nonzero value if sign of X is negative.  */
# ifdef __NO_LONG_DOUBLE_MATH
#  define signbit(x) \
 (sizeof (x) == sizeof (float) ? __signbitf (x) : __signbit (x))
# else
#  define signbit(x) \
 (sizeof (x) == sizeof (float)  \
  ? __signbitf (x)  \
  : sizeof (x) == sizeof (double)  \
  ? __signbit (x) : __signbitl (x))
# endif

> Thus, I don't think changes to use __builtin_signbitq should go into
> libquadmath.  Rather, it should be updated for the past few years' changes
> in glibc (this is long overdue), with some header used in building
> libquadmath being made to define signbit, isfinite etc. to use the
> type-generic built-in functions, and such type-generic macro calls (as in
> glibc) replacing libquadmath's calls to signbitq, finiteq, isinfq etc.

I don't see other way to instruct the compiler to overload e.g.
signbitq. This is non-standard, made-up function name, and the
compiler has no knowledge what to do with it. As far as the compiler
is concerned, it is just a function that happens to have TFmode
arguments.

Uros.

Re: [PATCH, i386]: Introduce __builtin_signbitq to use SSE4.1 PTEST insn

2016-06-13 Thread Joseph Myers

On Mon, 13 Jun 2016, Uros Bizjak wrote:

> Hello!
> 
> Attached patch intriduces __builtin_signbitq built-in function, so the
> compiler will be able to use SSE4.1 PTEST instruction to determine
> sign bit of __float128 value.

The __builtin_signbit function is type-generic from GCC 6 onwards, so I 
don't see any need for this type-specific function.  (The .md pattern may 
still be useful, of course, for better expansion of type-generic 
__builtin_signbit on float128 arguments.)

> The patch introduces complete infrastructure, including fallback to
> __signbittf2 libgcc function for non-SSE4.1 targets.

I don't see any need for a libgcc fallback either.  Generic code in GCC 
should always be able to implement signbit using bit-manipulation, without 
needing any library fallback.

> I have changed libquadmath to use __builtin_signbitq, and there were
> numerous places, where the call to signbitq + test + conditional jump
> reduced to e.g.:

Current glibc systematically uses type-generic classification macros such 
as signbit where they exist in , rather than direct calls to 
__signbitl etc. such as were formerly used.

Thus, I don't think changes to use __builtin_signbitq should go into 
libquadmath.  Rather, it should be updated for the past few years' changes 
in glibc (this is long overdue), with some header used in building 
libquadmath being made to define signbit, isfinite etc. to use the 
type-generic built-in functions, and such type-generic macro calls (as in 
glibc) replacing libquadmath's calls to signbitq, finiteq, isinfq etc.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, i386]: Introduce __builtin_signbitq to use SSE4.1 PTEST insn

2016-06-13 Thread Uros Bizjak

On Tue, Jun 14, 2016 at 12:50 AM, Uros Bizjak  wrote:
> On Mon, Jun 13, 2016 at 11:54 PM, Joseph Myers  
> wrote:
>
>>> Attached patch intriduces __builtin_signbitq built-in function, so the
>>> compiler will be able to use SSE4.1 PTEST instruction to determine
>>> sign bit of __float128 value.
>>
>> The __builtin_signbit function is type-generic from GCC 6 onwards, so I
>> don't see any need for this type-specific function.  (The .md pattern may
>> still be useful, of course, for better expansion of type-generic
>> __builtin_signbit on float128 arguments.)
>>
>>> The patch introduces complete infrastructure, including fallback to
>>> __signbittf2 libgcc function for non-SSE4.1 targets.
>>
>> I don't see any need for a libgcc fallback either.  Generic code in GCC
>> should always be able to implement signbit using bit-manipulation, without
>> needing any library fallback.

After some more head scratching, I have reverted my v1 patch and
committed the following revision. It works like magic, without any
libgcc fallbacks.

Thanks for guiding me to the right direction, and sorry for the troubles!

2016-06-13  Uros Bizjak  

* config/i386/i386.md (signbittf2): New expander.
* config/i386/sse.md (ptesttf2): New insn pattern.

testsuite/ChangeLog:

2016-06-13  Uros Bizjak  

* gcc.target/i386/float128-3.c: New test.
* gcc.target/i386/quad-sse4.c: Ditto.
* gcc.target/i386/quad-sse.c: Use -msse instead of -msse2.
Update scan strings.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 237382)
+++ config/i386/i386.md (working copy)
@@ -16198,6 +16198,22 @@
   DONE;
 })
 
+(define_expand "signbittf2"
+  [(use (match_operand:SI 0 "register_operand"))
+   (use (match_operand:TF 1 "register_operand"))]
+  "TARGET_SSE4_1"
+{
+  rtx mask = ix86_build_signbit_mask (TFmode, 0, 0);
+  rtx scratch = gen_reg_rtx (QImode);
+
+  emit_insn (gen_ptesttf2 (operands[1], mask));
+  ix86_expand_setcc (scratch, NE,
+gen_rtx_REG (CCZmode, FLAGS_REG), const0_rtx);
+
+  emit_insn (gen_zero_extendqisi2 (operands[0], scratch));
+  DONE;
+})
+
 (define_expand "signbitxf2"
   [(use (match_operand:SI 0 "register_operand"))
(use (match_operand:XF 1 "register_operand"))]
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 237380)
+++ config/i386/sse.md  (working copy)
@@ -15212,6 +15212,19 @@
  (const_string "*")))
(set_attr "mode" "")])
 
+(define_insn "ptesttf2"
+  [(set (reg:CC FLAGS_REG)
+   (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
+   (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
+  UNSPEC_PTEST))]
+  "TARGET_SSE4_1"
+  "%vptest\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssecomi")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "mode" "TI")])
+
 (define_insn "_round"
   [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
(unspec:VF_128_256
Index: testsuite/gcc.target/i386/float128-3.c
===
--- testsuite/gcc.target/i386/float128-3.c  (nonexistent)
+++ testsuite/gcc.target/i386/float128-3.c  (working copy)
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -msse4.1" } */
+/* { dg-require-effective-target sse4 } */
+
+#include "sse4_1-check.h"
+
+int signbit (__float128);
+
+extern void abort (void);
+
+static void
+sse4_1_test (void)
+{
+  static volatile __float128 a;
+
+  a = -1.2q;
+  if (!signbit (a))
+abort ();
+
+  a = 1.2q;
+  if (signbit (a))
+abort ();
+}
Index: gcc/testsuite/gcc.target/i386/quad-sse.c
===
--- gcc/testsuite/gcc.target/i386/quad-sse.c(revision 237380)
+++ gcc/testsuite/gcc.target/i386/quad-sse.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse2" } */
+/* { dg-options "-O2 -msse" } */
 
 __float128 x, y;
 
@@ -18,4 +18,4 @@ __float128 test_3(void)
   return __builtin_copysignq (x, y);
 }
 
-/* { dg-final { scan-assembler-not "call.*(neg|fabs|copysign)" } } */
+/* { dg-final { scan-assembler-not "neg|fabs|copysign" } } */
Index: testsuite/gcc.target/i386/quad-sse4.c
===
--- testsuite/gcc.target/i386/quad-sse4.c   (nonexistent)
+++ testsuite/gcc.target/i386/quad-sse4.c   (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse4.1" } */
+
+int signbit (__float128);
+
+__float128 x;
+
+int __test_1(void)
+{
+  return signbit (x);
+}
+
+/* { dg-final { scan-assembler-not "signbit" } } */

Re: PR 71181 Avoid rehash after reserve

2016-06-13 Thread François Dumont


Hi

I eventually would like to propose the attached patch.

In tr1 I made sure we use a special past-the-end iterator that 
makes usage of lower_bound result without check safe.


PR libstdc++/71181
* include/tr1/hashtable_policy.h
(_Prime_rehash_policy::_M_next_bkt): Make past-the-end iterator
dereferenceable to avoid check on lower_bound result.
(_Prime_rehash_policy::_M_bkt_for_elements): Call latter.
(_Prime_rehash_policy::_M_need_rehash): Likewise.
* src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
Always return a value greater than input value. Set _M_next_resize to
max value when reaching highest prime number.
* src/shared/hashtable-aux.cc (__prime_list): Add comment that sentinel
is useless.
* testsuite/23_containers/unordered_set/hash_policy/71181.cc: New.
* 
testsuite/23_containers/unordered_set/hash_policy/prime_rehash.cc: New.

* testsuite/23_containers/unordered_set/hash_policy/rehash.cc:
Fix indentation.

Tested under Linux x86_64.

François


On 25/05/2016 22:48, François Dumont wrote:

On 25/05/2016 16:01, Jonathan Wakely wrote:

On 22/05/16 17:16 +0200, François Dumont wrote:

Hi

   To fix 71181 problem I propose to change how we deal with reserve 
called with pivot values that is to say prime numbers. Now 
_M_next_bkt always return a value higher than the input value. This 
way when reverse(97) is called we end up with 199 buckets and so 
enough space to store 97 values without rehashing.


   I have integrated in this patch several other enhancements on the 
same subject. Improvement of _M_next_resize management when reaching 
highest bucket number. Remove sentinel value in __prime_list, just 
need to limit range when calling lower_bound.


I don't think the change to __prime_list is safe. If you compile some
code with GCC 5 and then used a libstdc++.so with this change the old
code would still be looking for the sentinel in the array, and would
not find it.

I think it would be safe to leave the old __prime_list unchanged (and
then not need to change anything in tr1/hashtable_policy.h?) and add a
new array with a different name. Existing code compiled with older
versions of GCC would still find __prime_list, but the new code would
use a different array.




What about this version ? tr1 mode still limit search range as it 
should to make sure it doesn't need to check lower_bound result. And 
sentinel is only kept for backward compatibility and commented to make 
that clear. Maybe there is a clearer way to express that sentinel can 
be removed on a future version breaking abi ?


François


diff --git a/libstdc++-v3/include/tr1/hashtable_policy.h b/libstdc++-v3/include/tr1/hashtable_policy.h
index 4ee6d45..24d1a59 100644
--- a/libstdc++-v3/include/tr1/hashtable_policy.h
+++ b/libstdc++-v3/include/tr1/hashtable_policy.h
@@ -420,8 +420,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Prime_rehash_policy::
   _M_next_bkt(std::size_t __n) const
   {
-const unsigned long* __p = std::lower_bound(__prime_list, __prime_list
-		+ _S_n_primes, __n);
+// Past-the-end iterator is made dereferenceable to avoid check on
+// lower_bound result.
+const unsigned long* __p
+  = std::lower_bound(__prime_list, __prime_list + _S_n_primes - 1, __n);
 _M_next_resize = 
   static_cast(__builtin_ceil(*__p * _M_max_load_factor));
 return *__p;
@@ -434,11 +436,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_bkt_for_elements(std::size_t __n) const
   {
 const float __min_bkts = __n / _M_max_load_factor;
-const unsigned long* __p = std::lower_bound(__prime_list, __prime_list
-		+ _S_n_primes, __min_bkts);
-_M_next_resize =
-  static_cast(__builtin_ceil(*__p * _M_max_load_factor));
-return *__p;
+return _M_next_bkt(__builtin_ceil(__min_bkts));
   }
 
   // Finds the smallest prime p such that alpha p > __n_elt + __n_ins.
@@ -462,12 +460,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	if (__min_bkts > __n_bkt)
 	  {
 	__min_bkts = std::max(__min_bkts, _M_growth_factor * __n_bkt);
-	const unsigned long* __p =
-	  std::lower_bound(__prime_list, __prime_list + _S_n_primes,
-			   __min_bkts);
-	_M_next_resize = static_cast
-	  (__builtin_ceil(*__p * _M_max_load_factor));
-	return std::make_pair(true, *__p);
+	return std::make_pair(true,
+  _M_next_bkt(__builtin_ceil(__min_bkts)));
 	  }
 	else 
 	  {
diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
index a5e6520..7cbd364 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -46,22 +46,36 @@ namespace __detail
   {
 // Optimize lookups involving the first elements of __prime_list.
 // (useful to speed-up, eg, constructors)
-static const unsigned char __fast_bkt[12]
-  = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11 };
+static const unsigned char __fast_bkt[13]
+  = { 2, 2, 3, 5, 5,

[C++ PATCH] Fix incomplete type error recovery (PR c++/71516)

2016-06-13 Thread Jakub Jelinek

Hi!

On the following testcase we ICE during error recovery, because
a is first added to the incomplete vars vector, but then is attempted to
be initialized, which results in error and setting its type to
error_mark_node (as the type has been incomplete).
When we try to complete vars, we ICE because TYPE_MAIN_VARIANT expects to
see a type, rather than error_mark_node (with tree checking).

Ok for trunk?  Would this be reasonable to backport too (I mean, it
shouldn't break anything and accessing TYPE_MAIN_VARIANT (error_mark_node)
can crash miserably)?

Bootstrapped/regtested on x86_64-linux and i686-linux.

2016-06-13  Jakub Jelinek  

PR c++/71516
* decl.c (complete_vars): Handle gracefully type == error_mark_node.

* g++.dg/init/pr71516.C: New test.

--- gcc/cp/decl.c.jj2016-06-09 22:45:57.0 +0200
+++ gcc/cp/decl.c   2016-06-13 17:05:37.742493834 +0200
@@ -15029,8 +15029,9 @@ complete_vars (tree type)
  tree var = iv->decl;
  tree type = TREE_TYPE (var);
 
- if (TYPE_MAIN_VARIANT (strip_array_types (type))
- == iv->incomplete_type)
+ if (type != error_mark_node
+ && (TYPE_MAIN_VARIANT (strip_array_types (type))
+ == iv->incomplete_type))
{
  /* Complete the type of the variable.  The VAR_DECL itself
 will be laid out in expand_expr.  */
--- gcc/testsuite/g++.dg/init/pr71516.C.jj  2016-06-13 17:08:07.734548282 
+0200
+++ gcc/testsuite/g++.dg/init/pr71516.C 2016-06-13 17:07:20.0 +0200
@@ -0,0 +1,10 @@
+// PR c++/71516
+// { dg-do compile }
+
+struct A;  // { dg-message "forward declaration of" }
+struct B
+{ 
+  static A a;
+};
+A B::a = A();  // { dg-error "has initializer but incomplete type|invalid use 
of incomplete type" }
+struct A {};

Jakub

Re: [PATCH 3/3][AArch64] Emit division using the Newton series

2016-06-13 Thread Evandro Menezes


On 06/13/16 05:15, James Greenhalgh wrote:
Thanks for your patience on this patch series. 


Just checked the series in.

Thank y'all for your assistance and patience.

Cheers,

--
Evandro Menezes

1 2 >

1 - 100 of 112 matches

Mail list logo