date:20151120

On Thu, Nov 19, 2015 at 08:26:45AM -0800, Cesar Philippidis wrote:
>   (gfc_oacc_routine_name): New struct;

Full stop instead of semicolon.

> diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
> index 1f6311c..e321072 100644
> --- a/gcc/tree-nested.c
> +++ b/gcc/tree-nested.c
> @@ -1106,6 +1106,9 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
> walk_stmt_info *wi)
>   case OMP_CLAUSE_NUM_TASKS:
>   case OMP_CLAUSE_HINT:
>   case OMP_CLAUSE__CILK_FOR_COUNT_:
> + case OMP_CLAUSE_NUM_GANGS:
> + case OMP_CLAUSE_NUM_WORKERS:
> + case OMP_CLAUSE_VECTOR_LENGTH:
> wi->val_only = true;
> wi->is_lhs = false;
> convert_nonlocal_reference_op (_CLAUSE_OPERAND (clause, 0),
> @@ -1173,6 +1176,10 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
> walk_stmt_info *wi)
>   case OMP_CLAUSE_THREADS:
>   case OMP_CLAUSE_SIMD:
>   case OMP_CLAUSE_DEFAULTMAP:
> + case OMP_CLAUSE_GANG:
> + case OMP_CLAUSE_WORKER:
> + case OMP_CLAUSE_VECTOR:

This looks wrong.  OMP_CLAUSE_GANG has 2 arguments, OMP_CLAUSE_WORKER and
OMP_CLAUSE_VECTOR one argument, if you use a non-local decl or local decl
that is referenced by a nested routine in those operands, it won't be
handled properly.

> @@ -1830,6 +1840,10 @@ convert_local_omp_clauses (tree *pclauses, struct 
> walk_stmt_info *wi)
>   case OMP_CLAUSE_THREADS:
>   case OMP_CLAUSE_SIMD:
>   case OMP_CLAUSE_DEFAULTMAP:
> + case OMP_CLAUSE_GANG:
> + case OMP_CLAUSE_WORKER:
> + case OMP_CLAUSE_VECTOR:
> + case OMP_CLAUSE_SEQ:

Ditto.

Otherwise LGTM.

Jakub

Re: [PATCH] Add clang-format config to contrib folder

2015-11-20 Thread Pedro Alves

On 11/20/2015 11:33 AM, Martin Liška wrote:

> Hi Pedro.

Hi Martin.

> Fully agree with you, there's suggested patch.
> Hope I can install the patch for trunk?

I'd call it obvious.  :-)

Thanks,
Pedro Alves

Re: [PATCH, VECTOR ABI] Add attribute((simd)) to GCC.


Hi Kirill,

On 18/11/15 14:11, Kirill Yukhin wrote:

Hello Andreas, Devid.

On 18 Nov 10:45, Andreas Schwab wrote:

Kirill Yukhin  writes:


diff --git a/gcc/testsuite/c-c++-common/attr-simd.c 
b/gcc/testsuite/c-c++-common/attr-simd.c
new file mode 100644
index 000..b4eda34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-simd.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-optimized" } */
+
+__attribute__((__simd__))
+extern
+int simd_attr (void)
+{
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" 
} } */

On ia64:

FAIL: c-c++-common/attr-simd.c  -Wc++-compat   scan-tree-dump optimized "simd_attr[ 
\\t]simdclone|vector"
FAIL: c-c++-common/attr-simd.c  -Wc++-compat   scan-tree-dump optimized "simd_attr2[ 
\\t]simdclone|vector"

$ grep simd_attr attr-simd.c.194t.optimized
;; Function simd_attr (simd_attr, funcdef_no=0, decl_uid=1389, cgraph_uid=0, 
symbol_order=0)
simd_attr ()
;; Function simd_attr2 (simd_attr2, funcdef_no=1, decl_uid=1392, cgraph_uid=1, 
symbol_order=1)
simd_attr2 ()

As far as vABI is supported on x86_64/i?86 only, I am going to enable mentioned 
`scan-tree-dump' only
for these targets. This should cure both IA64 and Power.

Concerning attr-simd-3.c. It is known issue: PR68158.
And I believe this test should work everywhere as far as PR is resolved.
I'll put xfail into the test.
Which will lead to (in g++.log):
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute 
does not apply to types\
  [-Wattributes]^M
output is:
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute 
does not apply to types\
  [-Wattributes]^M

XFAIL: c-c++-common/attr-simd-3.c  -std=gnu++98 PR68158 (test for errors, line 
5)
FAIL: c-c++-common/attr-simd-3.c  -std=gnu++98 (test for excess errors)
Excess errors:
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute 
does not apply to types\
  [-Wattributes]

Patch in the bottom.

gcc/tessuite/
* c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error.


This test fails on bare-metal targets that don't support -fcilkplus or -pthread.
Would you consider moving them to the cilkplus testing directory or adding an 
appropriate
effective target check?

Thanks,
Kyrill


* c-c++-common/attr-simd.c: Limit scan of dump to x86_64/i?86.


Andreas.

--
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c 
b/gcc/testsuite/c-c++-common/attr-simd-3.c
index 2bbdf04..35dd4c0 100644
--- a/gcc/testsuite/c-c++-common/attr-simd-3.c
+++ b/gcc/testsuite/c-c++-common/attr-simd-3.c
@@ -2,4 +2,4 @@
  /* { dg-options "-fcilkplus" } */
  /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not 
declared in this scope" } */

-void f () __attribute__((__simd__, __vector__)); /* { dg-error "in the same 
function marked as a Cilk Plus" } */
+void f () __attribute__((__simd__, __vector__)); /* { dg-error "in the same function marked 
as a Cilk Plus" "PR68158" { xfail c++ } } */
diff --git a/gcc/testsuite/c-c++-common/attr-simd.c 
b/gcc/testsuite/c-c++-common/attr-simd.c
index 61974e3..7674588 100644
--- a/gcc/testsuite/c-c++-common/attr-simd.c
+++ b/gcc/testsuite/c-c++-common/attr-simd.c
@@ -11,7 +11,7 @@ int simd_attr (void)
return 0;
  }

-/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" 
} } */
+/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" 
{ target { i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVbN4_simd_attr:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVbM4_simd_attr:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVcN4_simd_attr:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
@@ -29,7 +29,7 @@ int simd_attr2 (void)
return 0;
  }

-/* { dg-final { scan-tree-dump "simd_attr2\[ \\t\]simdclone|vector" 
"optimized" } } */
+/* { dg-final { scan-tree-dump "simd_attr2\[ \\t\]simdclone|vector" 
"optimized" { target { i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVbN4_simd_attr2:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVbM4_simd_attr2:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */
  /* { dg-final { scan-assembler-times "_ZGVcN4_simd_attr2:" 1 { target { 
i?86-*-* x86_64-*-* } } } } */

Go patch committed: add receiver type to specific type function name

This patch to the Go frontend fixes the case where two different
methods on different types with the same method name both define a
type internally with the same name where the type requires a specific
type hash or equality function.  Before this patch those functions
would get the same, causing a compilation error.  This patch gives
them different names using the same approach as is done for the type
descriptor.  I sent out a test case to the master Go testsuite in
https://golang.org/cl/17081.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline and GCC 5 branch.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 230463)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e3aef41ce0c5be81e2589e60d9cb0db1516e9e2d
+dfa74d975884f363c74d6a66a37b1703093fdba6
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 230463)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -1769,7 +1769,16 @@ Type::specific_type_functions(Gogo* gogo
   const Named_object* in_function = name->in_function();
   if (in_function != NULL)
{
- base_name += '$' + Gogo::unpack_hidden_name(in_function->name());
+ base_name.append(1, '$');
+ const Typed_identifier* rcvr =
+   in_function->func_value()->type()->receiver();
+ if (rcvr != NULL)
+   {
+ Named_type* rcvr_type = rcvr->type()->deref()->named_type();
+ base_name.append(Gogo::unpack_hidden_name(rcvr_type->name()));
+ base_name.append(1, '$');
+   }
+ base_name.append(Gogo::unpack_hidden_name(in_function->name()));
  if (index > 0)
{
  char buf[30];

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def


On 20/11/15 11:37, Richard Biener wrote:

   I'd rather make loop_optimizer_init do nothing
if requested flags are already set and no fixup is needed



Thus sth like

Index: gcc/loop-init.c
===
--- gcc/loop-init.c (revision 230649)
+++ gcc/loop-init.c (working copy)
@@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags)
calculate_dominance_info (CDI_DOMINATORS);

if (!needs_fixup)
-   checking_verify_loop_structure ();
+   {
+ checking_verify_loop_structure ();
+ if (loops_state_satisfies_p (flags))
+   goto out;


What about flags that are present in the loops state, but not requested 
in flags? Should we try to clear those flags?


Thanks,
- Tom


+   }

/* Clear all flags.  */
if (recorded_exits)
@@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags)
/* Apply flags to loops.  */
apply_loop_flags (flags);

+  checking_verify_loop_structure ();
+
+out:
/* Dump loops.  */
flow_loops_dump (dump_file, NULL, 1);

-  checking_verify_loop_structure ();
-
timevar_pop (TV_LOOP_INIT);
  }

Re: [PATCH, VECTOR ABI] Add attribute((simd)) to GCC.

2015-11-20 Thread Kirill Yukhin

Hello Kyrill,
On 20 Nov 12:15, Kyrill Tkachov wrote:
> >gcc/tessuite/
> > * c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error.
> 
> This test fails on bare-metal targets that don't support -fcilkplus or 
> -pthread.
> Would you consider moving them to the cilkplus testing directory or adding an 
> appropriate
> effective target check?
I think so. I'll commit change in the bottom.

gcc/testsuite/
* c-c++-common/attr-simd-3.c: Require Cilk Plus in effective target.

> 
> Thanks,
> Kyrill

--
Thanks, K

diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c 
b/gcc/testsuite/c-c++-common/attr-simd-3.c
index 35dd4c0..c7533f0 100644
--- a/gcc/testsuite/c-c++-common/attr-simd-3.c
+++ b/gcc/testsuite/c-c++-common/attr-simd-3.c
@@ -1,3 +1,4 @@
+/* { dg-require-effective-target cilkplus } */
 /* { dg-do compile } */
 /* { dg-options "-fcilkplus" } */
 /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was 
not declared in this scope" } */

Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt

2015-11-20 Thread Bernd Schmidt


On 11/19/2015 12:49 AM, Jeff Law wrote:

On 11/18/2015 12:16 PM, Bernd Schmidt wrote:

I don't think so, actually. One safe option would be to rip it out and
just stop transforming this case, but let's start by looking at the code
just a bit further down, calling noce_can_store_speculate. This was
added later than the code we're discussing, and it tries to verify that
the same memory location will unconditionally be written to at a point
later than the one we're trying to convert

And if we dig into that thread, Ian suggests this isn't terribly
important anyway.

However, if we go even further upthread, we find an assertion from
Michael that this is critical for 456.hmmer and references BZ 27313.


Cc'd in case he has additional input.


https://gcc.gnu.org/ml/gcc-patches/2007-08/msg01978.html

Sadly, no testcase was included.


BZ27313 is marked as fixed by the introduction of the tree cselim pass, 
thus the problem won't even be seen at RTL level.
I'm undecided on whether cs-elim is safe wrt the store speculation vs 
locks concerns raised in the thread discussing Ian's 
noce_can_store_speculate_p, but that's not something we have to consider 
to solve the problem at hand.



So if it weren't for the assertion that it's critical for hmmr, I'd be
convinced that just ripping out was the right thing to do.

Can you look at 27313 and hmmr and see if there's an impact.  Just maybe
the critical stuff for those is handled by the tree if converter and we
can just rip out the clearly incorrect RTL bits without regressing
anything performance-wise.  If there is an impact, then I think we have
to look at either improving the tree bits (so we can remove the rtl
bits) or we have to do real dataflow analysis in the rtl bits.


So I made this change:

   if (!set_b && MEM_P (orig_x))
-{
-  /* Disallow the "if (...) x = a;" form (implicit "else x = x;")
-for optimizations if writing to x may trap or fault,
-i.e. it's a memory other than a static var or a stack slot,
-is misaligned on strict aligned machines or is read-only.  If
-x is a read-only memory, then the program is valid only if we
-avoid the store into it.  If there are stores on both the
-THEN and ELSE arms, then we can go ahead with the conversion;
-either the program is broken, or the condition is always
-false such that the other memory is selected.  */
-  if (noce_mem_write_may_trap_or_fault_p (orig_x))
-   return FALSE;
-
-  /* Avoid store speculation: given "if (...) x = a" where x is a
-MEM, we only want to do the store if x is always set
-somewhere in the function.  This avoids cases like
-  if (pthread_mutex_trylock(mutex))
-++global_variable;
-where we only want global_variable to be changed if the mutex
-is held.  FIXME: This should ideally be expressed directly in
-RTL somehow.  */
-  if (!noce_can_store_speculate_p (test_bb, orig_x))
-   return FALSE;
-}
+return FALSE;

As far as I can tell hmmer and the 27313 testcase are unaffected at -O2 
(if anything, hmmer was very slightly faster afterwards). The run wasn't 
super-scientific, but I wouldn't have expected anything else given the 
existence of cs-elim.


Ok to do the above, removing all the bits made unnecessary (including 
memory_must_be_modified_in_insn_p in alias.c)?



Bernd

Re: [PATCH][ARM] Do not expand movmisalign pattern if not in 32-bit mode

2015-11-20 Thread Ramana Radhakrishnan

On 11/11/15 16:10, Kyrill Tkachov wrote:
> Hi all,
> 
> The attached testcase ICEs when compiled with -march=armv6k -mthumb -Os or 
> any march
> for which -mthumb gives Thumb1:
>  error: unrecognizable insn:
>  }
>  ^
> (insn 13 12 14 5 (set (reg:SI 116 [ x ])
> (unspec:SI [
> (mem:SI (reg/v/f:SI 112 [ s ]) [0 MEM[(unsigned char 
> *)s_1(D)]+0 S4 A8])
> ] UNSPEC_UNALIGNED_LOAD)) besttry.c:9 -1
>  (nil))
> 
> The problem is that the expands a movmisalign pattern but the resulting 
> unaligned loads don't
> match any define_insn because they are gated on unaligned_access && 
> TARGET_32BIT.
> The unaligned_access expander is gated only on unaligned_access.
> 
> This small patch fixes the issue by turning off unaligned_access if 
> TARGET_32BIT is not true.
> We can then remove TARGET_32BIT from the unaligned load/store patterns 
> conditions as a cleanup.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2015-11-11  Kyrylo Tkachov  
> 
> * config/arm/arm.c (arm_option_override): Require TARGET_32BIT
> for unaligned_access.
> * config/arm/arm.md (unaligned_loadsi): Remove redundant TARGET_32BIT
> from matching condition.
> (unaligned_loadhis): Likewise.
> (unaligned_loadhiu): Likewise.
> (unaligned_storesi): Likewise.
> (unaligned_storehi): Likewise.
> 
> 2015-11-11  Kyrylo Tkachov  
> 
> * gcc.target/arm/armv6-unaligned-load-ice.c: New test.


This means we don't have unaligned access for some cores in Thumb1 for armv6. 
I'd rather not have the ICE instead of trying to find testing coverage on such 
cores now.

OK.


regards
Ramana

[Patch] sync top level configure with binutils-gdb

2015-11-20 Thread Tristan Gingold

This patch was pushed on binutils-gdb repo, so I also commit it on gcc.

Tristan.


2015-11-20  Tristan Gingold  

Sync with binutils-gdb:
2015-11-20  Tristan Gingold  

* configure.ac: Add aarch64-*-darwin* and arm-*-darwin*.
* configure: Regenerate.

Index: configure
===
--- configure   (revision 230657)
+++ configure   (working copy)
@@ -3686,6 +3686,14 @@
 case "${target}" in
   *-*-chorusos)
 ;;
+  aarch64-*-darwin*)
+noconfigdirs="$noconfigdirs ld gas gdb gprof"
+noconfigdirs="$noconfigdirs sim target-rda"
+;;
+  arm-*-darwin*)
+noconfigdirs="$noconfigdirs ld gas gdb gprof"
+noconfigdirs="$noconfigdirs sim target-rda"
+;;
   powerpc-*-darwin*)
 noconfigdirs="$noconfigdirs ld gas gdb gprof"
 noconfigdirs="$noconfigdirs sim target-rda"
Index: configure.ac
===
--- configure.ac(revision 230657)
+++ configure.ac(working copy)
@@ -1023,6 +1023,14 @@
 case "${target}" in
   *-*-chorusos)
 ;;
+  aarch64-*-darwin*)
+noconfigdirs="$noconfigdirs ld gas gdb gprof"
+noconfigdirs="$noconfigdirs sim target-rda"
+;;
+  arm-*-darwin*)
+noconfigdirs="$noconfigdirs ld gas gdb gprof"
+noconfigdirs="$noconfigdirs sim target-rda"
+;;
   powerpc-*-darwin*)
 noconfigdirs="$noconfigdirs ld gas gdb gprof"
 noconfigdirs="$noconfigdirs sim target-rda"

Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage

On Fri, Nov 20, 2015 at 1:33 PM, Alan Hayward  wrote:
>
>
> On 20/11/2015 11:00, "Richard Biener"  wrote:
>
>>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward 
>>wrote:
>>> When vectorising a integer induction condition reduction,
>>> is_nonwrapping_integer_induction ends up with different values for base
>>> during the analysis and build phases. In the first it is an INTEGER_CST,
>>> in the second the loop has been vectorised out and the base is now a
>>> variable.
>>>
>>> This results in the analysis and build stage detecting the
>>> STMT_VINFO_VEC_REDUCTION_TYPE as different types.
>>>
>>> The easiest way to fix this is to only check for integer induction
>>> conditions on the analysis stage.
>>
>>I don't like this.  For the evolution part we have added
>>STMT_VINFO_LOOP_PHI_EVOLUTION_PART.  If you now need
>>the original initial value as well then just save it.
>>
>>Or if you really want to go with the hack then please do not call
>>is_nonwrapping_integer_induction with vec_stmt != NULL but
>>initialize cond_expr_is_nonwrapping_integer_induction from
>>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
>>
>>The hack also lacks a comment.
>>
>
> Ok. I've gone for a combination of both:
>
> I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE.
>
> I've removed the vec_stmt != NULL checks.
>
> I've moved the call to is_nonwrapping_integer_induction until after the
> vect_is_simple_reduction check. I never liked that I had
> is_nonwrapping_integer_induction early in the function, and think this
> looks better.

It looks better but the comment for loop_phi_evolution_base is wrong.
The value is _not_ "correct" after prologue peeling (unless you
update it there to a non-constant expr).  It is conservatively "correct"
for is_nonwrapping_integer_induction though.  Which is why I'd
probably rename it to STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED
to reflect this (and similar to STMT_VINFO_NITERS_UNCHANGED).

Ok with that change.

Thanks,
Richard.

>
> Alan.
>

Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage

2015-11-20 Thread Alan Hayward



On 20/11/2015 13:47, "Richard Biener"  wrote:

>On Fri, Nov 20, 2015 at 1:33 PM, Alan Hayward 
>wrote:
>>
>>
>>On 20/11/2015 11:00, "Richard Biener"  wrote:
>>
>>>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward 
>>>wrote:
When vectorising a integer induction condition reduction,
is_nonwrapping_integer_induction ends up with different values for base
during the analysis and build phases. In the first it is an
INTEGER_CST,
in the second the loop has been vectorised out and the base is now a
variable.

This results in the analysis and build stage detecting the
STMT_VINFO_VEC_REDUCTION_TYPE as different types.

The easiest way to fix this is to only check for integer induction
conditions on the analysis stage.
>>>
>>>I don't like this.  For the evolution part we have added
>>>STMT_VINFO_LOOP_PHI_EVOLUTION_PART.  If you now need
>>>the original initial value as well then just save it.
>>>
>>>Or if you really want to go with the hack then please do not call
>>>is_nonwrapping_integer_induction with vec_stmt != NULL but
>>>initialize cond_expr_is_nonwrapping_integer_induction from
>>>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
>>>
>>>The hack also lacks a comment.
>>>
>>
>>Ok. I've gone for a combination of both:
>>
>>I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE.
>>
>>I've removed the vec_stmt != NULL checks.
>>
>>I've moved the call to is_nonwrapping_integer_induction until after the
>>vect_is_simple_reduction check. I never liked that I had
>>is_nonwrapping_integer_induction early in the function, and think this
>>looks better.
>
>It looks better but the comment for loop_phi_evolution_base is wrong.
>The value is _not_ "correct" after prologue peeling (unless you
>update it there to a non-constant expr).  It is conservatively "correct"
>for is_nonwrapping_integer_induction though.  Which is why I'd
>probably rename it to STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED
>to reflect this (and similar to STMT_VINFO_NITERS_UNCHANGED).
>
>Ok with that change.

Updated as requested and submitted.

Alan.



analysisonlycondcheck3.patch
Description: Binary data

[Patch, fortran] PRs 68237 & 66762 - submodule problems

2015-11-20 Thread Paul Richard Thomas

Dear All,

I have committed as 'obvious'  revision 230661 to fix 2/3 submodule
problems. In the case of the third, PR68243, I believe gfortran is
behaving correctly and I am awaiting confirmation from the reporter.

Thanks to Dominique for regtesting the part of the patch that fixes PR66762.

I have bootstrapped and regtested both part on FC21/x86_64.

Cheers

Paul

PS I have just noticed that I attributed the wrong part of the patch
to Steve. I will correct the ChangeLog in just a moment.


2015-11-20  Paul Thomas  

PR fortran/68237
* decl.c (gfc_match_submod_proc): Test the interface symbol
before accessing its attributes.

2015-11-20  Steven G. Kargl  

PR fortran/66762
(gfc_get_symbol_decl): Test for attr.used_in_submodule as well
as attr.use_assoc (twice).
(gfc_create_module_variable): Ditto.

2015-11-20  Paul Thomas  

PR fortran/68237
* gfortran.dg/submodule_12.f90: New test

PR fortran/66762
* gfortran.dg/submodule_6.f90: Add compile option -flto.

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

On Fri, 20 Nov 2015, Tom de Vries wrote:

> On 20/11/15 11:37, Richard Biener wrote:
> >I'd rather make loop_optimizer_init do nothing
> > if requested flags are already set and no fixup is needed
> 
> > Thus sth like
> > 
> > Index: gcc/loop-init.c
> > ===
> > --- gcc/loop-init.c (revision 230649)
> > +++ gcc/loop-init.c (working copy)
> > @@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags)
> > calculate_dominance_info (CDI_DOMINATORS);
> > 
> > if (!needs_fixup)
> > -   checking_verify_loop_structure ();
> > +   {
> > + checking_verify_loop_structure ();
> > + if (loops_state_satisfies_p (flags))
> > +   goto out;
> 
> What about flags that are present in the loops state, but not requested in
> flags? Should we try to clear those flags?

No, I don't think so, that would break in-loop-pipeline LIM, dropping
loop-closed SSA for example.

I agree it's somewhat of an odd behavior but all passes should
either be placed in a sub-pipeline with an outer 
loop_optimizer_init()/finalize () call or call both themselves.

Richard.

> Thanks,
> - Tom
> 
> > +   }
> > 
> > /* Clear all flags.  */
> > if (recorded_exits)
> > @@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags)
> > /* Apply flags to loops.  */
> > apply_loop_flags (flags);
> > 
> > +  checking_verify_loop_structure ();
> > +
> > +out:
> > /* Dump loops.  */
> > flow_loops_dump (dump_file, NULL, 1);
> > 
> > -  checking_verify_loop_structure ();
> > -
> > timevar_pop (TV_LOOP_INIT);
> >   }
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument

On Fri, Nov 20, 2015 at 2:08 PM, Ilya Enkovich  wrote:
> On 19 Nov 18:19, Richard Biener wrote:
>> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt 
>>  wrote:
>> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote:
>> >> Currently we fold all memcpy/memmove calls with a known data size.
>> >> It causes two problems when used with Pointer Bounds Checker.
>> >> The first problem is that we may copy pointers as integer data
>> >> and thus loose bounds.  The second problem is that if we inline
>> >> memcpy, we also have to inline bounds copy and this may result
>> >> in a huge amount of code and significant compilation time growth.
>> >> This patch disables folding for functions we want to instrument.
>> >>
>> >> Does it look reasonable for trunk and GCC5 branch?  Bootstrapped
>> >> and regtested on x86_64-unknown-linux-gnu.
>> >
>> >Can't see anything wrong with it. Ok.
>>
>> But for small sizes this can have a huge impact on optimization.  Which is 
>> why we have the code in the first place.  I'd make the check less broad, for 
>> example inlining copies of size less than a pointer shouldn't be affected.
>
> Right.  We also may inline in case we know no pointers are copied.  Below is 
> a version with extended condition and a couple more tests.  Bootstrapped and 
> regtested on x86_64-unknown-linux-gnu.  Does it OK for trunk and gcc-5-branch?
>
>>
>> Richard.
>>
>> >
>> >Bernd
>>
>>
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-20  Ilya Enkovich  
>
> * gimple-fold.c (gimple_fold_builtin_memory_op): Don't
> fold call if we are going to instrument it and it may
> copy pointers.
>
> gcc/testsuite/
>
> 2015-11-20  Ilya Enkovich  
>
> * gcc.target/i386/mpx/pr68337-1.c: New test.
> * gcc.target/i386/mpx/pr68337-2.c: New test.
> * gcc.target/i386/mpx/pr68337-3.c: New test.
>
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 1ab20d1..dd9f80b 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gomp-constants.h"
>  #include "optabs-query.h"
>  #include "omp-low.h"
> +#include "tree-chkp.h"
> +#include "ipa-chkp.h"
>
>
>  /* Return true when DECL can be referenced from current unit.
> @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
>unsigned int src_align, dest_align;
>tree off0;
>
> +  /* Inlining of memcpy/memmove may cause bounds lost (if we copy
> +pointers as wide integer) and also may result in huge function
> +size because of inlined bounds copy.  Thus don't inline for
> +functions we want to instrument in case pointers are copied.  */
> +  if (flag_check_pointer_bounds
> + && chkp_instrumentable_p (cfun->decl)
> + /* Even if data may contain pointers we can inline if copy
> +less than a pointer size.  */
> + && (!tree_fits_uhwi_p (len)
> + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0)

|| tree_to_uhwi (len) >= POINTER_SIZE_UNITS

> + /* Check data type for pointers.  */
> + && (!TREE_TYPE (src)
> + || !TREE_TYPE (TREE_TYPE (src))
> + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src)))
> + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src)

I don't think you can in any way rely on the pointer type of the src argument
as all pointer conversions are useless and memcpy and friends take void *
anyway.

Note that you also disable memmove to memcpy simplification with this
early check.

Where is pointer transfer handled for MPX?  I suppose it's not done
transparently
for all memory move instructions but explicitely by instrumented block copy
routines in libmpx?  In which case how does that identify pointers vs.
non-pointers?

Richard.

> +   return false;
> +
>/* Build accesses at offset zero with a ref-all character type.  */
>off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
>  ptr_mode, true), 0);
> diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c 
> b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c
> new file mode 100644
> index 000..3f8d79d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c
> @@ -0,0 +1,32 @@
> +/* { dg-do run } */
> +/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
> +
> +#include "mpx-check.h"
> +
> +#define N 2
> +
> +extern void abort ();
> +
> +static int
> +mpx_test (int argc, const char **argv)
> +{
> +  char ** src = (char **)malloc (sizeof (char *) * N);
> +  char ** dst = (char **)malloc (sizeof (char *) * N);
> +  int i;
> +
> +  for (i = 0; i < N; i++)
> +src[i] = __bnd_set_ptr_bounds (argv[0] + i, i + 1);
> +
> +  __builtin_memcpy(dst, src, sizeof (char *) * N);
> +
> +  for (i = 0; i < N; i++)
> +{
> +  char *p = dst[i];
>

Re: [PATCH][ARM] PR 68149 Fix ICE in unaligned_loaddi split

2015-11-20 Thread Ramana Radhakrishnan

On 10/11/15 17:32, Kyrill Tkachov wrote:
> Hi all,
> 
> This ICE in this PR occurs when we're trying to split unaligned_loaddi into 
> two SImode unaligned loads.
> The problem is in the addressing mode.  When reload was picking the 
> addressing mode we accepted an offset of
> -256 because the mode in the pattern is advertised as DImode and that was 
> accepted by the legitimate address
> hooks because they thought it was a NEON load (DImode is in 
> VALID_NEON_DREG_MODE). However, the splitter wants
> to generate two normal SImode unaligned loads using that address, for which 
> -256 is not valid, so we ICE
> in gen_lowpart.
> 
> The only way unaligned_loaddi could be generated was through the 
> gen_movmem_ldrd_strd expansion that implements
> a memmove using LDRD and STRD sequences. If the memmove source is not aligned 
> we can't use LDRDs so the code
> generates unaligned_loaddi patterns and expects them to be split into two 
> normal loads after reload. Similarly
> for unaligned store destinations.
> 
> This patch just explicitly generates the two unaligned SImode loads or stores 
> when appropriate inside
> gen_movmem_ldrd_strd.  This makes the unaligned_loaddi and unaligned_storedi 
> patterns unused, so we can remove them.
> 
> This patch fixes the ICe in gcc.target/aarch64/advsimd-intrinsics/vldX.c seen 
> with
> -mthumb -mcpu=cortex-a15 -mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee
> so no new testcase is added.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2015-11-10  Kyrylo Tkachov  
> 
> PR target/68149
> * config/arm/arm.md (unaligned_loaddi): Delete.
> (unaligned_storedi): Likewise.
> * config/arm/arm.c (gen_movmem_ldrd_strd): Don't generate
> unaligned DImode memory ops.  Instead perform two back-to-back
> unalgined SImode ops.


s/unalgined/unaligned.


Ok.


Ramana

Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument

On 20 Nov 14:54, Richard Biener wrote:
> On Fri, Nov 20, 2015 at 2:08 PM, Ilya Enkovich  wrote:
> > On 19 Nov 18:19, Richard Biener wrote:
> >> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt 
> >>  wrote:
> >> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote:
> >> >> Currently we fold all memcpy/memmove calls with a known data size.
> >> >> It causes two problems when used with Pointer Bounds Checker.
> >> >> The first problem is that we may copy pointers as integer data
> >> >> and thus loose bounds.  The second problem is that if we inline
> >> >> memcpy, we also have to inline bounds copy and this may result
> >> >> in a huge amount of code and significant compilation time growth.
> >> >> This patch disables folding for functions we want to instrument.
> >> >>
> >> >> Does it look reasonable for trunk and GCC5 branch?  Bootstrapped
> >> >> and regtested on x86_64-unknown-linux-gnu.
> >> >
> >> >Can't see anything wrong with it. Ok.
> >>
> >> But for small sizes this can have a huge impact on optimization.  Which is 
> >> why we have the code in the first place.  I'd make the check less broad, 
> >> for example inlining copies of size less than a pointer shouldn't be 
> >> affected.
> >
> > Right.  We also may inline in case we know no pointers are copied.  Below 
> > is a version with extended condition and a couple more tests.  Bootstrapped 
> > and regtested on x86_64-unknown-linux-gnu.  Does it OK for trunk and 
> > gcc-5-branch?
> >
> >>
> >> Richard.
> >>
> >> >
> >> >Bernd
> >>
> >>
> >
> > Thanks,
> > Ilya
> > --
> > gcc/
> >
> > 2015-11-20  Ilya Enkovich  
> >
> > * gimple-fold.c (gimple_fold_builtin_memory_op): Don't
> > fold call if we are going to instrument it and it may
> > copy pointers.
> >
> > gcc/testsuite/
> >
> > 2015-11-20  Ilya Enkovich  
> >
> > * gcc.target/i386/mpx/pr68337-1.c: New test.
> > * gcc.target/i386/mpx/pr68337-2.c: New test.
> > * gcc.target/i386/mpx/pr68337-3.c: New test.
> >
> >
> > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> > index 1ab20d1..dd9f80b 100644
> > --- a/gcc/gimple-fold.c
> > +++ b/gcc/gimple-fold.c
> > @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gomp-constants.h"
> >  #include "optabs-query.h"
> >  #include "omp-low.h"
> > +#include "tree-chkp.h"
> > +#include "ipa-chkp.h"
> >
> >
> >  /* Return true when DECL can be referenced from current unit.
> > @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator 
> > *gsi,
> >unsigned int src_align, dest_align;
> >tree off0;
> >
> > +  /* Inlining of memcpy/memmove may cause bounds lost (if we copy
> > +pointers as wide integer) and also may result in huge function
> > +size because of inlined bounds copy.  Thus don't inline for
> > +functions we want to instrument in case pointers are copied.  */
> > +  if (flag_check_pointer_bounds
> > + && chkp_instrumentable_p (cfun->decl)
> > + /* Even if data may contain pointers we can inline if copy
> > +less than a pointer size.  */
> > + && (!tree_fits_uhwi_p (len)
> > + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0)
> 
> || tree_to_uhwi (len) >= POINTER_SIZE_UNITS
> 
> > + /* Check data type for pointers.  */
> > + && (!TREE_TYPE (src)
> > + || !TREE_TYPE (TREE_TYPE (src))
> > + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src)))
> > + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src)
> 
> I don't think you can in any way rely on the pointer type of the src argument
> as all pointer conversions are useless and memcpy and friends take void *
> anyway.

This check is looking for cases when we have type information indicating
no pointers are copied.  In case of 'void *' we have to assume pointers
are copied and inlining is undesired.  Test pr68337-2.c checks pointer
type allows to enable inlining.  Looks like this check misses
|| !COMPLETE_TYPE_P(TREE_TYPE (TREE_TYPE (src)))?

> 
> Note that you also disable memmove to memcpy simplification with this
> early check.

Doesn't matter for MPX which uses the same implementation for both cases.

> 
> Where is pointer transfer handled for MPX?  I suppose it's not done
> transparently
> for all memory move instructions but explicitely by instrumented block copy
> routines in libmpx?  In which case how does that identify pointers vs.
> non-pointers?

It is handled by instrumentation pass.  Compiler checks type of stored data to
find pointer stores.  Each pointer store is instrumented with bndstx call.

MPX versions of memcpy, memmove etc. don't make any assumptions about
type of copied data and just copy whole chunk of bounds metadata corresponding
to copied block.

Thanks,
Ilya

> 
> Richard.
>

[ptx] overrride anchor hook

2015-11-20 Thread Nathan Sidwell

Jim discovered that he needed to override the anchoring hook when using a PPC 
host-side compiler, but didn't figure out why this was needed.  Digging into it, 
I discovered that flag_section_anchors is cleared in toplev.c by the command 
line option machinery, if there are no anchor target hooks.  However, that's 
done in the host compiler and the LTO machinery simply copies the value over 
into the LTO compiler.   Usually that's fine, as the LTO compiler's for the same 
target architecture.  Except when it's an offload compiler.  The flags are not 
resanitized (perhaps that should be done?)


Anyway, that led to the PTX accelerator compiler trying to do section anchory 
things.  Fixed by overriding TARGET_USE_ANCHORS_FOR_SYMBOL_P to say 'no'.


I guess at the next instance of an offload compiler seeing a 'surprising' 
combination of optimization flags, we'll need a resanitize hook?  Jakub?


nathan
2015-11-20  Nathan Sidwell  
	James Norris  

	* config/nvptx/nvptx.c (nvptx_use_anchors_for_symbol_p): New.
	(TARGET_USE_ANCHORS_FOR_SYMBOL_P): Override.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 230657)
+++ config/nvptx/nvptx.c	(working copy)
@@ -3895,6 +3895,19 @@ nvptx_cannot_copy_insn_p (rtx_insn *insn
   return false;
 }
 }
+
+/* Section anchors do not work.  Initialization for flag_section_anchor
+   probes the existence of the anchoring target hooks and prevents
+   anchoring if they don't exist.  However, we may be being used with
+   a host-side compiler that does support anchoring, and hence see
+   the anchor flag set (as it's not recalculated).  So provide an
+   implementation denying anchoring.  */
+
+static bool
+nvptx_use_anchors_for_symbol_p (const_rtx ARG_UNUSED (a))
+{
+  return false;
+}
 
 /* Record a symbol for mkoffload to enter into the mapping table.  */
 
@@ -4914,6 +4927,9 @@ nvptx_goacc_reduction (gcall *call)
 #undef TARGET_CANNOT_COPY_INSN_P
 #define TARGET_CANNOT_COPY_INSN_P nvptx_cannot_copy_insn_p
 
+#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
+#define TARGET_USE_ANCHORS_FOR_SYMBOL_P nvptx_use_anchors_for_symbol_p
+
 #undef TARGET_INIT_BUILTINS
 #define TARGET_INIT_BUILTINS nvptx_init_builtins
 #undef TARGET_EXPAND_BUILTIN

Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models



On 20/11/15 12:27, James Greenhalgh wrote:

On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote:

On 11/12/2015 09:39 AM, Evandro Menezes wrote:
2015-11-12  Evandro Menezes 

[AArch64] Add attribute for compatibility with ARM pipeline models

gcc/

* config/aarch64/aarch64.md (predicated): Copy attribute from
"arm.md".
* config/arm/arm.md (predicated): Added description.


The arm part is ok too. It's just a comment.
In the ChangeLog entry for arm.md I'd say "Add description."

Thanks,
Kyrill


Please, commit if it's alright.

The AArch64 part of this is OK.


 From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:11:16 -0600
Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
  pipeline models

gcc/
* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
* config/arm/arm.md (predicated): Added description.
---
  gcc/config/aarch64/aarch64.md | 4 
  gcc/config/arm/arm.md | 3 +++
  2 files changed, 7 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1586256..d46f837 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -195,6 +195,10 @@
  ;; 1 :=: yes
  (define_attr "far_branch" "" (const_int 0))
  
+;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has

+;; no predicated insns.
+(define_attr "predicated" "yes,no" (const_string "no"))
+
  ;; ---
  ;; Pipeline descriptions and scheduling
  ;; ---
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 73c3088..6bda491 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -105,6 +105,9 @@
  (define_attr "fpu" "none,vfp"
(const (symbol_ref "arm_fpu_attr")))
  
+; Predicated means that the insn form is conditionally executed based on a

+; predicate.  We default to 'no' because no Thumb patterns match this rule
+; and not all ARM insns do.

s/is conditionally executed/can be conditionally executed/ in the first
sentence. Otherwise, this looks OK to me but I can't approve the ARM part,
so you'll need to wait for a review from someone who can.

Thanks,
James


  (define_attr "predicated" "yes,no" (const_string "no"))
  
  ; LENGTH of an instruction (in bytes)

--
2.1.0.243.g30d45f7

Re: [Committed] S/390: Add bswaphi2 pattern

2015-11-20 Thread Andreas Krebbel

On 11/20/2015 01:23 PM, Richard Henderson wrote:
> On 11/20/2015 12:52 PM, Andreas Krebbel wrote:
>> +(define_insn "bswaphi2"
>> +  [(set (match_operand:HI 0   "register_operand" "=d")
>> +(bswap:HI (match_operand:HI 1 "memory_operand"   "RT")))]
>> +  "TARGET_CPU_ZARCH"
>> +  "lrvh\t%0,%1"
>> +  [(set_attr "type" "load")
>> +   (set_attr "op_type" "RXY")
>> +   (set_attr "z10prop" "z10_super")])
> 
> Surely it's better to arrange so that you can use STRVH as well.
> And providing a fallback for the reg-reg case (e.g. LRVR+SRL).
> 
> Although I suppose I don't see support for STRV in bswap32/64 either...

Right, I totally forgot about the stores. I'll have a look.

We even have a mem-mem variant (mvcin). But I found it rather difficult to use 
since the pattern
would have to make sure that source and destination do not overlap.

-Andreas-

Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF

On 20 Nov 14:31, Ilya Enkovich wrote:
> 2015-11-20 14:28 GMT+03:00 Richard Biener :
> > On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich  
> > wrote:
> >> 2015-11-18 16:44 GMT+03:00 Richard Biener :
> >>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich  
> >>> wrote:
>  Hi,
> 
>  When we compute vectypes we skip non-relevant phi nodes.  But we process 
>  non-relevant alive statements and thus may need vectype of non-relevant 
>  live phi node to compute mask vectype.  This patch enables vectype 
>  computation for live phi nodes.  Botostrapped and regtested on 
>  x86_64-unknown-linux-gnu.  OK for trunk?
> >>>
> >>> Hmm.  What breaks if you instead skip all !relevant stmts and not
> >>> compute vectype for life but not relevant ones?  We won't ever
> >>> "vectorize" !relevant ones, that is, we don't need their vector type.
> >>
> >> I tried it and got regression in SLP.  It expected non-null vectype
> >> for non-releveant but live statement. Regression was in
> >> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90
> >
> > Because somebody put a vector type check before
> >
> >   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > return false;
> >
> > @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g
> >tree mask_type;
> >tree mask;
> >
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > +return false;
> > +
> >if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> >  return false;
> >
> > @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g
> >  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
> >
> >gcc_assert (ncopies >= 1);
> > -  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> > -return false;
> >
> >if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
> >&& !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
> >
> > fixes this particular fallout for me.
> 
> I'll try it.

With this fix it works fine, thanks!  Bootstrapped and regtested on 
x86_64-unknown-linux-gnu.  OK for trunk?

Ilya
--
gcc/

2015-11-20  Ilya Enkovich  
Richard Biener  

* tree-vect-loop.c (vect_determine_vectorization_factor): Don't
compute vectype for non-relevant mask producers.
* gcc/tree-vect-stmts.c (vectorizable_comparison): Check stmt
relevance earlier.

gcc/testsuite/

2015-11-20  Ilya Enkovich  

* gcc.dg/pr68327.c: New test.


diff --git a/gcc/testsuite/gcc.dg/pr68327.c b/gcc/testsuite/gcc.dg/pr68327.c
new file mode 100644
index 000..c3e6a94
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68327.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, d;
+char b, c;
+
+void
+fn1 ()
+{
+  int i = 0;
+  for (; i < 1; i++)
+d = 1;
+  for (; b; b++)
+a = 1 && (d & b);
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 80937ec..592372d 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -439,7 +439,8 @@ vect_determine_vectorization_factor (loop_vec_info 
loop_vinfo)
 compute a factor.  */
  if (TREE_CODE (scalar_type) == BOOLEAN_TYPE)
{
- mask_producers.safe_push (stmt_info);
+ if (STMT_VINFO_RELEVANT_P (stmt_info))
+   mask_producers.safe_push (stmt_info);
  bool_result = true;
 
  if (gimple_code (stmt) == GIMPLE_ASSIGN
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 0f64aaf..3723b26 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7546,6 +7546,9 @@ vectorizable_comparison (gimple *stmt, 
gimple_stmt_iterator *gsi,
   tree mask_type;
   tree mask;
 
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
   if (!VECTOR_BOOLEAN_TYPE_P (vectype))
 return false;
 
@@ -7558,9 +7561,6 @@ vectorizable_comparison (gimple *stmt, 
gimple_stmt_iterator *gsi,
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
 
   gcc_assert (ncopies >= 1);
-  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
-return false;
-
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
   && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
   && reduc_def))

Re: [PATCH] fix vectorizer performance problem on cygwin hosted cross compiler

On Fri, Nov 20, 2015 at 8:21 AM, Jim Wilson  wrote:
> A cygwin hosted cross compiler to aarch64-linux, compiling a C version
> of linpack with -Ofast, produces code that runs 17% slower than a
> linux hosted compiler.  The problem shows up in the vect dump, where
> some different vectorization optimization decisions were made by the
> cygwin compiler than the linux compiler.  That happened because
> tree-vect-data-refs.c calls qsort in vect_analyze_data_ref_accesses,
> and the newlib and glibc qsort routines sort the list differently.  I
> can reproduce the same problem on linux by adding the newlib qsort
> sources to a gcc build.  For an x86_64 target, I see about a 30%
> performance loss using the newlib qsort.
>
> The qsort trouble turns out to be a problem in the qsort comparison
> function, dr_group_sort_cmp.  It does this
>   if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0))
> {
>   cmp = compare_tree (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb));
>   if (cmp != 0)
> return cmp;
> }
> operand_equal_p calls STRIP_NOPS, so it will consider two trees to be
> the same even if they have NOP_EXPR.  However, compare_tree is not
> calling STRIP_NOPS, so it handles trees with NOP_EXPRs differently
> than trees without.  The result is that depending on which array entry
> gets used as the qsort pivot point, you can get very different sorts.
> The newlib qsort happens to be accidentally choosing a bad pivot for
> this testcase.  The glibc qsort happens to be accidentally choosing a
> good pivot for this testcase.  This then triggers a cascading problem
> in vect_analyze_data_ref_accesses which assumes that array entries
> that pass the operand_equal_p test for the base address will end up
> adjacent, and will only vectorize in that case.
>
> For a contrived example, suppose we have four entries to sort: (plus Y
> 8), (mult A 4), (pointer_plus Z 16), and (nop (mult A 4)).  Suppose we
> choose the mult as the pivot point. The plus sorts before because
> tree_code plus is less than mult. The pointer_plus sorts after for the
> same reason. The nop sorts equal. So we end up with plus, mult, nop,
> pointer_plus. The mult and nop are then combined into the same
> vectorization group.  Now suppose we choose the pointer_plus as the
> pivot point. The plus and mult sort before. The nop sorts after. The
> final result is plus, mult, pointer_plus, nop. And we fail to
> vectorize as the mult and nop are not adjacent as they should be.
>
> When I modify compare_tree to call STRIP_NOPS, this problem goes away.
> I get the same sort from both the newlib and glibc qsort functions,
> and I get the same linpack performance from a cygwin hosted compiler
> and a linux hosted compiler.
>
> This patch was tested with an x86_64 bootstrap and make check.  There
> were no regressions.  I've also done a SPEC CPU2000 run with and
> without the patch on aarch64-linux, there is no performance change.
> And I've verified it by building linpack for aarch64-linux with cygwin
> hosted cross compiler, x86_64 hosted cross compiler, and an aarch64
> native compiler.

Ok.

Thanks,
Richard.

> Jim

Re: Add uaddv4_optab, usubv4_optab

> Eric has just submitted a documentation path that documented the
> {add,sub,mul,umul}v4 and negv3 patterns, so this should be
> applied on top of that.

OK, I'm going to apply it, thanks.  Note that the comment at the beginning 
of expand_addsub_overflow describing the overall strategy ought to be adjusted 
if new patterns for the jump on carry are added.

-- 
Eric Botcazou

[committed, trivial] Fix typo and trailing whitespace in dump-file strings in parloops


[ was: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def ]

On 18/11/15 17:22, Bernhard Reutner-Fischer wrote:

Bonus points for fixing the dump_file to parse in:


>Parloops will fail because:
>...
>phi is n_2 = PHI 
>arg of phi to exit: value n_4(D) used outside loop
>checking if it a part of reduction pattern:

s/it a/it is/



This patch fixes a typo and trailing whitespace in dump-file strings in 
parloops.


Build for c and fortran, tested -fdump-tree-parloops testcases.

Committed to trunk as trivial.

Thanks,
- Tom
Fix typo and trailing whitespace in dump-file strings in parloops

2015-11-19  Tom de Vries  

	* tree-parloops.c (build_new_reduction): Fix trailing whitespace in
	dump-file string.
	(try_create_reduction_list): Same.  Fix typo in dump-file string.

---
 gcc/tree-parloops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 8d7912d..aca2370 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2383,7 +2383,7 @@ build_new_reduction (reduction_info_table_type *reduction_list,
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file,
-	   "Detected reduction. reduction stmt is: \n");
+	   "Detected reduction. reduction stmt is:\n");
   print_gimple_stmt (dump_file, reduc_stmt, 0, 0);
   fprintf (dump_file, "\n");
 }
@@ -2564,7 +2564,7 @@ try_create_reduction_list (loop_p loop,
 	  print_generic_expr (dump_file, val, 0);
 	  fprintf (dump_file, " used outside loop\n");
 	  fprintf (dump_file,
-		   "  checking if it a part of reduction pattern:  \n");
+		   "  checking if it is part of reduction pattern:\n");
 	}
 	  if (reduction_list->elements () == 0)
 	{

Add uaddv4_optab, usubv4_optab


Toward fixing PR68385.  I'm just starting a full round of testing, but

extern void underflow(void) __attribute__((noreturn));
unsigned sub1(unsigned a, unsigned b)
{
unsigned r = a - b;
if (r > a) underflow();
return r;
}

unsigned sub2(unsigned a, unsigned b)
{
unsigned r;
if (__builtin_sub_overflow(a, b, )) underflow();
return r;
}


sub1:
movl%edi, %eax
subl%esi, %eax
cmpl%eax, %edi
jb  .L7
rep ret
...
sub2:
movl%edi, %eax
subl%esi, %eax
jb  .L16
rep ret
...


r~
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4c5e22a..10a004e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6156,6 +6156,22 @@
  (const_string "4")]
  (const_string "")))])
 
+(define_expand "uaddv4"
+  [(parallel [(set (reg:CCC FLAGS_REG)
+  (compare:CCC
+(plus:SWI (match_dup 1) (match_dup 2))
+(match_dup 1)))
+ (set (match_dup 0)
+  (plus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (ne (reg:CCC FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  ""
+{
+  ix86_fixup_binary_operands_no_copy (PLUS, mode, operands);
+})
+
 ;; The lea patterns for modes less than 32 bits need to be matched by
 ;; several insns converted to real lea by splitters.
 
@@ -6461,6 +6477,20 @@
  (const_string "4")]
  (const_string "")))])
 
+(define_expand "usubv4"
+  [(parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 0)
+  (minus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (ltu (reg:CC FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  ""
+{
+  ix86_fixup_binary_operands_no_copy (MINUS, mode, operands);
+})
+
 (define_insn "*sub_3"
   [(set (reg FLAGS_REG)
(compare (match_operand:SWI 1 "nonimmediate_operand" "0,0")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8b2deaa..9386e62 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4912,6 +4912,25 @@ address calculations.  @code{add@var{m}3} is used if
 @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3}
 Similar, for other arithmetic operations.
 
+@cindex @code{addv@var{m}4} instruction pattern
+@item @samp{addv@var{m}4}
+Add operand 2 and operand 1, storing the result in operand 0.  If signed
+overflow occurs during the addition, jump to the label in operand 3.
+
+@cindex @code{subv@var{m}4} instruction pattern
+@cindex @code{mulv@var{m}4} instruction pattern
+@item @samp{subv@var{m}4}, @samp{mulv@var{m}4}
+Similar, for other signed arithmetic operations.
+
+@cindex @code{uaddv@var{m}4} instruction pattern
+@item @samp{uaddv@var{m}4}
+Like @code{addv@var{m}4}, except jump on unsigned overflow.
+
+@cindex @code{usubv@var{m}4} instruction pattern
+@cindex @code{umulv@var{m}4} instruction pattern
+@item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
+Similar, for other unsigned arithmetic operations.
+
 @cindex @code{fma@var{m}4} instruction pattern
 @item @samp{fma@var{m}4}
 Multiply operand 2 and operand 1, then add operand 3, storing the
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index bc77bdc..b15657f 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -546,6 +546,33 @@ expand_addsub_overflow (location_t loc, tree_code code, 
tree lhs,
   /* u1 +- u2 -> ur  */
   if (uns0_p && uns1_p && unsr_p)
 {
+  insn_code icode = optab_handler (code == PLUS_EXPR ? uaddv4_optab
+   : usubv4_optab, mode);
+  if (icode != CODE_FOR_nothing)
+   {
+ struct expand_operand ops[4];
+ rtx_insn *last = get_last_insn ();
+
+ res = gen_reg_rtx (mode);
+ create_output_operand ([0], res, mode);
+ create_input_operand ([1], op0, mode);
+ create_input_operand ([2], op1, mode);
+ create_fixed_operand ([3], do_error);
+ if (maybe_expand_insn (icode, 4, ops))
+   {
+ last = get_last_insn ();
+ if (profile_status_for_fn (cfun) != PROFILE_ABSENT
+ && JUMP_P (last)
+ && any_condjump_p (last)
+ && !find_reg_note (last, REG_BR_PROB, 0))
+   add_int_reg_note (last, REG_BR_PROB, PROB_VERY_UNLIKELY);
+ emit_jump (done_label);
+ goto do_error_label;
+   }
+
+ delete_insns_since (last);
+   }
+
   /* Compute the operation.  On RTL level, the addition is always
 unsigned.  */
   res = expand_binop (mode, code == PLUS_EXPR ? add_optab : sub_optab,
@@ -737,92 +764,88 @@ expand_addsub_overflow (location_t loc, tree_code code, 
tree lhs,
   gcc_assert (!uns0_p && !uns1_p && !unsr_p);
 
   /* s1 +- s2 ->

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

On Thu, 19 Nov 2015, Tom de Vries wrote:

> On 17/11/15 15:53, Tom de Vries wrote:
> > > And the above LIM example
> > > is none for why you need two LIM passes...
> > 
> > Indeed. I'm planning a separate reply to explain in more detail the need
> > for the two pass_lims.
> 
> I.
> 
> I managed to get rid of the two pass_lims for the motivating example that I
> used until now (goacc/kernels-double-reduction.c). I found that by adding a
> pass_dominator instance after pass_ch, I could get rid of the second pass_lim
> (and pass_copyprop as well).
> 
> But... then I wrote a counter example (goacc/kernels-double-reduction-n.c),
> and I'm back at two pass_lims (and two pass_dominators).
> Also I've split the pass group into a bit before and after pass_fre.
> 
> So, the current pass group looks like:
> ...
> NEXT_PASS (pass_build_ealias);
> 
> /* Pass group that runs when the function is an offloaded function
>containing oacc kernels loops.  Part 1.  */
> NEXT_PASS (pass_oacc_kernels);
> PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> /* We need pass_ch here, because pass_lim has no effect on
>exit-first loops (PR65442).  Ideally we want to remove both
>this pass instantiation, and the reverse transformation
>transform_to_exit_first_loop_alt, which is done in
>pass_parallelize_loops_oacc_kernels. */
> NEXT_PASS (pass_ch);
> POP_INSERT_PASSES ()
> 
> NEXT_PASS (pass_fre);
> 
> /* Pass group that runs when the function is an offloaded function
>containing oacc kernels loops.  Part 2.  */
> NEXT_PASS (pass_oacc_kernels2);
> PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels2)
> /* We use pass_lim to rewrite in-memory iteration and reduction
>variable accesses in loops into local variables accesses.  */
> NEXT_PASS (pass_lim);
> NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> NEXT_PASS (pass_lim);
> NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
> NEXT_PASS (pass_dce);
> NEXT_PASS (pass_parallelize_loops_oacc_kernels);
> NEXT_PASS (pass_expand_omp_ssa);
> POP_INSERT_PASSES ()
> NEXT_PASS (pass_merge_phi);
> ...
> 
> 
> II.
> 
> The motivating test-case kernels-double-reduction-n.c:
> ...
> #include 
> 
> #define N 500
> 
> unsigned int a[N][N];
> 
> void  __attribute__((noinline,noclone))
> foo (unsigned int n)
> {
>   int i, j;
>   unsigned int sum = 1;
> 
> #pragma acc kernels copyin (a[0:n]) copy (sum)
>   {
> for (i = 0; i < n; ++i)
>   for (j = 0; j < n; ++j)
> sum += a[i][j];
>   }
> 
>   if (sum != 5001)
> abort ();
> }
> ...
> 
> 
> III.
> 
> Before first pass_lim. Note no phis on inner or outer loop header for
> iteration varables or reduction variable:
> ...
>   :
>   _5 = *.omp_data_i_4(D).i;
>   *_5 = 0;
>   _44 = *.omp_data_i_4(D).n;
>   _45 = *_44;
>   if (_45 != 0)
> goto ;
>   else
> goto ;
> 
>   : outer loop header
>   _12 = *.omp_data_i_4(D).j;
>   *_12 = 0;
>   if (_45 != 0)
> goto ;
>   else
> goto ;
> 
>   : inner loop header, latch
>   _19 = *.omp_data_i_4(D).a;
>   _21 = *_5;
>   _23 = *_12;
>   _24 = *_19[_21][_23];
>   _25 = *.omp_data_i_4(D).sum;
>   sum.0_26 = *_25;
>   sum.1_27 = _24 + sum.0_26;
>   *_25 = sum.1_27;
>   _33 = _23 + 1;
>   *_12 = _33;
>   j.2_16 = (unsigned int) _33;
>   if (j.2_16 < _45)
> goto ;
>   else
> goto ;
> 
>   : outer loop latch
>   _36 = *_5;
>   _38 = _36 + 1;
>   *_5 = _38;
>   i.3_9 = (unsigned int) _38;
>   if (i.3_9 < _45)
> goto ;
>   else
> goto ;
> 
>   :
>   return;
> ...
> 
> 
> IV.
> 
> After first pass_lim/pass_dom pair. Note there are phis on the inner loop
> header for the reduction and the iteration variable, but not on the outer loop
> header:
> ...
>   :
>   _5 = *.omp_data_i_4(D).i;
>   *_5 = 0;
>   _44 = *.omp_data_i_4(D).n;
>   _45 = *_44;
>   if (_45 != 0)
> goto ;
>   else
> goto ;
> 
>   :
>   _12 = *.omp_data_i_4(D).j;
>   _19 = *.omp_data_i_4(D).a;
>   D__lsm.10_50 = *_12;
>   D__lsm.11_51 = 0;
>   _25 = *.omp_data_i_4(D).sum;
> 
>   : outer loop header
>   D__lsm.10_20 = 0;
>   D__lsm.11_22 = 1;
>   _21 = *_5;
>   D__lsm.12_28 = *_25;
>   D__lsm.13_30 = 0;
>   goto ;
> 
>   : inner loop header, latch
>   # D__lsm.10_47 = PHI <0(5), _33(7)>
>   # D__lsm.12_49 = PHI 
>   _23 = D__lsm.10_47;
>   _24 = *_19[_21][D__lsm.10_47];
>   sum.0_26 = D__lsm.12_49;
>   sum.1_27 = _24 + D__lsm.12_49;
>   D__lsm.12_31 = sum.1_27;
>   D__lsm.13_32 = 1;
>   _33 = D__lsm.10_47 + 1;
>   D__lsm.10_14 = _33;
>   D__lsm.11_15 = 1;
>   j.2_16 = (unsigned int) _33;
>   if (j.2_16 < _45)
> goto ;
>   else
> goto ;
> 
>   : outer loop latch
>   # D__lsm.10_35 = PHI <_33(7)>
>   # D__lsm.11_37 = PHI <1(7)>
>   # D__lsm.12_7 = PHI 
>   # D__lsm.13_8 = PHI <1(7)>
>   *_25 = sum.1_27;
>   _36 = *_5;
>   _38 = _36 + 1;
>   *_5 = _38;
>   i.3_9 = (unsigned int) _38;
>   if (i.3_9 < _45)
> goto ;
>   else
> goto ;
> 
>   :
>   #

Re: Add uaddv4_optab, usubv4_optab

On Fri, Nov 20, 2015 at 11:27:48AM +0100, Richard Henderson wrote:
> Toward fixing PR68385.  I'm just starting a full round of testing, but
> 
> extern void underflow(void) __attribute__((noreturn));
> unsigned sub1(unsigned a, unsigned b)
> {
> unsigned r = a - b;
> if (r > a) underflow();
> return r;
> }
> 
> unsigned sub2(unsigned a, unsigned b)
> {
> unsigned r;
> if (__builtin_sub_overflow(a, b, )) underflow();
> return r;
> }
> 
> 
> sub1:
>   movl%edi, %eax
>   subl%esi, %eax
>   cmpl%eax, %edi
>   jb  .L7
>   rep ret
> ...
> sub2:
>   movl%edi, %eax
>   subl%esi, %eax
>   jb  .L16
>   rep ret
> ...

That looks good.

> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6156,6 +6156,22 @@
> (const_string "4")]
> (const_string "")))])
>  
> +(define_expand "uaddv4"
> +  [(parallel [(set (reg:CCC FLAGS_REG)
> +(compare:CCC
> +  (plus:SWI (match_dup 1) (match_dup 2))
> +  (match_dup 1)))
> +   (set (match_dup 0)
> +(plus:SWI (match_dup 1) (match_dup 2)))])
> +   (set (pc) (if_then_else
> +(ne (reg:CCC FLAGS_REG) (const_int 0))
> +(label_ref (match_operand 3))
> +(pc)))]
> +  ""
> +{
> +  ix86_fixup_binary_operands_no_copy (PLUS, mode, operands);
> +})

Do we need this one on i?86?  I'm not against adding it to optabs, so that
other targets have a way to improve that, but doesn't combine handle this
case on i?86 already well?
I've been thinking of only transforming the above sub1 code (in forwprop, as
richi suggested) to sub2 internal call + REALPART/IMAGPART extraction if
the corresponding optab exists.

> +
>  ;; The lea patterns for modes less than 32 bits need to be matched by
>  ;; several insns converted to real lea by splitters.
>  
> @@ -6461,6 +6477,20 @@
> (const_string "4")]
> (const_string "")))])
>  
> +(define_expand "usubv4"
> +  [(parallel [(set (reg:CC FLAGS_REG)
> +(compare:CC (match_dup 1) (match_dup 2)))
> +   (set (match_dup 0)
> +(minus:SWI (match_dup 1) (match_dup 2)))])
> +   (set (pc) (if_then_else
> +(ltu (reg:CC FLAGS_REG) (const_int 0))
> +(label_ref (match_operand 3))
> +(pc)))]

If this works, it will be nice, I thought we'll need a new CC*mode.

> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4912,6 +4912,25 @@ address calculations.  @code{add@var{m}3} is used if
>  @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3}
>  Similar, for other arithmetic operations.
>  
> +@cindex @code{addv@var{m}4} instruction pattern
> +@item @samp{addv@var{m}4}
> +Add operand 2 and operand 1, storing the result in operand 0.  If signed
> +overflow occurs during the addition, jump to the label in operand 3.
> +
> +@cindex @code{subv@var{m}4} instruction pattern
> +@cindex @code{mulv@var{m}4} instruction pattern
> +@item @samp{subv@var{m}4}, @samp{mulv@var{m}4}
> +Similar, for other signed arithmetic operations.
> +
> +@cindex @code{uaddv@var{m}4} instruction pattern
> +@item @samp{uaddv@var{m}4}
> +Like @code{addv@var{m}4}, except jump on unsigned overflow.
> +
> +@cindex @code{usubv@var{m}4} instruction pattern
> +@cindex @code{umulv@var{m}4} instruction pattern
> +@item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
> +Similar, for other unsigned arithmetic operations.

Eric has just submitted a documentation path that documented the
{add,sub,mul,umul}v4 and negv3 patterns, so this should be
applied on top of that.

Jakub

Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF

2015-11-20 14:28 GMT+03:00 Richard Biener :
> On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich  wrote:
>> 2015-11-18 16:44 GMT+03:00 Richard Biener :
>>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich  
>>> wrote:
 Hi,

 When we compute vectypes we skip non-relevant phi nodes.  But we process 
 non-relevant alive statements and thus may need vectype of non-relevant 
 live phi node to compute mask vectype.  This patch enables vectype 
 computation for live phi nodes.  Botostrapped and regtested on 
 x86_64-unknown-linux-gnu.  OK for trunk?
>>>
>>> Hmm.  What breaks if you instead skip all !relevant stmts and not
>>> compute vectype for life but not relevant ones?  We won't ever
>>> "vectorize" !relevant ones, that is, we don't need their vector type.
>>
>> I tried it and got regression in SLP.  It expected non-null vectype
>> for non-releveant but live statement. Regression was in
>> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90
>
> Because somebody put a vector type check before
>
>   if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> return false;
>
> @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g
>tree mask_type;
>tree mask;
>
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> +return false;
> +
>if (!VECTOR_BOOLEAN_TYPE_P (vectype))
>  return false;
>
> @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g
>  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
>
>gcc_assert (ncopies >= 1);
> -  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> -return false;
>
>if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
>&& !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
>
> fixes this particular fallout for me.

I'll try it.

Thanks,
Ilya

>
> Richard.
>
>> Ilya
>>
>>>
>>> Richard.
>>>
 Thanks,
 Ilya

Re: [PATCH 02/15] Add selftests to bitmap.c

On Thu, Nov 19, 2015 at 6:04 PM, David Malcolm  wrote:
> Jeff pre-approved the plugin version of this (as a new
> file unittests/test-bitmap.c):
>   https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03284.html
> with:
>> OK if/when prereqs are approved.  Minor twiddling if we end up moving it
>> elsewhere or standardizing/reducing header files is pre-approved.
>
> This version moves them to bitmap.c
>
> One issue: how to express:
>   TEST (bitmap_test, gc_alloc)
> in a ChangeLog entry.
>
> I've chosen to write it as (bitmap_test, gc_alloc) since that
> has the greatest chance of being found via grep.

I think overloading CHECKING_P for this is bogus.  Adding a new
UNIT_TEST_FILE (like GENERATOR_FILE) would be better.  Also
selftest.h should be only conditionally included (before the tests themselves?)

Richard.

> gcc/ChangeLog:
> * bitmap.c: Include "selftest.h".
> (bitmap_test, gc_alloc): New selftest.
> (bitmap_test, set_range): New selftest.
> (bitmap_test, clear_bit_in_middle): New selftest.
> (bitmap_test, copying): New selftest.
> (bitmap_test, bitmap_single_bit_set_p): New selftest.
> ---
>  gcc/bitmap.c | 92 
> 
>  1 file changed, 92 insertions(+)
>
> diff --git a/gcc/bitmap.c b/gcc/bitmap.c
> index f04b8f9..e6f772e 100644
> --- a/gcc/bitmap.c
> +++ b/gcc/bitmap.c
> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "system.h"
>  #include "coretypes.h"
>  #include "bitmap.h"
> +#include "selftest.h"
>
>  /* Memory allocation statistics purpose instance.  */
>  mem_alloc_description bitmap_mem_desc;
> @@ -2094,5 +2095,96 @@ debug (const bitmap_head *ptr)
>  fprintf (stderr, "\n");
>  }
>
> +#if CHECKING_P
> +namespace {
> +
> +/* Freshly-created bitmaps ought to be empty.  */
> +
> +TEST (bitmap_test, gc_alloc)
> +{
> +  bitmap b = bitmap_gc_alloc ();
> +  EXPECT_TRUE (bitmap_empty_p (b));
> +}
> +
> +/* Verify bitmap_set_range.  */
> +
> +TEST (bitmap_test, set_range)
> +{
> +  bitmap b = bitmap_gc_alloc ();
> +  EXPECT_TRUE (bitmap_empty_p (b));
> +
> +  bitmap_set_range (b, 7, 5);
> +  EXPECT_FALSE (bitmap_empty_p (b));
> +  EXPECT_EQ (5, bitmap_count_bits (b));
> +
> +  /* Verify bitmap_bit_p at the boundaries.  */
> +  EXPECT_FALSE (bitmap_bit_p (b, 6));
> +  EXPECT_TRUE (bitmap_bit_p (b, 7));
> +  EXPECT_TRUE (bitmap_bit_p (b, 11));
> +  EXPECT_FALSE (bitmap_bit_p (b, 12));
> +}
> +
> +/* Verify splitting a range into two pieces using bitmap_clear_bit.  */
> +
> +TEST (bitmap_test, clear_bit_in_middle)
> +{
> +  bitmap b = bitmap_gc_alloc ();
> +
> +  /* Set b to [100..200].  */
> +  bitmap_set_range (b, 100, 100);
> +  EXPECT_EQ (100, bitmap_count_bits (b));
> +
> +  /* Clear a bit in the middle.  */
> +  bool changed = bitmap_clear_bit (b, 150);
> +  EXPECT_TRUE (changed);
> +  EXPECT_EQ (99, bitmap_count_bits (b));
> +  EXPECT_TRUE (bitmap_bit_p (b, 149));
> +  EXPECT_FALSE (bitmap_bit_p (b, 150));
> +  EXPECT_TRUE (bitmap_bit_p (b, 151));
> +}
> +
> +/* Verify bitmap_copy.  */
> +
> +TEST (bitmap_test, copying)
> +{
> +  bitmap src = bitmap_gc_alloc ();
> +  bitmap_set_range (src, 40, 10);
> +
> +  bitmap dst = bitmap_gc_alloc ();
> +  EXPECT_FALSE (bitmap_equal_p (src, dst));
> +  bitmap_copy (dst, src);
> +  EXPECT_TRUE (bitmap_equal_p (src, dst));
> +
> +  /* Verify that we can make them unequal again...  */
> +  bitmap_set_range (src, 70, 5);
> +  EXPECT_FALSE (bitmap_equal_p (src, dst));
> +
> +  /* ...and that changing src after the copy didn't affect
> + the other: */
> +  EXPECT_FALSE (bitmap_bit_p (dst, 70));
> +}
> +
> +/* Verify bitmap_single_bit_set_p.  */
> +TEST (bitmap_test, bitmap_single_bit_set_p)
> +{
> +  bitmap b = bitmap_gc_alloc ();
> +
> +  EXPECT_FALSE (bitmap_single_bit_set_p (b));
> +
> +  bitmap_set_range (b, 42, 1);
> +  EXPECT_TRUE (bitmap_single_bit_set_p (b));
> +  EXPECT_EQ (42, bitmap_first_set_bit (b));
> +
> +  bitmap_set_range (b, 1066, 1);
> +  EXPECT_FALSE (bitmap_single_bit_set_p (b));
> +  EXPECT_EQ (42, bitmap_first_set_bit (b));
> +
> +  bitmap_clear_range (b, 0, 100);
> +  EXPECT_TRUE (bitmap_single_bit_set_p (b));
> +  EXPECT_EQ (1066, bitmap_first_set_bit (b));
> +}
> +
> +}  // anon namespace
> +#endif /* CHECKING_P */
>
>  #include "gt-bitmap.h"
> --
> 1.8.5.3
>

Re: [PATCH 2/4][AArch64] Increase the loop peeling limit

On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote:
> On 11/05/2015 02:51 PM, Evandro Menezes wrote:
> >2015-11-05  Evandro Menezes 
> >
> >   gcc/
> >
> >   * config/aarch64/aarch64.c (aarch64_override_options_internal):
> >   Increase loop peeling limit.
> >
> >This patch increases the limit for the number of peeled insns.
> >With this change, I noticed no major regression in either
> >Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP
> >ones, improved significantly.
> >
> >I tested this tuning on Exynos M1 and on A57.  ThunderX seems to
> >benefit from this tuning too.  However, I'd appreciate comments
> >from other stakeholders.
> 
> Ping.

I'd like to leave this for a call from the port maintainers. I can see why
this leads to more opportunities for vectorization, but I'm concerned about
the wider impact on code size. Certainly I wouldn't expect this to be our
default at -O2 and below.

My gut feeling is that this doesn't really belong in the back-end (there are
presumably good reasons why the default for this parameter across GCC has
fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd
like Marcus or Richard to make the call as to whether or not we take this
patch.

For now, I'd drop it from the series (it stands alone anyway).

Thanks,
James

Re: [PATCH, PR68373 ] Call scev_const_prop in pass_parallelize_loops::execute

On Thu, 19 Nov 2015, Tom de Vries wrote:

> On 17/11/15 23:20, Tom de Vries wrote:
> > [ was: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def ]
> > 
> > Hi,
> > 
> > Consider test-case test.c, with a use of the final value of the
> > iteration variable (return i):
> > ...
> > unsigned int
> > foo (int *a, unsigned int n)
> > {
> >unsigned int i;
> >for (i = 0; i < n; ++i)
> >  a[i] = 1;
> > 
> >return i;
> > }
> > ...
> > 
> > Compiled with:
> > ...
> > $ gcc -S -O2 test.c -ftree-parallelize-loops=2 -fdump-tree-all-details
> > ...
> > 
> > Before parloops, we have:
> > ...
> >   :
> ># i_12 = PHI <0(3), i_10(5)>
> >_5 = (long unsigned int) i_12;
> >_6 = _5 * 4;
> >_8 = a_7(D) + _6;
> >*_8 = 1;
> >i_10 = i_12 + 1;
> >if (n_4(D) > i_10)
> >  goto ;
> >else
> >  goto ;
> > 
> >:
> >goto ;
> > 
> >:
> ># i_14 = PHI 
> > ...
> > 
> > Parloops will fail because:
> > ...
> > phi is n_2 = PHI 
> > arg of phi to exit:   value n_4(D) used outside loop
> >checking if it a part of reduction pattern:
> >FAILED: it is not a part of reduction
> > ...
> > [ note that the phi looks slightly different. In
> > gather_scalar_reductions -> vect_analyze_loop_form ->
> > vect_analyze_loop_form_1 -> split_loop_exit_edge we split the edge from
> > bb4 to bb6. ]
> > 
> > This patch uses scev_const_prop at the start of parloops.
> > scev_const_prop first also splits the exit edge, and then replaces the
> > phi with a assignment:
> > ...
> >   final value replacement:
> >n_2 = PHI 
> >with
> >n_2 = n_4(D);
> > ...
> > 
> > This allows parloops to succeed.
> > 
> > And there's a similar story when we compile with -fno-tree-scev-cprop in
> > addition.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > OK for stage3/stage1?
> 
> The patch has been updated to do the final value replacement only for the loop
> that parloops is processing, as suggested in review comment at
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02166.html .
> 
> That means the patch is now also required for the kernels patch series.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage 3 trunk?

Ok.  Please mention tree-optimization/68373 in the changelog.

Thanks,
Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [patch] Document new overflow arithmetics patterns

On Fri, Nov 20, 2015 at 11:12:06AM +0100, Eric Botcazou wrote:
> Hi,
> 
> this documents the new overflow arithmetics patterns added recently (addv4, 
> subv4, mulv4, umulv4, negv3) and only them, i.e. the old ones are still not.
> This also fixes the description of the cbranch and jump patterns, which were 
> referring to a label_ref instead of a code_label.
> 
> Tested with 'make doc' on x86-64/Linux, OK for mainline and 5 branch?
> 
> 
> 2015-11-20  Eric Botcazou  
> 
>   * doc/md.texi (Standard Names): Move entry for addptr3 around,
>   add entries for addv4, subv4, mulv4, umulv4 and negv3, fixes
>   glitch in entries for cbranch4 and jump.

Ok, thanks.

> +@cindex @code{negv@var{m}3} instruction pattern
> +@item @samp{negv@var{m}3}
> +Like @code{neg@var{m}2}  but takes a @code{code_label} as operand 2 and
> +emits code to jump to it if signed overflow occurs during the negation.

Spurious space before "but".

Jakub

Re: Add uaddv4_optab, usubv4_optab

> Toward fixing PR68385.  I'm just starting a full round of testing, but

Do you mind if I install my doc patch?  It's slightly more thorough.

-- 
Eric Botcazou

Re: [PATCH 2/5] [AARCH64] Change IMP and PART over to integers from strings.


Hi Andrew,

On 17/11/15 22:10, Andrew Pinski wrote:

Because the imp and parts are really integer rather than strings, this patch
moves the comparisons to be integer.  Also allows saving around integers are
easier than doing string comparisons.  This allows for the next change.

The way I store BIG.little is (big<<12)|little as each part num is only 12bits
long.  So it would be nice if someone could test -mpu=native on a big.little
system to make sure it works still.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Thanks,
Andrew Pinski



* config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are integer 
constants.
* config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change 
implementer_id to unsigned char.
Change part_no to unsigned int.
(AARCH64_BIG_LITTLE): New define.
(INVALID_IMP): New define.
(INVALID_CORE): New define.
(cpu_data): Change the last element's implementer_id and part_no to integers.
(valid_bL_string_p): Rewrite to ..
(valid_bL_core_p): this for integers instead of strings.
(parse_field): New function.
(contains_string_p): Rewrite to ...
(contains_core_p): this for integers and only for the part_no.
(host_detect_local_cpu): Rewrite handling of implementation and part num to be 
integers;
simplifying the code.
---
  gcc/config/aarch64/aarch64-cores.def | 25 +-
  gcc/config/aarch64/driver-aarch64.c  | 90 
  2 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 0b456f7..798f3e3 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -33,25 +33,26 @@
 This need not include flags implied by the architecture.
 COSTS is the name of the rtx_costs routine to use.
 IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it can
-   be found in /proc/cpuinfo.
+   be found in /proc/cpuinfo.  There is a list in the ARM ARM.
 PART is the part number of the CPU.  On a GNU/Linux system it can be found
-   in /proc/cpuinfo.  For big.LITTLE systems this should have the form at of
-   ".".  */
+   in /proc/cpuinfo.  For big.LITTLE systems this should use the macro 
AARCH64_BIG_LITTLE
+   where the big part number comes as the first arugment to the macro and 
little is the
+   second.  */
  
  /* V8 Architecture Processors.  */
  
-AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")

-AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, 
cortexa57, "0x41", "0xd07")
-AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, 
cortexa72, "0x41", "0xd08")
-AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | 
AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001")
-AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | 
AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
-AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | 
AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
-AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xgene1, 
"0x50", "0x000")
+AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa53, 0x41, 0xd03)
+AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa57, 0x41, 0xd07)
+AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa72, 0x41, 0xd08)
+AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, 0x53, 0x001)
+AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, 0x51, 0x800)
+AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 0x0a1)
+AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000)
  
  /* V8 big.LITTLE implementations.  */
  
-AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")

-AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC, cortexa72, "0x41", "0xd08.0xd03")
+AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE(0xd07, 0xd03))
+AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE(0xd08, 0xd03))
  
  
  #undef AARCH64_CORE

diff --git a/gcc/config/aarch64/driver-aarch64.c 
b/gcc/config/aarch64/driver-aarch64.c

Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF

On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich  wrote:
> 2015-11-18 16:44 GMT+03:00 Richard Biener :
>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> When we compute vectypes we skip non-relevant phi nodes.  But we process 
>>> non-relevant alive statements and thus may need vectype of non-relevant 
>>> live phi node to compute mask vectype.  This patch enables vectype 
>>> computation for live phi nodes.  Botostrapped and regtested on 
>>> x86_64-unknown-linux-gnu.  OK for trunk?
>>
>> Hmm.  What breaks if you instead skip all !relevant stmts and not
>> compute vectype for life but not relevant ones?  We won't ever
>> "vectorize" !relevant ones, that is, we don't need their vector type.
>
> I tried it and got regression in SLP.  It expected non-null vectype
> for non-releveant but live statement. Regression was in
> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90

Because somebody put a vector type check before

  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
return false;

@@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g
   tree mask_type;
   tree mask;

+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+return false;
+
   if (!VECTOR_BOOLEAN_TYPE_P (vectype))
 return false;

@@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;

   gcc_assert (ncopies >= 1);
-  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
-return false;

   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
   && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle

fixes this particular fallout for me.

Richard.

> Ilya
>
>>
>> Richard.
>>
>>> Thanks,
>>> Ilya

[patch] Document new overflow arithmetics patterns

Hi,

this documents the new overflow arithmetics patterns added recently (addv4, 
subv4, mulv4, umulv4, negv3) and only them, i.e. the old ones are still not.
This also fixes the description of the cbranch and jump patterns, which were 
referring to a label_ref instead of a code_label.

Tested with 'make doc' on x86-64/Linux, OK for mainline and 5 branch?


2015-11-20  Eric Botcazou  

* doc/md.texi (Standard Names): Move entry for addptr3 around,
add entries for addv4, subv4, mulv4, umulv4 and negv3, fixes
glitch in entries for cbranch4 and jump.

-- 
Eric BotcazouIndex: doc/md.texi
===
--- doc/md.texi	(revision 230589)
+++ doc/md.texi	(working copy)
@@ -4872,17 +4872,6 @@ Add operand 2 and operand 1, storing the
 must have mode @var{m}.  This can be used even on two-address machines, by
 means of constraints requiring operands 1 and 0 to be the same location.
 
-@cindex @code{addptr@var{m}3} instruction pattern
-@item @samp{addptr@var{m}3}
-Like @code{add@var{m}3} but is guaranteed to only be used for address
-calculations.  The expanded code is not allowed to clobber the
-condition code.  It only needs to be defined if @code{add@var{m}3}
-sets the condition code.  If adds used for address calculations and
-normal adds are not compatible it is required to expand a distinct
-pattern (e.g. using an unspec).  The pattern is used by LRA to emit
-address calculations.  @code{add@var{m}3} is used if
-@code{addptr@var{m}3} is not defined.
-
 @cindex @code{ssadd@var{m}3} instruction pattern
 @cindex @code{usadd@var{m}3} instruction pattern
 @cindex @code{sub@var{m}3} instruction pattern
@@ -4912,6 +4901,35 @@ address calculations.  @code{add@var{m}3
 @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3}
 Similar, for other arithmetic operations.
 
+@cindex @code{addv@var{m}4} instruction pattern
+@item @samp{addv@var{m}4}
+Like @code{add@var{m}3} but takes a @code{code_label} as operand 3 and
+emits code to jump to it if signed overflow occurs during the addition.
+This pattern is used to implement the built-in functions performing
+signed integer addition with overflow checking.
+
+@cindex @code{subv@var{m}4} instruction pattern
+@cindex @code{mulv@var{m}4} instruction pattern
+@item @samp{subv@var{m}4}, @samp{mulv@var{m}4}
+Similar, for other signed arithmetic operations.
+
+@cindex @code{umulv@var{m}4} instruction pattern
+@item @samp{umulv@var{m}4}
+Like @code{mulv@var{m}4} but for unsigned multiplication.  That is to
+say, the operation is the same as signed multiplication but the jump
+is taken only on unsigned overflow.
+
+@cindex @code{addptr@var{m}3} instruction pattern
+@item @samp{addptr@var{m}3}
+Like @code{add@var{m}3} but is guaranteed to only be used for address
+calculations.  The expanded code is not allowed to clobber the
+condition code.  It only needs to be defined if @code{add@var{m}3}
+sets the condition code.  If adds used for address calculations and
+normal adds are not compatible it is required to expand a distinct
+pattern (e.g. using an unspec).  The pattern is used by LRA to emit
+address calculations.  @code{add@var{m}3} is used if
+@code{addptr@var{m}3} is not defined.
+
 @cindex @code{fma@var{m}4} instruction pattern
 @item @samp{fma@var{m}4}
 Multiply operand 2 and operand 1, then add operand 3, storing the
@@ -5277,6 +5295,11 @@ Reverse the order of bytes of operand 1
 @item @samp{neg@var{m}2}, @samp{ssneg@var{m}2}, @samp{usneg@var{m}2}
 Negate operand 1 and store the result in operand 0.
 
+@cindex @code{negv@var{m}3} instruction pattern
+@item @samp{negv@var{m}3}
+Like @code{neg@var{m}2}  but takes a @code{code_label} as operand 2 and
+emits code to jump to it if signed overflow occurs during the negation.
+
 @cindex @code{abs@var{m}2} instruction pattern
 @item @samp{abs@var{m}2}
 Store the absolute value of operand 1 into operand 0.
@@ -5926,13 +5949,13 @@ from the machine description.
 Conditional branch instruction combined with a compare instruction.
 Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
 first and second operands of the comparison, respectively.  Operand 3
-is a @code{label_ref} that refers to the label to jump to.
+is the @code{code_label} to jump to.
 
 @cindex @code{jump} instruction pattern
 @item @samp{jump}
 A jump inside a function; an unconditional branch.  Operand 0 is the
-@code{label_ref} of the label to jump to.  This pattern name is mandatory
-on all machines.
+@code{code_label} to jump to.  This pattern name is mandatory on all
+machines.
 
 @cindex @code{call} instruction pattern
 @item @samp{call}

Re: OpenACC declare directive updates

On Thu, Nov 19, 2015 at 10:22:16AM -0600, James Norris wrote:
> 2015-XX-XX  James Norris  
>   Cesar Philippidis  
> 
>   gcc/fortran/
>   * dump-parse-tree.c (show_namespace): Handle declares.
>   * gfortran.h (struct symbol_attribute): New fields.
>   (enum gfc_omp_map_map): Add OMP_MAP_DEVICE_RESIDENT and OMP_MAP_LINK.
>   (OMP_LIST_LINK): New enum.
>   (struct gfc_oacc_declare): New structure.
>   (gfc_get_oacc_declare): New definition.
>   (struct gfc_namespace): Change type.
>   (enum gfc_exec_op): Add EXEC_OACC_DECLARE.
>   (struct gfc_code): New field.
>   * module.c (enum ab_attribute): Add AB_OACC_DECLARE_CREATE,
>   AB_OACC_DECLARE_COPYIN, AB_OACC_DECLARE_DEVICEPTR,
>   AB_OACC_DECLARE_DEVICE_RESIDENT, AB_OACC_DECLARE_LINK
>   (attr_bits): Add new initializers.
>   (mio_symbol_attribute): Handle new atributes.
>   * openmp.c (gfc_free_oacc_declare_clauses): New function.
>   (gfc_match_oacc_clause_link: Likewise.
>   (OMP_CLAUSE_LINK): New definition.
>   (gfc_match_omp_clauses): Handle OMP_CLAUSE_LINK.
>   (OACC_DECLARE_CLAUSES): Add OMP_CLAUSE_LINK
>   (gfc_match_oacc_declare): Add checking and module handling.
>   (gfc_resolve_oacc_declare): Reimplement.
>   * parse.c (case_decl): Add ST_OACC_DECLARE.
>   (parse_spec): Remove handling.
>   (parse_progunit): Remove handling.
>   * parse.h (struct gfc_state_data): Change type.
>   * resolve.c (gfc_resolve_blocks): Handle EXEC_OACC_DECLARE.
>   * st.c (gfc_free_statement): Handle EXEC_OACC_DECLARE.
>   * symbol.c (check_conflict): Add conflict checks.
>   (gfc_add_oacc_declare_create, gfc_add_oacc_declare_copyin, 
>   gfc_add_oacc_declare_deviceptr, gfc_add_oacc_declare_device_resident):
>   New functions.
>   (gfc_copy_attr): Handle new symbols.
>   * trans-decl.c (add_clause, find_module_oacc_declare_clauses,
>   finish_oacc_declare): New functions.
>   (gfc_generate_function_code): Replace with call.
>   * trans-openmp.c (gfc_trans_oacc_declare): Reimplement.
>   (gfc_trans_oacc_directive): Handle EXEC_OACC_DECLARE.
>   * trans-stmt.c (gfc_trans_block_construct): Replace with call.
>   * trans-stmt.h (gfc_trans_oacc_declare): Remove argument.
>   * trans.c (trans_code): Handle EXEC_OACC_DECLARE.
> 
>   gcc/testsuite
>   * gfortran.dg/goacc/declare-1.f95: Update test.
>   * gfortran.dg/goacc/declare-2.f95: New test.
> 
>   libgomp/
>   * testsuite/libgomp.oacc-fortran/declare-1.f90: New test.
>   * testsuite/libgomp.oacc-fortran/declare-2.f90: Likewise.
>   * testsuite/libgomp.oacc-fortran/declare-3.f90: Likewise.
>   * testsuite/libgomp.oacc-fortran/declare-4.f90: Likewise.
>   * testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise.
>   * testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise.

Ok.

Jakub

Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage

On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward  wrote:
> When vectorising a integer induction condition reduction,
> is_nonwrapping_integer_induction ends up with different values for base
> during the analysis and build phases. In the first it is an INTEGER_CST,
> in the second the loop has been vectorised out and the base is now a
> variable.
>
> This results in the analysis and build stage detecting the
> STMT_VINFO_VEC_REDUCTION_TYPE as different types.
>
> The easiest way to fix this is to only check for integer induction
> conditions on the analysis stage.

I don't like this.  For the evolution part we have added
STMT_VINFO_LOOP_PHI_EVOLUTION_PART.  If you now need
the original initial value as well then just save it.

Or if you really want to go with the hack then please do not call
is_nonwrapping_integer_induction with vec_stmt != NULL but
initialize cond_expr_is_nonwrapping_integer_induction from
STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)

The hack also lacks a comment.

Richard.

> gcc/
> PR tree-optimization/68413
> * tree-vect-loop.c (vectorizable_reduction): Only check for
> integer cond reduction on analysis stage.
>
>
>
>
> Thanks,
> Alan.
>

Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument

On 19 Nov 18:19, Richard Biener wrote:
> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt 
>  wrote:
> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote:
> >> Currently we fold all memcpy/memmove calls with a known data size.
> >> It causes two problems when used with Pointer Bounds Checker.
> >> The first problem is that we may copy pointers as integer data
> >> and thus loose bounds.  The second problem is that if we inline
> >> memcpy, we also have to inline bounds copy and this may result
> >> in a huge amount of code and significant compilation time growth.
> >> This patch disables folding for functions we want to instrument.
> >>
> >> Does it look reasonable for trunk and GCC5 branch?  Bootstrapped
> >> and regtested on x86_64-unknown-linux-gnu.
> >
> >Can't see anything wrong with it. Ok.
> 
> But for small sizes this can have a huge impact on optimization.  Which is 
> why we have the code in the first place.  I'd make the check less broad, for 
> example inlining copies of size less than a pointer shouldn't be affected.

Right.  We also may inline in case we know no pointers are copied.  Below is a 
version with extended condition and a couple more tests.  Bootstrapped and 
regtested on x86_64-unknown-linux-gnu.  Does it OK for trunk and gcc-5-branch?

> 
> Richard.
> 
> >
> >Bernd
> 
> 

Thanks,
Ilya
--
gcc/

2015-11-20  Ilya Enkovich  

* gimple-fold.c (gimple_fold_builtin_memory_op): Don't
fold call if we are going to instrument it and it may
copy pointers.

gcc/testsuite/

2015-11-20  Ilya Enkovich  

* gcc.target/i386/mpx/pr68337-1.c: New test.
* gcc.target/i386/mpx/pr68337-2.c: New test.
* gcc.target/i386/mpx/pr68337-3.c: New test.


diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 1ab20d1..dd9f80b 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -53,6 +53,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gomp-constants.h"
 #include "optabs-query.h"
 #include "omp-low.h"
+#include "tree-chkp.h"
+#include "ipa-chkp.h"
 
 
 /* Return true when DECL can be referenced from current unit.
@@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
   unsigned int src_align, dest_align;
   tree off0;
 
+  /* Inlining of memcpy/memmove may cause bounds lost (if we copy
+pointers as wide integer) and also may result in huge function
+size because of inlined bounds copy.  Thus don't inline for
+functions we want to instrument in case pointers are copied.  */
+  if (flag_check_pointer_bounds
+ && chkp_instrumentable_p (cfun->decl)
+ /* Even if data may contain pointers we can inline if copy
+less than a pointer size.  */
+ && (!tree_fits_uhwi_p (len)
+ || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0)
+ /* Check data type for pointers.  */
+ && (!TREE_TYPE (src)
+ || !TREE_TYPE (TREE_TYPE (src))
+ || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src)))
+ || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src)
+   return false;
+
   /* Build accesses at offset zero with a ref-all character type.  */
   off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
 ptr_mode, true), 0);
diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c 
b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c
new file mode 100644
index 000..3f8d79d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+#include "mpx-check.h"
+
+#define N 2
+
+extern void abort ();
+
+static int
+mpx_test (int argc, const char **argv)
+{
+  char ** src = (char **)malloc (sizeof (char *) * N);
+  char ** dst = (char **)malloc (sizeof (char *) * N);
+  int i;
+
+  for (i = 0; i < N; i++)
+src[i] = __bnd_set_ptr_bounds (argv[0] + i, i + 1);
+
+  __builtin_memcpy(dst, src, sizeof (char *) * N);
+
+  for (i = 0; i < N; i++)
+{
+  char *p = dst[i];
+  if (p != argv[0] + i
+ || __bnd_get_ptr_lbound (p) != p
+ || __bnd_get_ptr_ubound (p) != p + i)
+   abort ();
+}
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c 
b/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c
new file mode 100644
index 000..16736b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+/* { dg-final { scan-assembler-not "memcpy" } } */
+
+void
+test1 (char *dst, char *src)
+{
+  __builtin_memcpy (dst, src, sizeof (char *) * 2);
+}
+
+void
+test2 (void *dst, void *src)
+{
+  __builtin_memcpy (dst, src, sizeof (char *) / 2);
+}
+
+struct s
+{
+  int a;
+  int b;
+};
+
+void
+test3 (struct s *dst, struct s *src)
+{
+

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

On Thu, 19 Nov 2015, Tom de Vries wrote:

> On 16/11/15 13:45, Richard Biener wrote:
> > > I've eliminated all the uses for pass_tree_loop_init/pass_tree_loop_done
> > > in
> > > >the pass group. Instead, I've added conditional loop optimizer setup in:
> > > >-  pass_lim and pass_scev_cprop (added in this patch), and
> 
> Reposting the "Add pass_oacc_kernels pass group in passes.def" patch.
> 
> pass_scev_cprop is no longer part of the pass group.
> 
> And I've dropped the scev_initialize in pass_lim.
> 
> Pass_lim is part of the pass_tree_loop pass group, where AFAIU scev info is
> initialized at the start of the pass group and updated or reset by passes in
> the pass group if necessary, such that it's always available, or can be
> recalculated on the spot.
> 
> First, pass_lim doesn't invalidate scev info. And second, AFAIU pass_lim
> doesn't use scev info. So there doesn't seem to be a need to do anything about
> scev info for using pass_lim outside pass_tree_loop.
> 
> > > >- pass_parallelize_loops_oacc_kernels (added in patch "Add
> > > >   pass_parallelize_loops_oacc_kernels").
> > You miss calling scev_finalize ().
> 
> I've added the scev_finalize () in patch "Add
> pass_parallelize_loops_oacc_kernels".

 pass_lim::execute (function *fun)
 {
+  if (!loops_state_satisfies_p (LOOPS_NORMAL
+   | LOOPS_HAVE_RECORDED_EXITS))
+loop_optimizer_init (LOOPS_NORMAL
+| LOOPS_HAVE_RECORDED_EXITS);
+

note that this will, when not in the loop pipeline, not properly
fixup loops if LOOPS_NEED_FIXUP is set (that doesn't clear other
loop flags).  I'd rather make loop_optimizer_init do nothing
if requested flags are already set and no fixup is needed and
call the above unconditionally.  Thus sth like

Index: gcc/loop-init.c
===
--- gcc/loop-init.c (revision 230649)
+++ gcc/loop-init.c (working copy)
@@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags)
   calculate_dominance_info (CDI_DOMINATORS);
 
   if (!needs_fixup)
-   checking_verify_loop_structure ();
+   {
+ checking_verify_loop_structure ();
+ if (loops_state_satisfies_p (flags))
+   goto out;
+   }
 
   /* Clear all flags.  */
   if (recorded_exits)
@@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags)
   /* Apply flags to loops.  */
   apply_loop_flags (flags);
 
+  checking_verify_loop_structure ();
+
+out:
   /* Dump loops.  */
   flow_loops_dump (dump_file, NULL, 1);
 
-  checking_verify_loop_structure ();
-
   timevar_pop (TV_LOOP_INIT);
 }
 



   if (number_of_loops (fun) <= 1)
 return 0;
 
+  if (!loops_state_satisfies_p (LOOP_CLOSED_SSA))
+rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+
   return tree_ssa_lim ();
 }

that looks bogus.  The into-loop-closed SSA rewrite should be
only done if the state _satisfies_ it.  I understand LIM doesn't
require loop-closed SSA.  But it also doesn't destroy it obviously.
So just remove that.



> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: Add uaddv4_optab, usubv4_optab


On 11/20/2015 11:56 AM, Eric Botcazou wrote:

Eric has just submitted a documentation path that documented the
{add,sub,mul,umul}v4 and negv3 patterns, so this should be
applied on top of that.


OK, I'm going to apply it, thanks.


Thanks.


Note that the comment at the beginning
of expand_addsub_overflow describing the overall strategy ought to be adjusted
if new patterns for the jump on carry are added.


Ok, I'll do that.


r~

Re: Add uaddv4_optab, usubv4_optab


On 11/20/2015 11:43 AM, Jakub Jelinek wrote:

+(define_expand "uaddv4"
+  [(parallel [(set (reg:CCC FLAGS_REG)
+  (compare:CCC
+(plus:SWI (match_dup 1) (match_dup 2))
+(match_dup 1)))
+ (set (match_dup 0)
+  (plus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (ne (reg:CCC FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  ""
+{
+  ix86_fixup_binary_operands_no_copy (PLUS, mode, operands);
+})


Do we need this one on i?86?  I'm not against adding it to optabs, so that
other targets have a way to improve that, but doesn't combine handle this
case on i?86 already well?


Perhaps combine can do the job, but in my option it's better to have the optab 
than not.  Especially when it's so easy to add in this case.



+(define_expand "usubv4"
+  [(parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 0)
+  (minus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (ltu (reg:CC FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]


If this works, it will be nice, I thought we'll need a new CC*mode.


No, we just need to re-use the existing insn that performs the low half of a 
double-word subtraction operation.



Eric has just submitted a documentation path that documented the
{add,sub,mul,umul}v4 and negv3 patterns, so this should be
applied on top of that.


Ok, I'll look out for that.


r~

Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m


Hi Andre,

On 18/11/15 09:44, Andre Vieira wrote:

On 17/11/15 10:10, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote:

On 16/11/15 12:07, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote:

Hi,

   This patch changes the target support mechanism to make it
recognize any ARM 'M' profile as a non-neon supporting target. The
current check only tests for armv6 architectures and earlier, and
does not account for armv7-m.

   This is correct because there is no 'M' profile that supports neon
and the current test is not sufficient to exclude armv7-m.

   Tested by running regressions for this testcase for various ARM targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira 

 * gcc/testsuite/lib/target-supports.exp
   (check_effective_target_arm_neon_ok_nocache): Added check
   for M profile.



 From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Fri, 13 Nov 2015 11:16:34 +
Subject: [PATCH] Disable neon testing for armv7-m

---
  gcc/testsuite/lib/target-supports.exp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } {
  int dummy;
  /* Avoid the case where a test adds -mfpu=neon, but the toolchain is
 configured for -mcpu=arm926ej-s, for example.  */
-#if __ARM_ARCH < 7
+#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
  #error Architecture too old for NEON.


Could you fix this #error message while you're here?

Why we can't change this test to look for the __ARM_NEON macro from ACLE:

#if __ARM_NEON < 1
   #error NEON is not enabled
#endif

Thanks,
James



There is a check for this already:
'check_effective_target_arm_neon'. I think the idea behind
arm_neon_ok is to check whether the hardware would support neon,
whereas arm_neon is to check whether neon was enabled, i.e.
-mfpu=neon was used or a mcpu was passed that has neon enabled by
default.

The comments for 'check_effective_target_arm_neon_ok_nocache'
highlight this, though maybe the comments for
check_effective_target_arm_neon could be better.

# Return 1 if this is an ARM target supporting -mfpu=neon
# -mfloat-abi=softfp or equivalent options.  Some multilibs may be
# incompatible with these options.  Also set et_arm_neon_flags to the
# best options to add.

proc check_effective_target_arm_neon_ok_nocache
...
/* Avoid the case where a test adds -mfpu=neon, but the toolchain is
configured for -mcpu=arm926ej-s, for example.  */
...


and

# Return 1 if this is a ARM target with NEON enabled.

proc check_effective_target_arm_neon


OK, got it - sorry for my mistake, I had the two procs confused.

I'd still like to see the error message fixed "Architecture too old for NEON."
is not an accurate description of the problem.

Thanks,
James



This OK?



This is ok,
I've committed for you with the slightly tweaked ChangeLog entry:
2015-11-20  Andre Vieira  

* lib/target-supports.exp
(check_effective_target_arm_neon_ok_nocache): Add check
for M profile.

as r230653.

Thanks,
Kyrill



Cheers,
Andre

[RFC] [Patch] PR67326 - relax trap assumption by looking at similar DRS

2015-11-20 Thread Kumar, Venkataramanan

Hi Richard,

As per Jakub suggestion in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67326, 
the below patch fixes the regression in tree if conversion.
Basically allowing if conversion to happen for a candidate DR, if we find 
similar DR with same dimensions  and that DR will not trap.

To find similar DRs using hash table to hashing the offset and DR pairs.  
Also reusing  read/written information that was stored for reference tree.  

Also. 
(1) I guard these checks for  -ftree-loop-if-convert-stores and -fno-common. 
Sometimes vectorization flags also triggers if conversion.
(2) Also hashing base DRs for writes only. 

gcc/ChangeLog
2015-11-19  Venkataramanan  

PR tree-optimization/67326
* tree-if-conv.c  (offset_DR_map): Define.
(struct ifc_dr): Add new tree base_predicate field.
(hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash offsets, 
DR pairs
and hash base ref,  DR pairs  for write type DRs.
(ifcvt_memrefs_wont_trap):  Guard checks with 
-ftree-loop-if-convert-stores flag. 
   Check for similar DR that are accessed unconditionally.
   (if_convertible_loop_p_1):  Initialize and delete offset hash maps

gcc/testsuite/ChangeLog
2015-11-19  Venkataramanan  
* gcc.dg/tree-ssa/ifc-pr67326.c:  Add new.

Regstrapped on x86_64, Ok for trunk?

Regards,
Venkat.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c
new file mode 100644
index 000..5ca11cd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-ifcvt-details -fno-common 
-ftree-loop-if-convert-stores" } */
+
+float v0[1024];
+float v1[1024];
+float v2[1024];
+float v3[1024];
+
+void condAssign1 () {
+  for (int i=0; i<1024; ++i)
+v0[i] = (v2[i]>v1[i]) ? v2[i]*v3[i] : v0[i];
+}
+
+void condAssign2 () {
+  for (int i=0; i<1024; ++i)
+v0[i] = (v2[i]>v1[i]) ? v2[i]*v1[i] : v0[i];
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 2 "ifcvt" } } */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 01065cb..1b4d220 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -124,6 +124,9 @@ static hash_map 
*ref_DR_map;
 /* Hash table to store base reference, DR pairs.  */
 static hash_map *baseref_DR_map;
 
+/* Hash table to store DR offset, DR pairs.  */
+static hash_map *offset_DR_map;
+
 /* Structure used to predicate basic blocks.  This is attached to the
->aux field of the BBs in the loop to be if-converted.  */
 struct bb_predicate {
@@ -589,6 +592,8 @@ struct ifc_dr {
   int rw_unconditionally;
 
   tree predicate;
+
+  tree base_predicate;
 };
 
 #define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
@@ -609,11 +614,12 @@ static void
 hash_memrefs_baserefs_and_store_DRs_read_written_info (data_reference_p a)
 {
 
-  data_reference_p *master_dr, *base_master_dr;
+  data_reference_p *master_dr, *base_master_dr, *offset_master_dr;
   tree ref = DR_REF (a);
   tree base_ref = DR_BASE_OBJECT (a);
+  tree offset = DR_OFFSET (a);
   tree ca = bb_predicate (gimple_bb (DR_STMT (a)));
-  bool exist1, exist2;
+  bool exist1, exist2, exist3;
 
   while (TREE_CODE (ref) == COMPONENT_REF
 || TREE_CODE (ref) == IMAGPART_EXPR
@@ -636,22 +642,34 @@ hash_memrefs_baserefs_and_store_DRs_read_written_info 
(data_reference_p a)
   if (is_true_predicate (IFC_DR (*master_dr)->predicate))
 DR_RW_UNCONDITIONALLY (*master_dr) = 1;
 
-  base_master_dr = _DR_map->get_or_insert (base_ref,);
-
-  if (!exist2)
+  if (DR_IS_WRITE (a))
 {
-  IFC_DR (a)->predicate = ca;
-  *base_master_dr = a;
+  base_master_dr = _DR_map->get_or_insert (base_ref,);
+  if (!exist2)
+   {
+ IFC_DR (a)->base_predicate = ca;
+ *base_master_dr = a;
+   }
+  else
+   IFC_DR (*base_master_dr)->base_predicate
+ = fold_or_predicates
+   (EXPR_LOCATION (IFC_DR (*base_master_dr)->base_predicate),
+ca, IFC_DR (*base_master_dr)->base_predicate);
+
+  if (is_true_predicate (IFC_DR (*base_master_dr)->base_predicate))
+   DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) = 1;
 }
-  else
-IFC_DR (*base_master_dr)->predicate
-   = fold_or_predicates
-   (EXPR_LOCATION (IFC_DR (*base_master_dr)->predicate),
-ca, IFC_DR (*base_master_dr)->predicate);
 
-  if (DR_IS_WRITE (a)
-  && (is_true_predicate (IFC_DR (*base_master_dr)->predicate)))
-DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) = 1;
+  if (offset)
+{
+  offset_master_dr = _DR_map->get_or_insert (offset,);
+  if (!exist3)
+   *offset_master_dr = a;
+
+  if (DR_RW_UNCONDITIONALLY (*offset_master_dr) != 1)
+   DR_RW_UNCONDITIONALLY (*offset_master_dr)
+   =

Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote:
> On 11/12/2015 09:39 AM, Evandro Menezes wrote:
>2015-11-12  Evandro Menezes 
> 
>[AArch64] Add attribute for compatibility with ARM pipeline models
> 
>gcc/
> 
>* config/aarch64/aarch64.md (predicated): Copy attribute from
>"arm.md".
>* config/arm/arm.md (predicated): Added description.
> 
> Please, commit if it's alright.

The AArch64 part of this is OK.

> From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001
> From: Evandro Menezes 
> Date: Mon, 9 Nov 2015 17:11:16 -0600
> Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
>  pipeline models
> 
> gcc/
>   * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
>   * config/arm/arm.md (predicated): Added description.
> ---
>  gcc/config/aarch64/aarch64.md | 4 
>  gcc/config/arm/arm.md | 3 +++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 1586256..d46f837 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -195,6 +195,10 @@
>  ;; 1 :=: yes
>  (define_attr "far_branch" "" (const_int 0))
>  
> +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 
> has
> +;; no predicated insns.
> +(define_attr "predicated" "yes,no" (const_string "no"))
> +
>  ;; ---
>  ;; Pipeline descriptions and scheduling
>  ;; ---
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 73c3088..6bda491 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -105,6 +105,9 @@
>  (define_attr "fpu" "none,vfp"
>(const (symbol_ref "arm_fpu_attr")))
>  
> +; Predicated means that the insn form is conditionally executed based on a
> +; predicate.  We default to 'no' because no Thumb patterns match this rule
> +; and not all ARM insns do.

s/is conditionally executed/can be conditionally executed/ in the first
sentence. Otherwise, this looks OK to me but I can't approve the ARM part,
so you'll need to wait for a review from someone who can.

Thanks,
James

>  (define_attr "predicated" "yes,no" (const_string "no"))
>  
>  ; LENGTH of an instruction (in bytes)
> -- 
> 2.1.0.243.g30d45f7
>

Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage

2015-11-20 Thread Alan Hayward



On 20/11/2015 11:00, "Richard Biener"  wrote:

>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward 
>wrote:
>> When vectorising a integer induction condition reduction,
>> is_nonwrapping_integer_induction ends up with different values for base
>> during the analysis and build phases. In the first it is an INTEGER_CST,
>> in the second the loop has been vectorised out and the base is now a
>> variable.
>>
>> This results in the analysis and build stage detecting the
>> STMT_VINFO_VEC_REDUCTION_TYPE as different types.
>>
>> The easiest way to fix this is to only check for integer induction
>> conditions on the analysis stage.
>
>I don't like this.  For the evolution part we have added
>STMT_VINFO_LOOP_PHI_EVOLUTION_PART.  If you now need
>the original initial value as well then just save it.
>
>Or if you really want to go with the hack then please do not call
>is_nonwrapping_integer_induction with vec_stmt != NULL but
>initialize cond_expr_is_nonwrapping_integer_induction from
>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
>
>The hack also lacks a comment.
>

Ok. I've gone for a combination of both:

I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE.

I've removed the vec_stmt != NULL checks.

I've moved the call to is_nonwrapping_integer_induction until after the
vect_is_simple_reduction check. I never liked that I had
is_nonwrapping_integer_induction early in the function, and think this
looks better.


Alan.



analysisonlycondcheck2.patch
Description: Binary data

Re: GCC 5.3 Status Report (2015-11-20)

2015-11-20 Thread David Edelsohn

On Fri, Nov 20, 2015 at 7:53 AM, Richard Biener  wrote:
>
> Status
> ==
>
> We plan to do a GCC 5.3 release candidate at the end of next week
> followed by the actual release a week after that.
>
> So now is the time to look at your regression bugs in bugzilla and
> do some backporting for things already fixed on trunk.

I'm still waiting for approval of the libtool change to support AIX
TLS symbols (PR 68192).  There has been no response on Libtool patches
mailing list.

I again request permission to apply the patches to GCC trunk and 5-branch.

Thanks, David

[PATCH][ARM] Enable fusion of AES instructions

2015-11-20 Thread Wilco Dijkstra

Enable instruction fusion of AES instructions on ARM for Cortex-A53 and
Cortex-A57. 

OK for commit?

ChangeLog:
2015-11-20  Wilco Dijkstra  

* gcc/config/arm/arm.c (arm_cortex_a53_tune): Add AES fusion.
(arm_cortex_a57_tune): Likewise.
(aarch_macro_fusion_pair_p): Add support for AES fusion.
* gcc/config/arm/arm-protos.h (fuse_ops): Add FUSE_AES_AESMC.

---
 gcc/config/arm/arm-protos.h | 5 +++--
 gcc/config/arm/arm.c| 9 +++--
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f9b1276..4801bb8 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -302,8 +302,9 @@ struct tune_params
   enum fuse_ops
   {
 FUSE_NOTHING   = 0,
-FUSE_MOVW_MOVT = 1 << 0
-  } fusible_ops: 1;
+FUSE_MOVW_MOVT = 1 << 0,
+FUSE_AES_AESMC = 1 << 1
+  } fusible_ops: 2;
   /* Depth of scheduling queue to check for L2 autoprefetcher.  */
   enum {SCHED_AUTOPREF_OFF, SCHED_AUTOPREF_RANK, SCHED_AUTOPREF_FULL}
 sched_autopref: 2;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 02f5dc3..7077199 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1969,7 +1969,7 @@ const struct tune_params arm_cortex_a53_tune =
   tune_params::DISPARAGE_FLAGS_NEITHER,
   tune_params::PREF_NEON_64_FALSE,
   tune_params::PREF_NEON_STRINGOPS_TRUE,
-  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
+  FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC),
   tune_params::SCHED_AUTOPREF_OFF
 };
 
@@ -1992,7 +1992,7 @@ const struct tune_params arm_cortex_a57_tune =
   tune_params::DISPARAGE_FLAGS_ALL,
   tune_params::PREF_NEON_64_FALSE,
   tune_params::PREF_NEON_STRINGOPS_TRUE,
-  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
+  FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC),
   tune_params::SCHED_AUTOPREF_FULL
 };
 
@@ -29668,6 +29668,11 @@ aarch_macro_fusion_pair_p (rtx_insn* prev,
rtx_insn* curr)
   && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST
(prev_set)))
 return true;
 }
+
+  if (current_tune->fusible_ops & tune_params::FUSE_AES_AESMC
+  && aarch_crypto_can_dual_issue (prev, curr))
+return true;
+
   return false;
 }
 
-- 
1.9.1

Re: [PATCH] Fix memory leaks in tree-ssa-uninit.c

2015-11-20 Thread Martin Liška

On 11/20/2015 03:14 AM, Bernd Schmidt wrote:
> BTW, I'm with whoever said absolutely no way to the idea of making automatic 
> changes like this as part of a commit hook.
> 
> I think the whitespace change can go in if it hasn't already, but I think the 
> other one still has enough problems that I'll say - leave it for the next 
> stage 1.
> 
>> @@ -178,8 +173,9 @@ warn_uninitialized_vars (bool 
>> warn_possibly_uninitialized)
>>
>>FOR_EACH_BB_FN (bb, cfun)
>>  {
>> -  bool always_executed = dominated_by_p (CDI_POST_DOMINATORS,
>> - single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb);
>> +  bool always_executed
>> += dominated_by_p (CDI_POST_DOMINATORS,
>> +  single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb);
>>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
>>  {
> 
> Better to pull the single_succ into its own variable perhaps?
> 
>> @@ -1057,7 +1039,8 @@ prune_uninit_phi_opnds_in_unrealizable_paths (gphi 
>> *phi,
>>   *visited_flag_phis = BITMAP_ALLOC (NULL);
>>
>> if (bitmap_bit_p (*visited_flag_phis,
>> -SSA_NAME_VERSION (gimple_phi_result (flag_arg_def
>> +SSA_NAME_VERSION (
>> +  gimple_phi_result (flag_arg_def
>>   return false;
>>
>> bitmap_set_bit (*visited_flag_phis,
> 
> Pull the gimple_phi_result into a separate variable, or just leave it 
> unchanged for now, the modified version is too ugly to live.
> 
>> bitmap_clear_bit (*visited_flag_phis,
>> -SSA_NAME_VERSION (gimple_phi_result (flag_arg_def)));
>> +SSA_NAME_VERSION (
>> +  gimple_phi_result (flag_arg_def)));
>> continue;
>>   }
> 
> Here too.
> 
>> -  all_pruned = prune_uninit_phi_opnds_in_unrealizable_paths (phi,
>> - uninit_opnds,
>> - as_a  (flag_def),
>> - boundary_cst,
>> - cmp_code,
>> - visited_phis,
>> - _flag_phis);
>> +  all_pruned = prune_uninit_phi_opnds_in_unrealizable_paths
>> +(phi, uninit_opnds, as_a (flag_def), boundary_cst, cmp_code,
>> + visited_phis, _flag_phis);
> 
> I'd rather shorten the name of the function, even if it goes against the 
> spirit of the novel writing month.
> 
>> -  if (gphi *use_phi = dyn_cast  (use_stmt))
>> -use_bb = gimple_phi_arg_edge (use_phi,
>> -  PHI_ARG_INDEX_FROM_USE (use_p))->src;
>> +  if (gphi *use_phi = dyn_cast (use_stmt))
>> +use_bb
>> +  = gimple_phi_arg_edge (use_phi, PHI_ARG_INDEX_FROM_USE (use_p))->src;
>> else
>>   use_bb = gimple_bb (use_stmt);
> 
> There are some changes of this nature and I'm not sure it's an improvement. 
> Leave them out for now?
> 
>> @@ -2345,8 +2309,8 @@ warn_uninitialized_phi (gphi *phi, vec 
>> *worklist,
>>   }
>>
>> /* Now check if we have any use of the value without proper guard.  */
>> -  uninit_use_stmt = find_uninit_use (phi, uninit_opnds,
>> - worklist, added_to_worklist);
>> +  uninit_use_stmt
>> += find_uninit_use (phi, uninit_opnds, worklist, added_to_worklist);
>>
>> /* All uses are properly guarded.  */
>> if (!uninit_use_stmt)
> 
> Here too.
> 
>> @@ -2396,12 +2359,24 @@ public:
>> {}
>>
>> /* opt_pass methods: */
>> -  opt_pass * clone () { return new pass_late_warn_uninitialized (m_ctxt); }
>> -  virtual bool gate (function *) { return gate_warn_uninitialized (); }
>> +  opt_pass *clone ();
>> +  virtual bool gate (function *);
> 
> This may technically violate our coding standards, but it's consistent with a 
> lot of similar cases. Since coding standards are about enforcing consistency, 
> I'd drop this change.
> 
>>   namespace {
>>
>>   const pass_data pass_data_early_warn_uninitialized =
>>   {
>> -  GIMPLE_PASS, /* type */
>> +  GIMPLE_PASS,   /* type */
>> "*early_warn_uninitialized", /* name */
>> -  OPTGROUP_NONE, /* optinfo_flags */
>> -  TV_TREE_UNINIT, /* tv_id */
>> -  PROP_ssa, /* properties_required */
>> -  0, /* properties_provided */
>> -  0, /* properties_destroyed */
>> -  0, /* todo_flags_start */
>> -  0, /* todo_flags_finish */
>> +  OPTGROUP_NONE,   /* optinfo_flags */
>> +  TV_TREE_UNINIT,   /* tv_id */
>> +  PROP_ssa,   /* properties_required */
>> +  0,   /* properties_provided */
>> +  0,   /* properties_destroyed */
>> +  0,   /* todo_flags_start */
>> +  0,   /* todo_flags_finish */
>>   };
> 
> Likewise. Seems to be done practically nowhere.
> 
>>   class pass_early_warn_uninitialized : public gimple_opt_pass
>> @@ -2519,14 +2491,23 @@ public:
>> {}
>>
>> /* opt_pass methods: */
>> -  virtual bool gate (function *) { return gate_warn_uninitialized (); }
>> -

Re: [PATCH] Add clang-format config to contrib folder

2015-11-20 Thread Martin Liška

On 11/19/2015 06:43 PM, Pedro Alves wrote:
> On 11/19/2015 12:54 PM, Martin Liška wrote:
>>  ContinuationIndentWidth: 2
>> -ForEachMacros: 
>> ['_FOR_EACH','_FOR_EACH_1','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FOR_EACH_IMM_USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR!
>  _EACH_OBJE
> CT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','_FOR_EACH_X','_FOR_EACH_X_1','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR']
>> +ForEachMacros: 
>> ['FOR_ALL_BB_FN','FOR_ALL_EH_REGION','FOR_ALL_EH_REGION_AT','FOR_ALL_EH_REGION_FN','FOR_ALL_INHERITED_FIELDS','FOR_ALL_PREDICATES','FOR_BB_BETWEEN','FOR_BB_INSNS','FOR_BB_INSNS_REVERSE','FOR_BB_INSNS_REVERSE_SAFE','FOR_BB_INSNS_SAFE','FOR_BODY','FOR_COND','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FO!
>  R_EACH_IMM
> _USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR_EACH_OBJECT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','FOR_EXPR','FOR_INIT_STMT','FOR_SCOPE']
>>  IndentCaseLabels: false
> 
> I don't know the tool's syntax here, but it's usually much better to
> keep entries like these one per line (or at least a few only).  Then
> changes to the list are _much_ easier to review and maintain, like:
> 
> ...
>  ForEachMacros: [
>  'FOR_ALL_BB_FN',
>  'FOR_ALL_EH_REGION',
> -'FOR_ALL_EH_REGION_AT',
> -'FOR_ALL_EH_REGION',
> +'FOR_ALL_EH_WHATNOT,
> +'FOR_ALL_EH_WHATNOT_AT,
>  'FOR_ALL_INHERITED_FIELDS',
>  'FOR_ALL_PREDICATES',
>  'FOR_BB_BETWEEN',
>  'FOR_BB_INSNS',
> ...
> 
> vs a single-line hunk that's basically unintelligible.
> 
> Thanks,
> Pedro Alves
> 

Hi Pedro.

Fully agree with you, there's suggested patch.
Hope I can install the patch for trunk?

Thanks,
Martin
From eac730138b51db897f579a961aabaee805338bdd Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 20 Nov 2015 12:29:49 +0100
Subject: [PATCH] clang-format: split content of a list to multiple lines

contrib/ChangeLog:

2015-11-20  Martin Liska  

	* clang-format: Split content of a list to multiple
	lines.
---
 contrib/clang-format | 85 +++-
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/contrib/clang-format b/contrib/clang-format
index 76a9d87..d734001 100644
--- a/contrib/clang-format
+++ b/contrib/clang-format
@@ -43,7 +43,90 @@ BreakBeforeTernaryOperators: true
 ColumnLimit: 80

[Committed] S/390: Add bswaphi2 pattern

2015-11-20 Thread Andreas Krebbel

Hi,

the 16 bit bswap instruction support was missing in the back-end so
far.  Added with the attached patch.

Bootstrapped on s390 and s390x. No regressions.

Bye,

-Andreas-

gcc/testsuite/ChangeLog:

2015-11-20  Andreas Krebbel  

* gcc.target/s390/bswap-1.c: New test.


gcc/ChangeLog:

2015-11-20  Andreas Krebbel  

* config/s390/s390.md ("bswaphi2"): New pattern.

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index ea65c74..280a772 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -10439,6 +10439,8 @@
 ; Byte swap instructions
 ;
 
+; FIXME: There is also mvcin but we cannot use it since src and target
+; may overlap.
 (define_insn "bswap2"
   [(set (match_operand:GPR 0"register_operand" "=d, d")
(bswap:GPR (match_operand:GPR 1 "nonimmediate_operand" " d,RT")))]
@@ -10450,6 +10452,14 @@
(set_attr "op_type" "RRE,RXY")
(set_attr "z10prop" "z10_super")])
 
+(define_insn "bswaphi2"
+  [(set (match_operand:HI 0   "register_operand" "=d")
+   (bswap:HI (match_operand:HI 1 "memory_operand"   "RT")))]
+  "TARGET_CPU_ZARCH"
+  "lrvh\t%0,%1"
+  [(set_attr "type" "load")
+   (set_attr "op_type" "RXY")
+   (set_attr "z10prop" "z10_super")])
 
 ;
 ; Population count instruction
diff --git a/gcc/testsuite/gcc.target/s390/bswap-1.c 
b/gcc/testsuite/gcc.target/s390/bswap-1.c
new file mode 100644
index 000..e1f113a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/bswap-1.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z900 -mzarch" } */
+
+#include 
+
+uint64_t u64;
+uint32_t u32;
+uint16_t u16;
+
+uint64_t
+foo64a (uint64_t a)
+{
+  return __builtin_bswap64 (a);
+}
+/* { dg-final { scan-assembler-times "lrvgr\t%r2,%r2" 1 { target lp64 } } } */
+
+uint64_t
+foo64b ()
+{
+  return __builtin_bswap64 (u64);
+}
+/* { dg-final { scan-assembler-times "lrvg\t%r2,0\\(%r\[0-9\]*\\)" 1 { target 
lp64 } } } */
+
+uint32_t
+foo32 ()
+{
+  return __builtin_bswap32 (u32);
+}
+/* { dg-final { scan-assembler-times "lrv\t%r2,0\\(%r\[0-9\]*\\)" 1 } } */
+
+uint16_t
+foo16 ()
+{
+  return __builtin_bswap16 (u16);
+}
+/* { dg-final { scan-assembler-times "lrvh\t%r2,0\\(%r\[0-9\]*\\)" 1 } } */

Re: [RFC] [Patch] PR67326 - relax trap assumption by looking at similar DRS

On Fri, Nov 20, 2015 at 12:02:07PM +, Kumar, Venkataramanan wrote:
> Also. 
> (1) I guard these checks for  -ftree-loop-if-convert-stores and -fno-common. 
> Sometimes vectorization flags also triggers if conversion.
> (2) Also hashing base DRs for writes only. 

Let me comment just on formatting, will leave actual review to Richard.
> 
> gcc/ChangeLog
> 2015-11-19  Venkataramanan  

Surname missing.
>   
>   PR tree-optimization/67326
>   * tree-if-conv.c  (offset_DR_map): Define.

Extraneous space.

>   (struct ifc_dr): Add new tree base_predicate field.

All ChangeLog lines should be indented by a single tab.

>   (hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash offsets, 
> DR pairs

Too long line.

>   and hash base ref,  DR pairs  for write type DRs.

Extraneous spaces.

>   (ifcvt_memrefs_wont_trap):  Guard checks with 
> -ftree-loop-if-convert-stores flag. 

Too long line and extraneous spaces.

>Check for similar DR that are accessed unconditionally.
>(if_convertible_loop_p_1):  Initialize and delete offset hash maps

Extraneous space, missing full stop at the end.
> 
> gcc/testsuite/ChangeLog
> 2015-11-19  Venkataramanan  
>   * gcc.dg/tree-ssa/ifc-pr67326.c:  Add new.

Extraneous space.

> +  if (DR_IS_WRITE (a))
>  {
> -  IFC_DR (a)->predicate = ca;
> -  *base_master_dr = a;
> +  base_master_dr = _DR_map->get_or_insert (base_ref,);

Missing space before );

> +  offset_master_dr = _DR_map->get_or_insert (offset,);
> +  if (!exist3)
> + *offset_master_dr = a;
> +
> +  if (DR_RW_UNCONDITIONALLY (*offset_master_dr) != 1)
> + DR_RW_UNCONDITIONALLY (*offset_master_dr)
> + = DR_RW_UNCONDITIONALLY (*master_dr);

Wrong indentation, the = should be below the first underscore in
DR_RW_UNCONDITIONALLY.

> -  if (DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) == 1)
> +  if ((base_master_dr
> +&& DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) == 1))

Extraneous () pair.

> +  else if (DR_OFFSET (a))
> +{
> +  offset_dr = offset_DR_map->get (DR_OFFSET (a));
> +  if ((DR_RW_UNCONDITIONALLY (*offset_dr) == 1)

Likewise.

Jakub

Re: [Committed] S/390: Add bswaphi2 pattern


On 11/20/2015 12:52 PM, Andreas Krebbel wrote:

+(define_insn "bswaphi2"
+  [(set (match_operand:HI 0   "register_operand" "=d")
+   (bswap:HI (match_operand:HI 1 "memory_operand"   "RT")))]
+  "TARGET_CPU_ZARCH"
+  "lrvh\t%0,%1"
+  [(set_attr "type" "load")
+   (set_attr "op_type" "RXY")
+   (set_attr "z10prop" "z10_super")])


Surely it's better to arrange so that you can use STRVH as well.
And providing a fallback for the reg-reg case (e.g. LRVR+SRL).

Although I suppose I don't see support for STRV in bswap32/64 either...


r~

GCC 5.3 Status Report (2015-11-20)


Status
==

We plan to do a GCC 5.3 release candidate at the end of next week
followed by the actual release a week after that.

So now is the time to look at your regression bugs in bugzilla and
do some backporting for things already fixed on trunk.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  121+  30
P3   20-   8
P4   87+   2
P5   32+   2
---   ---
Total P1-P3 141+  22
Total   260+  24


Previous Report
===

https://gcc.gnu.org/ml/gcc/2015-07/msg00197.html

Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-20 Thread Evandro Menezes


On 11/20/2015 08:34 AM, Kyrill Tkachov wrote:


On 20/11/15 12:27, James Greenhalgh wrote:

On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote:

On 11/12/2015 09:39 AM, Evandro Menezes wrote:
2015-11-12  Evandro Menezes 

[AArch64] Add attribute for compatibility with ARM pipeline models

gcc/

* config/aarch64/aarch64.md (predicated): Copy attribute from
"arm.md".
* config/arm/arm.md (predicated): Added description.


The arm part is ok too. It's just a comment.
In the ChangeLog entry for arm.md I'd say "Add description."

Thanks,
Kyrill


Please, commit if it's alright.

The AArch64 part of this is OK.


 From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:11:16 -0600
Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
  pipeline models

gcc/
* config/aarch64/aarch64.md (predicated): Copy attribute from 
"arm.md".

* config/arm/arm.md (predicated): Added description.
---
  gcc/config/aarch64/aarch64.md | 4 
  gcc/config/arm/arm.md | 3 +++
  2 files changed, 7 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md 
b/gcc/config/aarch64/aarch64.md

index 1586256..d46f837 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -195,6 +195,10 @@
  ;; 1 :=: yes
  (define_attr "far_branch" "" (const_int 0))
  +;; Strictly for compatibility with AArch32 in pipeline models, 
since AArch64 has

+;; no predicated insns.
+(define_attr "predicated" "yes,no" (const_string "no"))
+
  ;; 
---

  ;; Pipeline descriptions and scheduling
  ;; 
---

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 73c3088..6bda491 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -105,6 +105,9 @@
  (define_attr "fpu" "none,vfp"
(const (symbol_ref "arm_fpu_attr")))
  +; Predicated means that the insn form is conditionally executed 
based on a
+; predicate.  We default to 'no' because no Thumb patterns match 
this rule

+; and not all ARM insns do.

s/is conditionally executed/can be conditionally executed/ in the first
sentence. Otherwise, this looks OK to me but I can't approve the ARM 
part,

so you'll need to wait for a review from someone who can.

Thanks,
James


  (define_attr "predicated" "yes,no" (const_string "no"))
; LENGTH of an instruction (in bytes)
--
2.1.0.243.g30d45f7


Can you please commit it with this change in the Changelog?

Thank you,

--
Evandro Menezes

[PATCH, PR68460] Always call free_stmt_vec_info_vec in gather_scalar_reductions


[ was: Re: [PATCH] Fix parloops gimple_uid usage ]

On 09/10/15 23:09, Tom de Vries wrote:

@@ -2392,6 +2397,9 @@ gather_scalar_reductions (loop_p loop, 
reduction_info_table_type *reduction_list
loop_vec_info simple_inner_loop_info = NULL;
bool allow_double_reduc = true;

+  if (!stmt_vec_info_vec.exists ())
+init_stmt_vec_info_vec ();
+
simple_loop_info = vect_analyze_loop_form (loop);
if (simple_loop_info == NULL)
  return;
@@ -2453,9 +2461,16 @@ gather_scalar_reductions (loop_p loop, 
reduction_info_table_type *reduction_list
destroy_loop_vec_info (simple_loop_info, true);
destroy_loop_vec_info (simple_inner_loop_info, true);

+  /* Release the claim on gimple_uid.  */
+  free_stmt_vec_info_vec ();
+


With the src/libgomp/testsuite/libgomp.c/pr46886.c testcase, compiled in 
addition with -ftree-vectorize, I ran into an ICE:

...
src/libgomp/testsuite/libgomp.c/pr46886.c:8:5: internal compiler error: 
in init_stmt_vec_info_vec, at tree-vect-stmts.c:8250

 int foo (void)
 ^~~

0x1196082 init_stmt_vec_info_vec()
src/gcc/tree-vect-stmts.c:8250
0x11c3ed4 vectorize_loops()
src/gcc/tree-vectorizer.c:510
0x10a7ea5 execute
src/gcc/tree-ssa-loop.c:276
...

The ICE is caused by the fact that init_stmt_vec_info_vec is called at 
the start of vectorize_loops, while stmt_vec_info_vec is not empty. I 
traced this back to gather_scalar_reduction, where we call 
init_stmt_vec_info_vec, but we skip free_stmt_vec_info_vec if we take 
the early-out for simple_loop_info == NULL.


This patch fixes the ICE by making sure we always call 
free_stmt_vec_info_vec in gather_scalar_reduction.


Passes cc1/f951 rebuild and autopar testing.

OK for stage3 trunk if bootstrap and regtest succeeds?

Thanks,
- Tom
Always call free_stmt_vec_info_vec in gather_scalar_reductions

2015-11-20  Tom de Vries  

	PR tree-optimization/68460
	* tree-parloops.c (gather_scalar_reductions): Also call
	free_stmt_vec_info_vec if simple_loop_info == NULL.

	* gcc.dg/autopar/pr68460.c: New test.

---
 gcc/testsuite/gcc.dg/autopar/pr68460.c | 35 ++
 gcc/tree-parloops.c|  6 +-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/autopar/pr68460.c b/gcc/testsuite/gcc.dg/autopar/pr68460.c
new file mode 100644
index 000..0c00065
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr68460.c
@@ -0,0 +1,35 @@
+/* { dg-do "compile" } */
+/* { dg-options "-O -ftree-parallelize-loops=2 -ftree-vectorize -fno-tree-ch -fno-tree-dominator-opts" } */
+
+void abort (void);
+
+int d[1024], e[1024];
+
+int
+foo (void)
+{
+  int s = 0;
+  int i;
+
+  for (i = 0; i < 1024; i++)
+s += d[i] - e[i];
+
+  return s;
+}
+
+int
+main ()
+{
+  int i;
+
+  for (i = 0; i < 1024; i++)
+{
+  d[i] = i * 2;
+  e[i] = i;
+}
+
+  if (foo () != 1023 * 1024 / 2)
+abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 8d7912d..85cc78d 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2433,7 +2433,7 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list
 
   simple_loop_info = vect_analyze_loop_form (loop);
   if (simple_loop_info == NULL)
-return;
+goto gather_done;
 
   for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next ())
 {
@@ -2492,9 +2492,13 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list
   destroy_loop_vec_info (simple_loop_info, true);
   destroy_loop_vec_info (simple_inner_loop_info, true);
 
+ gather_done:
   /* Release the claim on gimple_uid.  */
   free_stmt_vec_info_vec ();
 
+  if (reduction_list->elements () == 0)
+return;
+
   /* As gimple_uid is used by the vectorizer in between vect_analyze_loop_form
  and free_stmt_vec_info_vec, we can set gimple_uid of reduc_phi stmts only
  now.  */

Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def


On 20/11/15 14:29, Richard Biener wrote:

I agree it's somewhat of an odd behavior but all passes should
either be placed in a sub-pipeline with an outer
loop_optimizer_init()/finalize () call or call both themselves.


Hmm, but adding loop_optimizer_finalize at the end of pass_lim breaks 
the loop pipeline.


We could use the style used in pass_slp_vectorize::execute:
...
pass_slp_vectorize::execute (function *fun)
{
  basic_block bb;

  bool in_loop_pipeline = scev_initialized_p ();
  if (!in_loop_pipeline)
{
  loop_optimizer_init (LOOPS_NORMAL);
  scev_initialize ();
}

  ...

  if (!in_loop_pipeline)
{
  scev_finalize ();
  loop_optimizer_finalize ();
}
...

Although that doesn't strike me as particularly clean.

Thanks,
- Tom

Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m

2015-11-20 Thread Andre Vieira


Hi Kyrill
On 20/11/15 11:51, Kyrill Tkachov wrote:

Hi Andre,

On 18/11/15 09:44, Andre Vieira wrote:

On 17/11/15 10:10, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote:

On 16/11/15 12:07, James Greenhalgh wrote:

On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote:

Hi,

   This patch changes the target support mechanism to make it
recognize any ARM 'M' profile as a non-neon supporting target. The
current check only tests for armv6 architectures and earlier, and
does not account for armv7-m.

   This is correct because there is no 'M' profile that supports neon
and the current test is not sufficient to exclude armv7-m.

   Tested by running regressions for this testcase for various ARM
targets.

   Is this OK to commit?

   Thanks,
   Andre Vieira

gcc/testsuite/ChangeLog:
2015-11-06  Andre Vieira 

 * gcc/testsuite/lib/target-supports.exp
   (check_effective_target_arm_neon_ok_nocache): Added check
   for M profile.



 From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00
2001
From: Andre Simoes Dias Vieira 
Date: Fri, 13 Nov 2015 11:16:34 +
Subject: [PATCH] Disable neon testing for armv7-m

---
  gcc/testsuite/lib/target-supports.exp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index
75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6
100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2854,7 +2854,7 @@ proc
check_effective_target_arm_neon_ok_nocache { } {
  int dummy;
  /* Avoid the case where a test adds -mfpu=neon, but the
toolchain is
 configured for -mcpu=arm926ej-s, for example.  */
-#if __ARM_ARCH < 7
+#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M'
  #error Architecture too old for NEON.


Could you fix this #error message while you're here?

Why we can't change this test to look for the __ARM_NEON macro from
ACLE:

#if __ARM_NEON < 1
   #error NEON is not enabled
#endif

Thanks,
James



There is a check for this already:
'check_effective_target_arm_neon'. I think the idea behind
arm_neon_ok is to check whether the hardware would support neon,
whereas arm_neon is to check whether neon was enabled, i.e.
-mfpu=neon was used or a mcpu was passed that has neon enabled by
default.

The comments for 'check_effective_target_arm_neon_ok_nocache'
highlight this, though maybe the comments for
check_effective_target_arm_neon could be better.

# Return 1 if this is an ARM target supporting -mfpu=neon
# -mfloat-abi=softfp or equivalent options.  Some multilibs may be
# incompatible with these options.  Also set et_arm_neon_flags to the
# best options to add.

proc check_effective_target_arm_neon_ok_nocache
...
/* Avoid the case where a test adds -mfpu=neon, but the toolchain is
configured for -mcpu=arm926ej-s, for example.  */
...


and

# Return 1 if this is a ARM target with NEON enabled.

proc check_effective_target_arm_neon


OK, got it - sorry for my mistake, I had the two procs confused.

I'd still like to see the error message fixed "Architecture too old
for NEON."
is not an accurate description of the problem.

Thanks,
James



This OK?



This is ok,
I've committed for you with the slightly tweaked ChangeLog entry:
2015-11-20  Andre Vieira  

 * lib/target-supports.exp
 (check_effective_target_arm_neon_ok_nocache): Add check
 for M profile.

as r230653.

Thanks,
Kyrill



Cheers,
Andre




Thank you. Would there be any objections to backporting this to 
gcc-5-branch? I checked, it applies cleanly and its a simple enough way 
of preventing a lot of FAILS for armv7-m.


Best Regards,
Andre

Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-20 Thread Evandro Menezes


On 11/20/2015 06:27 AM, James Greenhalgh wrote:

On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote:

On 11/12/2015 09:39 AM, Evandro Menezes wrote:
2015-11-12  Evandro Menezes 

[AArch64] Add attribute for compatibility with ARM pipeline models

gcc/

* config/aarch64/aarch64.md (predicated): Copy attribute from
"arm.md".
* config/arm/arm.md (predicated): Added description.

Please, commit if it's alright.

The AArch64 part of this is OK.


 From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:11:16 -0600
Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
  pipeline models

gcc/
* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
* config/arm/arm.md (predicated): Added description.
---
  gcc/config/aarch64/aarch64.md | 4 
  gcc/config/arm/arm.md | 3 +++
  2 files changed, 7 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1586256..d46f837 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -195,6 +195,10 @@
  ;; 1 :=: yes
  (define_attr "far_branch" "" (const_int 0))
  
+;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has

+;; no predicated insns.
+(define_attr "predicated" "yes,no" (const_string "no"))
+
  ;; ---
  ;; Pipeline descriptions and scheduling
  ;; ---
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 73c3088..6bda491 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -105,6 +105,9 @@
  (define_attr "fpu" "none,vfp"
(const (symbol_ref "arm_fpu_attr")))
  
+; Predicated means that the insn form is conditionally executed based on a

+; predicate.  We default to 'no' because no Thumb patterns match this rule
+; and not all ARM insns do.

s/is conditionally executed/can be conditionally executed/ in the first
sentence. Otherwise, this looks OK to me but I can't approve the ARM part,
so you'll need to wait for a review from someone who can.


  (define_attr "predicated" "yes,no" (const_string "no"))
  
  ; LENGTH of an instruction (in bytes)

--
2.1.0.243.g30d45f7


Actually, that would then be the existing description for the 
"predicable" attribute.  I think that this proposed description is 
correct for the "predicated" attribute.


Thank you,

--
Evandro Menezes

Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

On Fri, Nov 20, 2015 at 09:55:29AM -0600, Evandro Menezes wrote:
> On 11/20/2015 06:27 AM, James Greenhalgh wrote:
> >On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote:
> >>On 11/12/2015 09:39 AM, Evandro Menezes wrote:
> >>2015-11-12  Evandro Menezes 
> >>
> >>[AArch64] Add attribute for compatibility with ARM pipeline models
> >>
> >>gcc/
> >>
> >>* config/aarch64/aarch64.md (predicated): Copy attribute from
> >>"arm.md".
> >>* config/arm/arm.md (predicated): Added description.
> >>
> >>Please, commit if it's alright.
> >The AArch64 part of this is OK.
> >
> >> From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001
> >>From: Evandro Menezes 
> >>Date: Mon, 9 Nov 2015 17:11:16 -0600
> >>Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
> >>  pipeline models
> >>
> >>gcc/
> >>* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
> >>* config/arm/arm.md (predicated): Added description.
> >>---
> >>  gcc/config/aarch64/aarch64.md | 4 
> >>  gcc/config/arm/arm.md | 3 +++
> >>  2 files changed, 7 insertions(+)
> >>
> >>diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> >>index 1586256..d46f837 100644
> >>--- a/gcc/config/aarch64/aarch64.md
> >>+++ b/gcc/config/aarch64/aarch64.md
> >>@@ -195,6 +195,10 @@
> >>  ;; 1 :=: yes
> >>  (define_attr "far_branch" "" (const_int 0))
> >>+;; Strictly for compatibility with AArch32 in pipeline models, since 
> >>AArch64 has
> >>+;; no predicated insns.
> >>+(define_attr "predicated" "yes,no" (const_string "no"))
> >>+
> >>  ;; ---
> >>  ;; Pipeline descriptions and scheduling
> >>  ;; ---
> >>diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> >>index 73c3088..6bda491 100644
> >>--- a/gcc/config/arm/arm.md
> >>+++ b/gcc/config/arm/arm.md
> >>@@ -105,6 +105,9 @@
> >>  (define_attr "fpu" "none,vfp"
> >>(const (symbol_ref "arm_fpu_attr")))
> >>+; Predicated means that the insn form is conditionally executed based on a
> >>+; predicate.  We default to 'no' because no Thumb patterns match this rule
> >>+; and not all ARM insns do.
> >s/is conditionally executed/can be conditionally executed/ in the first
> >sentence. Otherwise, this looks OK to me but I can't approve the ARM part,
> >so you'll need to wait for a review from someone who can.
> >
> >>  (define_attr "predicated" "yes,no" (const_string "no"))
> >>  ; LENGTH of an instruction (in bytes)
> >>-- 
> >>2.1.0.243.g30d45f7
> 
> Actually, that would then be the existing description for the
> "predicable" attribute.  I think that this proposed description is
> correct for the "predicated" attribute.

Right, understood, that will teach me to pattern match up to pred and stop
reading the word.

This is fine, I've committed it on your behalf as r230666

Thanks,
James

Re: PATCH to shorten_compare -Wtype-limits handling

2015-11-20 Thread Martin Sebor


Hmm, it looks like using expansion_point_if_in_system_header might avoid
the first issue you mention.


Thus.


Great, thanks! (I'll have to remember the trick for my own use!)

Martin

Re: [Patch, vrp] Allow VRP type conversion folding only for widenings upto word mode

2015-11-20 Thread Senthil Kumar Selvaraj

On Thu, Nov 19, 2015 at 10:31:41AM -0700, Jeff Law wrote:
> On 11/18/2015 11:20 PM, Senthil Kumar Selvaraj wrote:
> >On Wed, Nov 18, 2015 at 09:36:21AM +0100, Richard Biener wrote:
> >>
> >>Otherwise ok.
> >
> >See modified patch below. If you think vrp98.c is unnecessary, feel free
> >to dump it :).
> >
> >If ok, could you commit it for me please? I don't have commit access.
> >
> >Regards
> >Senthil
> >
> >gcc/ChangeLog
> >2015-11-19  Senthil Kumar Selvaraj  
> >
> > * tree.h (desired_pro_or_demotion_p): New function.
> > * tree-vrp.c (simplify_cond_using_ranges): Call it.
> >
> >gcc/testsuite/ChangeLog
> >2015-11-19  Senthil Kumar Selvaraj  
> >
> > * gcc.dg/tree-ssa/vrp98.c: New testcase.
> > * gcc.target/avr/uint8-single-reg.c: New testcase.
> I went ahead and committed this as-is.
> 
> I do think the vrp98 testcase is useful as it verifies that VRP is doing
> what we want in a target independent way.  It's a good complement to the AVR
> specific testcase.

I see the same problem on gcc-5-branch as well. Would it be ok to
backport the fix to that branch as well?

Regards
Senthil
> 
> Thanks,
> Jeff
>

Re: PATCH to shorten_compare -Wtype-limits handling


On 11/19/2015 05:16 PM, Jason Merrill wrote:

On 11/19/2015 02:44 PM, Martin Sebor wrote:

On 11/18/2015 09:26 PM, Jason Merrill wrote:

The rs6000 target was hitting a bootstrap failure due to
-Werror=type-limits.  Since warn_tautological_cmp and other warnings
avoid warning if one of the operands comes from a macro, I thought it
would make sense to do that here as well.


The also disables the warning for functions that are shadowed by
macros such as C atomic_load et al. For example, in the program
below. Is that an acceptable compromise or is there a way to avoid
this?


I think it's an acceptable compromise, but see below.


At the same time, the change doesn't suppress the warning in other
cases where I would have expected it to suppress it based on your
description. For instance here:

   unsigned short bar (unsigned short x)
   {
   #define X x

 if (x > 0x)


Yes, this is missed because the front end doesn't remember the location
of the use of x; that's one of many location tracking issues.  David
Malcolm is working on this stuff.


I noticed there is code elsewhere in c-common.c that avoids
issuing the same warning for system headers (that's the code
that responsible for issuing the warning for the second test
case above).


Hmm, it looks like using expansion_point_if_in_system_header might avoid
the first issue you mention.


Thus.


commit fb71bf4de520cc4bd11eefb57e50748b4679996f
Author: Jason Merrill 
Date:   Thu Nov 19 15:21:47 2015 -0500

	* c-common.c (shorten_compare): But look through macros from
	system headers.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 068a0bc..fe0a235 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -4651,8 +4651,10 @@ shorten_compare (location_t loc, tree *op0_ptr, tree *op1_ptr,
 	}
 
   if (TREE_CODE (primop0) != INTEGER_CST
-	  /* Don't warn if it's from a macro.  */
-	  && !from_macro_expansion_at (EXPR_LOCATION (primop0)))
+	  /* Don't warn if it's from a (non-system) macro.  */
+	  && !(from_macro_expansion_at
+	   (expansion_point_location_if_in_system_header
+		(EXPR_LOCATION (primop0)
 	{
 	  if (val == truthvalue_false_node)
 	warning_at (loc, OPT_Wtype_limits,
diff --git a/gcc/testsuite/gcc.dg/Wtype-limits2.c b/gcc/testsuite/gcc.dg/Wtype-limits2.c
new file mode 100644
index 000..92151aa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wtype-limits2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-Wtype-limits" } */
+/* { dg-require-effective-target sync_char_short } */
+
+#include 
+
+unsigned foo (unsigned char *x)
+{
+  if (atomic_load (x) > 1000) /* { dg-warning "comparison is always false due to limited range of data type" } */
+return 0;
+  return 1;
+}

Re: [PATCH] Fix PR68067

2015-11-20 Thread Alan Lawrence

On 6 November 2015 at 10:39, Richard Biener  wrote:
>> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location
>> references block not in block tree
>> l1_279 = PHI <1(28), l1_299(33)>
>
> ^^^
>
> this is the error to look at!  It means that the GC heap will be corrupted
> quite easily.
>

This looked very similar to PR68117 - the invalid phi arg, and block
not in  block-tree, even if not the invalid tree code - and as the
posters there were having success with valgrind, whereas I wasn't, I
watched and waited. First observation is that it triggers the asserts
you suggested in comment 27
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D68117#c27). Indeed, it
fails those asserts, even after the patch in comment 25 (committed as
r230594) to tree-ssa.c (delete_tree_ssa), and the patch in comment#35
to function.c (set_cfun), and the patch in comment#30 (committed as
r230424) to cfgexpand.c (pass_expand::execute).

The patch in comment#29 (which replaces the asserts in comment#27 with
empties), however, fixes the problem - although I can't rule out, that
that's just by changing the memory allocation pattern.

Moreover, if I take those patches and rebase onto a recent trunk (onto
which the delete_tree_ssa and pass_expand::execute patches have
already been committed), i.e. just adding the assertions from
comment#27 and the call in function.c (set_cfun) - the assertions are
still failing on my testcase, whereas the original (assertionless)
failure was very erratic, and had since disappeared/been hidden on
trunk. Indeed those same assertions break in a few other places (even
in a --disable-bootstrap build after gcc/xgcc is built), so I feel I
have a good chance of producing a reasonable assertion-breaking
testcase.

So I have to ask, how sure are you that those assertions are(/should
be!) "correct"? :)

--Alan

Re: [Patch, vrp] Allow VRP type conversion folding only for widenings upto word mode


On 11/20/2015 10:04 AM, Senthil Kumar Selvaraj wrote:

On Thu, Nov 19, 2015 at 10:31:41AM -0700, Jeff Law wrote:

On 11/18/2015 11:20 PM, Senthil Kumar Selvaraj wrote:

On Wed, Nov 18, 2015 at 09:36:21AM +0100, Richard Biener wrote:


Otherwise ok.


See modified patch below. If you think vrp98.c is unnecessary, feel free
to dump it :).

If ok, could you commit it for me please? I don't have commit access.

Regards
Senthil

gcc/ChangeLog
2015-11-19  Senthil Kumar Selvaraj  

* tree.h (desired_pro_or_demotion_p): New function.
* tree-vrp.c (simplify_cond_using_ranges): Call it.

gcc/testsuite/ChangeLog
2015-11-19  Senthil Kumar Selvaraj  

* gcc.dg/tree-ssa/vrp98.c: New testcase.
* gcc.target/avr/uint8-single-reg.c: New testcase.

I went ahead and committed this as-is.

I do think the vrp98 testcase is useful as it verifies that VRP is doing
what we want in a target independent way.  It's a good complement to the AVR
specific testcase.


I see the same problem on gcc-5-branch as well. Would it be ok to
backport the fix to that branch as well?
That's a call for the release managers.  I typically don't backport 
anything expect ICE or incorrect code generation fixes as I tend to be 
very conservative on what goes onto a release branch.


Jakub, Richi or Joseph would need to ack into a release branch.

jeff

Re: PATCH to shorten_compare -Wtype-limits handling


On 11/19/2015 12:02 PM, Manuel López-Ibáñez wrote:

On 19 November 2015 at 17:54, Jeff Law  wrote:

But there were a couple of patches from you some time ago, for
example: http://permalink.gmane.org/gmane.comp.gcc.patches/343476

What happened with those?


On hold pending fixing the type-limits warning placement.  Essentially that
has to be untangled first.


Could you elaborate on this? Or point me to some previous email
thread? (I don't have enough free time to follow the mailing list
anymore, sorry).
I don't have the thread handy, but essentially the operand shortening 
code has warning bits inside it.  As a result pulling out functionality 
(such as operand canonicalization) and putting into match.pd results in 
missing a warning -- ie a regression.


So we have to detangle the operand shortening from warning detection. 
Kai's idea was to first make the shortening code "pure" in the sense 
that it would have no side effects other than to generate the warnings. 
 Canonicalization and other transformations would still occur 
internally, but not be reflected in the IL.


That was the overall plan and he posted a patch for that.  But that 
patch didn't do the due diligence to verify that once the shortening 
code was made "pure" that we didn't regress on the quality of the code 
we generated.


jeff

Re: PATCH to shorten_compare -Wtype-limits handling

2015-11-20 Thread Manuel López-Ibáñez

On 20 November 2015 at 17:42, Jeff Law  wrote:
> So we have to detangle the operand shortening from warning detection. Kai's
> idea was to first make the shortening code "pure" in the sense that it would
> have no side effects other than to generate the warnings.  Canonicalization
> and other transformations would still occur internally, but not be reflected
> in the IL.
>
> That was the overall plan and he posted a patch for that.  But that patch
> didn't do the due diligence to verify that once the shortening code was made
> "pure" that we didn't regress on the quality of the code we generated.

I thought that the original plan was to make the warning code also use
match.pd. That is, that all folding, including FE folding, will be
match.pd-based. Has this changed?

Cheers,

Manuel.

Merge from trunk to gccgo branch

I merged trunk revision 230657 to the gccgo branch.

Ian

Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1

On Tue, Nov 10, 2015 at 11:54:00AM -0600, Evandro Menezes wrote:
>2015-11-10  Evandro Menezes 
> 
>gcc/
> 
>* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
>* config/aarch64/aarch64.md: Include "exynos-m1.md".
>* config/arm/arm-cores.def: Use the Exynos M1 sched model.
>* config/arm/arm.md: Include "exynos-m1.md".
>* config/arm/arm-tune.md: Regenerated.
>* config/arm/exynos-m1.md: New file.
> 
> This patch adds the scheduling model for Exynos M1.  It depends on
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html
> 
> Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.
> 
> Please, commit if it's alright.


> From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001
> From: Evandro Menezes 
> Date: Mon, 9 Nov 2015 17:18:52 -0600
> Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1
> 
> gcc/
>   * config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
>   * config/aarch64/aarch64.md: Include "exynos-m1.md".

These changes are fine.

>   * config/arm/arm-cores.def: Use the Exynos M1 sched model.
>   * config/arm/arm.md: Include "exynos-m1.md".
>   * config/arm/arm-tune.md: Regenerated.

These changes need an ack from an ARM reviewer.

>   * config/arm/exynos-m1.md: New file.

I have a few comments on this model.

> +;; The Exynos M1 core is modeled as a triple issue pipeline that has
> +;; the following functional units.
> +
> +(define_automaton "exynos_m1_gp")
> +(define_automaton "exynos_m1_ls")
> +(define_automaton "exynos_m1_fp")
> +
> +;; 1.  Two pipelines for simple integer operations: A, B
> +;; 2.  One pipeline for simple or complex integer operations: C
> +
> +(define_cpu_unit "em1_xa, em1_xb, em1_xc" "exynos_m1_gp")
> +
> +(define_reservation "em1_alu" "(em1_xa | em1_xb | em1_xc)")
> +(define_reservation "em1_c" "em1_xc")

Is this extra reservation useful, can we not just use em1_xc directly?

> +;; 3.  Two asymmetric pipelines for Neon and FP operations: F0, F1
> +
> +(define_cpu_unit "em1_f0, em1_f1" "exynos_m1_fp")
> +
> +(define_reservation "em1_fmac" "em1_f0")
> +(define_reservation "em1_fcvt" "em1_f0")
> +(define_reservation "em1_nalu" "(em1_f0 | em1_f1)")
> +(define_reservation "em1_nalu0" "em1_f0")
> +(define_reservation "em1_nalu1" "em1_f1")
> +(define_reservation "em1_nmisc" "em1_f0")
> +(define_reservation "em1_ncrypt" "em1_f0")
> +(define_reservation "em1_fadd" "em1_f1")
> +(define_reservation "em1_fvar" "em1_f1")
> +(define_reservation "em1_fst" "em1_f1")

Same comment here, does this not just obfuscate the interaction between
instruction classes in the description. I'm not against doing it this way
if you prefer, but it would seem to reduce readability to me. I think there
is also an argument that this increases readability, so it is your choice.

> +
> +;; 4.  One pipeline for branch operations: BX
> +
> +(define_cpu_unit "em1_bx" "exynos_m1_gp")
> +
> +(define_reservation "em1_br" "em1_bx")
> +

And again?

> +;; 5.  One AGU for loads: L
> +;; One AGU for stores and one pipeline for stores: S, SD
> +
> +(define_cpu_unit "em1_lx" "exynos_m1_ls")
> +(define_cpu_unit "em1_sx, em1_sd" "exynos_m1_ls")
> +
> +(define_reservation "em1_ld" "em1_lx")
> +(define_reservation "em1_st" "(em1_sx + em1_sd)")
> +
> +;; Common occurrences
> +(define_reservation "em1_sfst" "(em1_fst + em1_st)")
> +(define_reservation "em1_lfst" "(em1_fst + em1_ld)")
> +
> +;; Branches
> +;;
> +;; No latency as there is no result
> +;; TODO: Unconditional branches use no units;
> +;; conditional branches add the BX unit;
> +;; indirect branches add the C unit.
> +(define_insn_reservation "exynos_m1_branch" 0
> +  (and (eq_attr "tune" "exynosm1")
> +   (eq_attr "type" "branch"))
> +  "em1_br")
> +
> +(define_insn_reservation "exynos_m1_call" 1
> +  (and (eq_attr "tune" "exynosm1")
> +   (eq_attr "type" "call"))
> +  "em1_alu")
> +
> +;; Basic ALU
> +;;
> +;; Simple ALU without shift, non-predicated
> +(define_insn_reservation "exynos_m1_alu" 1
> +  (and (eq_attr "tune" "exynosm1")
> +   (and (not (eq_attr "predicated" "yes"))

(and (eq_attr "predicated" "no")) ?

Likewise throughout the file? Again this is your choice.

This is OK from the AArch64 side, let me know if you plan to change any
of the above, otherwise I'll commit it (or someone else can commit it)
after I see an OK from an ARM reviewer.

Thanks,
James

Re: [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)

2015-11-20 Thread Jonathan Wakely


On 20/11/15 18:28 +0100, Jan Kratochvil wrote:

On Thu, 19 Nov 2015 21:21:51 +0100, Jan Kratochvil wrote:

* python/hook.in: Call register_libstdcxx_printers.
* python/libstdcxx/v6/__init__.py: Wrap it to
register_libstdcxx_printers.

[...]

-import libstdcxx.v6
+# Call a function as a plain import would not execute body of the included file
+# on repeated reloads of this object file.
+from libstdcxx.v6 import register_libstdcxx_printers
+register_libstdcxx_printers(gdb.current_objfile())


Jonathan Wakely mentioned:
[libstdc++] Refactor python/hook.in
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02518.html
From: Siva Chandra 
Message-ID: 

https://gcc.gnu.org/r215726

That change introduced this regression.  This patch does not undo it, goal of
the patch by Siva Chandra - minimizing hook.in - stays achieved.


OK, thanks for checking, Jan.

The patch is OK for trunk and gcc-5-branch.

Re: PATCH to shorten_compare -Wtype-limits handling

2015-11-20 Thread Manuel López-Ibáñez

On 20 November 2015 at 16:10, Martin Sebor  wrote:
>>> Hmm, it looks like using expansion_point_if_in_system_header might avoid
>>> the first issue you mention.
>>
>>
>> Thus.
>
>
> Great, thanks! (I'll have to remember the trick for my own use!)

I added this to  https://gcc.gnu.org/wiki/DiagnosticsGuidelines under
"Locations" for future reference. I hope others would do the same in
the future, so the info is kept up-to-date.

Cheers,

Manuel.

Re: [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)

2015-11-20 Thread Jan Kratochvil

On Thu, 19 Nov 2015 21:21:51 +0100, Jan Kratochvil wrote:
>   * python/hook.in: Call register_libstdcxx_printers.
>   * python/libstdcxx/v6/__init__.py: Wrap it to
>   register_libstdcxx_printers.
[...]
> -import libstdcxx.v6
> +# Call a function as a plain import would not execute body of the included 
> file
> +# on repeated reloads of this object file.
> +from libstdcxx.v6 import register_libstdcxx_printers
> +register_libstdcxx_printers(gdb.current_objfile())

Jonathan Wakely mentioned:
[libstdc++] Refactor python/hook.in
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02518.html
From: Siva Chandra 
Message-ID: 

https://gcc.gnu.org/r215726

That change introduced this regression.  This patch does not undo it, goal of
the patch by Siva Chandra - minimizing hook.in - stays achieved.


Jan

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-11-20 Thread Richard Earnshaw

On 20/11/15 08:31, Bin.Cheng wrote:
> On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng  wrote:
>> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh
>>  wrote:
>>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote:
 Hi,
 GIMPLE IVO needs to call backend interface to calculate costs for addr
 expressions like below:
FORM1: "r73 + r74 + 16380"
FORM2: "r73 << 2 + r74 + 16380"

 They are invalid address expression on AArch64, so will be legitimized by
 aarch64_legitimize_address.  Below are what we got from that function:

 For FORM1, the address expression is legitimized into below insn sequence
 and rtx:
r84:DI=r73:DI+r74:DI
r85:DI=r84:DI+0x3000
r83:DI=r85:DI
"r83 + 4092"

 For FORM2, the address expression is legitimized into below insn sequence
 and rtx:
r108:DI=r73:DI<<0x2
r109:DI=r108:DI+r74:DI
r110:DI=r109:DI+0x3000
r107:DI=r110:DI
"r107 + 4092"

 So the costs computed are 12/16 respectively.  The high cost prevents IVO
 from choosing right candidates.  Besides cost computation, I also think the
 legitmization is bad in terms of code generation.
 The root cause in aarch64_legitimize_address can be described by it's
 comment:
/* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST),
   where mask is selected by alignment and size of the offset.
   We try to pick as large a range for the offset as possible to
   maximize the chance of a CSE.  However, for aligned addresses
   we limit the range to 4k so that structures with different sized
   elements are likely to use the same base.  */
 I think the split of CONST is intended for REG+CONST where the const offset
 is not in the range of AArch64's addressing modes.  Unfortunately, it
 doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<>>
>>> Thanks for the description, it made the patch very easy to review. I only
>>> have a style comment.
>>>
 Bootstrap & test on AArch64 along with other patch.  Is it OK?

 2015-11-04  Bin Cheng  
   Jiong Wang  

   * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle
   address expressions like REG+REG+CONST and REG+NON_REG+CONST.
>>>
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 5c8604f..47875ac 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
 */, machine_mode mode)
  {
HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
HOST_WIDE_INT base_offset;
 +  rtx op0 = XEXP (x,0);
 +
 +  if (GET_CODE (op0) == PLUS)
 + {
 +   rtx op0_ = XEXP (op0, 0);
 +   rtx op1_ = XEXP (op0, 1);
>>>
>>> I don't see this trailing _ on a variable name in many places in the source
>>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend.
>>> Can we pick a different name for op0_ and op1_?
>>>
 +
 +   /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will
 +  reach here, the 'CONST' may be valid in which case we should
 +  not split.  */
 +   if (REG_P (op0_) && REG_P (op1_))
 + {
 +   machine_mode addr_mode = GET_MODE (op0);
 +   rtx addr = gen_reg_rtx (addr_mode);
 +
 +   rtx ret = plus_constant (addr_mode, addr, offset);
 +   if (aarch64_legitimate_address_hook_p (mode, ret, false))
 + {
 +   emit_insn (gen_adddi3 (addr, op0_, op1_));
 +   return ret;
 + }
 + }
 +   /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST)
 +  will reach here.  If (PLUS REG, NON_REG) is valid addr expr,
 +  we split it into Y=REG+CONST, Y+NON_REG.  */
 +   else if (REG_P (op0_) || REG_P (op1_))
 + {
 +   machine_mode addr_mode = GET_MODE (op0);
 +   rtx addr = gen_reg_rtx (addr_mode);
 +
 +   /* Switch to make sure that register is in op0_.  */
 +   if (REG_P (op1_))
 + std::swap (op0_, op1_);
 +
 +   rtx ret = gen_rtx_fmt_ee (PLUS,

Re: [PATCH] Add clang-format config to contrib folder


On 11/20/2015 04:33 AM, Martin Liška wrote:

On 11/19/2015 06:43 PM, Pedro Alves wrote:

On 11/19/2015 12:54 PM, Martin Liška wrote:

  ContinuationIndentWidth: 2
-ForEachMacros: 
['_FOR_EACH','_FOR_EACH_1','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FOR_EACH_IMM_USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','F!

O!

  R!

  _EACH_OBJE
CT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','_FOR_EACH_X','_FOR_EACH_X_1','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR']

+ForEachMacros: 
['FOR_ALL_BB_FN','FOR_ALL_EH_REGION','FOR_ALL_EH_REGION_AT','FOR_ALL_EH_REGION_FN','FOR_ALL_INHERITED_FIELDS','FOR_ALL_PREDICATES','FOR_BB_BETWEEN','FOR_BB_INSNS','FOR_BB_INSNS_REVERSE','FOR_BB_INSNS_REVERSE_SAFE','FOR_BB_INSNS_SAFE','FOR_BODY','FOR_COND','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','!

F!

  O!

  R_EACH_IMM
_USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR_EACH_OBJECT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','FOR_EXPR','FOR_INIT_STMT','FOR_SCOPE']

  IndentCaseLabels: false


I don't know the tool's syntax here, but it's usually much better to
keep entries like these one per line (or at least a few only).  Then
changes to the list are _much_ easier to review and maintain, like:

...
  ForEachMacros: [
  'FOR_ALL_BB_FN',
  'FOR_ALL_EH_REGION',
-'FOR_ALL_EH_REGION_AT',
-'FOR_ALL_EH_REGION',
+'FOR_ALL_EH_WHATNOT,
+'FOR_ALL_EH_WHATNOT_AT,
  'FOR_ALL_INHERITED_FIELDS',
  'FOR_ALL_PREDICATES',
  'FOR_BB_BETWEEN',
  'FOR_BB_INSNS',
...

vs a single-line hunk that's basically unintelligible.

Thanks,
Pedro Alves



Hi Pedro.

Fully agree with you, there's suggested patch.
Hope I can install the patch for trunk?
Yes.  In fact, I think that as long as you're moving towards coercing 
clang-format to handle GNU style that you should have a free reign to 
make changes to this config file.


jeff

Re: [PATCH 4/4][AArch64] Add cost model for Exynos M1

On Thu, Nov 19, 2015 at 04:06:17PM -0600, Evandro Menezes wrote:
> On 11/05/2015 06:09 PM, Evandro Menezes wrote:
> >2015-10-25  Evandro Menezes 
> >
> >   gcc/
> >
> >   * config/aarch64/aarch64-cores.def: Use the Exynos M1 cost model.
> >   * config/aarch64/aarch64.c (exynosm1_addrcost_table): New
> >variable.
> >   (exynosm1_regmove_cost): Likewise.
> >   (exynosm1_vector_cost): Likewise.
> >   (exynosm1_tunings): Likewise.
> >   * config/arm/aarch-cost-tables.h (exynosm1_extra_costs): Likewise.
> >   * config/arm/arm.c (arm_exynos_m1_tune): Likewise.
> >
> >This patch adds the cost model for Exynos M1.  This patch depends
> >on a couple of previous patches though,
> >https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00505.html and
> >https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00538.html
> >
> >Please, commit if it's alright.
> 
> Ping.

This is OK from an AArch64 perspective. Please wait for an OK from an
ARM reviewer.

Thanks,
James

Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers


On 11/20/2015 02:19 AM, Nick Clifton wrote:

Hi Jeff,


The code there would solve this problem, but the approach is is overly
cautious, since it disables the optimization for all extensions that
increase the number of hard registers used.  Some of these will be
viable candidates, provided that the extra hard registers are no used.
(This is certainly true for the RL78, where the (patched) optimization
does improve code, even though the widening does use extra registers).

Nick -- can you pass along your testcode?


Sure - this is for the RL78 toolchain.  In theory the problem is
generic, but I have not tested other toolchains.
Right.  It just really helps me to have something I can poke at -- it's 
just the type of learner I am.  I tried to trip the test in that #if 
several times last year and never came up with anything.  So naturally 
if we can trip it with the rl78 I want to dig into it.





Compile the gcc.c-torture/execute/pr42833.c or
gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free
also specified on the command line.  Without -free these tests pass.
With -free they fail.

Perfect.  Building rl78-elf cross bits as I type...

jeff

Re: [PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class

2015-11-20 Thread H.J. Lu

On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener
 wrote:
> On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu  wrote:
>> Empty record should be returned and passed the same way in C and C++.
>> This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which
>> defaults to return false.  For C++, LANG_HOOKS_EMPTY_RECORD_P is defined
>> to is_really_empty_class, which returns true for C++ empty classes.  For
>> LTO, we stream out a bit to indicate if a record is empty and we store
>> it in TYPE_LANG_FLAG_0 when streaming in.  get_ref_base_and_extent is
>> changed to set bitsize to 0 for empty records.  Middle-end and x86
>> backend are updated to ignore empty records for parameter passing and
>> function value return.  Other targets may need similar changes.
>
> Please avoid a new langhook for this and instead claim a bit in 
> tree_type_common
> like for example restrict_flag (double-check it is unused for non-pointers).

There is no bit in tree_type_common I can overload.  restrict_flag is
checked for non-pointers to issue an error when it is used on non-pointers:

/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38:
error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’
   typedef typename T::L __restrict__ r;// { dg-error "'__restrict__'
qualifiers cannot" "" }

Should I add a bit to tree_type_common?  It may crease the size
of tree_type_common by 4 bytes since there is unused bits in
tree_type_common.

> I don't like that you need to modify targets - those checks should be done
> in the caller (which may just use a new wrapper with the logic and then
> dispatching to the actual hook).

I can do that.

> Why do you need do adjust get_ref_base_and_extent?

get_ref_base_and_extent is changed to set bitsize to 0 for empty records
so that when ref_maybe_used_by_call_p_1 calls get_ref_base_and_extent to
get 0 as the maximum size on empty record.  Otherwise, find_tail_calls
won't perform tail call optimization for functions with empty record
parameters, as in

struct dummy { };
struct true_type { struct dummy i; };

extern true_type y;
extern void xxx (true_type c);

void
yyy (void)
{
  xxx (y);
}

-- 
H.J.

Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2015-11-20 Thread Ilya Verbin

On Wed, Dec 10, 2014 at 01:48:21 +0300, Ilya Verbin wrote:
> On 09 Dec 14:59, Richard Biener wrote:
> > On Mon, 8 Dec 2014, Ilya Verbin wrote:
> > > Unfortunately, this fix was not general enough.
> > > There might be cases when mixed object files get into lto-wrapper, ie 
> > > some of
> > > them contain only LTO sections, some contain only offload sections, and 
> > > some
> > > contain both.  But when lto-wrapper will pass all these files to 
> > > recompilation,
> > > the compiler might crash (it depends on the order of input files), since 
> > > in
> > > read_cgraph_and_symbols it expects that *all* input files contain IR 
> > > section of
> > > given type.
> > > This patch splits input objects from argv into lto_argv and offload_argv, 
> > > so
> > > that all files in arrays contain corresponding IR.
> > > Similarly, in lto-plugin, it was bad idea to add objects, which contain 
> > > offload
> > > IR without LTO, to claimed_files, since this may corrupt a resolution 
> > > file.
> > > 
> > > Tested on various combinations of files with/without -flto and 
> > > with/without
> > > offload, using trunk ld and gold, also tested on ld without plugin 
> > > support.
> > > Bootstrap and make check passed on x86_64-linux and i686-linux.  Ok for 
> > > trunk?
> > 
> > Did you check that bootstrap-lto still works?  Ok if so.
> 
> Yes, bootstrap-lto passed.
> Committed revision 218543.

I don't know how I missed this a year ago, but mixing of LTO objects with
offloading-without-LTO objects still doesn't work :(
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463 filed about that.
Any thoughts how to fix this?

Thanks,
  -- Ilya

Re: [PATCH, VECTOR ABI] Add attribute((simd)) to GCC.


On 11/20/2015 05:15 AM, Kyrill Tkachov wrote:

Hi Kirill,

On 18/11/15 14:11, Kirill Yukhin wrote:

Hello Andreas, Devid.

On 18 Nov 10:45, Andreas Schwab wrote:

Kirill Yukhin  writes:


diff --git a/gcc/testsuite/c-c++-common/attr-simd.c
b/gcc/testsuite/c-c++-common/attr-simd.c
new file mode 100644
index 000..b4eda34
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-simd.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-optimized" } */
+
+__attribute__((__simd__))
+extern
+int simd_attr (void)
+{
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector"
"optimized" } } */

On ia64:

FAIL: c-c++-common/attr-simd.c  -Wc++-compat   scan-tree-dump
optimized "simd_attr[ \\t]simdclone|vector"
FAIL: c-c++-common/attr-simd.c  -Wc++-compat   scan-tree-dump
optimized "simd_attr2[ \\t]simdclone|vector"

$ grep simd_attr attr-simd.c.194t.optimized
;; Function simd_attr (simd_attr, funcdef_no=0, decl_uid=1389,
cgraph_uid=0, symbol_order=0)
simd_attr ()
;; Function simd_attr2 (simd_attr2, funcdef_no=1, decl_uid=1392,
cgraph_uid=1, symbol_order=1)
simd_attr2 ()

As far as vABI is supported on x86_64/i?86 only, I am going to enable
mentioned `scan-tree-dump' only
for these targets. This should cure both IA64 and Power.

Concerning attr-simd-3.c. It is known issue: PR68158.
And I believe this test should work everywhere as far as PR is resolved.
I'll put xfail into the test.
Which will lead to (in g++.log):
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__'
attribute does not apply to types\
  [-Wattributes]^M
output is:
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__'
attribute does not apply to types\
  [-Wattributes]^M

XFAIL: c-c++-common/attr-simd-3.c  -std=gnu++98 PR68158 (test for
errors, line 5)
FAIL: c-c++-common/attr-simd-3.c  -std=gnu++98 (test for excess errors)
Excess errors:
gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__'
attribute does not apply to types\
  [-Wattributes]

Patch in the bottom.

gcc/tessuite/
* c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error.


This test fails on bare-metal targets that don't support -fcilkplus or
-pthread.
Would you consider moving them to the cilkplus testing directory or
adding an appropriate
effective target check?
I think an effective-target-check is the most appropriate.  The whole 
point here is to make that attribute independent of Cilk support.


jeff

RFA: PATCH to match.pd for c++/68385

In this bug, we hit the (A & sign-bit) != 0 -> A < 0 transformation. 
Because of delayed folding, the operands aren't fully folded yet, so we 
have NOP_EXPRs around INTEGER_CSTs, and so calling wi::only_sign_bit_p 
ICEs.  We've been seeing several similar bugs, where code calls 
integer_zerop and therefore assumes that they have an INTEGER_CST, but 
in fact integer_zerop does STRIP_NOPS.


This patch changes the pattern to only match if the operand is actually 
an INTEGER_CST.  Alternatively we could call tree_strip_nop_conversions 
on the operand, but I would expect that to have issues when the 
conversion changes the signedness of the type.


OK if testing passes?
commit e7b45ed6775c88c6d48c5863738ba0db2e38fc5e
Author: Jason Merrill 
Date:   Fri Nov 20 14:40:35 2015 -0500

	PR c++/68385

	* match.pd: Don't assume that integer_pow2p implies INTEGER_CST.

diff --git a/gcc/match.pd b/gcc/match.pd
index e86cc8b..1981ae7 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2232,6 +2232,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (TYPE_PRECISION (TREE_TYPE (@0))
 	   == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0
&& element_precision (@2) >= element_precision (@0)
+   && TREE_CODE (@1) == INTEGER_CST
&& wi::only_sign_bit_p (@1, element_precision (@0)))
(with { tree stype = signed_type_for (TREE_TYPE (@0)); }
 (ncmp (convert:stype @0) { build_zero_cst (stype); })

Re: [PATCH] (Partial) Implementation of simplificaiton of CSHIFT

2015-11-20 Thread Steve Kargl

On Thu, Nov 19, 2015 at 04:58:36PM -0800, Steve Kargl wrote:
> 
> 2015-11-19  Steven G. Kargl  
> 
>   * intrinsic.h: Prototype for gfc_simplify_cshift
>   * intrinsic.c (add_functions): Use gfc_simplify_cshift.
>   * simplify.c (gfc_simplify_cshift): Implement simplification of CSHIFT.
>   (gfc_simplify_spread): Remove a FIXME and add error condition.
>  
> 2015-11-19  Steven G. Kargl  
> 
>   * gfortran.dg/simplify_cshift_1.f90: New test.
> 

I've attached an updated patch.  The changes consists of
1) better/more comments
2) re-organize code to reduce copying of the array.
3) add optimization for a left/right shift that 
   returns the original array.
4) Don't leak memory.

-- 
Steve
Index: gcc/fortran/intrinsic.c
===
--- gcc/fortran/intrinsic.c	(revision 230585)
+++ gcc/fortran/intrinsic.c	(working copy)
@@ -1659,9 +1659,11 @@ add_functions (void)
 
   make_generic ("count", GFC_ISYM_COUNT, GFC_STD_F95);
 
-  add_sym_3 ("cshift", GFC_ISYM_CSHIFT, CLASS_TRANSFORMATIONAL, ACTUAL_NO, BT_REAL, dr, GFC_STD_F95,
-	 gfc_check_cshift, NULL, gfc_resolve_cshift,
-	 ar, BT_REAL, dr, REQUIRED, sh, BT_INTEGER, di, REQUIRED,
+  add_sym_3 ("cshift", GFC_ISYM_CSHIFT, CLASS_TRANSFORMATIONAL, ACTUAL_NO,
+	 BT_REAL, dr, GFC_STD_F95,
+	 gfc_check_cshift, gfc_simplify_cshift, gfc_resolve_cshift,
+	 ar, BT_REAL, dr, REQUIRED,
+	 sh, BT_INTEGER, di, REQUIRED,
 	 dm, BT_INTEGER, ii, OPTIONAL);
 
   make_generic ("cshift", GFC_ISYM_CSHIFT, GFC_STD_F95);
Index: gcc/fortran/intrinsic.h
===
--- gcc/fortran/intrinsic.h	(revision 230585)
+++ gcc/fortran/intrinsic.h	(working copy)
@@ -271,6 +271,7 @@ gfc_expr *gfc_simplify_conjg (gfc_expr *
 gfc_expr *gfc_simplify_cos (gfc_expr *);
 gfc_expr *gfc_simplify_cosh (gfc_expr *);
 gfc_expr *gfc_simplify_count (gfc_expr *, gfc_expr *, gfc_expr *);
+gfc_expr *gfc_simplify_cshift (gfc_expr *, gfc_expr *, gfc_expr *);
 gfc_expr *gfc_simplify_dcmplx (gfc_expr *, gfc_expr *);
 gfc_expr *gfc_simplify_dble (gfc_expr *);
 gfc_expr *gfc_simplify_digits (gfc_expr *);
Index: gcc/fortran/simplify.c
===
--- gcc/fortran/simplify.c	(revision 230585)
+++ gcc/fortran/simplify.c	(working copy)
@@ -1789,6 +1789,99 @@ gfc_simplify_count (gfc_expr *mask, gfc_
 
 
 gfc_expr *
+gfc_simplify_cshift (gfc_expr *array, gfc_expr *shift, gfc_expr *dim)
+{
+  int dm;
+  gfc_expr *a;
+
+  /* DIM is only useful for rank > 1, but deal with it here as one can
+ set DIM = 1 for rank = 1.  */
+  if (dim)
+{
+  if (!gfc_is_constant_expr (dim))
+	  return NULL;
+  dm = mpz_get_si (dim->value.integer);
+}
+  else
+dm = 1;
+
+  /* Copy array into 'a', simplify it, and then test for a constant array.
+ Unexpected expr_type cause an ICE.   */
+  switch (array->expr_type)
+{
+  case EXPR_VARIABLE:
+  case EXPR_ARRAY:
+	a = gfc_copy_expr (array);
+	gfc_simplify_expr (a, 0);
+	if (!is_constant_array_expr (a))
+	  {
+	gfc_free_expr (a);
+	return NULL;
+	  }
+	break;
+  default:
+	gcc_unreachable ();
+}
+
+  if (a->rank == 1)
+{
+  gfc_constructor *ca, *cr;
+  gfc_expr *result;
+  mpz_t size;
+  int i, j, shft, sz;
+
+  if (!gfc_is_constant_expr (shift))
+	{
+	  gfc_free_expr (a);
+	  return NULL;
+	}
+
+  shft = mpz_get_si (shift->value.integer);
+
+  /*  Case (i):  If ARRAY has rank one, element i of the result is
+	  ARRAY (1 + MODULO (i + SHIFT  1, SIZE (ARRAY))).  */
+
+  mpz_init (size);
+  gfc_array_size (a, );
+  sz = mpz_get_si (size);
+  mpz_clear (size);
+
+  /* Special case: rank 1 array with no shift or a complete shift to
+	 the original order!  */
+  if (shft == 0 || shft == sz || shft == 1 - sz)
+	return a;
+
+  /* Adjust shft to deal with right or left shifts. */
+  shft = shft < 0 ? 1 - shft : shft;
+
+  result = gfc_copy_expr (a);
+  cr = gfc_constructor_first (result->value.constructor);
+  for (i = 0; i < sz; i++, cr = gfc_constructor_next (cr))
+	{
+	  j = (i + shft) % sz;
+	  ca = gfc_constructor_first (a->value.constructor);
+	  while (j-- > 0)
+	ca = gfc_constructor_next (ca);
+	  cr->expr = gfc_copy_expr (ca->expr);
+	}
+
+  gfc_free_expr (a);
+  return result;
+}
+  else
+{
+  /* FIXME: Deal with rank > 1 arrays.  For now, don't leak memory
+	 and exit with an error message.  */
+  gfc_free_expr (a);
+  gfc_error ("Simplification of CSHIFT with an array with rank > 1 "
+	 "no yet support");
+}
+
+  return NULL;
+}
+
+
+gfc_expr *
 gfc_simplify_dcmplx (gfc_expr *x, gfc_expr *y)
 {
   return simplify_cmplx ("DCMPLX", x, y, gfc_default_double_kind);
@@ -6089,10 +6182,11 @@ gfc_simplify_spread (gfc_expr *source, g
 	}
 }
   else
-/* FIXME: Returning here

Re: RFA: PATCH to match.pd for c++/68385


On 11/20/2015 02:58 PM, Jason Merrill wrote:

OK if testing passes?


Which it did.

Jason

[PATCH] Fix up reduction-1{1,2} testcases (PR middle-end/68221)

Hi!

If C/C++ array section reductions have non-zero (positive) bias, it is
implemented by declaring a smaller private array and subtracting the bias
from the start of the private array (because valid code may only dereference
elements from bias onwards).  But, this isn't something that is kosher in
C/C++ pointer arithmetics and the alias oracle seems to get upset on that.
So, the following patch fixes that by performing the subtraction on integral
type instead of p+ -bias.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2015-11-20  Jakub Jelinek  

PR middle-end/68221
* omp-low.c (lower_rec_input_clauses): If C/C++ array reduction
has non-zero bias, subtract it in integer type instead of
pointer plus of negated bias.

* testsuite/libgomp.c/reduction-11.c: Remove xfail.
* testsuite/libgomp.c/reduction-12.c: Likewise.
* testsuite/libgomp.c++/reduction-11.C: Likewise.
* testsuite/libgomp.c++/reduction-12.C: Likewise.

--- gcc/omp-low.c.jj2015-11-20 12:56:17.0 +0100
+++ gcc/omp-low.c   2015-11-20 13:44:29.080374051 +0100
@@ -,11 +,13 @@ lower_rec_input_clauses (tree clauses, g
 
  if (!integer_zerop (bias))
{
- bias = fold_convert_loc (clause_loc, sizetype, bias);
- bias = fold_build1_loc (clause_loc, NEGATE_EXPR,
- sizetype, bias);
- x = fold_build2_loc (clause_loc, POINTER_PLUS_EXPR,
-  TREE_TYPE (x), x, bias);
+ bias = fold_convert_loc (clause_loc, pointer_sized_int_node,
+  bias);
+ yb = fold_convert_loc (clause_loc, pointer_sized_int_node,
+x);
+ yb = fold_build2_loc (clause_loc, MINUS_EXPR,
+   pointer_sized_int_node, yb, bias);
+ x = fold_convert_loc (clause_loc, TREE_TYPE (x), yb);
  yb = create_tmp_var (ptype, name);
  gimplify_assign (yb, x, ilist);
  x = yb;
--- libgomp/testsuite/libgomp.c/reduction-11.c.jj   2015-11-05 
16:03:53.0 +0100
+++ libgomp/testsuite/libgomp.c/reduction-11.c  2015-11-20 13:38:24.448520879 
+0100
@@ -1,4 +1,4 @@
-/* { dg-do run { xfail *-*-* } } */
+/* { dg-do run } */
 
 char z[10] = { 0 };
 
--- libgomp/testsuite/libgomp.c/reduction-12.c.jj   2015-11-05 
16:03:53.0 +0100
+++ libgomp/testsuite/libgomp.c/reduction-12.c  2015-11-20 13:38:34.565378078 
+0100
@@ -1,4 +1,4 @@
-/* { dg-do run { xfail *-*-* } } */
+/* { dg-do run } */
 
 struct A { int t; };
 struct B { char t; };
--- libgomp/testsuite/libgomp.c++/reduction-11.C.jj 2015-11-05 
16:03:53.0 +0100
+++ libgomp/testsuite/libgomp.c++/reduction-11.C2015-11-20 
13:37:53.921951766 +0100
@@ -1,4 +1,4 @@
-// { dg-do run { xfail *-*-* } }
+// { dg-do run }
 
 char z[10] = { 0 };
 
--- libgomp/testsuite/libgomp.c++/reduction-12.C.jj 2015-11-05 
16:03:53.0 +0100
+++ libgomp/testsuite/libgomp.c++/reduction-12.C2015-11-20 
13:38:03.983809741 +0100
@@ -1,4 +1,4 @@
-// { dg-do run { xfail *-*-* } }
+// { dg-do run }
 
 template 
 struct A

Jakub

[PATCH] Fix debug info related ICE when inlining back fnsplit function (PR debug/66432)

Hi!

This patch fixes ICE where a parameter mentioned in a debug source bind
has its VLA type mistakenly remapped and that leads to inconsistent type
between the PARM_DECL and SSA_NAMEs derived from it.

The patch Tom posted for this can't work, because we assume that the
s=> value as well as debug_decl_args decl in that case is DECL_ORIGIN
of the PARM_DECL.  But, in this case we are replacing the DECL_ORIGIN
PARM_DECL immediately with a DEBUG_EXPR_DECL anyway, so there is no
point remapping the var we don't own (it could be in a different function
etc.).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5.3?

2015-11-20  Jakub Jelinek  

PR debug/66432
* tree-inline.c (copy_debug_stmt): If
gimple_debug_source_bind_get_value is DECL_ORIGIN of a PARM_DECL
in decl_debug_args, don't call remap_gimple_op_r on it.

* gcc.dg/debug/pr66432.c: New test.

--- gcc/tree-inline.c.jj2015-11-14 19:36:03.0 +0100
+++ gcc/tree-inline.c   2015-11-20 17:17:00.632082622 +0100
@@ -2864,8 +2864,6 @@ copy_debug_stmt (gdebug *stmt, copy_body
   else if (gimple_debug_source_bind_p (stmt))
 {
   gimple_debug_source_bind_set_var (stmt, t);
-  walk_tree (gimple_debug_source_bind_get_value_ptr (stmt),
-remap_gimple_op_r, , NULL);
   /* When inlining and source bind refers to one of the optimized
 away parameters, change the source bind into normal debug bind
 referring to the corresponding DEBUG_EXPR_DECL that should have
@@ -2889,7 +2887,10 @@ copy_debug_stmt (gdebug *stmt, copy_body
break;
  }
}
-   }  
+   }
+  if (gimple_debug_source_bind_p (stmt))
+   walk_tree (gimple_debug_source_bind_get_value_ptr (stmt),
+  remap_gimple_op_r, , NULL);
 }
 
   processing_debug_stmt = 0;
--- gcc/testsuite/gcc.dg/debug/pr66432.c.jj 2015-11-20 17:40:44.589171083 
+0100
+++ gcc/testsuite/gcc.dg/debug/pr66432.c2015-11-20 17:38:48.0 
+0100
@@ -0,0 +1,19 @@
+/* PR debug/66432 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g" } */
+
+extern void baz (const char *, const char *) __attribute__ ((__noreturn__));
+
+void
+foo (int x, int y[x][x])
+{
+  if (x < 2)
+baz ("", "");
+}
+
+void
+bar (void)
+{
+  int z[2][2] = { { 1, 2 }, { 3, 4 } };
+  foo (2, z);
+}

Jakub

[PATCH] Fix GC ICE during simd clone creation (PR middle-end/68339)

Hi!

node->get_body () can run various IPA passes and ggc_collect in them, so
it is undesirable to hold pointers to GC memory in automatic vars over it.
While I could store those vars (clone_info, clone and id) into special GTY
vars just to avoid collecting them, it seems easier to call node->get_body
() earlier.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
and 5 branch.

2015-11-20  Jakub Jelinek  

PR middle-end/68339
* omp-low.c (expand_simd_clones): Call node->get_body () before
allocating stuff in GC.

* gcc.dg/vect/pr68339.c: New test.

--- gcc/omp-low.c.jj2015-11-18 11:19:19.0 +0100
+++ gcc/omp-low.c   2015-11-20 12:56:17.075193601 +0100
@@ -18319,6 +18319,10 @@ expand_simd_clones (struct cgraph_node *
   && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE)
 return;
 
+  /* Call this before creating clone_info, as it might ggc_collect.  */
+  if (node->definition && node->has_gimple_body_p ())
+node->get_body ();
+
   do
 {
   /* Start with parsing the "omp declare simd" attribute(s).  */
--- gcc/testsuite/gcc.dg/vect/pr68339.c.jj  2015-11-20 13:10:47.756905395 
+0100
+++ gcc/testsuite/gcc.dg/vect/pr68339.c 2015-11-20 13:08:13.0 +0100
@@ -0,0 +1,17 @@
+/* PR middle-end/68339 */
+/* { dg-do compile } */
+/* { dg-options "--param ggc-min-heapsize=0 --param ggc-min-expand=0 
-fopenmp-simd" } */
+
+#pragma omp declare simd notinbranch
+int
+f1 (int x)
+{
+  return x;
+}
+
+#pragma omp declare simd notinbranch
+int
+f2 (int x)
+{
+  return x;
+}

Jakub

[PATCH] Fix PR objc/68438 (uninitialized source ranges)

2015-11-20 Thread David Malcolm

The attached patch fixes some more places in c/c-parser.c where the
src_range field of a c_expr was being left uninitialized, this time for
various Objective C constructs.

The source ranges are verified using the same unit-testing plugin used
for C expressions.  This leads to a wart, which is that it means having
a .m test file lurking below gcc.dg/plugin.  A workaround would be to
create an objc.dg/plugin subdirectory, with a new plugin.exp for testing
Objective C plugins, and having it reference the existing plugin below
gcc.dg/plugin.  That seemed like excessive complication, so I went for
the simpler approach of simply putting the .m file below gcc.dg/plugin.

I manually verified that this fixes the specific valgrind warnings
reported in the PR, and visually inspected the code for
objective-c-specific instances of uninitialized c_expr src_range values.

Bootstrapped on x86_64-pc-linux-gnu; adds 15 PASS results to
gcc.sum.

OK for trunk?

>From afdae8b15f71164d0d05e790078519b38bd674a4 Mon Sep 17 00:00:00 2001
From: David Malcolm 
Date: Fri, 20 Nov 2015 11:12:47 -0500
Subject: [PATCH] Fix PR objc/68438 (uninitialized source ranges)

gcc/c/ChangeLog:
	PR objc/68438
	* c-parser.c (c_parser_postfix_expression): Set up source ranges
	for various Objective-C constructs: Class.name syntax,
	@selector(), @protocol, @encode(), and [] message syntax.

gcc/testsuite/ChangeLog:
	PR objc/68438
	* gcc.dg/plugin/diagnostic-test-expressions-2.m: New test file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above file.
---
 gcc/c/c-parser.c   | 17 +++-
 .../gcc.dg/plugin/diagnostic-test-expressions-2.m  | 94 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp |  3 +-
 3 files changed, 110 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7b10764..18e9957 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7338,10 +7338,13 @@ c_parser_postfix_expression (c_parser *parser)
 		expr.value = error_mark_node;
 		break;
 	  }
-	component = c_parser_peek_token (parser)->value;
+	c_token *component_tok = c_parser_peek_token (parser);
+	component = component_tok->value;
+	location_t end_loc = component_tok->get_finish ();
 	c_parser_consume_token (parser);
 	expr.value = objc_build_class_component_ref (class_name, 
 			 component);
+	set_c_expr_source_range (, loc, end_loc);
 	break;
 	  }
 	default:
@@ -7816,9 +7819,11 @@ c_parser_postfix_expression (c_parser *parser)
 	}
 	  {
 	tree sel = c_parser_objc_selector_arg (parser);
+	location_t close_loc = c_parser_peek_token (parser)->location;
 	c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
    "expected %<)%>");
 	expr.value = objc_build_selector_expr (loc, sel);
+	set_c_expr_source_range (, loc, close_loc);
 	  }
 	  break;
 	case RID_AT_PROTOCOL:
@@ -7839,9 +7844,11 @@ c_parser_postfix_expression (c_parser *parser)
 	  {
 	tree id = c_parser_peek_token (parser)->value;
 	c_parser_consume_token (parser);
+	location_t close_loc = c_parser_peek_token (parser)->location;
 	c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
    "expected %<)%>");
 	expr.value = objc_build_protocol_expr (id);
+	set_c_expr_source_range (, loc, close_loc);
 	  }
 	  break;
 	case RID_AT_ENCODE:
@@ -7860,11 +7867,13 @@ c_parser_postfix_expression (c_parser *parser)
 	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
 	  break;
 	}
-	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
- "expected %<)%>");
 	  {
+	location_t close_loc = c_parser_peek_token (parser)->location;
+	c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
+ "expected %<)%>");
 	tree type = groktypename (t1, NULL, NULL);
 	expr.value = objc_build_encode_expr (type);
+	set_c_expr_source_range (, loc, close_loc);
 	  }
 	  break;
 	case RID_GENERIC:
@@ -7907,9 +7916,11 @@ c_parser_postfix_expression (c_parser *parser)
 	  c_parser_consume_token (parser);
 	  receiver = c_parser_objc_receiver (parser);
 	  args = c_parser_objc_message_args (parser);
+	  location_t close_loc = c_parser_peek_token (parser)->location;
 	  c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE,
  "expected %<]%>");
 	  expr.value = objc_build_message_expr (receiver, args);
+	  set_c_expr_source_range (, loc, close_loc);
 	  break;
 	}
   /* Else fall through to report error.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m
new file mode 100644
index 000..ed7aca3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m
@@ -0,0 +1,94 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdiagnostics-show-caret" } */
+
+/* This file is similar to

Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt


On 11/20/2015 07:08 AM, Bernd Schmidt wrote:


BZ27313 is marked as fixed by the introduction of the tree cselim pass,
thus the problem won't even be seen at RTL level.

Cool.


I'm undecided on whether cs-elim is safe wrt the store speculation vs
locks concerns raised in the thread discussing Ian's
noce_can_store_speculate_p, but that's not something we have to consider
to solve the problem at hand.

I don't think cs-elim is safe WRT locks and such in multi-threaded code.

   In particular it replaces this:

 bb0:
   if (cond) goto bb2; else goto bb1;
 bb1:
   *p = RHS;
 bb2:

   with

 bb0:
   if (cond) goto bb1; else goto bb2;
 bb1:
   condtmp' = *p;
 bb2:
   condtmp = PHI 
   *p = condtmp;


If *p is a shared memory location, then there may be another writer.  If 
that writer happens to store something in that location after the load 
of *p, but before the store to *p, then that store will get lost in the 
transformed pseudo code.


That seems to introduce a data race.  Presumably one would call the 
original code ill-formed WRT the C11/C++11 memory model since the shared 
location is not marked as such.


I'm willing to consider this an independent problem.





As far as I can tell hmmer and the 27313 testcase are unaffected at -O2
(if anything, hmmer was very slightly faster afterwards). The run wasn't
super-scientific, but I wouldn't have expected anything else given the
existence of cs-elim.
Cool.  It doesn't have to be super-scientific.  Good to see it didn't 
harm anything -- thanks for going the extra mile on this one.




Ok to do the above, removing all the bits made unnecessary (including
memory_must_be_modified_in_insn_p in alias.c)?

Yup.  Zap it.
jeff

[commit] [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)

2015-11-20 Thread Jan Kratochvil

On Fri, 20 Nov 2015 18:40:46 +0100, Jonathan Wakely wrote:
> The patch is OK for trunk and gcc-5-branch.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68448
trunk:
https://gcc.gnu.org/viewcvs/gcc?view=revision=230669
5.x:
https://gcc.gnu.org/viewcvs/gcc?view=revision=230670


Jan

libgo patch committed: use correct tool dir with gccgo

This patch from Lynn Boger fixes the go tool shipped with gccgo to use
the correct tool directory.  It also fixes the 'go tool' output to
only list known tools.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline and GCC 5 branch.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 230657)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-dfa74d975884f363c74d6a66a37b1703093fdba6
+d52835c9376985f92f35c32af5f1808239981536
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/pkg.go
===
--- libgo/go/cmd/go/pkg.go  (revision 230463)
+++ libgo/go/cmd/go/pkg.go  (working copy)
@@ -785,7 +785,11 @@ func (p *Package) load(stk *importStack,
if goTools[p.ImportPath] == toTool {
// This is for 'go tool'.
// Override all the usual logic and force it into the 
tool directory.
-   p.target = filepath.Join(gorootPkg, "tool", full)
+   if buildContext.Compiler == "gccgo" {
+   p.target = filepath.Join(runtime.GCCGOTOOLDIR, 
elem)
+   } else {
+   p.target = filepath.Join(gorootPkg, "tool", 
full)
+   }
}
if p.target != "" && buildContext.GOOS == "windows" {
p.target += ".exe"
Index: libgo/go/cmd/go/tool.go
===
--- libgo/go/cmd/go/tool.go (revision 230463)
+++ libgo/go/cmd/go/tool.go (working copy)
@@ -39,6 +39,12 @@ var (
toolN bool
 )
 
+// List of go tools found in the gccgo tool directory.
+// Other binaries could be in the same directory so don't
+// show those with the 'go tool' command.
+
+var gccgoTools = []string{"cgo", "fix", "cover", "godoc", "vet"}
+
 func init() {
cmdTool.Flag.BoolVar(, "n", false, "")
 }
@@ -146,6 +152,21 @@ func listTools() {
if toolIsWindows && strings.HasSuffix(name, 
toolWindowsExtension) {
name = name[:len(name)-len(toolWindowsExtension)]
}
-   fmt.Println(name)
+
+   // The tool directory used by gccgo will have other binaries
+   // in additions to go tools.  Only display go tools for this 
list.
+
+   if buildContext.Compiler == "gccgo" {
+   for _, tool := range gccgoTools {
+   if tool == name {
+   fmt.Println(name)
+   }
+   }
+   } else {
+
+   // Not gccgo, list all the tools found in this dir
+
+   fmt.Println(name)
+   }
}
 }

Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1

2015-11-20 Thread Evandro Menezes


On 11/20/2015 11:17 AM, James Greenhalgh wrote:

On Tue, Nov 10, 2015 at 11:54:00AM -0600, Evandro Menezes wrote:

2015-11-10  Evandro Menezes 

gcc/

* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
* config/aarch64/aarch64.md: Include "exynos-m1.md".
* config/arm/arm-cores.def: Use the Exynos M1 sched model.
* config/arm/arm.md: Include "exynos-m1.md".
* config/arm/arm-tune.md: Regenerated.
* config/arm/exynos-m1.md: New file.

This patch adds the scheduling model for Exynos M1.  It depends on
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html

Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.

Please, commit if it's alright.



 From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:18:52 -0600
Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1

gcc/
* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
* config/aarch64/aarch64.md: Include "exynos-m1.md".

These changes are fine.


* config/arm/arm-cores.def: Use the Exynos M1 sched model.
* config/arm/arm.md: Include "exynos-m1.md".
* config/arm/arm-tune.md: Regenerated.

These changes need an ack from an ARM reviewer.


* config/arm/exynos-m1.md: New file.

I have a few comments on this model.


+;; The Exynos M1 core is modeled as a triple issue pipeline that has
+;; the following functional units.
+
+(define_automaton "exynos_m1_gp")
+(define_automaton "exynos_m1_ls")
+(define_automaton "exynos_m1_fp")
+
+;; 1.  Two pipelines for simple integer operations: A, B
+;; 2.  One pipeline for simple or complex integer operations: C
+
+(define_cpu_unit "em1_xa, em1_xb, em1_xc" "exynos_m1_gp")
+
+(define_reservation "em1_alu" "(em1_xa | em1_xb | em1_xc)")
+(define_reservation "em1_c" "em1_xc")

Is this extra reservation useful, can we not just use em1_xc directly?


+;; 3.  Two asymmetric pipelines for Neon and FP operations: F0, F1
+
+(define_cpu_unit "em1_f0, em1_f1" "exynos_m1_fp")
+
+(define_reservation "em1_fmac" "em1_f0")
+(define_reservation "em1_fcvt" "em1_f0")
+(define_reservation "em1_nalu" "(em1_f0 | em1_f1)")
+(define_reservation "em1_nalu0" "em1_f0")
+(define_reservation "em1_nalu1" "em1_f1")
+(define_reservation "em1_nmisc" "em1_f0")
+(define_reservation "em1_ncrypt" "em1_f0")
+(define_reservation "em1_fadd" "em1_f1")
+(define_reservation "em1_fvar" "em1_f1")
+(define_reservation "em1_fst" "em1_f1")

Same comment here, does this not just obfuscate the interaction between
instruction classes in the description. I'm not against doing it this way
if you prefer, but it would seem to reduce readability to me. I think there
is also an argument that this increases readability, so it is your choice.


+
+;; 4.  One pipeline for branch operations: BX
+
+(define_cpu_unit "em1_bx" "exynos_m1_gp")
+
+(define_reservation "em1_br" "em1_bx")
+

And again?


+;; 5.  One AGU for loads: L
+;; One AGU for stores and one pipeline for stores: S, SD
+
+(define_cpu_unit "em1_lx" "exynos_m1_ls")
+(define_cpu_unit "em1_sx, em1_sd" "exynos_m1_ls")
+
+(define_reservation "em1_ld" "em1_lx")
+(define_reservation "em1_st" "(em1_sx + em1_sd)")
+
+;; Common occurrences
+(define_reservation "em1_sfst" "(em1_fst + em1_st)")
+(define_reservation "em1_lfst" "(em1_fst + em1_ld)")
+
+;; Branches
+;;
+;; No latency as there is no result
+;; TODO: Unconditional branches use no units;
+;; conditional branches add the BX unit;
+;; indirect branches add the C unit.
+(define_insn_reservation "exynos_m1_branch" 0
+  (and (eq_attr "tune" "exynosm1")
+   (eq_attr "type" "branch"))
+  "em1_br")
+
+(define_insn_reservation "exynos_m1_call" 1
+  (and (eq_attr "tune" "exynosm1")
+   (eq_attr "type" "call"))
+  "em1_alu")
+
+;; Basic ALU
+;;
+;; Simple ALU without shift, non-predicated
+(define_insn_reservation "exynos_m1_alu" 1
+  (and (eq_attr "tune" "exynosm1")
+   (and (not (eq_attr "predicated" "yes"))

(and (eq_attr "predicated" "no")) ?

Likewise throughout the file? Again this is your choice.

This is OK from the AArch64 side, let me know if you plan to change any
of the above, otherwise I'll commit it (or someone else can commit it)
after I see an OK from an ARM reviewer.


The naming choices were made more for uniformity's sake and to conform 
with the processor documentation.


The bit about "not... yes" <=> "no" is a goof up. :-s

I'm waiting for Kyrill's patch adding FCCMP to be approved to amend this 
patch.  However, I'd appreciate if it'd be approved and committed before 
then, as soon the ARM stuff is okayed.


Thank you,

--
Evandro Menezes

[SPARC] Minor cleanup

This just moves the VIS3 mulx patterns to a more appropriate place.

Tested on SPARC/Solaris, applied on the mainline.


2015-11-20  Eric Botcazou  

* config/sparc/sparc.md (umulxhi_vis): Move around.
(*umulxhi_sp64): Likewise.
(umulxhi_v8plus): Likewise.
(xmulx_vis): Likewise.
(*xmulx_sp64): Likewise.
(xmulx_v8plus): Likewise.
(xmulxhi_vis): Likewise.
(*xmulxhi_sp64): Likewise.
(xmulxhi_v8plus): Likewise.

-- 
Eric BotcazouIndex: config/sparc/sparc.md
===
--- config/sparc/sparc.md	(revision 230589)
+++ config/sparc/sparc.md	(working copy)
@@ -609,7 +609,7 @@ (define_insn "*cmptf_fp"
   return "fcmpq\t%1, %2";
 }
   [(set_attr "type" "fpcmp")])
-
+
 ;; Next come the scc insns.
 
 ;; Note that the boolean result (operand 0) takes on DImode
@@ -647,8 +647,6 @@ (define_expand "cstore4"
   "TARGET_FPU"
   { if (emit_scc_insn (operands)) DONE; else FAIL; })
 
-
-
 ;; Seq_special[_xxx] and sne_special[_xxx] clobber the CC reg, because they
 ;; generate addcc/subcc instructions.
 
@@ -1137,7 +1135,7 @@ (define_split
 			 (match_dup 0)))]
   "")
 
-
+
 ;; These control RTL generation for conditional jump insns
 
 (define_expand "cbranchcc4"
@@ -1318,7 +1316,6 @@ (define_insn "*cbcond_sp64"
 
 ;; There are no 32 bit brreg insns.
 
-;; XXX
 (define_insn "*normal_int_branch_sp64"
   [(set (pc)
 	(if_then_else (match_operator 0 "v9_register_compare_operator"
@@ -1335,7 +1332,6 @@ (define_insn "*normal_int_branch_sp64"
   [(set_attr "type" "branch")
(set_attr "branch_type" "reg")])
 
-;; XXX
 (define_insn "*inverted_int_branch_sp64"
   [(set (pc)
 	(if_then_else (match_operator 0 "v9_register_compare_operator"
@@ -2730,8 +2726,6 @@ (define_expand "movcc"
   DONE;
 })
 
-;; Conditional move define_insns
-
 (define_insn "*mov_cc_v9"
   [(set (match_operand:I 0 "register_operand" "=r")
 	(if_then_else:I (match_operator 1 "comparison_operator"
@@ -2896,7 +2890,7 @@ (define_insn_and_split "*movtf_cc_reg_sp
 }
   [(set_attr "length" "2")])
 
-
+
 ;; Zero-extension instructions
 
 ;; These patterns originally accepted general_operands, however, slightly
@@ -4043,7 +4037,6 @@ (define_insn "*muldi3_sp64"
   [(set_attr "type" "imul")])
 
 ;; V8plus wide multiply.
-;; XXX
 (define_insn "muldi3_v8plus"
   [(set (match_operand:DI 0 "register_operand" "=r,h")
 	(mult:DI (match_operand:DI 1 "arith_operand" "%r,0")
@@ -4094,7 +4087,6 @@ (define_expand "mulsidi3"
 
 ;; V9 puts the 64-bit product in a 64-bit register.  Only out or global
 ;; registers can hold 64-bit values in the V8plus environment.
-;; XXX
 (define_insn "mulsidi3_v8plus"
   [(set (match_operand:DI 0 "register_operand" "=h,r")
 	(mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r,r"))
@@ -4107,7 +4099,6 @@ (define_insn "mulsidi3_v8plus"
   [(set_attr "type" "multi")
(set_attr "length" "2,3")])
 
-;; XXX
 (define_insn "const_mulsidi3_v8plus"
   [(set (match_operand:DI 0 "register_operand" "=h,r")
 	(mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r,r"))
@@ -4120,7 +4111,6 @@ (define_insn "const_mulsidi3_v8plus"
   [(set_attr "type" "multi")
(set_attr "length" "2,3")])
 
-;; XXX
 (define_insn "*mulsidi3_sp32"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
@@ -4148,7 +4138,6 @@ (define_insn "*mulsidi3_sp64"
 
 ;; Extra pattern, because sign_extend of a constant isn't valid.
 
-;; XXX
 (define_insn "const_mulsidi3_sp32"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
@@ -4203,7 +4192,6 @@ (define_expand "smulsi3_highpart"
 }
 })
 
-;; XXX
 (define_insn "smulsi3_highpart_v8plus"
   [(set (match_operand:SI 0 "register_operand" "=h,r")
 	(truncate:SI
@@ -4219,7 +4207,6 @@ (define_insn "smulsi3_highpart_v8plus"
(set_attr "length" "2")])
 
 ;; The combiner changes TRUNCATE in the previous pattern to SUBREG.
-;; XXX
 (define_insn ""
   [(set (match_operand:SI 0 "register_operand" "=h,r")
 	(subreg:SI
@@ -4236,7 +4223,6 @@ (define_insn ""
   [(set_attr "type" "multi")
(set_attr "length" "2")])
 
-;; XXX
 (define_insn "const_smulsi3_highpart_v8plus"
   [(set (match_operand:SI 0 "register_operand" "=h,r")
 	(truncate:SI
@@ -4251,7 +4237,6 @@ (define_insn "const_smulsi3_highpart_v8p
   [(set_attr "type" "multi")
(set_attr "length" "2")])
 
-;; XXX
 (define_insn "*smulsi3_highpart_sp32"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(truncate:SI
@@ -4263,7 +4248,6 @@ (define_insn "*smulsi3_highpart_sp32"
   [(set_attr "type" "multi")
(set_attr "length" "2")])
 
-;; XXX
 (define_insn "const_smulsi3_highpart"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(truncate:SI
@@ -4301,7 +4285,6 @@ (define_expand "umulsidi3"
 }
 })
 
-;; XXX
 (define_insn "umulsidi3_v8plus"
   [(set

Re: [PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class


On 11/20/2015 01:52 PM, H.J. Lu wrote:

On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener
 wrote:

On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu  wrote:

Empty record should be returned and passed the same way in C and C++.
This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which
defaults to return false.  For C++, LANG_HOOKS_EMPTY_RECORD_P is defined
to is_really_empty_class, which returns true for C++ empty classes.  For
LTO, we stream out a bit to indicate if a record is empty and we store
it in TYPE_LANG_FLAG_0 when streaming in.  get_ref_base_and_extent is
changed to set bitsize to 0 for empty records.  Middle-end and x86
backend are updated to ignore empty records for parameter passing and
function value return.  Other targets may need similar changes.


Please avoid a new langhook for this and instead claim a bit in tree_type_common
like for example restrict_flag (double-check it is unused for non-pointers).


There is no bit in tree_type_common I can overload.  restrict_flag is
checked for non-pointers to issue an error when it is used on non-pointers:

/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38:
error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’
typedef typename T::L __restrict__ r;// { dg-error "'__restrict__'
qualifiers cannot" "" }


The C++ front end only needs to check TYPE_RESTRICT for this purpose on 
front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals 
could handle that specifically if you change TYPE_RESTRICT to only apply 
to pointers.


Jason

Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers


On 11/20/2015 02:19 AM, Nick Clifton wrote:

Hi Jeff,


The code there would solve this problem, but the approach is is overly
cautious, since it disables the optimization for all extensions that
increase the number of hard registers used.  Some of these will be
viable candidates, provided that the extra hard registers are no used.
(This is certainly true for the RL78, where the (patched) optimization
does improve code, even though the widening does use extra registers).

Nick -- can you pass along your testcode?


Sure - this is for the RL78 toolchain.  In theory the problem is
generic, but I have not tested other toolchains.

Compile the gcc.c-torture/execute/pr42833.c or
gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free
also specified on the command line.  Without -free these tests pass.
With -free they fail.
So this looks like a pretty fundamental failing in ree.c and AFAICT has 
been there since day 1.  Thankfully, I don't think it's likely to 
trigger all that often.


It's useful to ponder two cases.  One where we're going to add a copy 
and one where we don't.  I added the copy case during the 4.9 cycle as 
an extension of existing code.  It only fires in easy to analyze 
circumstances -- when the original set and the extension are in the same 
basic block, but have different destination registers.


In the copy case.  The code checks that the destination of the 
*extension* is neither used nor set between the original set and the 
extension.  So in the case of multi-word hard reg, we'll verify that the 
entire multi-word hard reg survives from the original insn to the 
extension.  That avoids this problem in cases where a copy is needed.


However, in the case where no copy is needed, no such checks are made. 
In fact the checks could be fairly painful as the original set and the 
extension can be in different blocks with arbitrary flow between them.


I would hazard a guess that the authors simply didn't consider the 
multi-hard reg case.  Essentially if the original set reached an 
extension, then obviously the original set got there unharmed and the 
extended destination should reach as well -- except that doesn't apply 
to multi-word hard regs.


Given the expense in checking all the potential paths between the 
original set and the extension, I think just disallowing this case ought 
to be is the best way to go.  Ideally we'll filter out that case before 
we get to combine_set_extension.  But if we can't, it's easy enough to 
test there.


Jeff

Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

> I would hazard a guess that the authors simply didn't consider the
> multi-hard reg case.  Essentially if the original set reached an
> extension, then obviously the original set got there unharmed and the
> extended destination should reach as well -- except that doesn't apply
> to multi-word hard regs.

This pass started as a Zero-Extension Elimination pass for x86-64 and was only 
considering implicit SI->DI extensions initially.  The algorithm was tailored 
to this specific pattern and I agree that the pass should be reimplemented.

-- 
Eric Botcazou

Re: [PATCH] g++.dg/init/vbase1.C and g++.dg/cpp/ucn-1.C

2015-11-20 Thread Mike Stump

On Nov 16, 2015, at 6:02 AM, Renlin Li  wrote:
> On 14/11/15 00:33, David Edelsohn wrote:
>> No RISC architecture can store directly to MEM, so the expected RTL in
>> g++.dg/init/vbase1.C is wrong.  I am adding XFAIL for PowerPC.  This
>> probably should be disabled for ARM and other RISC architectures.
> 
> I observed the same problem in arm.
> 
> This passes for aarch64 and mips as they have zero register to do that. 
> However, other RISC might not have that feature, for example arm and RS6000 
> in this  case.

I fixed this with the below patch.  Tested on x86_64 linux, x86_64 darwin and 
my port.  If you want to list aarch64/arn and mips, please do.


* g++.dg/init/vbase1.C: Only run on x86_64-*-* as this testcase
isn't portable.

Index: g++.dg/init/vbase1.C
===
--- g++.dg/init/vbase1.C(revision 230675)
+++ g++.dg/init/vbase1.C(working copy)
@@ -42,4 +42,4 @@ int main(int, char**)
 // Verify that the SubB() mem-initializer is storing 0 directly into
 // this->D.whatever rather than into a stack temp that is then copied into the
 // base field.
-// { dg-final { scan-rtl-dump "set 
\[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" { xfail { 
powerpc*-*-* } } } }
+// { dg-final { scan-rtl-dump "set 
\[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" { target { 
x86_64-*-* } } } }

Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-20 Thread Bin.Cheng

On Wed, Nov 18, 2015 at 11:50 PM, Bernd Schmidt  wrote:
> On 11/10/2015 11:19 AM, Bin.Cheng wrote:
>>
>> On Tue, Nov 10, 2015 at 6:06 PM, Bernd Schmidt 
>> wrote:
>>>
>>>
>>> Multi-line expressions should be wrapped in parentheses so that
>>> emacs/indent
>>> can format them automatically. Two sets of parens are needed for this.
>>> Operators should then line up appropriately.
>>
>> Ah, thanks for teaching.  Here is the updated patch, hoping it's correct.
>
> It looks like you're waiting to check it in - Richard's earlier approval
> still holds.
Thanks, applied as r230647.
>
>
> Bernd
>

Re: [PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

On 20/11/15 01:41, Bernd Schmidt wrote:

I1 is def_insn, I3 is cand->insn. tmp_reg is 'ax'. What we want to do
is reject this transformation
because the destination of def_insn (aka I1), that is 'ax', is not the
operand of the extend operation
in cand->insn (aka I3). As you said, rtx_equal won't work on just
SET_SRC (PATTERN (cand->insn)) because
it's an extend operation. So reg_overlap_mentioned should be appropriate.

Yeah, so just use the src_reg variable for the comparison. I still don't see why you wouldn't want to use the stronger test. But the whole thing still feels not completely ideal somehow, so after reading through ree.c for a while and
getting a better feeling for how it works, I think the following (which you said is equivalent) would be the most understandable and direct fix.

You said that the two tests should be equivalent, and I agree. I've not found
cases where the change makes a difference, other than the testcase. Would you
mind running this version through the testsuite and committing if it passes?

I've shrunk the comment; massive explanations like this for every bug are inappropriate IMO, and the example also duplicates an earlier comment in the same function. And, as I said earlier, the way you placed the comment is confusing
because only one part of the following if statement is related to it.

Ok, thanks for the explanation.
When investigating this bug I tried a patch identical to yours and it had
worked just fine.
My patch was just an alternative approach to the same issue.
I'll retest it just to double-check and I'll incorporate the testsuite
additions.

Thanks for your help!

Kyrill

Bernd

Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers

2015-11-20 Thread Nick Clifton


Hi Jeff,


The code there would solve this problem, but the approach is is overly
cautious, since it disables the optimization for all extensions that
increase the number of hard registers used.  Some of these will be
viable candidates, provided that the extra hard registers are no used.
(This is certainly true for the RL78, where the (patched) optimization
does improve code, even though the widening does use extra registers).

Nick -- can you pass along your testcode?


Sure - this is for the RL78 toolchain.  In theory the problem is 
generic, but I have not tested other toolchains.


Compile the gcc.c-torture/execute/pr42833.c or 
gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free 
also specified on the command line.  Without -free these tests pass. 
With -free they fail.


Cheers
  Nick

[PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage

2015-11-20 Thread Alan Hayward

When vectorising a integer induction condition reduction,
is_nonwrapping_integer_induction ends up with different values for base
during the analysis and build phases. In the first it is an INTEGER_CST,
in the second the loop has been vectorised out and the base is now a
variable.

This results in the analysis and build stage detecting the
STMT_VINFO_VEC_REDUCTION_TYPE as different types.

The easiest way to fix this is to only check for integer induction
conditions on the analysis stage.


gcc/
PR tree-optimization/68413
* tree-vect-loop.c (vectorizable_reduction): Only check for
integer cond reduction on analysis stage.




Thanks,
Alan.



analysisonlycondcheck.patch
Description: Binary data

libgo patch committed: Handle DW_AT_specification in cgo