Re: update acc routines in fortran
On Thu, Nov 19, 2015 at 08:26:45AM -0800, Cesar Philippidis wrote: > (gfc_oacc_routine_name): New struct; Full stop instead of semicolon. > diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c > index 1f6311c..e321072 100644 > --- a/gcc/tree-nested.c > +++ b/gcc/tree-nested.c > @@ -1106,6 +1106,9 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct > walk_stmt_info *wi) > case OMP_CLAUSE_NUM_TASKS: > case OMP_CLAUSE_HINT: > case OMP_CLAUSE__CILK_FOR_COUNT_: > + case OMP_CLAUSE_NUM_GANGS: > + case OMP_CLAUSE_NUM_WORKERS: > + case OMP_CLAUSE_VECTOR_LENGTH: > wi->val_only = true; > wi->is_lhs = false; > convert_nonlocal_reference_op (_CLAUSE_OPERAND (clause, 0), > @@ -1173,6 +1176,10 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct > walk_stmt_info *wi) > case OMP_CLAUSE_THREADS: > case OMP_CLAUSE_SIMD: > case OMP_CLAUSE_DEFAULTMAP: > + case OMP_CLAUSE_GANG: > + case OMP_CLAUSE_WORKER: > + case OMP_CLAUSE_VECTOR: This looks wrong. OMP_CLAUSE_GANG has 2 arguments, OMP_CLAUSE_WORKER and OMP_CLAUSE_VECTOR one argument, if you use a non-local decl or local decl that is referenced by a nested routine in those operands, it won't be handled properly. > @@ -1830,6 +1840,10 @@ convert_local_omp_clauses (tree *pclauses, struct > walk_stmt_info *wi) > case OMP_CLAUSE_THREADS: > case OMP_CLAUSE_SIMD: > case OMP_CLAUSE_DEFAULTMAP: > + case OMP_CLAUSE_GANG: > + case OMP_CLAUSE_WORKER: > + case OMP_CLAUSE_VECTOR: > + case OMP_CLAUSE_SEQ: Ditto. Otherwise LGTM. Jakub
Re: [PATCH] Add clang-format config to contrib folder
On 11/20/2015 11:33 AM, Martin Liška wrote: > Hi Pedro. Hi Martin. > Fully agree with you, there's suggested patch. > Hope I can install the patch for trunk? I'd call it obvious. :-) Thanks, Pedro Alves
Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.
Hi Kirill, On 18/11/15 14:11, Kirill Yukhin wrote: Hello Andreas, Devid. On 18 Nov 10:45, Andreas Schwab wrote: Kirill Yukhinwrites: diff --git a/gcc/testsuite/c-c++-common/attr-simd.c b/gcc/testsuite/c-c++-common/attr-simd.c new file mode 100644 index 000..b4eda34 --- /dev/null +++ b/gcc/testsuite/c-c++-common/attr-simd.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-optimized" } */ + +__attribute__((__simd__)) +extern +int simd_attr (void) +{ + return 0; +} + +/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" } } */ On ia64: FAIL: c-c++-common/attr-simd.c -Wc++-compat scan-tree-dump optimized "simd_attr[ \\t]simdclone|vector" FAIL: c-c++-common/attr-simd.c -Wc++-compat scan-tree-dump optimized "simd_attr2[ \\t]simdclone|vector" $ grep simd_attr attr-simd.c.194t.optimized ;; Function simd_attr (simd_attr, funcdef_no=0, decl_uid=1389, cgraph_uid=0, symbol_order=0) simd_attr () ;; Function simd_attr2 (simd_attr2, funcdef_no=1, decl_uid=1392, cgraph_uid=1, symbol_order=1) simd_attr2 () As far as vABI is supported on x86_64/i?86 only, I am going to enable mentioned `scan-tree-dump' only for these targets. This should cure both IA64 and Power. Concerning attr-simd-3.c. It is known issue: PR68158. And I believe this test should work everywhere as far as PR is resolved. I'll put xfail into the test. Which will lead to (in g++.log): gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes]^M output is: gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes]^M XFAIL: c-c++-common/attr-simd-3.c -std=gnu++98 PR68158 (test for errors, line 5) FAIL: c-c++-common/attr-simd-3.c -std=gnu++98 (test for excess errors) Excess errors: gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes] Patch in the bottom. gcc/tessuite/ * c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error. This test fails on bare-metal targets that don't support -fcilkplus or -pthread. Would you consider moving them to the cilkplus testing directory or adding an appropriate effective target check? Thanks, Kyrill * c-c++-common/attr-simd.c: Limit scan of dump to x86_64/i?86. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c-c++-common/attr-simd-3.c index 2bbdf04..35dd4c0 100644 --- a/gcc/testsuite/c-c++-common/attr-simd-3.c +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c @@ -2,4 +2,4 @@ /* { dg-options "-fcilkplus" } */ /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not declared in this scope" } */ -void f () __attribute__((__simd__, __vector__)); /* { dg-error "in the same function marked as a Cilk Plus" } */ +void f () __attribute__((__simd__, __vector__)); /* { dg-error "in the same function marked as a Cilk Plus" "PR68158" { xfail c++ } } */ diff --git a/gcc/testsuite/c-c++-common/attr-simd.c b/gcc/testsuite/c-c++-common/attr-simd.c index 61974e3..7674588 100644 --- a/gcc/testsuite/c-c++-common/attr-simd.c +++ b/gcc/testsuite/c-c++-common/attr-simd.c @@ -11,7 +11,7 @@ int simd_attr (void) return 0; } -/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" } } */ +/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVbN4_simd_attr:" 1 { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVbM4_simd_attr:" 1 { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVcN4_simd_attr:" 1 { target { i?86-*-* x86_64-*-* } } } } */ @@ -29,7 +29,7 @@ int simd_attr2 (void) return 0; } -/* { dg-final { scan-tree-dump "simd_attr2\[ \\t\]simdclone|vector" "optimized" } } */ +/* { dg-final { scan-tree-dump "simd_attr2\[ \\t\]simdclone|vector" "optimized" { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVbN4_simd_attr2:" 1 { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVbM4_simd_attr2:" 1 { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVcN4_simd_attr2:" 1 { target { i?86-*-* x86_64-*-* } } } } */
Go patch committed: add receiver type to specific type function name
This patch to the Go frontend fixes the case where two different methods on different types with the same method name both define a type internally with the same name where the type requires a specific type hash or equality function. Before this patch those functions would get the same, causing a compilation error. This patch gives them different names using the same approach as is done for the type descriptor. I sent out a test case to the master Go testsuite in https://golang.org/cl/17081. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline and GCC 5 branch. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230463) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -e3aef41ce0c5be81e2589e60d9cb0db1516e9e2d +dfa74d975884f363c74d6a66a37b1703093fdba6 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/types.cc === --- gcc/go/gofrontend/types.cc (revision 230463) +++ gcc/go/gofrontend/types.cc (working copy) @@ -1769,7 +1769,16 @@ Type::specific_type_functions(Gogo* gogo const Named_object* in_function = name->in_function(); if (in_function != NULL) { - base_name += '$' + Gogo::unpack_hidden_name(in_function->name()); + base_name.append(1, '$'); + const Typed_identifier* rcvr = + in_function->func_value()->type()->receiver(); + if (rcvr != NULL) + { + Named_type* rcvr_type = rcvr->type()->deref()->named_type(); + base_name.append(Gogo::unpack_hidden_name(rcvr_type->name())); + base_name.append(1, '$'); + } + base_name.append(Gogo::unpack_hidden_name(in_function->name())); if (index > 0) { char buf[30];
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
On 20/11/15 11:37, Richard Biener wrote: I'd rather make loop_optimizer_init do nothing if requested flags are already set and no fixup is needed Thus sth like Index: gcc/loop-init.c === --- gcc/loop-init.c (revision 230649) +++ gcc/loop-init.c (working copy) @@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags) calculate_dominance_info (CDI_DOMINATORS); if (!needs_fixup) - checking_verify_loop_structure (); + { + checking_verify_loop_structure (); + if (loops_state_satisfies_p (flags)) + goto out; What about flags that are present in the loops state, but not requested in flags? Should we try to clear those flags? Thanks, - Tom + } /* Clear all flags. */ if (recorded_exits) @@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags) /* Apply flags to loops. */ apply_loop_flags (flags); + checking_verify_loop_structure (); + +out: /* Dump loops. */ flow_loops_dump (dump_file, NULL, 1); - checking_verify_loop_structure (); - timevar_pop (TV_LOOP_INIT); }
Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.
Hello Kyrill, On 20 Nov 12:15, Kyrill Tkachov wrote: > >gcc/tessuite/ > > * c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error. > > This test fails on bare-metal targets that don't support -fcilkplus or > -pthread. > Would you consider moving them to the cilkplus testing directory or adding an > appropriate > effective target check? I think so. I'll commit change in the bottom. gcc/testsuite/ * c-c++-common/attr-simd-3.c: Require Cilk Plus in effective target. > > Thanks, > Kyrill -- Thanks, K diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c-c++-common/attr-simd-3.c index 35dd4c0..c7533f0 100644 --- a/gcc/testsuite/c-c++-common/attr-simd-3.c +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c @@ -1,3 +1,4 @@ +/* { dg-require-effective-target cilkplus } */ /* { dg-do compile } */ /* { dg-options "-fcilkplus" } */ /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not declared in this scope" } */
Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt
On 11/19/2015 12:49 AM, Jeff Law wrote: On 11/18/2015 12:16 PM, Bernd Schmidt wrote: I don't think so, actually. One safe option would be to rip it out and just stop transforming this case, but let's start by looking at the code just a bit further down, calling noce_can_store_speculate. This was added later than the code we're discussing, and it tries to verify that the same memory location will unconditionally be written to at a point later than the one we're trying to convert And if we dig into that thread, Ian suggests this isn't terribly important anyway. However, if we go even further upthread, we find an assertion from Michael that this is critical for 456.hmmer and references BZ 27313. Cc'd in case he has additional input. https://gcc.gnu.org/ml/gcc-patches/2007-08/msg01978.html Sadly, no testcase was included. BZ27313 is marked as fixed by the introduction of the tree cselim pass, thus the problem won't even be seen at RTL level. I'm undecided on whether cs-elim is safe wrt the store speculation vs locks concerns raised in the thread discussing Ian's noce_can_store_speculate_p, but that's not something we have to consider to solve the problem at hand. So if it weren't for the assertion that it's critical for hmmr, I'd be convinced that just ripping out was the right thing to do. Can you look at 27313 and hmmr and see if there's an impact. Just maybe the critical stuff for those is handled by the tree if converter and we can just rip out the clearly incorrect RTL bits without regressing anything performance-wise. If there is an impact, then I think we have to look at either improving the tree bits (so we can remove the rtl bits) or we have to do real dataflow analysis in the rtl bits. So I made this change: if (!set_b && MEM_P (orig_x)) -{ - /* Disallow the "if (...) x = a;" form (implicit "else x = x;") -for optimizations if writing to x may trap or fault, -i.e. it's a memory other than a static var or a stack slot, -is misaligned on strict aligned machines or is read-only. If -x is a read-only memory, then the program is valid only if we -avoid the store into it. If there are stores on both the -THEN and ELSE arms, then we can go ahead with the conversion; -either the program is broken, or the condition is always -false such that the other memory is selected. */ - if (noce_mem_write_may_trap_or_fault_p (orig_x)) - return FALSE; - - /* Avoid store speculation: given "if (...) x = a" where x is a -MEM, we only want to do the store if x is always set -somewhere in the function. This avoids cases like - if (pthread_mutex_trylock(mutex)) -++global_variable; -where we only want global_variable to be changed if the mutex -is held. FIXME: This should ideally be expressed directly in -RTL somehow. */ - if (!noce_can_store_speculate_p (test_bb, orig_x)) - return FALSE; -} +return FALSE; As far as I can tell hmmer and the 27313 testcase are unaffected at -O2 (if anything, hmmer was very slightly faster afterwards). The run wasn't super-scientific, but I wouldn't have expected anything else given the existence of cs-elim. Ok to do the above, removing all the bits made unnecessary (including memory_must_be_modified_in_insn_p in alias.c)? Bernd
Re: [PATCH][ARM] Do not expand movmisalign pattern if not in 32-bit mode
On 11/11/15 16:10, Kyrill Tkachov wrote: > Hi all, > > The attached testcase ICEs when compiled with -march=armv6k -mthumb -Os or > any march > for which -mthumb gives Thumb1: > error: unrecognizable insn: > } > ^ > (insn 13 12 14 5 (set (reg:SI 116 [ x ]) > (unspec:SI [ > (mem:SI (reg/v/f:SI 112 [ s ]) [0 MEM[(unsigned char > *)s_1(D)]+0 S4 A8]) > ] UNSPEC_UNALIGNED_LOAD)) besttry.c:9 -1 > (nil)) > > The problem is that the expands a movmisalign pattern but the resulting > unaligned loads don't > match any define_insn because they are gated on unaligned_access && > TARGET_32BIT. > The unaligned_access expander is gated only on unaligned_access. > > This small patch fixes the issue by turning off unaligned_access if > TARGET_32BIT is not true. > We can then remove TARGET_32BIT from the unaligned load/store patterns > conditions as a cleanup. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk? > > Thanks, > Kyrill > > 2015-11-11 Kyrylo Tkachov> > * config/arm/arm.c (arm_option_override): Require TARGET_32BIT > for unaligned_access. > * config/arm/arm.md (unaligned_loadsi): Remove redundant TARGET_32BIT > from matching condition. > (unaligned_loadhis): Likewise. > (unaligned_loadhiu): Likewise. > (unaligned_storesi): Likewise. > (unaligned_storehi): Likewise. > > 2015-11-11 Kyrylo Tkachov > > * gcc.target/arm/armv6-unaligned-load-ice.c: New test. This means we don't have unaligned access for some cores in Thumb1 for armv6. I'd rather not have the ICE instead of trying to find testing coverage on such cores now. OK. regards Ramana
[Patch] sync top level configure with binutils-gdb
This patch was pushed on binutils-gdb repo, so I also commit it on gcc. Tristan. 2015-11-20 Tristan GingoldSync with binutils-gdb: 2015-11-20 Tristan Gingold * configure.ac: Add aarch64-*-darwin* and arm-*-darwin*. * configure: Regenerate. Index: configure === --- configure (revision 230657) +++ configure (working copy) @@ -3686,6 +3686,14 @@ case "${target}" in *-*-chorusos) ;; + aarch64-*-darwin*) +noconfigdirs="$noconfigdirs ld gas gdb gprof" +noconfigdirs="$noconfigdirs sim target-rda" +;; + arm-*-darwin*) +noconfigdirs="$noconfigdirs ld gas gdb gprof" +noconfigdirs="$noconfigdirs sim target-rda" +;; powerpc-*-darwin*) noconfigdirs="$noconfigdirs ld gas gdb gprof" noconfigdirs="$noconfigdirs sim target-rda" Index: configure.ac === --- configure.ac(revision 230657) +++ configure.ac(working copy) @@ -1023,6 +1023,14 @@ case "${target}" in *-*-chorusos) ;; + aarch64-*-darwin*) +noconfigdirs="$noconfigdirs ld gas gdb gprof" +noconfigdirs="$noconfigdirs sim target-rda" +;; + arm-*-darwin*) +noconfigdirs="$noconfigdirs ld gas gdb gprof" +noconfigdirs="$noconfigdirs sim target-rda" +;; powerpc-*-darwin*) noconfigdirs="$noconfigdirs ld gas gdb gprof" noconfigdirs="$noconfigdirs sim target-rda"
Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
On Fri, Nov 20, 2015 at 1:33 PM, Alan Haywardwrote: > > > On 20/11/2015 11:00, "Richard Biener" wrote: > >>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward >>wrote: >>> When vectorising a integer induction condition reduction, >>> is_nonwrapping_integer_induction ends up with different values for base >>> during the analysis and build phases. In the first it is an INTEGER_CST, >>> in the second the loop has been vectorised out and the base is now a >>> variable. >>> >>> This results in the analysis and build stage detecting the >>> STMT_VINFO_VEC_REDUCTION_TYPE as different types. >>> >>> The easiest way to fix this is to only check for integer induction >>> conditions on the analysis stage. >> >>I don't like this. For the evolution part we have added >>STMT_VINFO_LOOP_PHI_EVOLUTION_PART. If you now need >>the original initial value as well then just save it. >> >>Or if you really want to go with the hack then please do not call >>is_nonwrapping_integer_induction with vec_stmt != NULL but >>initialize cond_expr_is_nonwrapping_integer_induction from >>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) >> >>The hack also lacks a comment. >> > > Ok. I've gone for a combination of both: > > I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE. > > I've removed the vec_stmt != NULL checks. > > I've moved the call to is_nonwrapping_integer_induction until after the > vect_is_simple_reduction check. I never liked that I had > is_nonwrapping_integer_induction early in the function, and think this > looks better. It looks better but the comment for loop_phi_evolution_base is wrong. The value is _not_ "correct" after prologue peeling (unless you update it there to a non-constant expr). It is conservatively "correct" for is_nonwrapping_integer_induction though. Which is why I'd probably rename it to STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED to reflect this (and similar to STMT_VINFO_NITERS_UNCHANGED). Ok with that change. Thanks, Richard. > > Alan. >
Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
On 20/11/2015 13:47, "Richard Biener"wrote: >On Fri, Nov 20, 2015 at 1:33 PM, Alan Hayward >wrote: >> >> >>On 20/11/2015 11:00, "Richard Biener" wrote: >> >>>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward >>>wrote: When vectorising a integer induction condition reduction, is_nonwrapping_integer_induction ends up with different values for base during the analysis and build phases. In the first it is an INTEGER_CST, in the second the loop has been vectorised out and the base is now a variable. This results in the analysis and build stage detecting the STMT_VINFO_VEC_REDUCTION_TYPE as different types. The easiest way to fix this is to only check for integer induction conditions on the analysis stage. >>> >>>I don't like this. For the evolution part we have added >>>STMT_VINFO_LOOP_PHI_EVOLUTION_PART. If you now need >>>the original initial value as well then just save it. >>> >>>Or if you really want to go with the hack then please do not call >>>is_nonwrapping_integer_induction with vec_stmt != NULL but >>>initialize cond_expr_is_nonwrapping_integer_induction from >>>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) >>> >>>The hack also lacks a comment. >>> >> >>Ok. I've gone for a combination of both: >> >>I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE. >> >>I've removed the vec_stmt != NULL checks. >> >>I've moved the call to is_nonwrapping_integer_induction until after the >>vect_is_simple_reduction check. I never liked that I had >>is_nonwrapping_integer_induction early in the function, and think this >>looks better. > >It looks better but the comment for loop_phi_evolution_base is wrong. >The value is _not_ "correct" after prologue peeling (unless you >update it there to a non-constant expr). It is conservatively "correct" >for is_nonwrapping_integer_induction though. Which is why I'd >probably rename it to STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED >to reflect this (and similar to STMT_VINFO_NITERS_UNCHANGED). > >Ok with that change. Updated as requested and submitted. Alan. analysisonlycondcheck3.patch Description: Binary data
[Patch, fortran] PRs 68237 & 66762 - submodule problems
Dear All, I have committed as 'obvious' revision 230661 to fix 2/3 submodule problems. In the case of the third, PR68243, I believe gfortran is behaving correctly and I am awaiting confirmation from the reporter. Thanks to Dominique for regtesting the part of the patch that fixes PR66762. I have bootstrapped and regtested both part on FC21/x86_64. Cheers Paul PS I have just noticed that I attributed the wrong part of the patch to Steve. I will correct the ChangeLog in just a moment. 2015-11-20 Paul ThomasPR fortran/68237 * decl.c (gfc_match_submod_proc): Test the interface symbol before accessing its attributes. 2015-11-20 Steven G. Kargl PR fortran/66762 (gfc_get_symbol_decl): Test for attr.used_in_submodule as well as attr.use_assoc (twice). (gfc_create_module_variable): Ditto. 2015-11-20 Paul Thomas PR fortran/68237 * gfortran.dg/submodule_12.f90: New test PR fortran/66762 * gfortran.dg/submodule_6.f90: Add compile option -flto.
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
On Fri, 20 Nov 2015, Tom de Vries wrote: > On 20/11/15 11:37, Richard Biener wrote: > >I'd rather make loop_optimizer_init do nothing > > if requested flags are already set and no fixup is needed > > > Thus sth like > > > > Index: gcc/loop-init.c > > === > > --- gcc/loop-init.c (revision 230649) > > +++ gcc/loop-init.c (working copy) > > @@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags) > > calculate_dominance_info (CDI_DOMINATORS); > > > > if (!needs_fixup) > > - checking_verify_loop_structure (); > > + { > > + checking_verify_loop_structure (); > > + if (loops_state_satisfies_p (flags)) > > + goto out; > > What about flags that are present in the loops state, but not requested in > flags? Should we try to clear those flags? No, I don't think so, that would break in-loop-pipeline LIM, dropping loop-closed SSA for example. I agree it's somewhat of an odd behavior but all passes should either be placed in a sub-pipeline with an outer loop_optimizer_init()/finalize () call or call both themselves. Richard. > Thanks, > - Tom > > > + } > > > > /* Clear all flags. */ > > if (recorded_exits) > > @@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags) > > /* Apply flags to loops. */ > > apply_loop_flags (flags); > > > > + checking_verify_loop_structure (); > > + > > +out: > > /* Dump loops. */ > > flow_loops_dump (dump_file, NULL, 1); > > > > - checking_verify_loop_structure (); > > - > > timevar_pop (TV_LOOP_INIT); > > } > > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument
On Fri, Nov 20, 2015 at 2:08 PM, Ilya Enkovichwrote: > On 19 Nov 18:19, Richard Biener wrote: >> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt >> wrote: >> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote: >> >> Currently we fold all memcpy/memmove calls with a known data size. >> >> It causes two problems when used with Pointer Bounds Checker. >> >> The first problem is that we may copy pointers as integer data >> >> and thus loose bounds. The second problem is that if we inline >> >> memcpy, we also have to inline bounds copy and this may result >> >> in a huge amount of code and significant compilation time growth. >> >> This patch disables folding for functions we want to instrument. >> >> >> >> Does it look reasonable for trunk and GCC5 branch? Bootstrapped >> >> and regtested on x86_64-unknown-linux-gnu. >> > >> >Can't see anything wrong with it. Ok. >> >> But for small sizes this can have a huge impact on optimization. Which is >> why we have the code in the first place. I'd make the check less broad, for >> example inlining copies of size less than a pointer shouldn't be affected. > > Right. We also may inline in case we know no pointers are copied. Below is > a version with extended condition and a couple more tests. Bootstrapped and > regtested on x86_64-unknown-linux-gnu. Does it OK for trunk and gcc-5-branch? > >> >> Richard. >> >> > >> >Bernd >> >> > > Thanks, > Ilya > -- > gcc/ > > 2015-11-20 Ilya Enkovich > > * gimple-fold.c (gimple_fold_builtin_memory_op): Don't > fold call if we are going to instrument it and it may > copy pointers. > > gcc/testsuite/ > > 2015-11-20 Ilya Enkovich > > * gcc.target/i386/mpx/pr68337-1.c: New test. > * gcc.target/i386/mpx/pr68337-2.c: New test. > * gcc.target/i386/mpx/pr68337-3.c: New test. > > > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > index 1ab20d1..dd9f80b 100644 > --- a/gcc/gimple-fold.c > +++ b/gcc/gimple-fold.c > @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3. If not see > #include "gomp-constants.h" > #include "optabs-query.h" > #include "omp-low.h" > +#include "tree-chkp.h" > +#include "ipa-chkp.h" > > > /* Return true when DECL can be referenced from current unit. > @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, >unsigned int src_align, dest_align; >tree off0; > > + /* Inlining of memcpy/memmove may cause bounds lost (if we copy > +pointers as wide integer) and also may result in huge function > +size because of inlined bounds copy. Thus don't inline for > +functions we want to instrument in case pointers are copied. */ > + if (flag_check_pointer_bounds > + && chkp_instrumentable_p (cfun->decl) > + /* Even if data may contain pointers we can inline if copy > +less than a pointer size. */ > + && (!tree_fits_uhwi_p (len) > + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0) || tree_to_uhwi (len) >= POINTER_SIZE_UNITS > + /* Check data type for pointers. */ > + && (!TREE_TYPE (src) > + || !TREE_TYPE (TREE_TYPE (src)) > + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src))) > + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src) I don't think you can in any way rely on the pointer type of the src argument as all pointer conversions are useless and memcpy and friends take void * anyway. Note that you also disable memmove to memcpy simplification with this early check. Where is pointer transfer handled for MPX? I suppose it's not done transparently for all memory move instructions but explicitely by instrumented block copy routines in libmpx? In which case how does that identify pointers vs. non-pointers? Richard. > + return false; > + >/* Build accesses at offset zero with a ref-all character type. */ >off0 = build_int_cst (build_pointer_type_for_mode (char_type_node, > ptr_mode, true), 0); > diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c > b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c > new file mode 100644 > index 000..3f8d79d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c > @@ -0,0 +1,32 @@ > +/* { dg-do run } */ > +/* { dg-options "-fcheck-pointer-bounds -mmpx" } */ > + > +#include "mpx-check.h" > + > +#define N 2 > + > +extern void abort (); > + > +static int > +mpx_test (int argc, const char **argv) > +{ > + char ** src = (char **)malloc (sizeof (char *) * N); > + char ** dst = (char **)malloc (sizeof (char *) * N); > + int i; > + > + for (i = 0; i < N; i++) > +src[i] = __bnd_set_ptr_bounds (argv[0] + i, i + 1); > + > + __builtin_memcpy(dst, src, sizeof (char *) * N); > + > + for (i = 0; i < N; i++) > +{ > + char *p = dst[i]; >
Re: [PATCH][ARM] PR 68149 Fix ICE in unaligned_loaddi split
On 10/11/15 17:32, Kyrill Tkachov wrote: > Hi all, > > This ICE in this PR occurs when we're trying to split unaligned_loaddi into > two SImode unaligned loads. > The problem is in the addressing mode. When reload was picking the > addressing mode we accepted an offset of > -256 because the mode in the pattern is advertised as DImode and that was > accepted by the legitimate address > hooks because they thought it was a NEON load (DImode is in > VALID_NEON_DREG_MODE). However, the splitter wants > to generate two normal SImode unaligned loads using that address, for which > -256 is not valid, so we ICE > in gen_lowpart. > > The only way unaligned_loaddi could be generated was through the > gen_movmem_ldrd_strd expansion that implements > a memmove using LDRD and STRD sequences. If the memmove source is not aligned > we can't use LDRDs so the code > generates unaligned_loaddi patterns and expects them to be split into two > normal loads after reload. Similarly > for unaligned store destinations. > > This patch just explicitly generates the two unaligned SImode loads or stores > when appropriate inside > gen_movmem_ldrd_strd. This makes the unaligned_loaddi and unaligned_storedi > patterns unused, so we can remove them. > > This patch fixes the ICe in gcc.target/aarch64/advsimd-intrinsics/vldX.c seen > with > -mthumb -mcpu=cortex-a15 -mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee > so no new testcase is added. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk? > > Thanks, > Kyrill > > 2015-11-10 Kyrylo Tkachov> > PR target/68149 > * config/arm/arm.md (unaligned_loaddi): Delete. > (unaligned_storedi): Likewise. > * config/arm/arm.c (gen_movmem_ldrd_strd): Don't generate > unaligned DImode memory ops. Instead perform two back-to-back > unalgined SImode ops. s/unalgined/unaligned. Ok. Ramana
Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument
On 20 Nov 14:54, Richard Biener wrote: > On Fri, Nov 20, 2015 at 2:08 PM, Ilya Enkovichwrote: > > On 19 Nov 18:19, Richard Biener wrote: > >> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt > >> wrote: > >> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote: > >> >> Currently we fold all memcpy/memmove calls with a known data size. > >> >> It causes two problems when used with Pointer Bounds Checker. > >> >> The first problem is that we may copy pointers as integer data > >> >> and thus loose bounds. The second problem is that if we inline > >> >> memcpy, we also have to inline bounds copy and this may result > >> >> in a huge amount of code and significant compilation time growth. > >> >> This patch disables folding for functions we want to instrument. > >> >> > >> >> Does it look reasonable for trunk and GCC5 branch? Bootstrapped > >> >> and regtested on x86_64-unknown-linux-gnu. > >> > > >> >Can't see anything wrong with it. Ok. > >> > >> But for small sizes this can have a huge impact on optimization. Which is > >> why we have the code in the first place. I'd make the check less broad, > >> for example inlining copies of size less than a pointer shouldn't be > >> affected. > > > > Right. We also may inline in case we know no pointers are copied. Below > > is a version with extended condition and a couple more tests. Bootstrapped > > and regtested on x86_64-unknown-linux-gnu. Does it OK for trunk and > > gcc-5-branch? > > > >> > >> Richard. > >> > >> > > >> >Bernd > >> > >> > > > > Thanks, > > Ilya > > -- > > gcc/ > > > > 2015-11-20 Ilya Enkovich > > > > * gimple-fold.c (gimple_fold_builtin_memory_op): Don't > > fold call if we are going to instrument it and it may > > copy pointers. > > > > gcc/testsuite/ > > > > 2015-11-20 Ilya Enkovich > > > > * gcc.target/i386/mpx/pr68337-1.c: New test. > > * gcc.target/i386/mpx/pr68337-2.c: New test. > > * gcc.target/i386/mpx/pr68337-3.c: New test. > > > > > > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > > index 1ab20d1..dd9f80b 100644 > > --- a/gcc/gimple-fold.c > > +++ b/gcc/gimple-fold.c > > @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3. If not see > > #include "gomp-constants.h" > > #include "optabs-query.h" > > #include "omp-low.h" > > +#include "tree-chkp.h" > > +#include "ipa-chkp.h" > > > > > > /* Return true when DECL can be referenced from current unit. > > @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator > > *gsi, > >unsigned int src_align, dest_align; > >tree off0; > > > > + /* Inlining of memcpy/memmove may cause bounds lost (if we copy > > +pointers as wide integer) and also may result in huge function > > +size because of inlined bounds copy. Thus don't inline for > > +functions we want to instrument in case pointers are copied. */ > > + if (flag_check_pointer_bounds > > + && chkp_instrumentable_p (cfun->decl) > > + /* Even if data may contain pointers we can inline if copy > > +less than a pointer size. */ > > + && (!tree_fits_uhwi_p (len) > > + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0) > > || tree_to_uhwi (len) >= POINTER_SIZE_UNITS > > > + /* Check data type for pointers. */ > > + && (!TREE_TYPE (src) > > + || !TREE_TYPE (TREE_TYPE (src)) > > + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src))) > > + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src) > > I don't think you can in any way rely on the pointer type of the src argument > as all pointer conversions are useless and memcpy and friends take void * > anyway. This check is looking for cases when we have type information indicating no pointers are copied. In case of 'void *' we have to assume pointers are copied and inlining is undesired. Test pr68337-2.c checks pointer type allows to enable inlining. Looks like this check misses || !COMPLETE_TYPE_P(TREE_TYPE (TREE_TYPE (src)))? > > Note that you also disable memmove to memcpy simplification with this > early check. Doesn't matter for MPX which uses the same implementation for both cases. > > Where is pointer transfer handled for MPX? I suppose it's not done > transparently > for all memory move instructions but explicitely by instrumented block copy > routines in libmpx? In which case how does that identify pointers vs. > non-pointers? It is handled by instrumentation pass. Compiler checks type of stored data to find pointer stores. Each pointer store is instrumented with bndstx call. MPX versions of memcpy, memmove etc. don't make any assumptions about type of copied data and just copy whole chunk of bounds metadata corresponding to copied block. Thanks, Ilya > > Richard. >
[ptx] overrride anchor hook
Jim discovered that he needed to override the anchoring hook when using a PPC host-side compiler, but didn't figure out why this was needed. Digging into it, I discovered that flag_section_anchors is cleared in toplev.c by the command line option machinery, if there are no anchor target hooks. However, that's done in the host compiler and the LTO machinery simply copies the value over into the LTO compiler. Usually that's fine, as the LTO compiler's for the same target architecture. Except when it's an offload compiler. The flags are not resanitized (perhaps that should be done?) Anyway, that led to the PTX accelerator compiler trying to do section anchory things. Fixed by overriding TARGET_USE_ANCHORS_FOR_SYMBOL_P to say 'no'. I guess at the next instance of an offload compiler seeing a 'surprising' combination of optimization flags, we'll need a resanitize hook? Jakub? nathan 2015-11-20 Nathan SidwellJames Norris * config/nvptx/nvptx.c (nvptx_use_anchors_for_symbol_p): New. (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Override. Index: config/nvptx/nvptx.c === --- config/nvptx/nvptx.c (revision 230657) +++ config/nvptx/nvptx.c (working copy) @@ -3895,6 +3895,19 @@ nvptx_cannot_copy_insn_p (rtx_insn *insn return false; } } + +/* Section anchors do not work. Initialization for flag_section_anchor + probes the existence of the anchoring target hooks and prevents + anchoring if they don't exist. However, we may be being used with + a host-side compiler that does support anchoring, and hence see + the anchor flag set (as it's not recalculated). So provide an + implementation denying anchoring. */ + +static bool +nvptx_use_anchors_for_symbol_p (const_rtx ARG_UNUSED (a)) +{ + return false; +} /* Record a symbol for mkoffload to enter into the mapping table. */ @@ -4914,6 +4927,9 @@ nvptx_goacc_reduction (gcall *call) #undef TARGET_CANNOT_COPY_INSN_P #define TARGET_CANNOT_COPY_INSN_P nvptx_cannot_copy_insn_p +#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P +#define TARGET_USE_ANCHORS_FOR_SYMBOL_P nvptx_use_anchors_for_symbol_p + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS nvptx_init_builtins #undef TARGET_EXPAND_BUILTIN
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On 20/11/15 12:27, James Greenhalgh wrote: On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: On 11/12/2015 09:39 AM, Evandro Menezes wrote: 2015-11-12 Evandro Menezes[AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. The arm part is ok too. It's just a comment. In the ChangeLog entry for arm.md I'd say "Add description." Thanks, Kyrill Please, commit if it's alright. The AArch64 part of this is OK. From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 9 Nov 2015 17:11:16 -0600 Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. --- gcc/config/aarch64/aarch64.md | 4 gcc/config/arm/arm.md | 3 +++ 2 files changed, 7 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1586256..d46f837 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -195,6 +195,10 @@ ;; 1 :=: yes (define_attr "far_branch" "" (const_int 0)) +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has +;; no predicated insns. +(define_attr "predicated" "yes,no" (const_string "no")) + ;; --- ;; Pipeline descriptions and scheduling ;; --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 73c3088..6bda491 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -105,6 +105,9 @@ (define_attr "fpu" "none,vfp" (const (symbol_ref "arm_fpu_attr"))) +; Predicated means that the insn form is conditionally executed based on a +; predicate. We default to 'no' because no Thumb patterns match this rule +; and not all ARM insns do. s/is conditionally executed/can be conditionally executed/ in the first sentence. Otherwise, this looks OK to me but I can't approve the ARM part, so you'll need to wait for a review from someone who can. Thanks, James (define_attr "predicated" "yes,no" (const_string "no")) ; LENGTH of an instruction (in bytes) -- 2.1.0.243.g30d45f7
Re: [Committed] S/390: Add bswaphi2 pattern
On 11/20/2015 01:23 PM, Richard Henderson wrote: > On 11/20/2015 12:52 PM, Andreas Krebbel wrote: >> +(define_insn "bswaphi2" >> + [(set (match_operand:HI 0 "register_operand" "=d") >> +(bswap:HI (match_operand:HI 1 "memory_operand" "RT")))] >> + "TARGET_CPU_ZARCH" >> + "lrvh\t%0,%1" >> + [(set_attr "type" "load") >> + (set_attr "op_type" "RXY") >> + (set_attr "z10prop" "z10_super")]) > > Surely it's better to arrange so that you can use STRVH as well. > And providing a fallback for the reg-reg case (e.g. LRVR+SRL). > > Although I suppose I don't see support for STRV in bswap32/64 either... Right, I totally forgot about the stores. I'll have a look. We even have a mem-mem variant (mvcin). But I found it rather difficult to use since the pattern would have to make sure that source and destination do not overlap. -Andreas-
Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF
On 20 Nov 14:31, Ilya Enkovich wrote: > 2015-11-20 14:28 GMT+03:00 Richard Biener: > > On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich > > wrote: > >> 2015-11-18 16:44 GMT+03:00 Richard Biener : > >>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich > >>> wrote: > Hi, > > When we compute vectypes we skip non-relevant phi nodes. But we process > non-relevant alive statements and thus may need vectype of non-relevant > live phi node to compute mask vectype. This patch enables vectype > computation for live phi nodes. Botostrapped and regtested on > x86_64-unknown-linux-gnu. OK for trunk? > >>> > >>> Hmm. What breaks if you instead skip all !relevant stmts and not > >>> compute vectype for life but not relevant ones? We won't ever > >>> "vectorize" !relevant ones, that is, we don't need their vector type. > >> > >> I tried it and got regression in SLP. It expected non-null vectype > >> for non-releveant but live statement. Regression was in > >> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90 > > > > Because somebody put a vector type check before > > > > if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > > return false; > > > > @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g > >tree mask_type; > >tree mask; > > > > + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > > +return false; > > + > >if (!VECTOR_BOOLEAN_TYPE_P (vectype)) > > return false; > > > > @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g > > ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; > > > >gcc_assert (ncopies >= 1); > > - if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > > -return false; > > > >if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def > >&& !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle > > > > fixes this particular fallout for me. > > I'll try it. With this fix it works fine, thanks! Bootstrapped and regtested on x86_64-unknown-linux-gnu. OK for trunk? Ilya -- gcc/ 2015-11-20 Ilya Enkovich Richard Biener * tree-vect-loop.c (vect_determine_vectorization_factor): Don't compute vectype for non-relevant mask producers. * gcc/tree-vect-stmts.c (vectorizable_comparison): Check stmt relevance earlier. gcc/testsuite/ 2015-11-20 Ilya Enkovich * gcc.dg/pr68327.c: New test. diff --git a/gcc/testsuite/gcc.dg/pr68327.c b/gcc/testsuite/gcc.dg/pr68327.c new file mode 100644 index 000..c3e6a94 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr68327.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +int a, d; +char b, c; + +void +fn1 () +{ + int i = 0; + for (; i < 1; i++) +d = 1; + for (; b; b++) +a = 1 && (d & b); +} diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 80937ec..592372d 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -439,7 +439,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) compute a factor. */ if (TREE_CODE (scalar_type) == BOOLEAN_TYPE) { - mask_producers.safe_push (stmt_info); + if (STMT_VINFO_RELEVANT_P (stmt_info)) + mask_producers.safe_push (stmt_info); bool_result = true; if (gimple_code (stmt) == GIMPLE_ASSIGN diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 0f64aaf..3723b26 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -7546,6 +7546,9 @@ vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi, tree mask_type; tree mask; + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) +return false; + if (!VECTOR_BOOLEAN_TYPE_P (vectype)) return false; @@ -7558,9 +7561,6 @@ vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi, ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; gcc_assert (ncopies >= 1); - if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) -return false; - if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle && reduc_def))
Re: [PATCH] fix vectorizer performance problem on cygwin hosted cross compiler
On Fri, Nov 20, 2015 at 8:21 AM, Jim Wilsonwrote: > A cygwin hosted cross compiler to aarch64-linux, compiling a C version > of linpack with -Ofast, produces code that runs 17% slower than a > linux hosted compiler. The problem shows up in the vect dump, where > some different vectorization optimization decisions were made by the > cygwin compiler than the linux compiler. That happened because > tree-vect-data-refs.c calls qsort in vect_analyze_data_ref_accesses, > and the newlib and glibc qsort routines sort the list differently. I > can reproduce the same problem on linux by adding the newlib qsort > sources to a gcc build. For an x86_64 target, I see about a 30% > performance loss using the newlib qsort. > > The qsort trouble turns out to be a problem in the qsort comparison > function, dr_group_sort_cmp. It does this > if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)) > { > cmp = compare_tree (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb)); > if (cmp != 0) > return cmp; > } > operand_equal_p calls STRIP_NOPS, so it will consider two trees to be > the same even if they have NOP_EXPR. However, compare_tree is not > calling STRIP_NOPS, so it handles trees with NOP_EXPRs differently > than trees without. The result is that depending on which array entry > gets used as the qsort pivot point, you can get very different sorts. > The newlib qsort happens to be accidentally choosing a bad pivot for > this testcase. The glibc qsort happens to be accidentally choosing a > good pivot for this testcase. This then triggers a cascading problem > in vect_analyze_data_ref_accesses which assumes that array entries > that pass the operand_equal_p test for the base address will end up > adjacent, and will only vectorize in that case. > > For a contrived example, suppose we have four entries to sort: (plus Y > 8), (mult A 4), (pointer_plus Z 16), and (nop (mult A 4)). Suppose we > choose the mult as the pivot point. The plus sorts before because > tree_code plus is less than mult. The pointer_plus sorts after for the > same reason. The nop sorts equal. So we end up with plus, mult, nop, > pointer_plus. The mult and nop are then combined into the same > vectorization group. Now suppose we choose the pointer_plus as the > pivot point. The plus and mult sort before. The nop sorts after. The > final result is plus, mult, pointer_plus, nop. And we fail to > vectorize as the mult and nop are not adjacent as they should be. > > When I modify compare_tree to call STRIP_NOPS, this problem goes away. > I get the same sort from both the newlib and glibc qsort functions, > and I get the same linpack performance from a cygwin hosted compiler > and a linux hosted compiler. > > This patch was tested with an x86_64 bootstrap and make check. There > were no regressions. I've also done a SPEC CPU2000 run with and > without the patch on aarch64-linux, there is no performance change. > And I've verified it by building linpack for aarch64-linux with cygwin > hosted cross compiler, x86_64 hosted cross compiler, and an aarch64 > native compiler. Ok. Thanks, Richard. > Jim
Re: Add uaddv4_optab, usubv4_optab
> Eric has just submitted a documentation path that documented the > {add,sub,mul,umul}v4 and negv3 patterns, so this should be > applied on top of that. OK, I'm going to apply it, thanks. Note that the comment at the beginning of expand_addsub_overflow describing the overall strategy ought to be adjusted if new patterns for the jump on carry are added. -- Eric Botcazou
[committed, trivial] Fix typo and trailing whitespace in dump-file strings in parloops
[ was: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def ] On 18/11/15 17:22, Bernhard Reutner-Fischer wrote: Bonus points for fixing the dump_file to parse in: >Parloops will fail because: >... >phi is n_2 = PHI>arg of phi to exit: value n_4(D) used outside loop >checking if it a part of reduction pattern: s/it a/it is/ This patch fixes a typo and trailing whitespace in dump-file strings in parloops. Build for c and fortran, tested -fdump-tree-parloops testcases. Committed to trunk as trivial. Thanks, - Tom Fix typo and trailing whitespace in dump-file strings in parloops 2015-11-19 Tom de Vries * tree-parloops.c (build_new_reduction): Fix trailing whitespace in dump-file string. (try_create_reduction_list): Same. Fix typo in dump-file string. --- gcc/tree-parloops.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 8d7912d..aca2370 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2383,7 +2383,7 @@ build_new_reduction (reduction_info_table_type *reduction_list, if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, - "Detected reduction. reduction stmt is: \n"); + "Detected reduction. reduction stmt is:\n"); print_gimple_stmt (dump_file, reduc_stmt, 0, 0); fprintf (dump_file, "\n"); } @@ -2564,7 +2564,7 @@ try_create_reduction_list (loop_p loop, print_generic_expr (dump_file, val, 0); fprintf (dump_file, " used outside loop\n"); fprintf (dump_file, - " checking if it a part of reduction pattern: \n"); + " checking if it is part of reduction pattern:\n"); } if (reduction_list->elements () == 0) {
Add uaddv4_optab, usubv4_optab
Toward fixing PR68385. I'm just starting a full round of testing, but extern void underflow(void) __attribute__((noreturn)); unsigned sub1(unsigned a, unsigned b) { unsigned r = a - b; if (r > a) underflow(); return r; } unsigned sub2(unsigned a, unsigned b) { unsigned r; if (__builtin_sub_overflow(a, b, )) underflow(); return r; } sub1: movl%edi, %eax subl%esi, %eax cmpl%eax, %edi jb .L7 rep ret ... sub2: movl%edi, %eax subl%esi, %eax jb .L16 rep ret ... r~ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 4c5e22a..10a004e 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -6156,6 +6156,22 @@ (const_string "4")] (const_string "")))]) +(define_expand "uaddv4" + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC +(plus:SWI (match_dup 1) (match_dup 2)) +(match_dup 1))) + (set (match_dup 0) + (plus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (ne (reg:CCC FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + "" +{ + ix86_fixup_binary_operands_no_copy (PLUS, mode, operands); +}) + ;; The lea patterns for modes less than 32 bits need to be matched by ;; several insns converted to real lea by splitters. @@ -6461,6 +6477,20 @@ (const_string "4")] (const_string "")))]) +(define_expand "usubv4" + [(parallel [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 1) (match_dup 2))) + (set (match_dup 0) + (minus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (ltu (reg:CC FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + "" +{ + ix86_fixup_binary_operands_no_copy (MINUS, mode, operands); +}) + (define_insn "*sub_3" [(set (reg FLAGS_REG) (compare (match_operand:SWI 1 "nonimmediate_operand" "0,0") diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 8b2deaa..9386e62 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4912,6 +4912,25 @@ address calculations. @code{add@var{m}3} is used if @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3} Similar, for other arithmetic operations. +@cindex @code{addv@var{m}4} instruction pattern +@item @samp{addv@var{m}4} +Add operand 2 and operand 1, storing the result in operand 0. If signed +overflow occurs during the addition, jump to the label in operand 3. + +@cindex @code{subv@var{m}4} instruction pattern +@cindex @code{mulv@var{m}4} instruction pattern +@item @samp{subv@var{m}4}, @samp{mulv@var{m}4} +Similar, for other signed arithmetic operations. + +@cindex @code{uaddv@var{m}4} instruction pattern +@item @samp{uaddv@var{m}4} +Like @code{addv@var{m}4}, except jump on unsigned overflow. + +@cindex @code{usubv@var{m}4} instruction pattern +@cindex @code{umulv@var{m}4} instruction pattern +@item @samp{usubv@var{m}4}, @samp{umulv@var{m}4} +Similar, for other unsigned arithmetic operations. + @cindex @code{fma@var{m}4} instruction pattern @item @samp{fma@var{m}4} Multiply operand 2 and operand 1, then add operand 3, storing the diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index bc77bdc..b15657f 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -546,6 +546,33 @@ expand_addsub_overflow (location_t loc, tree_code code, tree lhs, /* u1 +- u2 -> ur */ if (uns0_p && uns1_p && unsr_p) { + insn_code icode = optab_handler (code == PLUS_EXPR ? uaddv4_optab + : usubv4_optab, mode); + if (icode != CODE_FOR_nothing) + { + struct expand_operand ops[4]; + rtx_insn *last = get_last_insn (); + + res = gen_reg_rtx (mode); + create_output_operand ([0], res, mode); + create_input_operand ([1], op0, mode); + create_input_operand ([2], op1, mode); + create_fixed_operand ([3], do_error); + if (maybe_expand_insn (icode, 4, ops)) + { + last = get_last_insn (); + if (profile_status_for_fn (cfun) != PROFILE_ABSENT + && JUMP_P (last) + && any_condjump_p (last) + && !find_reg_note (last, REG_BR_PROB, 0)) + add_int_reg_note (last, REG_BR_PROB, PROB_VERY_UNLIKELY); + emit_jump (done_label); + goto do_error_label; + } + + delete_insns_since (last); + } + /* Compute the operation. On RTL level, the addition is always unsigned. */ res = expand_binop (mode, code == PLUS_EXPR ? add_optab : sub_optab, @@ -737,92 +764,88 @@ expand_addsub_overflow (location_t loc, tree_code code, tree lhs, gcc_assert (!uns0_p && !uns1_p && !unsr_p); /* s1 +- s2 ->
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
On Thu, 19 Nov 2015, Tom de Vries wrote: > On 17/11/15 15:53, Tom de Vries wrote: > > > And the above LIM example > > > is none for why you need two LIM passes... > > > > Indeed. I'm planning a separate reply to explain in more detail the need > > for the two pass_lims. > > I. > > I managed to get rid of the two pass_lims for the motivating example that I > used until now (goacc/kernels-double-reduction.c). I found that by adding a > pass_dominator instance after pass_ch, I could get rid of the second pass_lim > (and pass_copyprop as well). > > But... then I wrote a counter example (goacc/kernels-double-reduction-n.c), > and I'm back at two pass_lims (and two pass_dominators). > Also I've split the pass group into a bit before and after pass_fre. > > So, the current pass group looks like: > ... > NEXT_PASS (pass_build_ealias); > > /* Pass group that runs when the function is an offloaded function >containing oacc kernels loops. Part 1. */ > NEXT_PASS (pass_oacc_kernels); > PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) > /* We need pass_ch here, because pass_lim has no effect on >exit-first loops (PR65442). Ideally we want to remove both >this pass instantiation, and the reverse transformation >transform_to_exit_first_loop_alt, which is done in >pass_parallelize_loops_oacc_kernels. */ > NEXT_PASS (pass_ch); > POP_INSERT_PASSES () > > NEXT_PASS (pass_fre); > > /* Pass group that runs when the function is an offloaded function >containing oacc kernels loops. Part 2. */ > NEXT_PASS (pass_oacc_kernels2); > PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels2) > /* We use pass_lim to rewrite in-memory iteration and reduction >variable accesses in loops into local variables accesses. */ > NEXT_PASS (pass_lim); > NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); > NEXT_PASS (pass_lim); > NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); > NEXT_PASS (pass_dce); > NEXT_PASS (pass_parallelize_loops_oacc_kernels); > NEXT_PASS (pass_expand_omp_ssa); > POP_INSERT_PASSES () > NEXT_PASS (pass_merge_phi); > ... > > > II. > > The motivating test-case kernels-double-reduction-n.c: > ... > #include > > #define N 500 > > unsigned int a[N][N]; > > void __attribute__((noinline,noclone)) > foo (unsigned int n) > { > int i, j; > unsigned int sum = 1; > > #pragma acc kernels copyin (a[0:n]) copy (sum) > { > for (i = 0; i < n; ++i) > for (j = 0; j < n; ++j) > sum += a[i][j]; > } > > if (sum != 5001) > abort (); > } > ... > > > III. > > Before first pass_lim. Note no phis on inner or outer loop header for > iteration varables or reduction variable: > ... > : > _5 = *.omp_data_i_4(D).i; > *_5 = 0; > _44 = *.omp_data_i_4(D).n; > _45 = *_44; > if (_45 != 0) > goto ; > else > goto ; > > : outer loop header > _12 = *.omp_data_i_4(D).j; > *_12 = 0; > if (_45 != 0) > goto ; > else > goto ; > > : inner loop header, latch > _19 = *.omp_data_i_4(D).a; > _21 = *_5; > _23 = *_12; > _24 = *_19[_21][_23]; > _25 = *.omp_data_i_4(D).sum; > sum.0_26 = *_25; > sum.1_27 = _24 + sum.0_26; > *_25 = sum.1_27; > _33 = _23 + 1; > *_12 = _33; > j.2_16 = (unsigned int) _33; > if (j.2_16 < _45) > goto ; > else > goto ; > > : outer loop latch > _36 = *_5; > _38 = _36 + 1; > *_5 = _38; > i.3_9 = (unsigned int) _38; > if (i.3_9 < _45) > goto ; > else > goto ; > > : > return; > ... > > > IV. > > After first pass_lim/pass_dom pair. Note there are phis on the inner loop > header for the reduction and the iteration variable, but not on the outer loop > header: > ... > : > _5 = *.omp_data_i_4(D).i; > *_5 = 0; > _44 = *.omp_data_i_4(D).n; > _45 = *_44; > if (_45 != 0) > goto ; > else > goto ; > > : > _12 = *.omp_data_i_4(D).j; > _19 = *.omp_data_i_4(D).a; > D__lsm.10_50 = *_12; > D__lsm.11_51 = 0; > _25 = *.omp_data_i_4(D).sum; > > : outer loop header > D__lsm.10_20 = 0; > D__lsm.11_22 = 1; > _21 = *_5; > D__lsm.12_28 = *_25; > D__lsm.13_30 = 0; > goto ; > > : inner loop header, latch > # D__lsm.10_47 = PHI <0(5), _33(7)> > # D__lsm.12_49 = PHI> _23 = D__lsm.10_47; > _24 = *_19[_21][D__lsm.10_47]; > sum.0_26 = D__lsm.12_49; > sum.1_27 = _24 + D__lsm.12_49; > D__lsm.12_31 = sum.1_27; > D__lsm.13_32 = 1; > _33 = D__lsm.10_47 + 1; > D__lsm.10_14 = _33; > D__lsm.11_15 = 1; > j.2_16 = (unsigned int) _33; > if (j.2_16 < _45) > goto ; > else > goto ; > > : outer loop latch > # D__lsm.10_35 = PHI <_33(7)> > # D__lsm.11_37 = PHI <1(7)> > # D__lsm.12_7 = PHI > # D__lsm.13_8 = PHI <1(7)> > *_25 = sum.1_27; > _36 = *_5; > _38 = _36 + 1; > *_5 = _38; > i.3_9 = (unsigned int) _38; > if (i.3_9 < _45) > goto ; > else > goto ; > > : > #
Re: Add uaddv4_optab, usubv4_optab
On Fri, Nov 20, 2015 at 11:27:48AM +0100, Richard Henderson wrote: > Toward fixing PR68385. I'm just starting a full round of testing, but > > extern void underflow(void) __attribute__((noreturn)); > unsigned sub1(unsigned a, unsigned b) > { > unsigned r = a - b; > if (r > a) underflow(); > return r; > } > > unsigned sub2(unsigned a, unsigned b) > { > unsigned r; > if (__builtin_sub_overflow(a, b, )) underflow(); > return r; > } > > > sub1: > movl%edi, %eax > subl%esi, %eax > cmpl%eax, %edi > jb .L7 > rep ret > ... > sub2: > movl%edi, %eax > subl%esi, %eax > jb .L16 > rep ret > ... That looks good. > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -6156,6 +6156,22 @@ > (const_string "4")] > (const_string "")))]) > > +(define_expand "uaddv4" > + [(parallel [(set (reg:CCC FLAGS_REG) > +(compare:CCC > + (plus:SWI (match_dup 1) (match_dup 2)) > + (match_dup 1))) > + (set (match_dup 0) > +(plus:SWI (match_dup 1) (match_dup 2)))]) > + (set (pc) (if_then_else > +(ne (reg:CCC FLAGS_REG) (const_int 0)) > +(label_ref (match_operand 3)) > +(pc)))] > + "" > +{ > + ix86_fixup_binary_operands_no_copy (PLUS, mode, operands); > +}) Do we need this one on i?86? I'm not against adding it to optabs, so that other targets have a way to improve that, but doesn't combine handle this case on i?86 already well? I've been thinking of only transforming the above sub1 code (in forwprop, as richi suggested) to sub2 internal call + REALPART/IMAGPART extraction if the corresponding optab exists. > + > ;; The lea patterns for modes less than 32 bits need to be matched by > ;; several insns converted to real lea by splitters. > > @@ -6461,6 +6477,20 @@ > (const_string "4")] > (const_string "")))]) > > +(define_expand "usubv4" > + [(parallel [(set (reg:CC FLAGS_REG) > +(compare:CC (match_dup 1) (match_dup 2))) > + (set (match_dup 0) > +(minus:SWI (match_dup 1) (match_dup 2)))]) > + (set (pc) (if_then_else > +(ltu (reg:CC FLAGS_REG) (const_int 0)) > +(label_ref (match_operand 3)) > +(pc)))] If this works, it will be nice, I thought we'll need a new CC*mode. > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -4912,6 +4912,25 @@ address calculations. @code{add@var{m}3} is used if > @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3} > Similar, for other arithmetic operations. > > +@cindex @code{addv@var{m}4} instruction pattern > +@item @samp{addv@var{m}4} > +Add operand 2 and operand 1, storing the result in operand 0. If signed > +overflow occurs during the addition, jump to the label in operand 3. > + > +@cindex @code{subv@var{m}4} instruction pattern > +@cindex @code{mulv@var{m}4} instruction pattern > +@item @samp{subv@var{m}4}, @samp{mulv@var{m}4} > +Similar, for other signed arithmetic operations. > + > +@cindex @code{uaddv@var{m}4} instruction pattern > +@item @samp{uaddv@var{m}4} > +Like @code{addv@var{m}4}, except jump on unsigned overflow. > + > +@cindex @code{usubv@var{m}4} instruction pattern > +@cindex @code{umulv@var{m}4} instruction pattern > +@item @samp{usubv@var{m}4}, @samp{umulv@var{m}4} > +Similar, for other unsigned arithmetic operations. Eric has just submitted a documentation path that documented the {add,sub,mul,umul}v4 and negv3 patterns, so this should be applied on top of that. Jakub
Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF
2015-11-20 14:28 GMT+03:00 Richard Biener: > On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich wrote: >> 2015-11-18 16:44 GMT+03:00 Richard Biener : >>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich >>> wrote: Hi, When we compute vectypes we skip non-relevant phi nodes. But we process non-relevant alive statements and thus may need vectype of non-relevant live phi node to compute mask vectype. This patch enables vectype computation for live phi nodes. Botostrapped and regtested on x86_64-unknown-linux-gnu. OK for trunk? >>> >>> Hmm. What breaks if you instead skip all !relevant stmts and not >>> compute vectype for life but not relevant ones? We won't ever >>> "vectorize" !relevant ones, that is, we don't need their vector type. >> >> I tried it and got regression in SLP. It expected non-null vectype >> for non-releveant but live statement. Regression was in >> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90 > > Because somebody put a vector type check before > > if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > return false; > > @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g >tree mask_type; >tree mask; > > + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > +return false; > + >if (!VECTOR_BOOLEAN_TYPE_P (vectype)) > return false; > > @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g > ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; > >gcc_assert (ncopies >= 1); > - if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > -return false; > >if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def >&& !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle > > fixes this particular fallout for me. I'll try it. Thanks, Ilya > > Richard. > >> Ilya >> >>> >>> Richard. >>> Thanks, Ilya
Re: [PATCH 02/15] Add selftests to bitmap.c
On Thu, Nov 19, 2015 at 6:04 PM, David Malcolmwrote: > Jeff pre-approved the plugin version of this (as a new > file unittests/test-bitmap.c): > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03284.html > with: >> OK if/when prereqs are approved. Minor twiddling if we end up moving it >> elsewhere or standardizing/reducing header files is pre-approved. > > This version moves them to bitmap.c > > One issue: how to express: > TEST (bitmap_test, gc_alloc) > in a ChangeLog entry. > > I've chosen to write it as (bitmap_test, gc_alloc) since that > has the greatest chance of being found via grep. I think overloading CHECKING_P for this is bogus. Adding a new UNIT_TEST_FILE (like GENERATOR_FILE) would be better. Also selftest.h should be only conditionally included (before the tests themselves?) Richard. > gcc/ChangeLog: > * bitmap.c: Include "selftest.h". > (bitmap_test, gc_alloc): New selftest. > (bitmap_test, set_range): New selftest. > (bitmap_test, clear_bit_in_middle): New selftest. > (bitmap_test, copying): New selftest. > (bitmap_test, bitmap_single_bit_set_p): New selftest. > --- > gcc/bitmap.c | 92 > > 1 file changed, 92 insertions(+) > > diff --git a/gcc/bitmap.c b/gcc/bitmap.c > index f04b8f9..e6f772e 100644 > --- a/gcc/bitmap.c > +++ b/gcc/bitmap.c > @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see > #include "system.h" > #include "coretypes.h" > #include "bitmap.h" > +#include "selftest.h" > > /* Memory allocation statistics purpose instance. */ > mem_alloc_description bitmap_mem_desc; > @@ -2094,5 +2095,96 @@ debug (const bitmap_head *ptr) > fprintf (stderr, "\n"); > } > > +#if CHECKING_P > +namespace { > + > +/* Freshly-created bitmaps ought to be empty. */ > + > +TEST (bitmap_test, gc_alloc) > +{ > + bitmap b = bitmap_gc_alloc (); > + EXPECT_TRUE (bitmap_empty_p (b)); > +} > + > +/* Verify bitmap_set_range. */ > + > +TEST (bitmap_test, set_range) > +{ > + bitmap b = bitmap_gc_alloc (); > + EXPECT_TRUE (bitmap_empty_p (b)); > + > + bitmap_set_range (b, 7, 5); > + EXPECT_FALSE (bitmap_empty_p (b)); > + EXPECT_EQ (5, bitmap_count_bits (b)); > + > + /* Verify bitmap_bit_p at the boundaries. */ > + EXPECT_FALSE (bitmap_bit_p (b, 6)); > + EXPECT_TRUE (bitmap_bit_p (b, 7)); > + EXPECT_TRUE (bitmap_bit_p (b, 11)); > + EXPECT_FALSE (bitmap_bit_p (b, 12)); > +} > + > +/* Verify splitting a range into two pieces using bitmap_clear_bit. */ > + > +TEST (bitmap_test, clear_bit_in_middle) > +{ > + bitmap b = bitmap_gc_alloc (); > + > + /* Set b to [100..200]. */ > + bitmap_set_range (b, 100, 100); > + EXPECT_EQ (100, bitmap_count_bits (b)); > + > + /* Clear a bit in the middle. */ > + bool changed = bitmap_clear_bit (b, 150); > + EXPECT_TRUE (changed); > + EXPECT_EQ (99, bitmap_count_bits (b)); > + EXPECT_TRUE (bitmap_bit_p (b, 149)); > + EXPECT_FALSE (bitmap_bit_p (b, 150)); > + EXPECT_TRUE (bitmap_bit_p (b, 151)); > +} > + > +/* Verify bitmap_copy. */ > + > +TEST (bitmap_test, copying) > +{ > + bitmap src = bitmap_gc_alloc (); > + bitmap_set_range (src, 40, 10); > + > + bitmap dst = bitmap_gc_alloc (); > + EXPECT_FALSE (bitmap_equal_p (src, dst)); > + bitmap_copy (dst, src); > + EXPECT_TRUE (bitmap_equal_p (src, dst)); > + > + /* Verify that we can make them unequal again... */ > + bitmap_set_range (src, 70, 5); > + EXPECT_FALSE (bitmap_equal_p (src, dst)); > + > + /* ...and that changing src after the copy didn't affect > + the other: */ > + EXPECT_FALSE (bitmap_bit_p (dst, 70)); > +} > + > +/* Verify bitmap_single_bit_set_p. */ > +TEST (bitmap_test, bitmap_single_bit_set_p) > +{ > + bitmap b = bitmap_gc_alloc (); > + > + EXPECT_FALSE (bitmap_single_bit_set_p (b)); > + > + bitmap_set_range (b, 42, 1); > + EXPECT_TRUE (bitmap_single_bit_set_p (b)); > + EXPECT_EQ (42, bitmap_first_set_bit (b)); > + > + bitmap_set_range (b, 1066, 1); > + EXPECT_FALSE (bitmap_single_bit_set_p (b)); > + EXPECT_EQ (42, bitmap_first_set_bit (b)); > + > + bitmap_clear_range (b, 0, 100); > + EXPECT_TRUE (bitmap_single_bit_set_p (b)); > + EXPECT_EQ (1066, bitmap_first_set_bit (b)); > +} > + > +} // anon namespace > +#endif /* CHECKING_P */ > > #include "gt-bitmap.h" > -- > 1.8.5.3 >
Re: [PATCH 2/4][AArch64] Increase the loop peeling limit
On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote: > On 11/05/2015 02:51 PM, Evandro Menezes wrote: > >2015-11-05 Evandro Menezes> > > > gcc/ > > > > * config/aarch64/aarch64.c (aarch64_override_options_internal): > > Increase loop peeling limit. > > > >This patch increases the limit for the number of peeled insns. > >With this change, I noticed no major regression in either > >Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP > >ones, improved significantly. > > > >I tested this tuning on Exynos M1 and on A57. ThunderX seems to > >benefit from this tuning too. However, I'd appreciate comments > >from other stakeholders. > > Ping. I'd like to leave this for a call from the port maintainers. I can see why this leads to more opportunities for vectorization, but I'm concerned about the wider impact on code size. Certainly I wouldn't expect this to be our default at -O2 and below. My gut feeling is that this doesn't really belong in the back-end (there are presumably good reasons why the default for this parameter across GCC has fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd like Marcus or Richard to make the call as to whether or not we take this patch. For now, I'd drop it from the series (it stands alone anyway). Thanks, James
Re: [PATCH, PR68373 ] Call scev_const_prop in pass_parallelize_loops::execute
On Thu, 19 Nov 2015, Tom de Vries wrote: > On 17/11/15 23:20, Tom de Vries wrote: > > [ was: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def ] > > > > Hi, > > > > Consider test-case test.c, with a use of the final value of the > > iteration variable (return i): > > ... > > unsigned int > > foo (int *a, unsigned int n) > > { > >unsigned int i; > >for (i = 0; i < n; ++i) > > a[i] = 1; > > > >return i; > > } > > ... > > > > Compiled with: > > ... > > $ gcc -S -O2 test.c -ftree-parallelize-loops=2 -fdump-tree-all-details > > ... > > > > Before parloops, we have: > > ... > > : > ># i_12 = PHI <0(3), i_10(5)> > >_5 = (long unsigned int) i_12; > >_6 = _5 * 4; > >_8 = a_7(D) + _6; > >*_8 = 1; > >i_10 = i_12 + 1; > >if (n_4(D) > i_10) > > goto ; > >else > > goto ; > > > >: > >goto ; > > > >: > ># i_14 = PHI> > ... > > > > Parloops will fail because: > > ... > > phi is n_2 = PHI > > arg of phi to exit: value n_4(D) used outside loop > >checking if it a part of reduction pattern: > >FAILED: it is not a part of reduction > > ... > > [ note that the phi looks slightly different. In > > gather_scalar_reductions -> vect_analyze_loop_form -> > > vect_analyze_loop_form_1 -> split_loop_exit_edge we split the edge from > > bb4 to bb6. ] > > > > This patch uses scev_const_prop at the start of parloops. > > scev_const_prop first also splits the exit edge, and then replaces the > > phi with a assignment: > > ... > > final value replacement: > >n_2 = PHI > >with > >n_2 = n_4(D); > > ... > > > > This allows parloops to succeed. > > > > And there's a similar story when we compile with -fno-tree-scev-cprop in > > addition. > > > > Bootstrapped and reg-tested on x86_64. > > > > OK for stage3/stage1? > > The patch has been updated to do the final value replacement only for the loop > that parloops is processing, as suggested in review comment at > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02166.html . > > That means the patch is now also required for the kernels patch series. > > Bootstrapped and reg-tested on x86_64. > > OK for stage 3 trunk? Ok. Please mention tree-optimization/68373 in the changelog. Thanks, Richard. > Thanks, > - Tom > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [patch] Document new overflow arithmetics patterns
On Fri, Nov 20, 2015 at 11:12:06AM +0100, Eric Botcazou wrote: > Hi, > > this documents the new overflow arithmetics patterns added recently (addv4, > subv4, mulv4, umulv4, negv3) and only them, i.e. the old ones are still not. > This also fixes the description of the cbranch and jump patterns, which were > referring to a label_ref instead of a code_label. > > Tested with 'make doc' on x86-64/Linux, OK for mainline and 5 branch? > > > 2015-11-20 Eric Botcazou> > * doc/md.texi (Standard Names): Move entry for addptr3 around, > add entries for addv4, subv4, mulv4, umulv4 and negv3, fixes > glitch in entries for cbranch4 and jump. Ok, thanks. > +@cindex @code{negv@var{m}3} instruction pattern > +@item @samp{negv@var{m}3} > +Like @code{neg@var{m}2} but takes a @code{code_label} as operand 2 and > +emits code to jump to it if signed overflow occurs during the negation. Spurious space before "but". Jakub
Re: Add uaddv4_optab, usubv4_optab
> Toward fixing PR68385. I'm just starting a full round of testing, but Do you mind if I install my doc patch? It's slightly more thorough. -- Eric Botcazou
Re: [PATCH 2/5] [AARCH64] Change IMP and PART over to integers from strings.
Hi Andrew, On 17/11/15 22:10, Andrew Pinski wrote: Because the imp and parts are really integer rather than strings, this patch moves the comparisons to be integer. Also allows saving around integers are easier than doing string comparisons. This allows for the next change. The way I store BIG.little is (big<<12)|little as each part num is only 12bits long. So it would be nice if someone could test -mpu=native on a big.little system to make sure it works still. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski * config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are integer constants. * config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change implementer_id to unsigned char. Change part_no to unsigned int. (AARCH64_BIG_LITTLE): New define. (INVALID_IMP): New define. (INVALID_CORE): New define. (cpu_data): Change the last element's implementer_id and part_no to integers. (valid_bL_string_p): Rewrite to .. (valid_bL_core_p): this for integers instead of strings. (parse_field): New function. (contains_string_p): Rewrite to ... (contains_core_p): this for integers and only for the part_no. (host_detect_local_cpu): Rewrite handling of implementation and part num to be integers; simplifying the code. --- gcc/config/aarch64/aarch64-cores.def | 25 +- gcc/config/aarch64/driver-aarch64.c | 90 2 files changed, 62 insertions(+), 53 deletions(-) diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 0b456f7..798f3e3 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -33,25 +33,26 @@ This need not include flags implied by the architecture. COSTS is the name of the rtx_costs routine to use. IMP is the implementer ID of the CPU vendor. On a GNU/Linux system it can - be found in /proc/cpuinfo. + be found in /proc/cpuinfo. There is a list in the ARM ARM. PART is the part number of the CPU. On a GNU/Linux system it can be found - in /proc/cpuinfo. For big.LITTLE systems this should have the form at of - ".". */ + in /proc/cpuinfo. For big.LITTLE systems this should use the macro AARCH64_BIG_LITTLE + where the big part number comes as the first arugment to the macro and little is the + second. */ /* V8 Architecture Processors. */ -AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03") -AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07") -AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08") -AARCH64_CORE("exynos-m1", exynosm1, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001") -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800") -AARCH64_CORE("thunderx",thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1") -AARCH64_CORE("xgene1", xgene1,xgene1,8A, AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000") +AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03) +AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07) +AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08) +AARCH64_CORE("exynos-m1", exynosm1, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, 0x53, 0x001) +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, 0x51, 0x800) +AARCH64_CORE("thunderx",thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1) +AARCH64_CORE("xgene1", xgene1,xgene1,8A, AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000) /* V8 big.LITTLE implementations. */ -AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03") -AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08.0xd03") +AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE(0xd07, 0xd03)) +AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE(0xd08, 0xd03)) #undef AARCH64_CORE diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c
Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF
On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovichwrote: > 2015-11-18 16:44 GMT+03:00 Richard Biener : >> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich >> wrote: >>> Hi, >>> >>> When we compute vectypes we skip non-relevant phi nodes. But we process >>> non-relevant alive statements and thus may need vectype of non-relevant >>> live phi node to compute mask vectype. This patch enables vectype >>> computation for live phi nodes. Botostrapped and regtested on >>> x86_64-unknown-linux-gnu. OK for trunk? >> >> Hmm. What breaks if you instead skip all !relevant stmts and not >> compute vectype for life but not relevant ones? We won't ever >> "vectorize" !relevant ones, that is, we don't need their vector type. > > I tried it and got regression in SLP. It expected non-null vectype > for non-releveant but live statement. Regression was in > gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90 Because somebody put a vector type check before if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) return false; @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g tree mask_type; tree mask; + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) +return false; + if (!VECTOR_BOOLEAN_TYPE_P (vectype)) return false; @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; gcc_assert (ncopies >= 1); - if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) -return false; if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle fixes this particular fallout for me. Richard. > Ilya > >> >> Richard. >> >>> Thanks, >>> Ilya
[patch] Document new overflow arithmetics patterns
Hi, this documents the new overflow arithmetics patterns added recently (addv4, subv4, mulv4, umulv4, negv3) and only them, i.e. the old ones are still not. This also fixes the description of the cbranch and jump patterns, which were referring to a label_ref instead of a code_label. Tested with 'make doc' on x86-64/Linux, OK for mainline and 5 branch? 2015-11-20 Eric Botcazou* doc/md.texi (Standard Names): Move entry for addptr3 around, add entries for addv4, subv4, mulv4, umulv4 and negv3, fixes glitch in entries for cbranch4 and jump. -- Eric BotcazouIndex: doc/md.texi === --- doc/md.texi (revision 230589) +++ doc/md.texi (working copy) @@ -4872,17 +4872,6 @@ Add operand 2 and operand 1, storing the must have mode @var{m}. This can be used even on two-address machines, by means of constraints requiring operands 1 and 0 to be the same location. -@cindex @code{addptr@var{m}3} instruction pattern -@item @samp{addptr@var{m}3} -Like @code{add@var{m}3} but is guaranteed to only be used for address -calculations. The expanded code is not allowed to clobber the -condition code. It only needs to be defined if @code{add@var{m}3} -sets the condition code. If adds used for address calculations and -normal adds are not compatible it is required to expand a distinct -pattern (e.g. using an unspec). The pattern is used by LRA to emit -address calculations. @code{add@var{m}3} is used if -@code{addptr@var{m}3} is not defined. - @cindex @code{ssadd@var{m}3} instruction pattern @cindex @code{usadd@var{m}3} instruction pattern @cindex @code{sub@var{m}3} instruction pattern @@ -4912,6 +4901,35 @@ address calculations. @code{add@var{m}3 @itemx @samp{and@var{m}3}, @samp{ior@var{m}3}, @samp{xor@var{m}3} Similar, for other arithmetic operations. +@cindex @code{addv@var{m}4} instruction pattern +@item @samp{addv@var{m}4} +Like @code{add@var{m}3} but takes a @code{code_label} as operand 3 and +emits code to jump to it if signed overflow occurs during the addition. +This pattern is used to implement the built-in functions performing +signed integer addition with overflow checking. + +@cindex @code{subv@var{m}4} instruction pattern +@cindex @code{mulv@var{m}4} instruction pattern +@item @samp{subv@var{m}4}, @samp{mulv@var{m}4} +Similar, for other signed arithmetic operations. + +@cindex @code{umulv@var{m}4} instruction pattern +@item @samp{umulv@var{m}4} +Like @code{mulv@var{m}4} but for unsigned multiplication. That is to +say, the operation is the same as signed multiplication but the jump +is taken only on unsigned overflow. + +@cindex @code{addptr@var{m}3} instruction pattern +@item @samp{addptr@var{m}3} +Like @code{add@var{m}3} but is guaranteed to only be used for address +calculations. The expanded code is not allowed to clobber the +condition code. It only needs to be defined if @code{add@var{m}3} +sets the condition code. If adds used for address calculations and +normal adds are not compatible it is required to expand a distinct +pattern (e.g. using an unspec). The pattern is used by LRA to emit +address calculations. @code{add@var{m}3} is used if +@code{addptr@var{m}3} is not defined. + @cindex @code{fma@var{m}4} instruction pattern @item @samp{fma@var{m}4} Multiply operand 2 and operand 1, then add operand 3, storing the @@ -5277,6 +5295,11 @@ Reverse the order of bytes of operand 1 @item @samp{neg@var{m}2}, @samp{ssneg@var{m}2}, @samp{usneg@var{m}2} Negate operand 1 and store the result in operand 0. +@cindex @code{negv@var{m}3} instruction pattern +@item @samp{negv@var{m}3} +Like @code{neg@var{m}2} but takes a @code{code_label} as operand 2 and +emits code to jump to it if signed overflow occurs during the negation. + @cindex @code{abs@var{m}2} instruction pattern @item @samp{abs@var{m}2} Store the absolute value of operand 1 into operand 0. @@ -5926,13 +5949,13 @@ from the machine description. Conditional branch instruction combined with a compare instruction. Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 -is a @code{label_ref} that refers to the label to jump to. +is the @code{code_label} to jump to. @cindex @code{jump} instruction pattern @item @samp{jump} A jump inside a function; an unconditional branch. Operand 0 is the -@code{label_ref} of the label to jump to. This pattern name is mandatory -on all machines. +@code{code_label} to jump to. This pattern name is mandatory on all +machines. @cindex @code{call} instruction pattern @item @samp{call}
Re: OpenACC declare directive updates
On Thu, Nov 19, 2015 at 10:22:16AM -0600, James Norris wrote: > 2015-XX-XX James Norris> Cesar Philippidis > > gcc/fortran/ > * dump-parse-tree.c (show_namespace): Handle declares. > * gfortran.h (struct symbol_attribute): New fields. > (enum gfc_omp_map_map): Add OMP_MAP_DEVICE_RESIDENT and OMP_MAP_LINK. > (OMP_LIST_LINK): New enum. > (struct gfc_oacc_declare): New structure. > (gfc_get_oacc_declare): New definition. > (struct gfc_namespace): Change type. > (enum gfc_exec_op): Add EXEC_OACC_DECLARE. > (struct gfc_code): New field. > * module.c (enum ab_attribute): Add AB_OACC_DECLARE_CREATE, > AB_OACC_DECLARE_COPYIN, AB_OACC_DECLARE_DEVICEPTR, > AB_OACC_DECLARE_DEVICE_RESIDENT, AB_OACC_DECLARE_LINK > (attr_bits): Add new initializers. > (mio_symbol_attribute): Handle new atributes. > * openmp.c (gfc_free_oacc_declare_clauses): New function. > (gfc_match_oacc_clause_link: Likewise. > (OMP_CLAUSE_LINK): New definition. > (gfc_match_omp_clauses): Handle OMP_CLAUSE_LINK. > (OACC_DECLARE_CLAUSES): Add OMP_CLAUSE_LINK > (gfc_match_oacc_declare): Add checking and module handling. > (gfc_resolve_oacc_declare): Reimplement. > * parse.c (case_decl): Add ST_OACC_DECLARE. > (parse_spec): Remove handling. > (parse_progunit): Remove handling. > * parse.h (struct gfc_state_data): Change type. > * resolve.c (gfc_resolve_blocks): Handle EXEC_OACC_DECLARE. > * st.c (gfc_free_statement): Handle EXEC_OACC_DECLARE. > * symbol.c (check_conflict): Add conflict checks. > (gfc_add_oacc_declare_create, gfc_add_oacc_declare_copyin, > gfc_add_oacc_declare_deviceptr, gfc_add_oacc_declare_device_resident): > New functions. > (gfc_copy_attr): Handle new symbols. > * trans-decl.c (add_clause, find_module_oacc_declare_clauses, > finish_oacc_declare): New functions. > (gfc_generate_function_code): Replace with call. > * trans-openmp.c (gfc_trans_oacc_declare): Reimplement. > (gfc_trans_oacc_directive): Handle EXEC_OACC_DECLARE. > * trans-stmt.c (gfc_trans_block_construct): Replace with call. > * trans-stmt.h (gfc_trans_oacc_declare): Remove argument. > * trans.c (trans_code): Handle EXEC_OACC_DECLARE. > > gcc/testsuite > * gfortran.dg/goacc/declare-1.f95: Update test. > * gfortran.dg/goacc/declare-2.f95: New test. > > libgomp/ > * testsuite/libgomp.oacc-fortran/declare-1.f90: New test. > * testsuite/libgomp.oacc-fortran/declare-2.f90: Likewise. > * testsuite/libgomp.oacc-fortran/declare-3.f90: Likewise. > * testsuite/libgomp.oacc-fortran/declare-4.f90: Likewise. > * testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise. > * testsuite/libgomp.oacc-fortran/declare-5.f90: Likewise. Ok. Jakub
Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
On Fri, Nov 20, 2015 at 10:24 AM, Alan Haywardwrote: > When vectorising a integer induction condition reduction, > is_nonwrapping_integer_induction ends up with different values for base > during the analysis and build phases. In the first it is an INTEGER_CST, > in the second the loop has been vectorised out and the base is now a > variable. > > This results in the analysis and build stage detecting the > STMT_VINFO_VEC_REDUCTION_TYPE as different types. > > The easiest way to fix this is to only check for integer induction > conditions on the analysis stage. I don't like this. For the evolution part we have added STMT_VINFO_LOOP_PHI_EVOLUTION_PART. If you now need the original initial value as well then just save it. Or if you really want to go with the hack then please do not call is_nonwrapping_integer_induction with vec_stmt != NULL but initialize cond_expr_is_nonwrapping_integer_induction from STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) The hack also lacks a comment. Richard. > gcc/ > PR tree-optimization/68413 > * tree-vect-loop.c (vectorizable_reduction): Only check for > integer cond reduction on analysis stage. > > > > > Thanks, > Alan. >
Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument
On 19 Nov 18:19, Richard Biener wrote: > On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt >wrote: > >On 11/19/2015 05:31 PM, Ilya Enkovich wrote: > >> Currently we fold all memcpy/memmove calls with a known data size. > >> It causes two problems when used with Pointer Bounds Checker. > >> The first problem is that we may copy pointers as integer data > >> and thus loose bounds. The second problem is that if we inline > >> memcpy, we also have to inline bounds copy and this may result > >> in a huge amount of code and significant compilation time growth. > >> This patch disables folding for functions we want to instrument. > >> > >> Does it look reasonable for trunk and GCC5 branch? Bootstrapped > >> and regtested on x86_64-unknown-linux-gnu. > > > >Can't see anything wrong with it. Ok. > > But for small sizes this can have a huge impact on optimization. Which is > why we have the code in the first place. I'd make the check less broad, for > example inlining copies of size less than a pointer shouldn't be affected. Right. We also may inline in case we know no pointers are copied. Below is a version with extended condition and a couple more tests. Bootstrapped and regtested on x86_64-unknown-linux-gnu. Does it OK for trunk and gcc-5-branch? > > Richard. > > > > >Bernd > > Thanks, Ilya -- gcc/ 2015-11-20 Ilya Enkovich * gimple-fold.c (gimple_fold_builtin_memory_op): Don't fold call if we are going to instrument it and it may copy pointers. gcc/testsuite/ 2015-11-20 Ilya Enkovich * gcc.target/i386/mpx/pr68337-1.c: New test. * gcc.target/i386/mpx/pr68337-2.c: New test. * gcc.target/i386/mpx/pr68337-3.c: New test. diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 1ab20d1..dd9f80b 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3. If not see #include "gomp-constants.h" #include "optabs-query.h" #include "omp-low.h" +#include "tree-chkp.h" +#include "ipa-chkp.h" /* Return true when DECL can be referenced from current unit. @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, unsigned int src_align, dest_align; tree off0; + /* Inlining of memcpy/memmove may cause bounds lost (if we copy +pointers as wide integer) and also may result in huge function +size because of inlined bounds copy. Thus don't inline for +functions we want to instrument in case pointers are copied. */ + if (flag_check_pointer_bounds + && chkp_instrumentable_p (cfun->decl) + /* Even if data may contain pointers we can inline if copy +less than a pointer size. */ + && (!tree_fits_uhwi_p (len) + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0) + /* Check data type for pointers. */ + && (!TREE_TYPE (src) + || !TREE_TYPE (TREE_TYPE (src)) + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src))) + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src) + return false; + /* Build accesses at offset zero with a ref-all character type. */ off0 = build_int_cst (build_pointer_type_for_mode (char_type_node, ptr_mode, true), 0); diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c new file mode 100644 index 000..3f8d79d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-options "-fcheck-pointer-bounds -mmpx" } */ + +#include "mpx-check.h" + +#define N 2 + +extern void abort (); + +static int +mpx_test (int argc, const char **argv) +{ + char ** src = (char **)malloc (sizeof (char *) * N); + char ** dst = (char **)malloc (sizeof (char *) * N); + int i; + + for (i = 0; i < N; i++) +src[i] = __bnd_set_ptr_bounds (argv[0] + i, i + 1); + + __builtin_memcpy(dst, src, sizeof (char *) * N); + + for (i = 0; i < N; i++) +{ + char *p = dst[i]; + if (p != argv[0] + i + || __bnd_get_ptr_lbound (p) != p + || __bnd_get_ptr_ubound (p) != p + i) + abort (); +} + + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c b/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c new file mode 100644 index 000..16736b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-fcheck-pointer-bounds -mmpx" } */ +/* { dg-final { scan-assembler-not "memcpy" } } */ + +void +test1 (char *dst, char *src) +{ + __builtin_memcpy (dst, src, sizeof (char *) * 2); +} + +void +test2 (void *dst, void *src) +{ + __builtin_memcpy (dst, src, sizeof (char *) / 2); +} + +struct s +{ + int a; + int b; +}; + +void +test3 (struct s *dst, struct s *src) +{ +
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
On Thu, 19 Nov 2015, Tom de Vries wrote: > On 16/11/15 13:45, Richard Biener wrote: > > > I've eliminated all the uses for pass_tree_loop_init/pass_tree_loop_done > > > in > > > >the pass group. Instead, I've added conditional loop optimizer setup in: > > > >- pass_lim and pass_scev_cprop (added in this patch), and > > Reposting the "Add pass_oacc_kernels pass group in passes.def" patch. > > pass_scev_cprop is no longer part of the pass group. > > And I've dropped the scev_initialize in pass_lim. > > Pass_lim is part of the pass_tree_loop pass group, where AFAIU scev info is > initialized at the start of the pass group and updated or reset by passes in > the pass group if necessary, such that it's always available, or can be > recalculated on the spot. > > First, pass_lim doesn't invalidate scev info. And second, AFAIU pass_lim > doesn't use scev info. So there doesn't seem to be a need to do anything about > scev info for using pass_lim outside pass_tree_loop. > > > > >- pass_parallelize_loops_oacc_kernels (added in patch "Add > > > > pass_parallelize_loops_oacc_kernels"). > > You miss calling scev_finalize (). > > I've added the scev_finalize () in patch "Add > pass_parallelize_loops_oacc_kernels". pass_lim::execute (function *fun) { + if (!loops_state_satisfies_p (LOOPS_NORMAL + | LOOPS_HAVE_RECORDED_EXITS)) +loop_optimizer_init (LOOPS_NORMAL +| LOOPS_HAVE_RECORDED_EXITS); + note that this will, when not in the loop pipeline, not properly fixup loops if LOOPS_NEED_FIXUP is set (that doesn't clear other loop flags). I'd rather make loop_optimizer_init do nothing if requested flags are already set and no fixup is needed and call the above unconditionally. Thus sth like Index: gcc/loop-init.c === --- gcc/loop-init.c (revision 230649) +++ gcc/loop-init.c (working copy) @@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags) calculate_dominance_info (CDI_DOMINATORS); if (!needs_fixup) - checking_verify_loop_structure (); + { + checking_verify_loop_structure (); + if (loops_state_satisfies_p (flags)) + goto out; + } /* Clear all flags. */ if (recorded_exits) @@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags) /* Apply flags to loops. */ apply_loop_flags (flags); + checking_verify_loop_structure (); + +out: /* Dump loops. */ flow_loops_dump (dump_file, NULL, 1); - checking_verify_loop_structure (); - timevar_pop (TV_LOOP_INIT); } if (number_of_loops (fun) <= 1) return 0; + if (!loops_state_satisfies_p (LOOP_CLOSED_SSA)) +rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa); + return tree_ssa_lim (); } that looks bogus. The into-loop-closed SSA rewrite should be only done if the state _satisfies_ it. I understand LIM doesn't require loop-closed SSA. But it also doesn't destroy it obviously. So just remove that. > Thanks, > - Tom > > -- Richard BienerSUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: Add uaddv4_optab, usubv4_optab
On 11/20/2015 11:56 AM, Eric Botcazou wrote: Eric has just submitted a documentation path that documented the {add,sub,mul,umul}v4 and negv3 patterns, so this should be applied on top of that. OK, I'm going to apply it, thanks. Thanks. Note that the comment at the beginning of expand_addsub_overflow describing the overall strategy ought to be adjusted if new patterns for the jump on carry are added. Ok, I'll do that. r~
Re: Add uaddv4_optab, usubv4_optab
On 11/20/2015 11:43 AM, Jakub Jelinek wrote: +(define_expand "uaddv4" + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC +(plus:SWI (match_dup 1) (match_dup 2)) +(match_dup 1))) + (set (match_dup 0) + (plus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (ne (reg:CCC FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + "" +{ + ix86_fixup_binary_operands_no_copy (PLUS, mode, operands); +}) Do we need this one on i?86? I'm not against adding it to optabs, so that other targets have a way to improve that, but doesn't combine handle this case on i?86 already well? Perhaps combine can do the job, but in my option it's better to have the optab than not. Especially when it's so easy to add in this case. +(define_expand "usubv4" + [(parallel [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 1) (match_dup 2))) + (set (match_dup 0) + (minus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (ltu (reg:CC FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] If this works, it will be nice, I thought we'll need a new CC*mode. No, we just need to re-use the existing insn that performs the low half of a double-word subtraction operation. Eric has just submitted a documentation path that documented the {add,sub,mul,umul}v4 and negv3 patterns, so this should be applied on top of that. Ok, I'll look out for that. r~
Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m
Hi Andre, On 18/11/15 09:44, Andre Vieira wrote: On 17/11/15 10:10, James Greenhalgh wrote: On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote: On 16/11/15 12:07, James Greenhalgh wrote: On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote: Hi, This patch changes the target support mechanism to make it recognize any ARM 'M' profile as a non-neon supporting target. The current check only tests for armv6 architectures and earlier, and does not account for armv7-m. This is correct because there is no 'M' profile that supports neon and the current test is not sufficient to exclude armv7-m. Tested by running regressions for this testcase for various ARM targets. Is this OK to commit? Thanks, Andre Vieira gcc/testsuite/ChangeLog: 2015-11-06 Andre Vieira* gcc/testsuite/lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Added check for M profile. From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Fri, 13 Nov 2015 11:16:34 + Subject: [PATCH] Disable neon testing for armv7-m --- gcc/testsuite/lib/target-supports.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } { int dummy; /* Avoid the case where a test adds -mfpu=neon, but the toolchain is configured for -mcpu=arm926ej-s, for example. */ -#if __ARM_ARCH < 7 +#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M' #error Architecture too old for NEON. Could you fix this #error message while you're here? Why we can't change this test to look for the __ARM_NEON macro from ACLE: #if __ARM_NEON < 1 #error NEON is not enabled #endif Thanks, James There is a check for this already: 'check_effective_target_arm_neon'. I think the idea behind arm_neon_ok is to check whether the hardware would support neon, whereas arm_neon is to check whether neon was enabled, i.e. -mfpu=neon was used or a mcpu was passed that has neon enabled by default. The comments for 'check_effective_target_arm_neon_ok_nocache' highlight this, though maybe the comments for check_effective_target_arm_neon could be better. # Return 1 if this is an ARM target supporting -mfpu=neon # -mfloat-abi=softfp or equivalent options. Some multilibs may be # incompatible with these options. Also set et_arm_neon_flags to the # best options to add. proc check_effective_target_arm_neon_ok_nocache ... /* Avoid the case where a test adds -mfpu=neon, but the toolchain is configured for -mcpu=arm926ej-s, for example. */ ... and # Return 1 if this is a ARM target with NEON enabled. proc check_effective_target_arm_neon OK, got it - sorry for my mistake, I had the two procs confused. I'd still like to see the error message fixed "Architecture too old for NEON." is not an accurate description of the problem. Thanks, James This OK? This is ok, I've committed for you with the slightly tweaked ChangeLog entry: 2015-11-20 Andre Vieira * lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Add check for M profile. as r230653. Thanks, Kyrill Cheers, Andre
[RFC] [Patch] PR67326 - relax trap assumption by looking at similar DRS
Hi Richard, As per Jakub suggestion in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67326, the below patch fixes the regression in tree if conversion. Basically allowing if conversion to happen for a candidate DR, if we find similar DR with same dimensions and that DR will not trap. To find similar DRs using hash table to hashing the offset and DR pairs. Also reusing read/written information that was stored for reference tree. Also. (1) I guard these checks for -ftree-loop-if-convert-stores and -fno-common. Sometimes vectorization flags also triggers if conversion. (2) Also hashing base DRs for writes only. gcc/ChangeLog 2015-11-19 VenkataramananPR tree-optimization/67326 * tree-if-conv.c (offset_DR_map): Define. (struct ifc_dr): Add new tree base_predicate field. (hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash offsets, DR pairs and hash base ref, DR pairs for write type DRs. (ifcvt_memrefs_wont_trap): Guard checks with -ftree-loop-if-convert-stores flag. Check for similar DR that are accessed unconditionally. (if_convertible_loop_p_1): Initialize and delete offset hash maps gcc/testsuite/ChangeLog 2015-11-19 Venkataramanan * gcc.dg/tree-ssa/ifc-pr67326.c: Add new. Regstrapped on x86_64, Ok for trunk? Regards, Venkat. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c new file mode 100644 index 000..5ca11cd --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-ifcvt-details -fno-common -ftree-loop-if-convert-stores" } */ + +float v0[1024]; +float v1[1024]; +float v2[1024]; +float v3[1024]; + +void condAssign1 () { + for (int i=0; i<1024; ++i) +v0[i] = (v2[i]>v1[i]) ? v2[i]*v3[i] : v0[i]; +} + +void condAssign2 () { + for (int i=0; i<1024; ++i) +v0[i] = (v2[i]>v1[i]) ? v2[i]*v1[i] : v0[i]; +} + +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 2 "ifcvt" } } */ diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 01065cb..1b4d220 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -124,6 +124,9 @@ static hash_map *ref_DR_map; /* Hash table to store base reference, DR pairs. */ static hash_map *baseref_DR_map; +/* Hash table to store DR offset, DR pairs. */ +static hash_map *offset_DR_map; + /* Structure used to predicate basic blocks. This is attached to the ->aux field of the BBs in the loop to be if-converted. */ struct bb_predicate { @@ -589,6 +592,8 @@ struct ifc_dr { int rw_unconditionally; tree predicate; + + tree base_predicate; }; #define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux) @@ -609,11 +614,12 @@ static void hash_memrefs_baserefs_and_store_DRs_read_written_info (data_reference_p a) { - data_reference_p *master_dr, *base_master_dr; + data_reference_p *master_dr, *base_master_dr, *offset_master_dr; tree ref = DR_REF (a); tree base_ref = DR_BASE_OBJECT (a); + tree offset = DR_OFFSET (a); tree ca = bb_predicate (gimple_bb (DR_STMT (a))); - bool exist1, exist2; + bool exist1, exist2, exist3; while (TREE_CODE (ref) == COMPONENT_REF || TREE_CODE (ref) == IMAGPART_EXPR @@ -636,22 +642,34 @@ hash_memrefs_baserefs_and_store_DRs_read_written_info (data_reference_p a) if (is_true_predicate (IFC_DR (*master_dr)->predicate)) DR_RW_UNCONDITIONALLY (*master_dr) = 1; - base_master_dr = _DR_map->get_or_insert (base_ref,); - - if (!exist2) + if (DR_IS_WRITE (a)) { - IFC_DR (a)->predicate = ca; - *base_master_dr = a; + base_master_dr = _DR_map->get_or_insert (base_ref,); + if (!exist2) + { + IFC_DR (a)->base_predicate = ca; + *base_master_dr = a; + } + else + IFC_DR (*base_master_dr)->base_predicate + = fold_or_predicates + (EXPR_LOCATION (IFC_DR (*base_master_dr)->base_predicate), +ca, IFC_DR (*base_master_dr)->base_predicate); + + if (is_true_predicate (IFC_DR (*base_master_dr)->base_predicate)) + DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) = 1; } - else -IFC_DR (*base_master_dr)->predicate - = fold_or_predicates - (EXPR_LOCATION (IFC_DR (*base_master_dr)->predicate), -ca, IFC_DR (*base_master_dr)->predicate); - if (DR_IS_WRITE (a) - && (is_true_predicate (IFC_DR (*base_master_dr)->predicate))) -DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) = 1; + if (offset) +{ + offset_master_dr = _DR_map->get_or_insert (offset,); + if (!exist3) + *offset_master_dr = a; + + if (DR_RW_UNCONDITIONALLY (*offset_master_dr) != 1) + DR_RW_UNCONDITIONALLY (*offset_master_dr) + =
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: > On 11/12/2015 09:39 AM, Evandro Menezes wrote: >2015-11-12 Evandro Menezes> >[AArch64] Add attribute for compatibility with ARM pipeline models > >gcc/ > >* config/aarch64/aarch64.md (predicated): Copy attribute from >"arm.md". >* config/arm/arm.md (predicated): Added description. > > Please, commit if it's alright. The AArch64 part of this is OK. > From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 > From: Evandro Menezes > Date: Mon, 9 Nov 2015 17:11:16 -0600 > Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM > pipeline models > > gcc/ > * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". > * config/arm/arm.md (predicated): Added description. > --- > gcc/config/aarch64/aarch64.md | 4 > gcc/config/arm/arm.md | 3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 1586256..d46f837 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -195,6 +195,10 @@ > ;; 1 :=: yes > (define_attr "far_branch" "" (const_int 0)) > > +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 > has > +;; no predicated insns. > +(define_attr "predicated" "yes,no" (const_string "no")) > + > ;; --- > ;; Pipeline descriptions and scheduling > ;; --- > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 73c3088..6bda491 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -105,6 +105,9 @@ > (define_attr "fpu" "none,vfp" >(const (symbol_ref "arm_fpu_attr"))) > > +; Predicated means that the insn form is conditionally executed based on a > +; predicate. We default to 'no' because no Thumb patterns match this rule > +; and not all ARM insns do. s/is conditionally executed/can be conditionally executed/ in the first sentence. Otherwise, this looks OK to me but I can't approve the ARM part, so you'll need to wait for a review from someone who can. Thanks, James > (define_attr "predicated" "yes,no" (const_string "no")) > > ; LENGTH of an instruction (in bytes) > -- > 2.1.0.243.g30d45f7 >
Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
On 20/11/2015 11:00, "Richard Biener"wrote: >On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward >wrote: >> When vectorising a integer induction condition reduction, >> is_nonwrapping_integer_induction ends up with different values for base >> during the analysis and build phases. In the first it is an INTEGER_CST, >> in the second the loop has been vectorised out and the base is now a >> variable. >> >> This results in the analysis and build stage detecting the >> STMT_VINFO_VEC_REDUCTION_TYPE as different types. >> >> The easiest way to fix this is to only check for integer induction >> conditions on the analysis stage. > >I don't like this. For the evolution part we have added >STMT_VINFO_LOOP_PHI_EVOLUTION_PART. If you now need >the original initial value as well then just save it. > >Or if you really want to go with the hack then please do not call >is_nonwrapping_integer_induction with vec_stmt != NULL but >initialize cond_expr_is_nonwrapping_integer_induction from >STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) > >The hack also lacks a comment. > Ok. I've gone for a combination of both: I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE. I've removed the vec_stmt != NULL checks. I've moved the call to is_nonwrapping_integer_induction until after the vect_is_simple_reduction check. I never liked that I had is_nonwrapping_integer_induction early in the function, and think this looks better. Alan. analysisonlycondcheck2.patch Description: Binary data
Re: GCC 5.3 Status Report (2015-11-20)
On Fri, Nov 20, 2015 at 7:53 AM, Richard Bienerwrote: > > Status > == > > We plan to do a GCC 5.3 release candidate at the end of next week > followed by the actual release a week after that. > > So now is the time to look at your regression bugs in bugzilla and > do some backporting for things already fixed on trunk. I'm still waiting for approval of the libtool change to support AIX TLS symbols (PR 68192). There has been no response on Libtool patches mailing list. I again request permission to apply the patches to GCC trunk and 5-branch. Thanks, David
[PATCH][ARM] Enable fusion of AES instructions
Enable instruction fusion of AES instructions on ARM for Cortex-A53 and Cortex-A57. OK for commit? ChangeLog: 2015-11-20 Wilco Dijkstra* gcc/config/arm/arm.c (arm_cortex_a53_tune): Add AES fusion. (arm_cortex_a57_tune): Likewise. (aarch_macro_fusion_pair_p): Add support for AES fusion. * gcc/config/arm/arm-protos.h (fuse_ops): Add FUSE_AES_AESMC. --- gcc/config/arm/arm-protos.h | 5 +++-- gcc/config/arm/arm.c| 9 +++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index f9b1276..4801bb8 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -302,8 +302,9 @@ struct tune_params enum fuse_ops { FUSE_NOTHING = 0, -FUSE_MOVW_MOVT = 1 << 0 - } fusible_ops: 1; +FUSE_MOVW_MOVT = 1 << 0, +FUSE_AES_AESMC = 1 << 1 + } fusible_ops: 2; /* Depth of scheduling queue to check for L2 autoprefetcher. */ enum {SCHED_AUTOPREF_OFF, SCHED_AUTOPREF_RANK, SCHED_AUTOPREF_FULL} sched_autopref: 2; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 02f5dc3..7077199 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1969,7 +1969,7 @@ const struct tune_params arm_cortex_a53_tune = tune_params::DISPARAGE_FLAGS_NEITHER, tune_params::PREF_NEON_64_FALSE, tune_params::PREF_NEON_STRINGOPS_TRUE, - FUSE_OPS (tune_params::FUSE_MOVW_MOVT), + FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC), tune_params::SCHED_AUTOPREF_OFF }; @@ -1992,7 +1992,7 @@ const struct tune_params arm_cortex_a57_tune = tune_params::DISPARAGE_FLAGS_ALL, tune_params::PREF_NEON_64_FALSE, tune_params::PREF_NEON_STRINGOPS_TRUE, - FUSE_OPS (tune_params::FUSE_MOVW_MOVT), + FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC), tune_params::SCHED_AUTOPREF_FULL }; @@ -29668,6 +29668,11 @@ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* curr) && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set))) return true; } + + if (current_tune->fusible_ops & tune_params::FUSE_AES_AESMC + && aarch_crypto_can_dual_issue (prev, curr)) +return true; + return false; } -- 1.9.1
Re: [PATCH] Fix memory leaks in tree-ssa-uninit.c
On 11/20/2015 03:14 AM, Bernd Schmidt wrote: > BTW, I'm with whoever said absolutely no way to the idea of making automatic > changes like this as part of a commit hook. > > I think the whitespace change can go in if it hasn't already, but I think the > other one still has enough problems that I'll say - leave it for the next > stage 1. > >> @@ -178,8 +173,9 @@ warn_uninitialized_vars (bool >> warn_possibly_uninitialized) >> >>FOR_EACH_BB_FN (bb, cfun) >> { >> - bool always_executed = dominated_by_p (CDI_POST_DOMINATORS, >> - single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb); >> + bool always_executed >> += dominated_by_p (CDI_POST_DOMINATORS, >> + single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb); >>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ()) >> { > > Better to pull the single_succ into its own variable perhaps? > >> @@ -1057,7 +1039,8 @@ prune_uninit_phi_opnds_in_unrealizable_paths (gphi >> *phi, >> *visited_flag_phis = BITMAP_ALLOC (NULL); >> >> if (bitmap_bit_p (*visited_flag_phis, >> -SSA_NAME_VERSION (gimple_phi_result (flag_arg_def >> +SSA_NAME_VERSION ( >> + gimple_phi_result (flag_arg_def >> return false; >> >> bitmap_set_bit (*visited_flag_phis, > > Pull the gimple_phi_result into a separate variable, or just leave it > unchanged for now, the modified version is too ugly to live. > >> bitmap_clear_bit (*visited_flag_phis, >> -SSA_NAME_VERSION (gimple_phi_result (flag_arg_def))); >> +SSA_NAME_VERSION ( >> + gimple_phi_result (flag_arg_def))); >> continue; >> } > > Here too. > >> - all_pruned = prune_uninit_phi_opnds_in_unrealizable_paths (phi, >> - uninit_opnds, >> - as_a (flag_def), >> - boundary_cst, >> - cmp_code, >> - visited_phis, >> - _flag_phis); >> + all_pruned = prune_uninit_phi_opnds_in_unrealizable_paths >> +(phi, uninit_opnds, as_a (flag_def), boundary_cst, cmp_code, >> + visited_phis, _flag_phis); > > I'd rather shorten the name of the function, even if it goes against the > spirit of the novel writing month. > >> - if (gphi *use_phi = dyn_cast (use_stmt)) >> -use_bb = gimple_phi_arg_edge (use_phi, >> - PHI_ARG_INDEX_FROM_USE (use_p))->src; >> + if (gphi *use_phi = dyn_cast (use_stmt)) >> +use_bb >> + = gimple_phi_arg_edge (use_phi, PHI_ARG_INDEX_FROM_USE (use_p))->src; >> else >> use_bb = gimple_bb (use_stmt); > > There are some changes of this nature and I'm not sure it's an improvement. > Leave them out for now? > >> @@ -2345,8 +2309,8 @@ warn_uninitialized_phi (gphi *phi, vec >> *worklist, >> } >> >> /* Now check if we have any use of the value without proper guard. */ >> - uninit_use_stmt = find_uninit_use (phi, uninit_opnds, >> - worklist, added_to_worklist); >> + uninit_use_stmt >> += find_uninit_use (phi, uninit_opnds, worklist, added_to_worklist); >> >> /* All uses are properly guarded. */ >> if (!uninit_use_stmt) > > Here too. > >> @@ -2396,12 +2359,24 @@ public: >> {} >> >> /* opt_pass methods: */ >> - opt_pass * clone () { return new pass_late_warn_uninitialized (m_ctxt); } >> - virtual bool gate (function *) { return gate_warn_uninitialized (); } >> + opt_pass *clone (); >> + virtual bool gate (function *); > > This may technically violate our coding standards, but it's consistent with a > lot of similar cases. Since coding standards are about enforcing consistency, > I'd drop this change. > >> namespace { >> >> const pass_data pass_data_early_warn_uninitialized = >> { >> - GIMPLE_PASS, /* type */ >> + GIMPLE_PASS, /* type */ >> "*early_warn_uninitialized", /* name */ >> - OPTGROUP_NONE, /* optinfo_flags */ >> - TV_TREE_UNINIT, /* tv_id */ >> - PROP_ssa, /* properties_required */ >> - 0, /* properties_provided */ >> - 0, /* properties_destroyed */ >> - 0, /* todo_flags_start */ >> - 0, /* todo_flags_finish */ >> + OPTGROUP_NONE, /* optinfo_flags */ >> + TV_TREE_UNINIT, /* tv_id */ >> + PROP_ssa, /* properties_required */ >> + 0, /* properties_provided */ >> + 0, /* properties_destroyed */ >> + 0, /* todo_flags_start */ >> + 0, /* todo_flags_finish */ >> }; > > Likewise. Seems to be done practically nowhere. > >> class pass_early_warn_uninitialized : public gimple_opt_pass >> @@ -2519,14 +2491,23 @@ public: >> {} >> >> /* opt_pass methods: */ >> - virtual bool gate (function *) { return gate_warn_uninitialized (); } >> -
Re: [PATCH] Add clang-format config to contrib folder
On 11/19/2015 06:43 PM, Pedro Alves wrote: > On 11/19/2015 12:54 PM, Martin Liška wrote: >> ContinuationIndentWidth: 2 >> -ForEachMacros: >> ['_FOR_EACH','_FOR_EACH_1','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FOR_EACH_IMM_USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR! > _EACH_OBJE > CT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','_FOR_EACH_X','_FOR_EACH_X_1','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR'] >> +ForEachMacros: >> ['FOR_ALL_BB_FN','FOR_ALL_EH_REGION','FOR_ALL_EH_REGION_AT','FOR_ALL_EH_REGION_FN','FOR_ALL_INHERITED_FIELDS','FOR_ALL_PREDICATES','FOR_BB_BETWEEN','FOR_BB_INSNS','FOR_BB_INSNS_REVERSE','FOR_BB_INSNS_REVERSE_SAFE','FOR_BB_INSNS_SAFE','FOR_BODY','FOR_COND','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FO! > R_EACH_IMM > _USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR_EACH_OBJECT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','FOR_EXPR','FOR_INIT_STMT','FOR_SCOPE'] >> IndentCaseLabels: false > > I don't know the tool's syntax here, but it's usually much better to > keep entries like these one per line (or at least a few only). Then > changes to the list are _much_ easier to review and maintain, like: > > ... > ForEachMacros: [ > 'FOR_ALL_BB_FN', > 'FOR_ALL_EH_REGION', > -'FOR_ALL_EH_REGION_AT', > -'FOR_ALL_EH_REGION', > +'FOR_ALL_EH_WHATNOT, > +'FOR_ALL_EH_WHATNOT_AT, > 'FOR_ALL_INHERITED_FIELDS', > 'FOR_ALL_PREDICATES', > 'FOR_BB_BETWEEN', > 'FOR_BB_INSNS', > ... > > vs a single-line hunk that's basically unintelligible. > > Thanks, > Pedro Alves > Hi Pedro. Fully agree with you, there's suggested patch. Hope I can install the patch for trunk? Thanks, Martin From eac730138b51db897f579a961aabaee805338bdd Mon Sep 17 00:00:00 2001 From: marxinDate: Fri, 20 Nov 2015 12:29:49 +0100 Subject: [PATCH] clang-format: split content of a list to multiple lines contrib/ChangeLog: 2015-11-20 Martin Liska * clang-format: Split content of a list to multiple lines. --- contrib/clang-format | 85 +++- 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/contrib/clang-format b/contrib/clang-format index 76a9d87..d734001 100644 --- a/contrib/clang-format +++ b/contrib/clang-format @@ -43,7 +43,90 @@ BreakBeforeTernaryOperators: true ColumnLimit: 80
[Committed] S/390: Add bswaphi2 pattern
Hi, the 16 bit bswap instruction support was missing in the back-end so far. Added with the attached patch. Bootstrapped on s390 and s390x. No regressions. Bye, -Andreas- gcc/testsuite/ChangeLog: 2015-11-20 Andreas Krebbel* gcc.target/s390/bswap-1.c: New test. gcc/ChangeLog: 2015-11-20 Andreas Krebbel * config/s390/s390.md ("bswaphi2"): New pattern. diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index ea65c74..280a772 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -10439,6 +10439,8 @@ ; Byte swap instructions ; +; FIXME: There is also mvcin but we cannot use it since src and target +; may overlap. (define_insn "bswap2" [(set (match_operand:GPR 0"register_operand" "=d, d") (bswap:GPR (match_operand:GPR 1 "nonimmediate_operand" " d,RT")))] @@ -10450,6 +10452,14 @@ (set_attr "op_type" "RRE,RXY") (set_attr "z10prop" "z10_super")]) +(define_insn "bswaphi2" + [(set (match_operand:HI 0 "register_operand" "=d") + (bswap:HI (match_operand:HI 1 "memory_operand" "RT")))] + "TARGET_CPU_ZARCH" + "lrvh\t%0,%1" + [(set_attr "type" "load") + (set_attr "op_type" "RXY") + (set_attr "z10prop" "z10_super")]) ; ; Population count instruction diff --git a/gcc/testsuite/gcc.target/s390/bswap-1.c b/gcc/testsuite/gcc.target/s390/bswap-1.c new file mode 100644 index 000..e1f113a --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/bswap-1.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z900 -mzarch" } */ + +#include + +uint64_t u64; +uint32_t u32; +uint16_t u16; + +uint64_t +foo64a (uint64_t a) +{ + return __builtin_bswap64 (a); +} +/* { dg-final { scan-assembler-times "lrvgr\t%r2,%r2" 1 { target lp64 } } } */ + +uint64_t +foo64b () +{ + return __builtin_bswap64 (u64); +} +/* { dg-final { scan-assembler-times "lrvg\t%r2,0\\(%r\[0-9\]*\\)" 1 { target lp64 } } } */ + +uint32_t +foo32 () +{ + return __builtin_bswap32 (u32); +} +/* { dg-final { scan-assembler-times "lrv\t%r2,0\\(%r\[0-9\]*\\)" 1 } } */ + +uint16_t +foo16 () +{ + return __builtin_bswap16 (u16); +} +/* { dg-final { scan-assembler-times "lrvh\t%r2,0\\(%r\[0-9\]*\\)" 1 } } */
Re: [RFC] [Patch] PR67326 - relax trap assumption by looking at similar DRS
On Fri, Nov 20, 2015 at 12:02:07PM +, Kumar, Venkataramanan wrote: > Also. > (1) I guard these checks for -ftree-loop-if-convert-stores and -fno-common. > Sometimes vectorization flags also triggers if conversion. > (2) Also hashing base DRs for writes only. Let me comment just on formatting, will leave actual review to Richard. > > gcc/ChangeLog > 2015-11-19 VenkataramananSurname missing. > > PR tree-optimization/67326 > * tree-if-conv.c (offset_DR_map): Define. Extraneous space. > (struct ifc_dr): Add new tree base_predicate field. All ChangeLog lines should be indented by a single tab. > (hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash offsets, > DR pairs Too long line. > and hash base ref, DR pairs for write type DRs. Extraneous spaces. > (ifcvt_memrefs_wont_trap): Guard checks with > -ftree-loop-if-convert-stores flag. Too long line and extraneous spaces. >Check for similar DR that are accessed unconditionally. >(if_convertible_loop_p_1): Initialize and delete offset hash maps Extraneous space, missing full stop at the end. > > gcc/testsuite/ChangeLog > 2015-11-19 Venkataramanan > * gcc.dg/tree-ssa/ifc-pr67326.c: Add new. Extraneous space. > + if (DR_IS_WRITE (a)) > { > - IFC_DR (a)->predicate = ca; > - *base_master_dr = a; > + base_master_dr = _DR_map->get_or_insert (base_ref,); Missing space before ); > + offset_master_dr = _DR_map->get_or_insert (offset,); > + if (!exist3) > + *offset_master_dr = a; > + > + if (DR_RW_UNCONDITIONALLY (*offset_master_dr) != 1) > + DR_RW_UNCONDITIONALLY (*offset_master_dr) > + = DR_RW_UNCONDITIONALLY (*master_dr); Wrong indentation, the = should be below the first underscore in DR_RW_UNCONDITIONALLY. > - if (DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) == 1) > + if ((base_master_dr > +&& DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) == 1)) Extraneous () pair. > + else if (DR_OFFSET (a)) > +{ > + offset_dr = offset_DR_map->get (DR_OFFSET (a)); > + if ((DR_RW_UNCONDITIONALLY (*offset_dr) == 1) Likewise. Jakub
Re: [Committed] S/390: Add bswaphi2 pattern
On 11/20/2015 12:52 PM, Andreas Krebbel wrote: +(define_insn "bswaphi2" + [(set (match_operand:HI 0 "register_operand" "=d") + (bswap:HI (match_operand:HI 1 "memory_operand" "RT")))] + "TARGET_CPU_ZARCH" + "lrvh\t%0,%1" + [(set_attr "type" "load") + (set_attr "op_type" "RXY") + (set_attr "z10prop" "z10_super")]) Surely it's better to arrange so that you can use STRVH as well. And providing a fallback for the reg-reg case (e.g. LRVR+SRL). Although I suppose I don't see support for STRV in bswap32/64 either... r~
GCC 5.3 Status Report (2015-11-20)
Status == We plan to do a GCC 5.3 release candidate at the end of next week followed by the actual release a week after that. So now is the time to look at your regression bugs in bugzilla and do some backporting for things already fixed on trunk. Quality Data Priority # Change from last report --- --- P10 P2 121+ 30 P3 20- 8 P4 87+ 2 P5 32+ 2 --- --- Total P1-P3 141+ 22 Total 260+ 24 Previous Report === https://gcc.gnu.org/ml/gcc/2015-07/msg00197.html
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On 11/20/2015 08:34 AM, Kyrill Tkachov wrote: On 20/11/15 12:27, James Greenhalgh wrote: On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: On 11/12/2015 09:39 AM, Evandro Menezes wrote: 2015-11-12 Evandro Menezes[AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. The arm part is ok too. It's just a comment. In the ChangeLog entry for arm.md I'd say "Add description." Thanks, Kyrill Please, commit if it's alright. The AArch64 part of this is OK. From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 9 Nov 2015 17:11:16 -0600 Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. --- gcc/config/aarch64/aarch64.md | 4 gcc/config/arm/arm.md | 3 +++ 2 files changed, 7 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1586256..d46f837 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -195,6 +195,10 @@ ;; 1 :=: yes (define_attr "far_branch" "" (const_int 0)) +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has +;; no predicated insns. +(define_attr "predicated" "yes,no" (const_string "no")) + ;; --- ;; Pipeline descriptions and scheduling ;; --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 73c3088..6bda491 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -105,6 +105,9 @@ (define_attr "fpu" "none,vfp" (const (symbol_ref "arm_fpu_attr"))) +; Predicated means that the insn form is conditionally executed based on a +; predicate. We default to 'no' because no Thumb patterns match this rule +; and not all ARM insns do. s/is conditionally executed/can be conditionally executed/ in the first sentence. Otherwise, this looks OK to me but I can't approve the ARM part, so you'll need to wait for a review from someone who can. Thanks, James (define_attr "predicated" "yes,no" (const_string "no")) ; LENGTH of an instruction (in bytes) -- 2.1.0.243.g30d45f7 Can you please commit it with this change in the Changelog? Thank you, -- Evandro Menezes
[PATCH, PR68460] Always call free_stmt_vec_info_vec in gather_scalar_reductions
[ was: Re: [PATCH] Fix parloops gimple_uid usage ] On 09/10/15 23:09, Tom de Vries wrote: @@ -2392,6 +2397,9 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list loop_vec_info simple_inner_loop_info = NULL; bool allow_double_reduc = true; + if (!stmt_vec_info_vec.exists ()) +init_stmt_vec_info_vec (); + simple_loop_info = vect_analyze_loop_form (loop); if (simple_loop_info == NULL) return; @@ -2453,9 +2461,16 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list destroy_loop_vec_info (simple_loop_info, true); destroy_loop_vec_info (simple_inner_loop_info, true); + /* Release the claim on gimple_uid. */ + free_stmt_vec_info_vec (); + With the src/libgomp/testsuite/libgomp.c/pr46886.c testcase, compiled in addition with -ftree-vectorize, I ran into an ICE: ... src/libgomp/testsuite/libgomp.c/pr46886.c:8:5: internal compiler error: in init_stmt_vec_info_vec, at tree-vect-stmts.c:8250 int foo (void) ^~~ 0x1196082 init_stmt_vec_info_vec() src/gcc/tree-vect-stmts.c:8250 0x11c3ed4 vectorize_loops() src/gcc/tree-vectorizer.c:510 0x10a7ea5 execute src/gcc/tree-ssa-loop.c:276 ... The ICE is caused by the fact that init_stmt_vec_info_vec is called at the start of vectorize_loops, while stmt_vec_info_vec is not empty. I traced this back to gather_scalar_reduction, where we call init_stmt_vec_info_vec, but we skip free_stmt_vec_info_vec if we take the early-out for simple_loop_info == NULL. This patch fixes the ICE by making sure we always call free_stmt_vec_info_vec in gather_scalar_reduction. Passes cc1/f951 rebuild and autopar testing. OK for stage3 trunk if bootstrap and regtest succeeds? Thanks, - Tom Always call free_stmt_vec_info_vec in gather_scalar_reductions 2015-11-20 Tom de VriesPR tree-optimization/68460 * tree-parloops.c (gather_scalar_reductions): Also call free_stmt_vec_info_vec if simple_loop_info == NULL. * gcc.dg/autopar/pr68460.c: New test. --- gcc/testsuite/gcc.dg/autopar/pr68460.c | 35 ++ gcc/tree-parloops.c| 6 +- 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/autopar/pr68460.c b/gcc/testsuite/gcc.dg/autopar/pr68460.c new file mode 100644 index 000..0c00065 --- /dev/null +++ b/gcc/testsuite/gcc.dg/autopar/pr68460.c @@ -0,0 +1,35 @@ +/* { dg-do "compile" } */ +/* { dg-options "-O -ftree-parallelize-loops=2 -ftree-vectorize -fno-tree-ch -fno-tree-dominator-opts" } */ + +void abort (void); + +int d[1024], e[1024]; + +int +foo (void) +{ + int s = 0; + int i; + + for (i = 0; i < 1024; i++) +s += d[i] - e[i]; + + return s; +} + +int +main () +{ + int i; + + for (i = 0; i < 1024; i++) +{ + d[i] = i * 2; + e[i] = i; +} + + if (foo () != 1023 * 1024 / 2) +abort (); + + return 0; +} diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 8d7912d..85cc78d 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2433,7 +2433,7 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list simple_loop_info = vect_analyze_loop_form (loop); if (simple_loop_info == NULL) -return; +goto gather_done; for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next ()) { @@ -2492,9 +2492,13 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list destroy_loop_vec_info (simple_loop_info, true); destroy_loop_vec_info (simple_inner_loop_info, true); + gather_done: /* Release the claim on gimple_uid. */ free_stmt_vec_info_vec (); + if (reduction_list->elements () == 0) +return; + /* As gimple_uid is used by the vectorizer in between vect_analyze_loop_form and free_stmt_vec_info_vec, we can set gimple_uid of reduc_phi stmts only now. */
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
On 20/11/15 14:29, Richard Biener wrote: I agree it's somewhat of an odd behavior but all passes should either be placed in a sub-pipeline with an outer loop_optimizer_init()/finalize () call or call both themselves. Hmm, but adding loop_optimizer_finalize at the end of pass_lim breaks the loop pipeline. We could use the style used in pass_slp_vectorize::execute: ... pass_slp_vectorize::execute (function *fun) { basic_block bb; bool in_loop_pipeline = scev_initialized_p (); if (!in_loop_pipeline) { loop_optimizer_init (LOOPS_NORMAL); scev_initialize (); } ... if (!in_loop_pipeline) { scev_finalize (); loop_optimizer_finalize (); } ... Although that doesn't strike me as particularly clean. Thanks, - Tom
Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m
Hi Kyrill On 20/11/15 11:51, Kyrill Tkachov wrote: Hi Andre, On 18/11/15 09:44, Andre Vieira wrote: On 17/11/15 10:10, James Greenhalgh wrote: On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote: On 16/11/15 12:07, James Greenhalgh wrote: On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote: Hi, This patch changes the target support mechanism to make it recognize any ARM 'M' profile as a non-neon supporting target. The current check only tests for armv6 architectures and earlier, and does not account for armv7-m. This is correct because there is no 'M' profile that supports neon and the current test is not sufficient to exclude armv7-m. Tested by running regressions for this testcase for various ARM targets. Is this OK to commit? Thanks, Andre Vieira gcc/testsuite/ChangeLog: 2015-11-06 Andre Vieira* gcc/testsuite/lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Added check for M profile. From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Fri, 13 Nov 2015 11:16:34 + Subject: [PATCH] Disable neon testing for armv7-m --- gcc/testsuite/lib/target-supports.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } { int dummy; /* Avoid the case where a test adds -mfpu=neon, but the toolchain is configured for -mcpu=arm926ej-s, for example. */ -#if __ARM_ARCH < 7 +#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M' #error Architecture too old for NEON. Could you fix this #error message while you're here? Why we can't change this test to look for the __ARM_NEON macro from ACLE: #if __ARM_NEON < 1 #error NEON is not enabled #endif Thanks, James There is a check for this already: 'check_effective_target_arm_neon'. I think the idea behind arm_neon_ok is to check whether the hardware would support neon, whereas arm_neon is to check whether neon was enabled, i.e. -mfpu=neon was used or a mcpu was passed that has neon enabled by default. The comments for 'check_effective_target_arm_neon_ok_nocache' highlight this, though maybe the comments for check_effective_target_arm_neon could be better. # Return 1 if this is an ARM target supporting -mfpu=neon # -mfloat-abi=softfp or equivalent options. Some multilibs may be # incompatible with these options. Also set et_arm_neon_flags to the # best options to add. proc check_effective_target_arm_neon_ok_nocache ... /* Avoid the case where a test adds -mfpu=neon, but the toolchain is configured for -mcpu=arm926ej-s, for example. */ ... and # Return 1 if this is a ARM target with NEON enabled. proc check_effective_target_arm_neon OK, got it - sorry for my mistake, I had the two procs confused. I'd still like to see the error message fixed "Architecture too old for NEON." is not an accurate description of the problem. Thanks, James This OK? This is ok, I've committed for you with the slightly tweaked ChangeLog entry: 2015-11-20 Andre Vieira * lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Add check for M profile. as r230653. Thanks, Kyrill Cheers, Andre Thank you. Would there be any objections to backporting this to gcc-5-branch? I checked, it applies cleanly and its a simple enough way of preventing a lot of FAILS for armv7-m. Best Regards, Andre
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On 11/20/2015 06:27 AM, James Greenhalgh wrote: On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: On 11/12/2015 09:39 AM, Evandro Menezes wrote: 2015-11-12 Evandro Menezes[AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. Please, commit if it's alright. The AArch64 part of this is OK. From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 9 Nov 2015 17:11:16 -0600 Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. --- gcc/config/aarch64/aarch64.md | 4 gcc/config/arm/arm.md | 3 +++ 2 files changed, 7 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1586256..d46f837 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -195,6 +195,10 @@ ;; 1 :=: yes (define_attr "far_branch" "" (const_int 0)) +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has +;; no predicated insns. +(define_attr "predicated" "yes,no" (const_string "no")) + ;; --- ;; Pipeline descriptions and scheduling ;; --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 73c3088..6bda491 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -105,6 +105,9 @@ (define_attr "fpu" "none,vfp" (const (symbol_ref "arm_fpu_attr"))) +; Predicated means that the insn form is conditionally executed based on a +; predicate. We default to 'no' because no Thumb patterns match this rule +; and not all ARM insns do. s/is conditionally executed/can be conditionally executed/ in the first sentence. Otherwise, this looks OK to me but I can't approve the ARM part, so you'll need to wait for a review from someone who can. (define_attr "predicated" "yes,no" (const_string "no")) ; LENGTH of an instruction (in bytes) -- 2.1.0.243.g30d45f7 Actually, that would then be the existing description for the "predicable" attribute. I think that this proposed description is correct for the "predicated" attribute. Thank you, -- Evandro Menezes
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On Fri, Nov 20, 2015 at 09:55:29AM -0600, Evandro Menezes wrote: > On 11/20/2015 06:27 AM, James Greenhalgh wrote: > >On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: > >>On 11/12/2015 09:39 AM, Evandro Menezes wrote: > >>2015-11-12 Evandro Menezes> >> > >>[AArch64] Add attribute for compatibility with ARM pipeline models > >> > >>gcc/ > >> > >>* config/aarch64/aarch64.md (predicated): Copy attribute from > >>"arm.md". > >>* config/arm/arm.md (predicated): Added description. > >> > >>Please, commit if it's alright. > >The AArch64 part of this is OK. > > > >> From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 > >>From: Evandro Menezes > >>Date: Mon, 9 Nov 2015 17:11:16 -0600 > >>Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM > >> pipeline models > >> > >>gcc/ > >>* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". > >>* config/arm/arm.md (predicated): Added description. > >>--- > >> gcc/config/aarch64/aarch64.md | 4 > >> gcc/config/arm/arm.md | 3 +++ > >> 2 files changed, 7 insertions(+) > >> > >>diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > >>index 1586256..d46f837 100644 > >>--- a/gcc/config/aarch64/aarch64.md > >>+++ b/gcc/config/aarch64/aarch64.md > >>@@ -195,6 +195,10 @@ > >> ;; 1 :=: yes > >> (define_attr "far_branch" "" (const_int 0)) > >>+;; Strictly for compatibility with AArch32 in pipeline models, since > >>AArch64 has > >>+;; no predicated insns. > >>+(define_attr "predicated" "yes,no" (const_string "no")) > >>+ > >> ;; --- > >> ;; Pipeline descriptions and scheduling > >> ;; --- > >>diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > >>index 73c3088..6bda491 100644 > >>--- a/gcc/config/arm/arm.md > >>+++ b/gcc/config/arm/arm.md > >>@@ -105,6 +105,9 @@ > >> (define_attr "fpu" "none,vfp" > >>(const (symbol_ref "arm_fpu_attr"))) > >>+; Predicated means that the insn form is conditionally executed based on a > >>+; predicate. We default to 'no' because no Thumb patterns match this rule > >>+; and not all ARM insns do. > >s/is conditionally executed/can be conditionally executed/ in the first > >sentence. Otherwise, this looks OK to me but I can't approve the ARM part, > >so you'll need to wait for a review from someone who can. > > > >> (define_attr "predicated" "yes,no" (const_string "no")) > >> ; LENGTH of an instruction (in bytes) > >>-- > >>2.1.0.243.g30d45f7 > > Actually, that would then be the existing description for the > "predicable" attribute. I think that this proposed description is > correct for the "predicated" attribute. Right, understood, that will teach me to pattern match up to pred and stop reading the word. This is fine, I've committed it on your behalf as r230666 Thanks, James
Re: PATCH to shorten_compare -Wtype-limits handling
Hmm, it looks like using expansion_point_if_in_system_header might avoid the first issue you mention. Thus. Great, thanks! (I'll have to remember the trick for my own use!) Martin
Re: [Patch, vrp] Allow VRP type conversion folding only for widenings upto word mode
On Thu, Nov 19, 2015 at 10:31:41AM -0700, Jeff Law wrote: > On 11/18/2015 11:20 PM, Senthil Kumar Selvaraj wrote: > >On Wed, Nov 18, 2015 at 09:36:21AM +0100, Richard Biener wrote: > >> > >>Otherwise ok. > > > >See modified patch below. If you think vrp98.c is unnecessary, feel free > >to dump it :). > > > >If ok, could you commit it for me please? I don't have commit access. > > > >Regards > >Senthil > > > >gcc/ChangeLog > >2015-11-19 Senthil Kumar Selvaraj> > > > * tree.h (desired_pro_or_demotion_p): New function. > > * tree-vrp.c (simplify_cond_using_ranges): Call it. > > > >gcc/testsuite/ChangeLog > >2015-11-19 Senthil Kumar Selvaraj > > > > * gcc.dg/tree-ssa/vrp98.c: New testcase. > > * gcc.target/avr/uint8-single-reg.c: New testcase. > I went ahead and committed this as-is. > > I do think the vrp98 testcase is useful as it verifies that VRP is doing > what we want in a target independent way. It's a good complement to the AVR > specific testcase. I see the same problem on gcc-5-branch as well. Would it be ok to backport the fix to that branch as well? Regards Senthil > > Thanks, > Jeff >
Re: PATCH to shorten_compare -Wtype-limits handling
On 11/19/2015 05:16 PM, Jason Merrill wrote: On 11/19/2015 02:44 PM, Martin Sebor wrote: On 11/18/2015 09:26 PM, Jason Merrill wrote: The rs6000 target was hitting a bootstrap failure due to -Werror=type-limits. Since warn_tautological_cmp and other warnings avoid warning if one of the operands comes from a macro, I thought it would make sense to do that here as well. The also disables the warning for functions that are shadowed by macros such as C atomic_load et al. For example, in the program below. Is that an acceptable compromise or is there a way to avoid this? I think it's an acceptable compromise, but see below. At the same time, the change doesn't suppress the warning in other cases where I would have expected it to suppress it based on your description. For instance here: unsigned short bar (unsigned short x) { #define X x if (x > 0x) Yes, this is missed because the front end doesn't remember the location of the use of x; that's one of many location tracking issues. David Malcolm is working on this stuff. I noticed there is code elsewhere in c-common.c that avoids issuing the same warning for system headers (that's the code that responsible for issuing the warning for the second test case above). Hmm, it looks like using expansion_point_if_in_system_header might avoid the first issue you mention. Thus. commit fb71bf4de520cc4bd11eefb57e50748b4679996f Author: Jason MerrillDate: Thu Nov 19 15:21:47 2015 -0500 * c-common.c (shorten_compare): But look through macros from system headers. diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 068a0bc..fe0a235 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -4651,8 +4651,10 @@ shorten_compare (location_t loc, tree *op0_ptr, tree *op1_ptr, } if (TREE_CODE (primop0) != INTEGER_CST - /* Don't warn if it's from a macro. */ - && !from_macro_expansion_at (EXPR_LOCATION (primop0))) + /* Don't warn if it's from a (non-system) macro. */ + && !(from_macro_expansion_at + (expansion_point_location_if_in_system_header + (EXPR_LOCATION (primop0) { if (val == truthvalue_false_node) warning_at (loc, OPT_Wtype_limits, diff --git a/gcc/testsuite/gcc.dg/Wtype-limits2.c b/gcc/testsuite/gcc.dg/Wtype-limits2.c new file mode 100644 index 000..92151aa --- /dev/null +++ b/gcc/testsuite/gcc.dg/Wtype-limits2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-Wtype-limits" } */ +/* { dg-require-effective-target sync_char_short } */ + +#include + +unsigned foo (unsigned char *x) +{ + if (atomic_load (x) > 1000) /* { dg-warning "comparison is always false due to limited range of data type" } */ +return 0; + return 1; +}
Re: [PATCH] Fix PR68067
On 6 November 2015 at 10:39, Richard Bienerwrote: >> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location >> references block not in block tree >> l1_279 = PHI <1(28), l1_299(33)> > > ^^^ > > this is the error to look at! It means that the GC heap will be corrupted > quite easily. > This looked very similar to PR68117 - the invalid phi arg, and block not in block-tree, even if not the invalid tree code - and as the posters there were having success with valgrind, whereas I wasn't, I watched and waited. First observation is that it triggers the asserts you suggested in comment 27 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D68117#c27). Indeed, it fails those asserts, even after the patch in comment 25 (committed as r230594) to tree-ssa.c (delete_tree_ssa), and the patch in comment#35 to function.c (set_cfun), and the patch in comment#30 (committed as r230424) to cfgexpand.c (pass_expand::execute). The patch in comment#29 (which replaces the asserts in comment#27 with empties), however, fixes the problem - although I can't rule out, that that's just by changing the memory allocation pattern. Moreover, if I take those patches and rebase onto a recent trunk (onto which the delete_tree_ssa and pass_expand::execute patches have already been committed), i.e. just adding the assertions from comment#27 and the call in function.c (set_cfun) - the assertions are still failing on my testcase, whereas the original (assertionless) failure was very erratic, and had since disappeared/been hidden on trunk. Indeed those same assertions break in a few other places (even in a --disable-bootstrap build after gcc/xgcc is built), so I feel I have a good chance of producing a reasonable assertion-breaking testcase. So I have to ask, how sure are you that those assertions are(/should be!) "correct"? :) --Alan
Re: [Patch, vrp] Allow VRP type conversion folding only for widenings upto word mode
On 11/20/2015 10:04 AM, Senthil Kumar Selvaraj wrote: On Thu, Nov 19, 2015 at 10:31:41AM -0700, Jeff Law wrote: On 11/18/2015 11:20 PM, Senthil Kumar Selvaraj wrote: On Wed, Nov 18, 2015 at 09:36:21AM +0100, Richard Biener wrote: Otherwise ok. See modified patch below. If you think vrp98.c is unnecessary, feel free to dump it :). If ok, could you commit it for me please? I don't have commit access. Regards Senthil gcc/ChangeLog 2015-11-19 Senthil Kumar Selvaraj* tree.h (desired_pro_or_demotion_p): New function. * tree-vrp.c (simplify_cond_using_ranges): Call it. gcc/testsuite/ChangeLog 2015-11-19 Senthil Kumar Selvaraj * gcc.dg/tree-ssa/vrp98.c: New testcase. * gcc.target/avr/uint8-single-reg.c: New testcase. I went ahead and committed this as-is. I do think the vrp98 testcase is useful as it verifies that VRP is doing what we want in a target independent way. It's a good complement to the AVR specific testcase. I see the same problem on gcc-5-branch as well. Would it be ok to backport the fix to that branch as well? That's a call for the release managers. I typically don't backport anything expect ICE or incorrect code generation fixes as I tend to be very conservative on what goes onto a release branch. Jakub, Richi or Joseph would need to ack into a release branch. jeff
Re: PATCH to shorten_compare -Wtype-limits handling
On 11/19/2015 12:02 PM, Manuel López-Ibáñez wrote: On 19 November 2015 at 17:54, Jeff Lawwrote: But there were a couple of patches from you some time ago, for example: http://permalink.gmane.org/gmane.comp.gcc.patches/343476 What happened with those? On hold pending fixing the type-limits warning placement. Essentially that has to be untangled first. Could you elaborate on this? Or point me to some previous email thread? (I don't have enough free time to follow the mailing list anymore, sorry). I don't have the thread handy, but essentially the operand shortening code has warning bits inside it. As a result pulling out functionality (such as operand canonicalization) and putting into match.pd results in missing a warning -- ie a regression. So we have to detangle the operand shortening from warning detection. Kai's idea was to first make the shortening code "pure" in the sense that it would have no side effects other than to generate the warnings. Canonicalization and other transformations would still occur internally, but not be reflected in the IL. That was the overall plan and he posted a patch for that. But that patch didn't do the due diligence to verify that once the shortening code was made "pure" that we didn't regress on the quality of the code we generated. jeff
Re: PATCH to shorten_compare -Wtype-limits handling
On 20 November 2015 at 17:42, Jeff Lawwrote: > So we have to detangle the operand shortening from warning detection. Kai's > idea was to first make the shortening code "pure" in the sense that it would > have no side effects other than to generate the warnings. Canonicalization > and other transformations would still occur internally, but not be reflected > in the IL. > > That was the overall plan and he posted a patch for that. But that patch > didn't do the due diligence to verify that once the shortening code was made > "pure" that we didn't regress on the quality of the code we generated. I thought that the original plan was to make the warning code also use match.pd. That is, that all folding, including FE folding, will be match.pd-based. Has this changed? Cheers, Manuel.
Merge from trunk to gccgo branch
I merged trunk revision 230657 to the gccgo branch. Ian
Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1
On Tue, Nov 10, 2015 at 11:54:00AM -0600, Evandro Menezes wrote: >2015-11-10 Evandro Menezes> >gcc/ > >* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model. >* config/aarch64/aarch64.md: Include "exynos-m1.md". >* config/arm/arm-cores.def: Use the Exynos M1 sched model. >* config/arm/arm.md: Include "exynos-m1.md". >* config/arm/arm-tune.md: Regenerated. >* config/arm/exynos-m1.md: New file. > > This patch adds the scheduling model for Exynos M1. It depends on > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html > > Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu. > > Please, commit if it's alright. > From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001 > From: Evandro Menezes > Date: Mon, 9 Nov 2015 17:18:52 -0600 > Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1 > > gcc/ > * config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model. > * config/aarch64/aarch64.md: Include "exynos-m1.md". These changes are fine. > * config/arm/arm-cores.def: Use the Exynos M1 sched model. > * config/arm/arm.md: Include "exynos-m1.md". > * config/arm/arm-tune.md: Regenerated. These changes need an ack from an ARM reviewer. > * config/arm/exynos-m1.md: New file. I have a few comments on this model. > +;; The Exynos M1 core is modeled as a triple issue pipeline that has > +;; the following functional units. > + > +(define_automaton "exynos_m1_gp") > +(define_automaton "exynos_m1_ls") > +(define_automaton "exynos_m1_fp") > + > +;; 1. Two pipelines for simple integer operations: A, B > +;; 2. One pipeline for simple or complex integer operations: C > + > +(define_cpu_unit "em1_xa, em1_xb, em1_xc" "exynos_m1_gp") > + > +(define_reservation "em1_alu" "(em1_xa | em1_xb | em1_xc)") > +(define_reservation "em1_c" "em1_xc") Is this extra reservation useful, can we not just use em1_xc directly? > +;; 3. Two asymmetric pipelines for Neon and FP operations: F0, F1 > + > +(define_cpu_unit "em1_f0, em1_f1" "exynos_m1_fp") > + > +(define_reservation "em1_fmac" "em1_f0") > +(define_reservation "em1_fcvt" "em1_f0") > +(define_reservation "em1_nalu" "(em1_f0 | em1_f1)") > +(define_reservation "em1_nalu0" "em1_f0") > +(define_reservation "em1_nalu1" "em1_f1") > +(define_reservation "em1_nmisc" "em1_f0") > +(define_reservation "em1_ncrypt" "em1_f0") > +(define_reservation "em1_fadd" "em1_f1") > +(define_reservation "em1_fvar" "em1_f1") > +(define_reservation "em1_fst" "em1_f1") Same comment here, does this not just obfuscate the interaction between instruction classes in the description. I'm not against doing it this way if you prefer, but it would seem to reduce readability to me. I think there is also an argument that this increases readability, so it is your choice. > + > +;; 4. One pipeline for branch operations: BX > + > +(define_cpu_unit "em1_bx" "exynos_m1_gp") > + > +(define_reservation "em1_br" "em1_bx") > + And again? > +;; 5. One AGU for loads: L > +;; One AGU for stores and one pipeline for stores: S, SD > + > +(define_cpu_unit "em1_lx" "exynos_m1_ls") > +(define_cpu_unit "em1_sx, em1_sd" "exynos_m1_ls") > + > +(define_reservation "em1_ld" "em1_lx") > +(define_reservation "em1_st" "(em1_sx + em1_sd)") > + > +;; Common occurrences > +(define_reservation "em1_sfst" "(em1_fst + em1_st)") > +(define_reservation "em1_lfst" "(em1_fst + em1_ld)") > + > +;; Branches > +;; > +;; No latency as there is no result > +;; TODO: Unconditional branches use no units; > +;; conditional branches add the BX unit; > +;; indirect branches add the C unit. > +(define_insn_reservation "exynos_m1_branch" 0 > + (and (eq_attr "tune" "exynosm1") > + (eq_attr "type" "branch")) > + "em1_br") > + > +(define_insn_reservation "exynos_m1_call" 1 > + (and (eq_attr "tune" "exynosm1") > + (eq_attr "type" "call")) > + "em1_alu") > + > +;; Basic ALU > +;; > +;; Simple ALU without shift, non-predicated > +(define_insn_reservation "exynos_m1_alu" 1 > + (and (eq_attr "tune" "exynosm1") > + (and (not (eq_attr "predicated" "yes")) (and (eq_attr "predicated" "no")) ? Likewise throughout the file? Again this is your choice. This is OK from the AArch64 side, let me know if you plan to change any of the above, otherwise I'll commit it (or someone else can commit it) after I see an OK from an ARM reviewer. Thanks, James
Re: [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)
On 20/11/15 18:28 +0100, Jan Kratochvil wrote: On Thu, 19 Nov 2015 21:21:51 +0100, Jan Kratochvil wrote: * python/hook.in: Call register_libstdcxx_printers. * python/libstdcxx/v6/__init__.py: Wrap it to register_libstdcxx_printers. [...] -import libstdcxx.v6 +# Call a function as a plain import would not execute body of the included file +# on repeated reloads of this object file. +from libstdcxx.v6 import register_libstdcxx_printers +register_libstdcxx_printers(gdb.current_objfile()) Jonathan Wakely mentioned: [libstdc++] Refactor python/hook.in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02518.html From: Siva ChandraMessage-ID: https://gcc.gnu.org/r215726 That change introduced this regression. This patch does not undo it, goal of the patch by Siva Chandra - minimizing hook.in - stays achieved. OK, thanks for checking, Jan. The patch is OK for trunk and gcc-5-branch.
Re: PATCH to shorten_compare -Wtype-limits handling
On 20 November 2015 at 16:10, Martin Seborwrote: >>> Hmm, it looks like using expansion_point_if_in_system_header might avoid >>> the first issue you mention. >> >> >> Thus. > > > Great, thanks! (I'll have to remember the trick for my own use!) I added this to https://gcc.gnu.org/wiki/DiagnosticsGuidelines under "Locations" for future reference. I hope others would do the same in the future, so the info is kept up-to-date. Cheers, Manuel.
Re: [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)
On Thu, 19 Nov 2015 21:21:51 +0100, Jan Kratochvil wrote: > * python/hook.in: Call register_libstdcxx_printers. > * python/libstdcxx/v6/__init__.py: Wrap it to > register_libstdcxx_printers. [...] > -import libstdcxx.v6 > +# Call a function as a plain import would not execute body of the included > file > +# on repeated reloads of this object file. > +from libstdcxx.v6 import register_libstdcxx_printers > +register_libstdcxx_printers(gdb.current_objfile()) Jonathan Wakely mentioned: [libstdc++] Refactor python/hook.in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02518.html From: Siva ChandraMessage-ID: https://gcc.gnu.org/r215726 That change introduced this regression. This patch does not undo it, goal of the patch by Siva Chandra - minimizing hook.in - stays achieved. Jan
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 20/11/15 08:31, Bin.Cheng wrote: > On Thu, Nov 19, 2015 at 10:32 AM, Bin.Chengwrote: >> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh >> wrote: >>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: Hi, GIMPLE IVO needs to call backend interface to calculate costs for addr expressions like below: FORM1: "r73 + r74 + 16380" FORM2: "r73 << 2 + r74 + 16380" They are invalid address expression on AArch64, so will be legitimized by aarch64_legitimize_address. Below are what we got from that function: For FORM1, the address expression is legitimized into below insn sequence and rtx: r84:DI=r73:DI+r74:DI r85:DI=r84:DI+0x3000 r83:DI=r85:DI "r83 + 4092" For FORM2, the address expression is legitimized into below insn sequence and rtx: r108:DI=r73:DI<<0x2 r109:DI=r108:DI+r74:DI r110:DI=r109:DI+0x3000 r107:DI=r110:DI "r107 + 4092" So the costs computed are 12/16 respectively. The high cost prevents IVO from choosing right candidates. Besides cost computation, I also think the legitmization is bad in terms of code generation. The root cause in aarch64_legitimize_address can be described by it's comment: /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST), where mask is selected by alignment and size of the offset. We try to pick as large a range for the offset as possible to maximize the chance of a CSE. However, for aligned addresses we limit the range to 4k so that structures with different sized elements are likely to use the same base. */ I think the split of CONST is intended for REG+CONST where the const offset is not in the range of AArch64's addressing modes. Unfortunately, it doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG< >> >>> Thanks for the description, it made the patch very easy to review. I only >>> have a style comment. >>> Bootstrap & test on AArch64 along with other patch. Is it OK? 2015-11-04 Bin Cheng Jiong Wang * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle address expressions like REG+REG+CONST and REG+NON_REG+CONST. >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5c8604f..47875ac 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) { HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); HOST_WIDE_INT base_offset; + rtx op0 = XEXP (x,0); + + if (GET_CODE (op0) == PLUS) + { + rtx op0_ = XEXP (op0, 0); + rtx op1_ = XEXP (op0, 1); >>> >>> I don't see this trailing _ on a variable name in many places in the source >>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend. >>> Can we pick a different name for op0_ and op1_? >>> + + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will + reach here, the 'CONST' may be valid in which case we should + not split. */ + if (REG_P (op0_) && REG_P (op1_)) + { + machine_mode addr_mode = GET_MODE (op0); + rtx addr = gen_reg_rtx (addr_mode); + + rtx ret = plus_constant (addr_mode, addr, offset); + if (aarch64_legitimate_address_hook_p (mode, ret, false)) + { + emit_insn (gen_adddi3 (addr, op0_, op1_)); + return ret; + } + } + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) + will reach here. If (PLUS REG, NON_REG) is valid addr expr, + we split it into Y=REG+CONST, Y+NON_REG. */ + else if (REG_P (op0_) || REG_P (op1_)) + { + machine_mode addr_mode = GET_MODE (op0); + rtx addr = gen_reg_rtx (addr_mode); + + /* Switch to make sure that register is in op0_. */ + if (REG_P (op1_)) + std::swap (op0_, op1_); + + rtx ret = gen_rtx_fmt_ee (PLUS,
Re: [PATCH] Add clang-format config to contrib folder
On 11/20/2015 04:33 AM, Martin Liška wrote: On 11/19/2015 06:43 PM, Pedro Alves wrote: On 11/19/2015 12:54 PM, Martin Liška wrote: ContinuationIndentWidth: 2 -ForEachMacros: ['_FOR_EACH','_FOR_EACH_1','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FOR_EACH_IMM_USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','F! O! R! _EACH_OBJE CT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','_FOR_EACH_X','_FOR_EACH_X_1','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR'] +ForEachMacros: ['FOR_ALL_BB_FN','FOR_ALL_EH_REGION','FOR_ALL_EH_REGION_AT','FOR_ALL_EH_REGION_FN','FOR_ALL_INHERITED_FIELDS','FOR_ALL_PREDICATES','FOR_BB_BETWEEN','FOR_BB_INSNS','FOR_BB_INSNS_REVERSE','FOR_BB_INSNS_REVERSE_SAFE','FOR_BB_INSNS_SAFE','FOR_BODY','FOR_COND','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','! F! O! R_EACH_IMM _USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR_EACH_OBJECT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','FOR_EXPR','FOR_INIT_STMT','FOR_SCOPE'] IndentCaseLabels: false I don't know the tool's syntax here, but it's usually much better to keep entries like these one per line (or at least a few only). Then changes to the list are _much_ easier to review and maintain, like: ... ForEachMacros: [ 'FOR_ALL_BB_FN', 'FOR_ALL_EH_REGION', -'FOR_ALL_EH_REGION_AT', -'FOR_ALL_EH_REGION', +'FOR_ALL_EH_WHATNOT, +'FOR_ALL_EH_WHATNOT_AT, 'FOR_ALL_INHERITED_FIELDS', 'FOR_ALL_PREDICATES', 'FOR_BB_BETWEEN', 'FOR_BB_INSNS', ... vs a single-line hunk that's basically unintelligible. Thanks, Pedro Alves Hi Pedro. Fully agree with you, there's suggested patch. Hope I can install the patch for trunk? Yes. In fact, I think that as long as you're moving towards coercing clang-format to handle GNU style that you should have a free reign to make changes to this config file. jeff
Re: [PATCH 4/4][AArch64] Add cost model for Exynos M1
On Thu, Nov 19, 2015 at 04:06:17PM -0600, Evandro Menezes wrote: > On 11/05/2015 06:09 PM, Evandro Menezes wrote: > >2015-10-25 Evandro Menezes> > > > gcc/ > > > > * config/aarch64/aarch64-cores.def: Use the Exynos M1 cost model. > > * config/aarch64/aarch64.c (exynosm1_addrcost_table): New > >variable. > > (exynosm1_regmove_cost): Likewise. > > (exynosm1_vector_cost): Likewise. > > (exynosm1_tunings): Likewise. > > * config/arm/aarch-cost-tables.h (exynosm1_extra_costs): Likewise. > > * config/arm/arm.c (arm_exynos_m1_tune): Likewise. > > > >This patch adds the cost model for Exynos M1. This patch depends > >on a couple of previous patches though, > >https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00505.html and > >https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00538.html > > > >Please, commit if it's alright. > > Ping. This is OK from an AArch64 perspective. Please wait for an OK from an ARM reviewer. Thanks, James
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/20/2015 02:19 AM, Nick Clifton wrote: Hi Jeff, The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Right. It just really helps me to have something I can poke at -- it's just the type of learner I am. I tried to trip the test in that #if several times last year and never came up with anything. So naturally if we can trip it with the rl78 I want to dig into it. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. Perfect. Building rl78-elf cross bits as I type... jeff
Re: [PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class
On Tue, Nov 17, 2015 at 4:22 AM, Richard Bienerwrote: > On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu wrote: >> Empty record should be returned and passed the same way in C and C++. >> This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which >> defaults to return false. For C++, LANG_HOOKS_EMPTY_RECORD_P is defined >> to is_really_empty_class, which returns true for C++ empty classes. For >> LTO, we stream out a bit to indicate if a record is empty and we store >> it in TYPE_LANG_FLAG_0 when streaming in. get_ref_base_and_extent is >> changed to set bitsize to 0 for empty records. Middle-end and x86 >> backend are updated to ignore empty records for parameter passing and >> function value return. Other targets may need similar changes. > > Please avoid a new langhook for this and instead claim a bit in > tree_type_common > like for example restrict_flag (double-check it is unused for non-pointers). There is no bit in tree_type_common I can overload. restrict_flag is checked for non-pointers to issue an error when it is used on non-pointers: /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38: error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’ typedef typename T::L __restrict__ r;// { dg-error "'__restrict__' qualifiers cannot" "" } Should I add a bit to tree_type_common? It may crease the size of tree_type_common by 4 bytes since there is unused bits in tree_type_common. > I don't like that you need to modify targets - those checks should be done > in the caller (which may just use a new wrapper with the logic and then > dispatching to the actual hook). I can do that. > Why do you need do adjust get_ref_base_and_extent? get_ref_base_and_extent is changed to set bitsize to 0 for empty records so that when ref_maybe_used_by_call_p_1 calls get_ref_base_and_extent to get 0 as the maximum size on empty record. Otherwise, find_tail_calls won't perform tail call optimization for functions with empty record parameters, as in struct dummy { }; struct true_type { struct dummy i; }; extern true_type y; extern void xxx (true_type c); void yyy (void) { xxx (y); } -- H.J.
Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling
On Wed, Dec 10, 2014 at 01:48:21 +0300, Ilya Verbin wrote: > On 09 Dec 14:59, Richard Biener wrote: > > On Mon, 8 Dec 2014, Ilya Verbin wrote: > > > Unfortunately, this fix was not general enough. > > > There might be cases when mixed object files get into lto-wrapper, ie > > > some of > > > them contain only LTO sections, some contain only offload sections, and > > > some > > > contain both. But when lto-wrapper will pass all these files to > > > recompilation, > > > the compiler might crash (it depends on the order of input files), since > > > in > > > read_cgraph_and_symbols it expects that *all* input files contain IR > > > section of > > > given type. > > > This patch splits input objects from argv into lto_argv and offload_argv, > > > so > > > that all files in arrays contain corresponding IR. > > > Similarly, in lto-plugin, it was bad idea to add objects, which contain > > > offload > > > IR without LTO, to claimed_files, since this may corrupt a resolution > > > file. > > > > > > Tested on various combinations of files with/without -flto and > > > with/without > > > offload, using trunk ld and gold, also tested on ld without plugin > > > support. > > > Bootstrap and make check passed on x86_64-linux and i686-linux. Ok for > > > trunk? > > > > Did you check that bootstrap-lto still works? Ok if so. > > Yes, bootstrap-lto passed. > Committed revision 218543. I don't know how I missed this a year ago, but mixing of LTO objects with offloading-without-LTO objects still doesn't work :( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463 filed about that. Any thoughts how to fix this? Thanks, -- Ilya
Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.
On 11/20/2015 05:15 AM, Kyrill Tkachov wrote: Hi Kirill, On 18/11/15 14:11, Kirill Yukhin wrote: Hello Andreas, Devid. On 18 Nov 10:45, Andreas Schwab wrote: Kirill Yukhinwrites: diff --git a/gcc/testsuite/c-c++-common/attr-simd.c b/gcc/testsuite/c-c++-common/attr-simd.c new file mode 100644 index 000..b4eda34 --- /dev/null +++ b/gcc/testsuite/c-c++-common/attr-simd.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-optimized" } */ + +__attribute__((__simd__)) +extern +int simd_attr (void) +{ + return 0; +} + +/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" } } */ On ia64: FAIL: c-c++-common/attr-simd.c -Wc++-compat scan-tree-dump optimized "simd_attr[ \\t]simdclone|vector" FAIL: c-c++-common/attr-simd.c -Wc++-compat scan-tree-dump optimized "simd_attr2[ \\t]simdclone|vector" $ grep simd_attr attr-simd.c.194t.optimized ;; Function simd_attr (simd_attr, funcdef_no=0, decl_uid=1389, cgraph_uid=0, symbol_order=0) simd_attr () ;; Function simd_attr2 (simd_attr2, funcdef_no=1, decl_uid=1392, cgraph_uid=1, symbol_order=1) simd_attr2 () As far as vABI is supported on x86_64/i?86 only, I am going to enable mentioned `scan-tree-dump' only for these targets. This should cure both IA64 and Power. Concerning attr-simd-3.c. It is known issue: PR68158. And I believe this test should work everywhere as far as PR is resolved. I'll put xfail into the test. Which will lead to (in g++.log): gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes]^M output is: gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes]^M XFAIL: c-c++-common/attr-simd-3.c -std=gnu++98 PR68158 (test for errors, line 5) FAIL: c-c++-common/attr-simd-3.c -std=gnu++98 (test for excess errors) Excess errors: gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes] Patch in the bottom. gcc/tessuite/ * c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error. This test fails on bare-metal targets that don't support -fcilkplus or -pthread. Would you consider moving them to the cilkplus testing directory or adding an appropriate effective target check? I think an effective-target-check is the most appropriate. The whole point here is to make that attribute independent of Cilk support. jeff
RFA: PATCH to match.pd for c++/68385
In this bug, we hit the (A & sign-bit) != 0 -> A < 0 transformation. Because of delayed folding, the operands aren't fully folded yet, so we have NOP_EXPRs around INTEGER_CSTs, and so calling wi::only_sign_bit_p ICEs. We've been seeing several similar bugs, where code calls integer_zerop and therefore assumes that they have an INTEGER_CST, but in fact integer_zerop does STRIP_NOPS. This patch changes the pattern to only match if the operand is actually an INTEGER_CST. Alternatively we could call tree_strip_nop_conversions on the operand, but I would expect that to have issues when the conversion changes the signedness of the type. OK if testing passes? commit e7b45ed6775c88c6d48c5863738ba0db2e38fc5e Author: Jason MerrillDate: Fri Nov 20 14:40:35 2015 -0500 PR c++/68385 * match.pd: Don't assume that integer_pow2p implies INTEGER_CST. diff --git a/gcc/match.pd b/gcc/match.pd index e86cc8b..1981ae7 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2232,6 +2232,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && (TYPE_PRECISION (TREE_TYPE (@0)) == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0 && element_precision (@2) >= element_precision (@0) + && TREE_CODE (@1) == INTEGER_CST && wi::only_sign_bit_p (@1, element_precision (@0))) (with { tree stype = signed_type_for (TREE_TYPE (@0)); } (ncmp (convert:stype @0) { build_zero_cst (stype); })
Re: [PATCH] (Partial) Implementation of simplificaiton of CSHIFT
On Thu, Nov 19, 2015 at 04:58:36PM -0800, Steve Kargl wrote: > > 2015-11-19 Steven G. Kargl> > * intrinsic.h: Prototype for gfc_simplify_cshift > * intrinsic.c (add_functions): Use gfc_simplify_cshift. > * simplify.c (gfc_simplify_cshift): Implement simplification of CSHIFT. > (gfc_simplify_spread): Remove a FIXME and add error condition. > > 2015-11-19 Steven G. Kargl > > * gfortran.dg/simplify_cshift_1.f90: New test. > I've attached an updated patch. The changes consists of 1) better/more comments 2) re-organize code to reduce copying of the array. 3) add optimization for a left/right shift that returns the original array. 4) Don't leak memory. -- Steve Index: gcc/fortran/intrinsic.c === --- gcc/fortran/intrinsic.c (revision 230585) +++ gcc/fortran/intrinsic.c (working copy) @@ -1659,9 +1659,11 @@ add_functions (void) make_generic ("count", GFC_ISYM_COUNT, GFC_STD_F95); - add_sym_3 ("cshift", GFC_ISYM_CSHIFT, CLASS_TRANSFORMATIONAL, ACTUAL_NO, BT_REAL, dr, GFC_STD_F95, - gfc_check_cshift, NULL, gfc_resolve_cshift, - ar, BT_REAL, dr, REQUIRED, sh, BT_INTEGER, di, REQUIRED, + add_sym_3 ("cshift", GFC_ISYM_CSHIFT, CLASS_TRANSFORMATIONAL, ACTUAL_NO, + BT_REAL, dr, GFC_STD_F95, + gfc_check_cshift, gfc_simplify_cshift, gfc_resolve_cshift, + ar, BT_REAL, dr, REQUIRED, + sh, BT_INTEGER, di, REQUIRED, dm, BT_INTEGER, ii, OPTIONAL); make_generic ("cshift", GFC_ISYM_CSHIFT, GFC_STD_F95); Index: gcc/fortran/intrinsic.h === --- gcc/fortran/intrinsic.h (revision 230585) +++ gcc/fortran/intrinsic.h (working copy) @@ -271,6 +271,7 @@ gfc_expr *gfc_simplify_conjg (gfc_expr * gfc_expr *gfc_simplify_cos (gfc_expr *); gfc_expr *gfc_simplify_cosh (gfc_expr *); gfc_expr *gfc_simplify_count (gfc_expr *, gfc_expr *, gfc_expr *); +gfc_expr *gfc_simplify_cshift (gfc_expr *, gfc_expr *, gfc_expr *); gfc_expr *gfc_simplify_dcmplx (gfc_expr *, gfc_expr *); gfc_expr *gfc_simplify_dble (gfc_expr *); gfc_expr *gfc_simplify_digits (gfc_expr *); Index: gcc/fortran/simplify.c === --- gcc/fortran/simplify.c (revision 230585) +++ gcc/fortran/simplify.c (working copy) @@ -1789,6 +1789,99 @@ gfc_simplify_count (gfc_expr *mask, gfc_ gfc_expr * +gfc_simplify_cshift (gfc_expr *array, gfc_expr *shift, gfc_expr *dim) +{ + int dm; + gfc_expr *a; + + /* DIM is only useful for rank > 1, but deal with it here as one can + set DIM = 1 for rank = 1. */ + if (dim) +{ + if (!gfc_is_constant_expr (dim)) + return NULL; + dm = mpz_get_si (dim->value.integer); +} + else +dm = 1; + + /* Copy array into 'a', simplify it, and then test for a constant array. + Unexpected expr_type cause an ICE. */ + switch (array->expr_type) +{ + case EXPR_VARIABLE: + case EXPR_ARRAY: + a = gfc_copy_expr (array); + gfc_simplify_expr (a, 0); + if (!is_constant_array_expr (a)) + { + gfc_free_expr (a); + return NULL; + } + break; + default: + gcc_unreachable (); +} + + if (a->rank == 1) +{ + gfc_constructor *ca, *cr; + gfc_expr *result; + mpz_t size; + int i, j, shft, sz; + + if (!gfc_is_constant_expr (shift)) + { + gfc_free_expr (a); + return NULL; + } + + shft = mpz_get_si (shift->value.integer); + + /* Case (i): If ARRAY has rank one, element i of the result is + ARRAY (1 + MODULO (i + SHIFT 1, SIZE (ARRAY))). */ + + mpz_init (size); + gfc_array_size (a, ); + sz = mpz_get_si (size); + mpz_clear (size); + + /* Special case: rank 1 array with no shift or a complete shift to + the original order! */ + if (shft == 0 || shft == sz || shft == 1 - sz) + return a; + + /* Adjust shft to deal with right or left shifts. */ + shft = shft < 0 ? 1 - shft : shft; + + result = gfc_copy_expr (a); + cr = gfc_constructor_first (result->value.constructor); + for (i = 0; i < sz; i++, cr = gfc_constructor_next (cr)) + { + j = (i + shft) % sz; + ca = gfc_constructor_first (a->value.constructor); + while (j-- > 0) + ca = gfc_constructor_next (ca); + cr->expr = gfc_copy_expr (ca->expr); + } + + gfc_free_expr (a); + return result; +} + else +{ + /* FIXME: Deal with rank > 1 arrays. For now, don't leak memory + and exit with an error message. */ + gfc_free_expr (a); + gfc_error ("Simplification of CSHIFT with an array with rank > 1 " + "no yet support"); +} + + return NULL; +} + + +gfc_expr * gfc_simplify_dcmplx (gfc_expr *x, gfc_expr *y) { return simplify_cmplx ("DCMPLX", x, y, gfc_default_double_kind); @@ -6089,10 +6182,11 @@ gfc_simplify_spread (gfc_expr *source, g } } else -/* FIXME: Returning here
Re: RFA: PATCH to match.pd for c++/68385
On 11/20/2015 02:58 PM, Jason Merrill wrote: OK if testing passes? Which it did. Jason
[PATCH] Fix up reduction-1{1,2} testcases (PR middle-end/68221)
Hi! If C/C++ array section reductions have non-zero (positive) bias, it is implemented by declaring a smaller private array and subtracting the bias from the start of the private array (because valid code may only dereference elements from bias onwards). But, this isn't something that is kosher in C/C++ pointer arithmetics and the alias oracle seems to get upset on that. So, the following patch fixes that by performing the subtraction on integral type instead of p+ -bias. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2015-11-20 Jakub JelinekPR middle-end/68221 * omp-low.c (lower_rec_input_clauses): If C/C++ array reduction has non-zero bias, subtract it in integer type instead of pointer plus of negated bias. * testsuite/libgomp.c/reduction-11.c: Remove xfail. * testsuite/libgomp.c/reduction-12.c: Likewise. * testsuite/libgomp.c++/reduction-11.C: Likewise. * testsuite/libgomp.c++/reduction-12.C: Likewise. --- gcc/omp-low.c.jj2015-11-20 12:56:17.0 +0100 +++ gcc/omp-low.c 2015-11-20 13:44:29.080374051 +0100 @@ -,11 +,13 @@ lower_rec_input_clauses (tree clauses, g if (!integer_zerop (bias)) { - bias = fold_convert_loc (clause_loc, sizetype, bias); - bias = fold_build1_loc (clause_loc, NEGATE_EXPR, - sizetype, bias); - x = fold_build2_loc (clause_loc, POINTER_PLUS_EXPR, - TREE_TYPE (x), x, bias); + bias = fold_convert_loc (clause_loc, pointer_sized_int_node, + bias); + yb = fold_convert_loc (clause_loc, pointer_sized_int_node, +x); + yb = fold_build2_loc (clause_loc, MINUS_EXPR, + pointer_sized_int_node, yb, bias); + x = fold_convert_loc (clause_loc, TREE_TYPE (x), yb); yb = create_tmp_var (ptype, name); gimplify_assign (yb, x, ilist); x = yb; --- libgomp/testsuite/libgomp.c/reduction-11.c.jj 2015-11-05 16:03:53.0 +0100 +++ libgomp/testsuite/libgomp.c/reduction-11.c 2015-11-20 13:38:24.448520879 +0100 @@ -1,4 +1,4 @@ -/* { dg-do run { xfail *-*-* } } */ +/* { dg-do run } */ char z[10] = { 0 }; --- libgomp/testsuite/libgomp.c/reduction-12.c.jj 2015-11-05 16:03:53.0 +0100 +++ libgomp/testsuite/libgomp.c/reduction-12.c 2015-11-20 13:38:34.565378078 +0100 @@ -1,4 +1,4 @@ -/* { dg-do run { xfail *-*-* } } */ +/* { dg-do run } */ struct A { int t; }; struct B { char t; }; --- libgomp/testsuite/libgomp.c++/reduction-11.C.jj 2015-11-05 16:03:53.0 +0100 +++ libgomp/testsuite/libgomp.c++/reduction-11.C2015-11-20 13:37:53.921951766 +0100 @@ -1,4 +1,4 @@ -// { dg-do run { xfail *-*-* } } +// { dg-do run } char z[10] = { 0 }; --- libgomp/testsuite/libgomp.c++/reduction-12.C.jj 2015-11-05 16:03:53.0 +0100 +++ libgomp/testsuite/libgomp.c++/reduction-12.C2015-11-20 13:38:03.983809741 +0100 @@ -1,4 +1,4 @@ -// { dg-do run { xfail *-*-* } } +// { dg-do run } template struct A Jakub
[PATCH] Fix debug info related ICE when inlining back fnsplit function (PR debug/66432)
Hi! This patch fixes ICE where a parameter mentioned in a debug source bind has its VLA type mistakenly remapped and that leads to inconsistent type between the PARM_DECL and SSA_NAMEs derived from it. The patch Tom posted for this can't work, because we assume that the s=> value as well as debug_decl_args decl in that case is DECL_ORIGIN of the PARM_DECL. But, in this case we are replacing the DECL_ORIGIN PARM_DECL immediately with a DEBUG_EXPR_DECL anyway, so there is no point remapping the var we don't own (it could be in a different function etc.). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5.3? 2015-11-20 Jakub JelinekPR debug/66432 * tree-inline.c (copy_debug_stmt): If gimple_debug_source_bind_get_value is DECL_ORIGIN of a PARM_DECL in decl_debug_args, don't call remap_gimple_op_r on it. * gcc.dg/debug/pr66432.c: New test. --- gcc/tree-inline.c.jj2015-11-14 19:36:03.0 +0100 +++ gcc/tree-inline.c 2015-11-20 17:17:00.632082622 +0100 @@ -2864,8 +2864,6 @@ copy_debug_stmt (gdebug *stmt, copy_body else if (gimple_debug_source_bind_p (stmt)) { gimple_debug_source_bind_set_var (stmt, t); - walk_tree (gimple_debug_source_bind_get_value_ptr (stmt), -remap_gimple_op_r, , NULL); /* When inlining and source bind refers to one of the optimized away parameters, change the source bind into normal debug bind referring to the corresponding DEBUG_EXPR_DECL that should have @@ -2889,7 +2887,10 @@ copy_debug_stmt (gdebug *stmt, copy_body break; } } - } + } + if (gimple_debug_source_bind_p (stmt)) + walk_tree (gimple_debug_source_bind_get_value_ptr (stmt), + remap_gimple_op_r, , NULL); } processing_debug_stmt = 0; --- gcc/testsuite/gcc.dg/debug/pr66432.c.jj 2015-11-20 17:40:44.589171083 +0100 +++ gcc/testsuite/gcc.dg/debug/pr66432.c2015-11-20 17:38:48.0 +0100 @@ -0,0 +1,19 @@ +/* PR debug/66432 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -g" } */ + +extern void baz (const char *, const char *) __attribute__ ((__noreturn__)); + +void +foo (int x, int y[x][x]) +{ + if (x < 2) +baz ("", ""); +} + +void +bar (void) +{ + int z[2][2] = { { 1, 2 }, { 3, 4 } }; + foo (2, z); +} Jakub
[PATCH] Fix GC ICE during simd clone creation (PR middle-end/68339)
Hi! node->get_body () can run various IPA passes and ggc_collect in them, so it is undesirable to hold pointers to GC memory in automatic vars over it. While I could store those vars (clone_info, clone and id) into special GTY vars just to avoid collecting them, it seems easier to call node->get_body () earlier. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk and 5 branch. 2015-11-20 Jakub JelinekPR middle-end/68339 * omp-low.c (expand_simd_clones): Call node->get_body () before allocating stuff in GC. * gcc.dg/vect/pr68339.c: New test. --- gcc/omp-low.c.jj2015-11-18 11:19:19.0 +0100 +++ gcc/omp-low.c 2015-11-20 12:56:17.075193601 +0100 @@ -18319,6 +18319,10 @@ expand_simd_clones (struct cgraph_node * && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE) return; + /* Call this before creating clone_info, as it might ggc_collect. */ + if (node->definition && node->has_gimple_body_p ()) +node->get_body (); + do { /* Start with parsing the "omp declare simd" attribute(s). */ --- gcc/testsuite/gcc.dg/vect/pr68339.c.jj 2015-11-20 13:10:47.756905395 +0100 +++ gcc/testsuite/gcc.dg/vect/pr68339.c 2015-11-20 13:08:13.0 +0100 @@ -0,0 +1,17 @@ +/* PR middle-end/68339 */ +/* { dg-do compile } */ +/* { dg-options "--param ggc-min-heapsize=0 --param ggc-min-expand=0 -fopenmp-simd" } */ + +#pragma omp declare simd notinbranch +int +f1 (int x) +{ + return x; +} + +#pragma omp declare simd notinbranch +int +f2 (int x) +{ + return x; +} Jakub
[PATCH] Fix PR objc/68438 (uninitialized source ranges)
The attached patch fixes some more places in c/c-parser.c where the src_range field of a c_expr was being left uninitialized, this time for various Objective C constructs. The source ranges are verified using the same unit-testing plugin used for C expressions. This leads to a wart, which is that it means having a .m test file lurking below gcc.dg/plugin. A workaround would be to create an objc.dg/plugin subdirectory, with a new plugin.exp for testing Objective C plugins, and having it reference the existing plugin below gcc.dg/plugin. That seemed like excessive complication, so I went for the simpler approach of simply putting the .m file below gcc.dg/plugin. I manually verified that this fixes the specific valgrind warnings reported in the PR, and visually inspected the code for objective-c-specific instances of uninitialized c_expr src_range values. Bootstrapped on x86_64-pc-linux-gnu; adds 15 PASS results to gcc.sum. OK for trunk? >From afdae8b15f71164d0d05e790078519b38bd674a4 Mon Sep 17 00:00:00 2001 From: David MalcolmDate: Fri, 20 Nov 2015 11:12:47 -0500 Subject: [PATCH] Fix PR objc/68438 (uninitialized source ranges) gcc/c/ChangeLog: PR objc/68438 * c-parser.c (c_parser_postfix_expression): Set up source ranges for various Objective-C constructs: Class.name syntax, @selector(), @protocol, @encode(), and [] message syntax. gcc/testsuite/ChangeLog: PR objc/68438 * gcc.dg/plugin/diagnostic-test-expressions-2.m: New test file. * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above file. --- gcc/c/c-parser.c | 17 +++- .../gcc.dg/plugin/diagnostic-test-expressions-2.m | 94 ++ gcc/testsuite/gcc.dg/plugin/plugin.exp | 3 +- 3 files changed, 110 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 7b10764..18e9957 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -7338,10 +7338,13 @@ c_parser_postfix_expression (c_parser *parser) expr.value = error_mark_node; break; } - component = c_parser_peek_token (parser)->value; + c_token *component_tok = c_parser_peek_token (parser); + component = component_tok->value; + location_t end_loc = component_tok->get_finish (); c_parser_consume_token (parser); expr.value = objc_build_class_component_ref (class_name, component); + set_c_expr_source_range (, loc, end_loc); break; } default: @@ -7816,9 +7819,11 @@ c_parser_postfix_expression (c_parser *parser) } { tree sel = c_parser_objc_selector_arg (parser); + location_t close_loc = c_parser_peek_token (parser)->location; c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>"); expr.value = objc_build_selector_expr (loc, sel); + set_c_expr_source_range (, loc, close_loc); } break; case RID_AT_PROTOCOL: @@ -7839,9 +7844,11 @@ c_parser_postfix_expression (c_parser *parser) { tree id = c_parser_peek_token (parser)->value; c_parser_consume_token (parser); + location_t close_loc = c_parser_peek_token (parser)->location; c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>"); expr.value = objc_build_protocol_expr (id); + set_c_expr_source_range (, loc, close_loc); } break; case RID_AT_ENCODE: @@ -7860,11 +7867,13 @@ c_parser_postfix_expression (c_parser *parser) c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL); break; } - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, - "expected %<)%>"); { + location_t close_loc = c_parser_peek_token (parser)->location; + c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, + "expected %<)%>"); tree type = groktypename (t1, NULL, NULL); expr.value = objc_build_encode_expr (type); + set_c_expr_source_range (, loc, close_loc); } break; case RID_GENERIC: @@ -7907,9 +7916,11 @@ c_parser_postfix_expression (c_parser *parser) c_parser_consume_token (parser); receiver = c_parser_objc_receiver (parser); args = c_parser_objc_message_args (parser); + location_t close_loc = c_parser_peek_token (parser)->location; c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE, "expected %<]%>"); expr.value = objc_build_message_expr (receiver, args); + set_c_expr_source_range (, loc, close_loc); break; } /* Else fall through to report error. */ diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m new file mode 100644 index 000..ed7aca3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m @@ -0,0 +1,94 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdiagnostics-show-caret" } */ + +/* This file is similar to
Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt
On 11/20/2015 07:08 AM, Bernd Schmidt wrote: BZ27313 is marked as fixed by the introduction of the tree cselim pass, thus the problem won't even be seen at RTL level. Cool. I'm undecided on whether cs-elim is safe wrt the store speculation vs locks concerns raised in the thread discussing Ian's noce_can_store_speculate_p, but that's not something we have to consider to solve the problem at hand. I don't think cs-elim is safe WRT locks and such in multi-threaded code. In particular it replaces this: bb0: if (cond) goto bb2; else goto bb1; bb1: *p = RHS; bb2: with bb0: if (cond) goto bb1; else goto bb2; bb1: condtmp' = *p; bb2: condtmp = PHI*p = condtmp; If *p is a shared memory location, then there may be another writer. If that writer happens to store something in that location after the load of *p, but before the store to *p, then that store will get lost in the transformed pseudo code. That seems to introduce a data race. Presumably one would call the original code ill-formed WRT the C11/C++11 memory model since the shared location is not marked as such. I'm willing to consider this an independent problem. As far as I can tell hmmer and the 27313 testcase are unaffected at -O2 (if anything, hmmer was very slightly faster afterwards). The run wasn't super-scientific, but I wouldn't have expected anything else given the existence of cs-elim. Cool. It doesn't have to be super-scientific. Good to see it didn't harm anything -- thanks for going the extra mile on this one. Ok to do the above, removing all the bits made unnecessary (including memory_must_be_modified_in_insn_p in alias.c)? Yup. Zap it. jeff
[commit] [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)
On Fri, 20 Nov 2015 18:40:46 +0100, Jonathan Wakely wrote: > The patch is OK for trunk and gcc-5-branch. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68448 trunk: https://gcc.gnu.org/viewcvs/gcc?view=revision=230669 5.x: https://gcc.gnu.org/viewcvs/gcc?view=revision=230670 Jan
libgo patch committed: use correct tool dir with gccgo
This patch from Lynn Boger fixes the go tool shipped with gccgo to use the correct tool directory. It also fixes the 'go tool' output to only list known tools. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline and GCC 5 branch. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230657) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -dfa74d975884f363c74d6a66a37b1703093fdba6 +d52835c9376985f92f35c32af5f1808239981536 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/cmd/go/pkg.go === --- libgo/go/cmd/go/pkg.go (revision 230463) +++ libgo/go/cmd/go/pkg.go (working copy) @@ -785,7 +785,11 @@ func (p *Package) load(stk *importStack, if goTools[p.ImportPath] == toTool { // This is for 'go tool'. // Override all the usual logic and force it into the tool directory. - p.target = filepath.Join(gorootPkg, "tool", full) + if buildContext.Compiler == "gccgo" { + p.target = filepath.Join(runtime.GCCGOTOOLDIR, elem) + } else { + p.target = filepath.Join(gorootPkg, "tool", full) + } } if p.target != "" && buildContext.GOOS == "windows" { p.target += ".exe" Index: libgo/go/cmd/go/tool.go === --- libgo/go/cmd/go/tool.go (revision 230463) +++ libgo/go/cmd/go/tool.go (working copy) @@ -39,6 +39,12 @@ var ( toolN bool ) +// List of go tools found in the gccgo tool directory. +// Other binaries could be in the same directory so don't +// show those with the 'go tool' command. + +var gccgoTools = []string{"cgo", "fix", "cover", "godoc", "vet"} + func init() { cmdTool.Flag.BoolVar(, "n", false, "") } @@ -146,6 +152,21 @@ func listTools() { if toolIsWindows && strings.HasSuffix(name, toolWindowsExtension) { name = name[:len(name)-len(toolWindowsExtension)] } - fmt.Println(name) + + // The tool directory used by gccgo will have other binaries + // in additions to go tools. Only display go tools for this list. + + if buildContext.Compiler == "gccgo" { + for _, tool := range gccgoTools { + if tool == name { + fmt.Println(name) + } + } + } else { + + // Not gccgo, list all the tools found in this dir + + fmt.Println(name) + } } }
Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1
On 11/20/2015 11:17 AM, James Greenhalgh wrote: On Tue, Nov 10, 2015 at 11:54:00AM -0600, Evandro Menezes wrote: 2015-11-10 Evandro Menezesgcc/ * config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model. * config/aarch64/aarch64.md: Include "exynos-m1.md". * config/arm/arm-cores.def: Use the Exynos M1 sched model. * config/arm/arm.md: Include "exynos-m1.md". * config/arm/arm-tune.md: Regenerated. * config/arm/exynos-m1.md: New file. This patch adds the scheduling model for Exynos M1. It depends on https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu. Please, commit if it's alright. From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 9 Nov 2015 17:18:52 -0600 Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1 gcc/ * config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model. * config/aarch64/aarch64.md: Include "exynos-m1.md". These changes are fine. * config/arm/arm-cores.def: Use the Exynos M1 sched model. * config/arm/arm.md: Include "exynos-m1.md". * config/arm/arm-tune.md: Regenerated. These changes need an ack from an ARM reviewer. * config/arm/exynos-m1.md: New file. I have a few comments on this model. +;; The Exynos M1 core is modeled as a triple issue pipeline that has +;; the following functional units. + +(define_automaton "exynos_m1_gp") +(define_automaton "exynos_m1_ls") +(define_automaton "exynos_m1_fp") + +;; 1. Two pipelines for simple integer operations: A, B +;; 2. One pipeline for simple or complex integer operations: C + +(define_cpu_unit "em1_xa, em1_xb, em1_xc" "exynos_m1_gp") + +(define_reservation "em1_alu" "(em1_xa | em1_xb | em1_xc)") +(define_reservation "em1_c" "em1_xc") Is this extra reservation useful, can we not just use em1_xc directly? +;; 3. Two asymmetric pipelines for Neon and FP operations: F0, F1 + +(define_cpu_unit "em1_f0, em1_f1" "exynos_m1_fp") + +(define_reservation "em1_fmac" "em1_f0") +(define_reservation "em1_fcvt" "em1_f0") +(define_reservation "em1_nalu" "(em1_f0 | em1_f1)") +(define_reservation "em1_nalu0" "em1_f0") +(define_reservation "em1_nalu1" "em1_f1") +(define_reservation "em1_nmisc" "em1_f0") +(define_reservation "em1_ncrypt" "em1_f0") +(define_reservation "em1_fadd" "em1_f1") +(define_reservation "em1_fvar" "em1_f1") +(define_reservation "em1_fst" "em1_f1") Same comment here, does this not just obfuscate the interaction between instruction classes in the description. I'm not against doing it this way if you prefer, but it would seem to reduce readability to me. I think there is also an argument that this increases readability, so it is your choice. + +;; 4. One pipeline for branch operations: BX + +(define_cpu_unit "em1_bx" "exynos_m1_gp") + +(define_reservation "em1_br" "em1_bx") + And again? +;; 5. One AGU for loads: L +;; One AGU for stores and one pipeline for stores: S, SD + +(define_cpu_unit "em1_lx" "exynos_m1_ls") +(define_cpu_unit "em1_sx, em1_sd" "exynos_m1_ls") + +(define_reservation "em1_ld" "em1_lx") +(define_reservation "em1_st" "(em1_sx + em1_sd)") + +;; Common occurrences +(define_reservation "em1_sfst" "(em1_fst + em1_st)") +(define_reservation "em1_lfst" "(em1_fst + em1_ld)") + +;; Branches +;; +;; No latency as there is no result +;; TODO: Unconditional branches use no units; +;; conditional branches add the BX unit; +;; indirect branches add the C unit. +(define_insn_reservation "exynos_m1_branch" 0 + (and (eq_attr "tune" "exynosm1") + (eq_attr "type" "branch")) + "em1_br") + +(define_insn_reservation "exynos_m1_call" 1 + (and (eq_attr "tune" "exynosm1") + (eq_attr "type" "call")) + "em1_alu") + +;; Basic ALU +;; +;; Simple ALU without shift, non-predicated +(define_insn_reservation "exynos_m1_alu" 1 + (and (eq_attr "tune" "exynosm1") + (and (not (eq_attr "predicated" "yes")) (and (eq_attr "predicated" "no")) ? Likewise throughout the file? Again this is your choice. This is OK from the AArch64 side, let me know if you plan to change any of the above, otherwise I'll commit it (or someone else can commit it) after I see an OK from an ARM reviewer. The naming choices were made more for uniformity's sake and to conform with the processor documentation. The bit about "not... yes" <=> "no" is a goof up. :-s I'm waiting for Kyrill's patch adding FCCMP to be approved to amend this patch. However, I'd appreciate if it'd be approved and committed before then, as soon the ARM stuff is okayed. Thank you, -- Evandro Menezes
[SPARC] Minor cleanup
This just moves the VIS3 mulx patterns to a more appropriate place. Tested on SPARC/Solaris, applied on the mainline. 2015-11-20 Eric Botcazou* config/sparc/sparc.md (umulxhi_vis): Move around. (*umulxhi_sp64): Likewise. (umulxhi_v8plus): Likewise. (xmulx_vis): Likewise. (*xmulx_sp64): Likewise. (xmulx_v8plus): Likewise. (xmulxhi_vis): Likewise. (*xmulxhi_sp64): Likewise. (xmulxhi_v8plus): Likewise. -- Eric BotcazouIndex: config/sparc/sparc.md === --- config/sparc/sparc.md (revision 230589) +++ config/sparc/sparc.md (working copy) @@ -609,7 +609,7 @@ (define_insn "*cmptf_fp" return "fcmpq\t%1, %2"; } [(set_attr "type" "fpcmp")]) - + ;; Next come the scc insns. ;; Note that the boolean result (operand 0) takes on DImode @@ -647,8 +647,6 @@ (define_expand "cstore4" "TARGET_FPU" { if (emit_scc_insn (operands)) DONE; else FAIL; }) - - ;; Seq_special[_xxx] and sne_special[_xxx] clobber the CC reg, because they ;; generate addcc/subcc instructions. @@ -1137,7 +1135,7 @@ (define_split (match_dup 0)))] "") - + ;; These control RTL generation for conditional jump insns (define_expand "cbranchcc4" @@ -1318,7 +1316,6 @@ (define_insn "*cbcond_sp64" ;; There are no 32 bit brreg insns. -;; XXX (define_insn "*normal_int_branch_sp64" [(set (pc) (if_then_else (match_operator 0 "v9_register_compare_operator" @@ -1335,7 +1332,6 @@ (define_insn "*normal_int_branch_sp64" [(set_attr "type" "branch") (set_attr "branch_type" "reg")]) -;; XXX (define_insn "*inverted_int_branch_sp64" [(set (pc) (if_then_else (match_operator 0 "v9_register_compare_operator" @@ -2730,8 +2726,6 @@ (define_expand "movcc" DONE; }) -;; Conditional move define_insns - (define_insn "*mov_cc_v9" [(set (match_operand:I 0 "register_operand" "=r") (if_then_else:I (match_operator 1 "comparison_operator" @@ -2896,7 +2890,7 @@ (define_insn_and_split "*movtf_cc_reg_sp } [(set_attr "length" "2")]) - + ;; Zero-extension instructions ;; These patterns originally accepted general_operands, however, slightly @@ -4043,7 +4037,6 @@ (define_insn "*muldi3_sp64" [(set_attr "type" "imul")]) ;; V8plus wide multiply. -;; XXX (define_insn "muldi3_v8plus" [(set (match_operand:DI 0 "register_operand" "=r,h") (mult:DI (match_operand:DI 1 "arith_operand" "%r,0") @@ -4094,7 +4087,6 @@ (define_expand "mulsidi3" ;; V9 puts the 64-bit product in a 64-bit register. Only out or global ;; registers can hold 64-bit values in the V8plus environment. -;; XXX (define_insn "mulsidi3_v8plus" [(set (match_operand:DI 0 "register_operand" "=h,r") (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r,r")) @@ -4107,7 +4099,6 @@ (define_insn "mulsidi3_v8plus" [(set_attr "type" "multi") (set_attr "length" "2,3")]) -;; XXX (define_insn "const_mulsidi3_v8plus" [(set (match_operand:DI 0 "register_operand" "=h,r") (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r,r")) @@ -4120,7 +4111,6 @@ (define_insn "const_mulsidi3_v8plus" [(set_attr "type" "multi") (set_attr "length" "2,3")]) -;; XXX (define_insn "*mulsidi3_sp32" [(set (match_operand:DI 0 "register_operand" "=r") (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r")) @@ -4148,7 +4138,6 @@ (define_insn "*mulsidi3_sp64" ;; Extra pattern, because sign_extend of a constant isn't valid. -;; XXX (define_insn "const_mulsidi3_sp32" [(set (match_operand:DI 0 "register_operand" "=r") (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r")) @@ -4203,7 +4192,6 @@ (define_expand "smulsi3_highpart" } }) -;; XXX (define_insn "smulsi3_highpart_v8plus" [(set (match_operand:SI 0 "register_operand" "=h,r") (truncate:SI @@ -4219,7 +4207,6 @@ (define_insn "smulsi3_highpart_v8plus" (set_attr "length" "2")]) ;; The combiner changes TRUNCATE in the previous pattern to SUBREG. -;; XXX (define_insn "" [(set (match_operand:SI 0 "register_operand" "=h,r") (subreg:SI @@ -4236,7 +4223,6 @@ (define_insn "" [(set_attr "type" "multi") (set_attr "length" "2")]) -;; XXX (define_insn "const_smulsi3_highpart_v8plus" [(set (match_operand:SI 0 "register_operand" "=h,r") (truncate:SI @@ -4251,7 +4237,6 @@ (define_insn "const_smulsi3_highpart_v8p [(set_attr "type" "multi") (set_attr "length" "2")]) -;; XXX (define_insn "*smulsi3_highpart_sp32" [(set (match_operand:SI 0 "register_operand" "=r") (truncate:SI @@ -4263,7 +4248,6 @@ (define_insn "*smulsi3_highpart_sp32" [(set_attr "type" "multi") (set_attr "length" "2")]) -;; XXX (define_insn "const_smulsi3_highpart" [(set (match_operand:SI 0 "register_operand" "=r") (truncate:SI @@ -4301,7 +4285,6 @@ (define_expand "umulsidi3" } }) -;; XXX (define_insn "umulsidi3_v8plus" [(set
Re: [PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class
On 11/20/2015 01:52 PM, H.J. Lu wrote: On Tue, Nov 17, 2015 at 4:22 AM, Richard Bienerwrote: On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu wrote: Empty record should be returned and passed the same way in C and C++. This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which defaults to return false. For C++, LANG_HOOKS_EMPTY_RECORD_P is defined to is_really_empty_class, which returns true for C++ empty classes. For LTO, we stream out a bit to indicate if a record is empty and we store it in TYPE_LANG_FLAG_0 when streaming in. get_ref_base_and_extent is changed to set bitsize to 0 for empty records. Middle-end and x86 backend are updated to ignore empty records for parameter passing and function value return. Other targets may need similar changes. Please avoid a new langhook for this and instead claim a bit in tree_type_common like for example restrict_flag (double-check it is unused for non-pointers). There is no bit in tree_type_common I can overload. restrict_flag is checked for non-pointers to issue an error when it is used on non-pointers: /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38: error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’ typedef typename T::L __restrict__ r;// { dg-error "'__restrict__' qualifiers cannot" "" } The C++ front end only needs to check TYPE_RESTRICT for this purpose on front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals could handle that specifically if you change TYPE_RESTRICT to only apply to pointers. Jason
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/20/2015 02:19 AM, Nick Clifton wrote: Hi Jeff, The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. So this looks like a pretty fundamental failing in ree.c and AFAICT has been there since day 1. Thankfully, I don't think it's likely to trigger all that often. It's useful to ponder two cases. One where we're going to add a copy and one where we don't. I added the copy case during the 4.9 cycle as an extension of existing code. It only fires in easy to analyze circumstances -- when the original set and the extension are in the same basic block, but have different destination registers. In the copy case. The code checks that the destination of the *extension* is neither used nor set between the original set and the extension. So in the case of multi-word hard reg, we'll verify that the entire multi-word hard reg survives from the original insn to the extension. That avoids this problem in cases where a copy is needed. However, in the case where no copy is needed, no such checks are made. In fact the checks could be fairly painful as the original set and the extension can be in different blocks with arbitrary flow between them. I would hazard a guess that the authors simply didn't consider the multi-hard reg case. Essentially if the original set reached an extension, then obviously the original set got there unharmed and the extended destination should reach as well -- except that doesn't apply to multi-word hard regs. Given the expense in checking all the potential paths between the original set and the extension, I think just disallowing this case ought to be is the best way to go. Ideally we'll filter out that case before we get to combine_set_extension. But if we can't, it's easy enough to test there. Jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
> I would hazard a guess that the authors simply didn't consider the > multi-hard reg case. Essentially if the original set reached an > extension, then obviously the original set got there unharmed and the > extended destination should reach as well -- except that doesn't apply > to multi-word hard regs. This pass started as a Zero-Extension Elimination pass for x86-64 and was only considering implicit SI->DI extensions initially. The algorithm was tailored to this specific pattern and I agree that the pass should be reimplemented. -- Eric Botcazou
Re: [PATCH] g++.dg/init/vbase1.C and g++.dg/cpp/ucn-1.C
On Nov 16, 2015, at 6:02 AM, Renlin Liwrote: > On 14/11/15 00:33, David Edelsohn wrote: >> No RISC architecture can store directly to MEM, so the expected RTL in >> g++.dg/init/vbase1.C is wrong. I am adding XFAIL for PowerPC. This >> probably should be disabled for ARM and other RISC architectures. > > I observed the same problem in arm. > > This passes for aarch64 and mips as they have zero register to do that. > However, other RISC might not have that feature, for example arm and RS6000 > in this case. I fixed this with the below patch. Tested on x86_64 linux, x86_64 darwin and my port. If you want to list aarch64/arn and mips, please do. * g++.dg/init/vbase1.C: Only run on x86_64-*-* as this testcase isn't portable. Index: g++.dg/init/vbase1.C === --- g++.dg/init/vbase1.C(revision 230675) +++ g++.dg/init/vbase1.C(working copy) @@ -42,4 +42,4 @@ int main(int, char**) // Verify that the SubB() mem-initializer is storing 0 directly into // this->D.whatever rather than into a stack temp that is then copied into the // base field. -// { dg-final { scan-rtl-dump "set \[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" { xfail { powerpc*-*-* } } } } +// { dg-final { scan-rtl-dump "set \[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" { target { x86_64-*-* } } } }
Re: [PATCH PR52272]Be smart when adding iv candidates
On Wed, Nov 18, 2015 at 11:50 PM, Bernd Schmidtwrote: > On 11/10/2015 11:19 AM, Bin.Cheng wrote: >> >> On Tue, Nov 10, 2015 at 6:06 PM, Bernd Schmidt >> wrote: >>> >>> >>> Multi-line expressions should be wrapped in parentheses so that >>> emacs/indent >>> can format them automatically. Two sets of parens are needed for this. >>> Operators should then line up appropriately. >> >> Ah, thanks for teaching. Here is the updated patch, hoping it's correct. > > It looks like you're waiting to check it in - Richard's earlier approval > still holds. Thanks, applied as r230647. > > > Bernd >
Re: [PATCH][RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves
On 20/11/15 01:41, Bernd Schmidt wrote: I1 is def_insn, I3 is cand->insn. tmp_reg is 'ax'. What we want to do is reject this transformation because the destination of def_insn (aka I1), that is 'ax', is not the operand of the extend operation in cand->insn (aka I3). As you said, rtx_equal won't work on just SET_SRC (PATTERN (cand->insn)) because it's an extend operation. So reg_overlap_mentioned should be appropriate. Yeah, so just use the src_reg variable for the comparison. I still don't see why you wouldn't want to use the stronger test. But the whole thing still feels not completely ideal somehow, so after reading through ree.c for a while and getting a better feeling for how it works, I think the following (which you said is equivalent) would be the most understandable and direct fix. You said that the two tests should be equivalent, and I agree. I've not found cases where the change makes a difference, other than the testcase. Would you mind running this version through the testsuite and committing if it passes? I've shrunk the comment; massive explanations like this for every bug are inappropriate IMO, and the example also duplicates an earlier comment in the same function. And, as I said earlier, the way you placed the comment is confusing because only one part of the following if statement is related to it. Ok, thanks for the explanation. When investigating this bug I tried a patch identical to yours and it had worked just fine. My patch was just an alternative approach to the same issue. I'll retest it just to double-check and I'll incorporate the testsuite additions. Thanks for your help! Kyrill Bernd
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
Hi Jeff, The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. Cheers Nick
[PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
When vectorising a integer induction condition reduction, is_nonwrapping_integer_induction ends up with different values for base during the analysis and build phases. In the first it is an INTEGER_CST, in the second the loop has been vectorised out and the base is now a variable. This results in the analysis and build stage detecting the STMT_VINFO_VEC_REDUCTION_TYPE as different types. The easiest way to fix this is to only check for integer induction conditions on the analysis stage. gcc/ PR tree-optimization/68413 * tree-vect-loop.c (vectorizable_reduction): Only check for integer cond reduction on analysis stage. Thanks, Alan. analysisonlycondcheck.patch Description: Binary data
libgo patch committed: Handle DW_AT_specification in cgo
The earlydebug work has caused https://gcc.gnu.org/PR68072 when using the cgo tool. The patch to fix this in the master sources is https://golang.org/cl/17151 . This patch fixes the problem in the gccgo sources. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline and GCC 5 branch. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230677) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -d52835c9376985f92f35c32af5f1808239981536 +128d5b14b8ab967cb61c01a9b2c596bda7d04c63 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/cmd/cgo/gcc.go === --- libgo/go/cmd/cgo/gcc.go (revision 230463) +++ libgo/go/cmd/cgo/gcc.go (working copy) @@ -490,6 +490,11 @@ func (p *Package) loadDWARF(f *File, nam name, _ := e.Val(dwarf.AttrName).(string) typOff, _ := e.Val(dwarf.AttrType).(dwarf.Offset) if name == "" || typOff == 0 { + if e.Val(dwarf.AttrSpecification) != nil { + // Since we are reading all the DWARF, + // assume we will see the variable elsewhere. + break + } fatalf("malformed DWARF TagVariable entry") } if !strings.HasPrefix(name, "__cgo__") {
Re: [PATCH] Fix PR objc/68438 (uninitialized source ranges)
On Fri, 20 Nov 2015, David Malcolm wrote: > The source ranges are verified using the same unit-testing plugin used > for C expressions. This leads to a wart, which is that it means having > a .m test file lurking below gcc.dg/plugin. A workaround would be to > create an objc.dg/plugin subdirectory, with a new plugin.exp for testing > Objective C plugins, and having it reference the existing plugin below > gcc.dg/plugin. That seemed like excessive complication, so I went for > the simpler approach of simply putting the .m file below gcc.dg/plugin. Have you made sure that this test is quietly not run if the --enable-languages configuration does not build the ObjC compiler? -- Joseph S. Myers jos...@codesourcery.com