Re: [PATCH 5/5] Add plugin to recursively dump the source-ranges in a tree (v2)
David Malcolm a écrit: > This patch adds a test plugin that recurses down an expression tree, > printing diagnostics showing the ranges of each node in the tree. > > It corresponds to: > [PATCH 15/22] Add plugin to recursively dump the source-ranges in a tree > https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00741.html > from v1 of the patch kit. > > Changes in v2: > * the output no longer contains the PARAM_DECL and INTEGER_CST > leaves since we no longer have range data for them; updated > the expected output accordingly. > * slightly updated to eliminate use of SOURCE_RANGE > > Updated screenshot: > > https://dmalcolm.fedorapeople.org/gcc/2015-09-22/diagnostic-test-show-trees-1.html > > gcc/testsuite/ChangeLog: > * gcc.dg/plugin/diagnostic-test-show-trees-1.c: New file. > * gcc.dg/plugin/diagnostic_plugin_show_trees.c: New file. > * gcc.dg/plugin/plugin.exp (plugin_test_list): Add > diagnostic_plugin_show_trees.c and > diagnostic-test-show-trees-1.c. For what it's worth, this looks good to me. Thanks! -- Dodji
Re: [AArch64] Fix Prefetch ICE
Hi Marcus, Thanks for the review and comments. >> OK and can you back port to 5 ? Please find attached the backported patch on gcc-5-branch. Regression tested on AArch64 without any issues. 2015-09-28 Andrew Pinski ChangeLog * config/aarch64/aarch64.md (prefetch): Change the predicate of operand 0 to register_operand. Thanks, Naveen Index: config/aarch64/aarch64.md === --- config/aarch64/aarch64.md (revision 228182) +++ config/aarch64/aarch64.md (working copy) @@ -382,7 +382,7 @@ ) (define_insn "prefetch" - [(prefetch (match_operand:DI 0 "address_operand" "r") + [(prefetch (match_operand:DI 0 "register_operand" "r") (match_operand:QI 1 "const_int_operand" "") (match_operand:QI 2 "const_int_operand" ""))] ""
RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.
-Original Message- From: Bin.Cheng [mailto:amker.ch...@gmail.com] Sent: Monday, September 28, 2015 7:05 AM To: Ajit Kumar Agarwal Cc: Segher Boessenkool; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost. On Sun, Sep 27, 2015 at 11:13 PM, Ajit Kumar Agarwal wrote: > > > -Original Message- > From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] > Sent: Sunday, September 27, 2015 7:49 PM > To: Ajit Kumar Agarwal > Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli > Hunsigida; Nagaraju Mekala > Subject: Re: [Patch,optimization]: Optimized changes in the estimate register > pressure cost. > > On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote: >> SPEC CPU 2000 benchmarks are run and there is following impact on the >> performance and code size. >> >> ratio with the optimization vs ratio without optimization for INT >> benchmarks >> (3807.632 vs 3804.661) >> >> ratio with the optimization vs ratio without optimization for FP >> benchmarks ( 4668.743 vs 4778.741) > >>>Did you swap these? You're saying FP got significantly worse? > > Sorry for the typo error. Please find the corrected one. > > Ratio with the optimization vs ratio without optimization for FP > benchmarks ( 4668.743 vs 4668.741). With the optimization FP is slightly > better performance. >>Did you mis-type the number again? Or this must be noise. Now I remember >>why I didn't get perf improvement from this. Changing reg_new to reg_new >>+ >>reg_old doesn't have big impact because it just increased the starting number >>for each scenarios. Maybe it still makes sense for cases on the verge of exceeding target's available register number. I will try to collect >>benchmark data on ARM, but it may take some time. This is the correct one. Thanks & Regards Ajit Thanks, bin > > Thanks & Regards > Ajit > > Segher
Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.
On Sun, Sep 27, 2015 at 11:13 PM, Ajit Kumar Agarwal wrote: > > > -Original Message- > From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] > Sent: Sunday, September 27, 2015 7:49 PM > To: Ajit Kumar Agarwal > Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; > Nagaraju Mekala > Subject: Re: [Patch,optimization]: Optimized changes in the estimate register > pressure cost. > > On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote: >> SPEC CPU 2000 benchmarks are run and there is following impact on the >> performance and code size. >> >> ratio with the optimization vs ratio without optimization for INT >> benchmarks >> (3807.632 vs 3804.661) >> >> ratio with the optimization vs ratio without optimization for FP >> benchmarks ( 4668.743 vs 4778.741) > >>>Did you swap these? You're saying FP got significantly worse? > > Sorry for the typo error. Please find the corrected one. > > Ratio with the optimization vs ratio without optimization for FP benchmarks > ( 4668.743 vs 4668.741). With the optimization > FP is slightly better performance. Did you mis-type the number again? Or this must be noise. Now I remember why I didn't get perf improvement from this. Changing reg_new to reg_new + reg_old doesn't have big impact because it just increased the starting number for each scenarios. Maybe it still makes sense for cases on the verge of exceeding target's available register number. I will try to collect benchmark data on ARM, but it may take some time. Thanks, bin > > Thanks & Regards > Ajit > > Segher
[Graphite] Redesign Graphite scop detection
From: hiraditya Redesign Graphite scop detection for faster compiler time and detecting more SCoPs. Existing algorithm for SCoP detection in graphite was based on dominator tree where a tree (CFG) traversal was required for analyzing an SESE. The tree traversal is linear in the number of basic blocks and SCoP detection is (probably) linear in number of instructions. That algorithm utilized a generic infrastructure of SESE which does not directly represent loops. With regards to graphite framework, we are only interested in subtrees with loops. The new algorithm is geared towards tree traversal on loop structure. The algorithm is linear in number of loops which is faster than the previous algorithm. Briefly, we start the traversal at a loop-nest and analyze it recursively for validity. Once a valid loop is found we find a valid adjacent loop. If an adjacent loop is found and is valid, we merge both loop nests otherwise we form a SCoP from the previous loop nest, and resume the algorithm from the adjacent loop nest. The data structure to represent an SESE is an ordered pair of edges (entry, exit). The new algoritm can extend a SCoP in both the directions. With this approach, the number of instructions to be analyzed for validity reduces to a minimal set. We start by analyzing those statements which are inside a loop, because validity of those statements is necessary for the validity of loop. The statements outside the loop nest can be just excluded from the SESE if they are not valid. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg02024.html Passes (c,c++,fortran) regtest and bootstrap. gcc/ChangeLog: 2015-09-27 Aditya Kumar Sebastian Pop * graphite-optimize-isl.c (optimize_isl): * graphite-scop-detection.c (struct sese_l): New type. (get_entry_bb): API for getting entry bb of SESE. (get_exit_bb): API for getting exit bb of SESE. (class debug_printer): New type. Simple printer in debug mode. (trivially_empty_bb_p): New. Return true when BB is empty or contains only debug instructions. (graphite_can_represent_expr): Call scalar_evoution_in_region instead of analyze_scalar_evolution. Pass in scop instead of only the scop entry. (stmt_has_simple_data_refs_p): Pass in scop instead of only the scop entry. (stmt_simple_for_scop_p): Same. (harmful_stmt_in_bb): Same. (graphite_can_represent_loop): Deleted. (struct scopdet_info): Deleted. (scopdet_basic_block_info): Deleted. (build_scops_1): Deleted. (bb_in_sd_region): Deleted. (find_single_entry_edge): Deleted. (find_single_exit_edge): Deleted. (create_single_entry_edge): Deleted. (sd_region_without_exit): Deleted. (create_single_exit_edge): Deleted. (unmark_exit_edges): Deleted. (mark_exit_edges): Deleted. (create_sese_edges): Deleted. (build_graphite_scops): Deleted. (canonicalize_loop_closed_ssa): Recompute all dominators at the end. (build_scops): Use the new scop_builder to build scops. (dot_all_scops_1): Use the new pretty printer. Print loop father as well. (loop_body_is_valid_scop): New. Return true if loop body is a valid scop. (class scop_builder): New. Builds SCoPs for polyhedral optimizatios. (scop_builder): New. Constructor. (static sese_l invalid_sese): sese_l with invalid edges. (get_sese): Get an sese (from a loop) if possible, invalid_sese otherwise. (get_nearest_dom_with_single_entry): Get nearest dominator of a basic_block with single entry. Return NULL if we get to the beginning of a function. (get_nearest_pdom_with_single_exit): Get nearest post-dominator of a basic_block with single exit. Return NULL if we get to the beginning of a function. (print_sese): Pretty-print SESE. (merge_sese): Merge two SESEs if possible and return the new SESE. (build_scop_depth): Start building the SCoP within a loop nest. (build_scop_breadth): Start building the SCoP at a single loop depth. Merge adjacent SESEs if valid. (can_represent_loop_1): Returns true if Graphite can represent loop inside SCoP. Helper for can_represent_loop. (can_represent_loop): Returns true if Graphite can represent LOOP and all its nested loops in SCoP. (loop_is_valid_scop): Returns true if LOOP and all its nests constitute a valid SCoP. (region_has_one_loop): Returns true of a region has only one loop. (add_scop): Add SCoP to the list of valid scops. Removes an already existing scop if it intersects with or subsumed by this one. (harmful_stmt_in_region): Returns true if SCoP has any statment which cannot be represented by Graphite.
Re: [PATCH] Convert SPARC to LRA
On 09/27/2015 01:57 PM, Hans-Peter Nilsson wrote: On Wed, 9 Sep 2015, Mike Stump wrote: On Sep 8, 2015, at 9:41 PM, David Miller wrote: +#define TARGET_LRA_P hook_bool_void_true Are we at the point there this should be the default, and old ports should just define to false, if they really need to? I think no. For one, we don't have proper target documentation updates for LRA. What does it need? What is outdated? Also, give ample time for gcc releases of odd ports with LRA to get into the public and cover most of the inevitable remaining bugs. Not even sh has moved over due to remaining issues. Let the reports come in - and be fixed. Let's revisit in a year or two. I don't think we're there yet either -- many ports still require some guidance from Vlad to get working with LRA. It *may* be time to decree that any new ports must use the LRA path rather than reload. I'm still on the fence with that. jeff
Re: [PATCH] Convert SPARC to LRA
On Wed, 9 Sep 2015, Mike Stump wrote: > On Sep 8, 2015, at 9:41 PM, David Miller wrote: > > +#define TARGET_LRA_P hook_bool_void_true > > Are we at the point there this should be the default, and old > ports should just define to false, if they really need to? I think no. For one, we don't have proper target documentation updates for LRA. What does it need? What is outdated? Also, give ample time for gcc releases of odd ports with LRA to get into the public and cover most of the inevitable remaining bugs. Not even sh has moved over due to remaining issues. Let the reports come in - and be fixed. Let's revisit in a year or two. brgds, H-P
RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.
On September 27, 2015 5:13:59 PM GMT+02:00, Ajit Kumar Agarwal wrote: > > >-Original Message- >From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] >Sent: Sunday, September 27, 2015 7:49 PM >To: Ajit Kumar Agarwal >Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli >Hunsigida; Nagaraju Mekala >Subject: Re: [Patch,optimization]: Optimized changes in the estimate >register pressure cost. > >On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote: >> SPEC CPU 2000 benchmarks are run and there is following impact on the > >> performance and code size. >> >> ratio with the optimization vs ratio without optimization for INT >> benchmarks >> (3807.632 vs 3804.661) >> >> ratio with the optimization vs ratio without optimization for FP >> benchmarks ( 4668.743 vs 4778.741) > >>>Did you swap these? You're saying FP got significantly worse? > >Sorry for the typo error. Please find the corrected one. > >Ratio with the optimization vs ratio without optimization for FP >benchmarks ( 4668.743 vs 4668.741). With the optimization >FP is slightly better performance. Ah, I see. Sorry for the noise.. Thanks,
Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.
On September 26, 2015 9:10:13 AM GMT+02:00, "Bin.Cheng" wrote: >On Sat, Sep 26, 2015 at 12:51 PM, Ajit Kumar Agarwal > wrote: >> SPEC CPU 2000 benchmarks are run and there is following impact on the >performance >> and code size. >> >> ratio with the optimization vs ratio without optimization for INT >benchmarks >> (3807.632 vs 3804.661) >> >> ratio with the optimization vs ratio without optimization for FP >benchmarks >> ( 4668.743 vs 4778.741) Do I read this correctly to introduce a 2,4% regression for FP? Thanks, >> >> Code size reduction with respect to FP SPEC CPU 2000 benchmarks >> >> Number of instruction with optimization = 1094117 >> Number of instruction without optimization = 1094659 >> >> Reduction in number of instruction with the optimization = 542 >instruction.
[PATCH, i386]: Merge *vec_extract_zext patterns
Hello! Now that PR 57195 (Mode attributes with specific mode iterator can not be used as mode iterators in *.md files) [1] is fixed, we can merge *vec_extract_zext patterns. 2015-09-27 Uros Bizjak * config/i386/predicates.md (register_sse4nonimm_operand): New predicate. * config/i386/sse.md (PEXTR_MODE12): New mode iterator. (*vec_extract): Use PEXTR_MODE12 instead of VI12_128 mode. Use register_sse4nonimm_operand as operand 0 predicate. (*vec_extractv8hi_sse2): Remove insn pattern. (*vec_extract_zext): Merge insn pattern from *vec_extractv8hi_zext and *vec_extractv16qi_zext patterns. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57195 Uros. Index: config/i386/predicates.md === --- config/i386/predicates.md (revision 228109) +++ config/i386/predicates.md (working copy) @@ -127,6 +127,12 @@ (match_operand 0 "nonimmediate_operand") (match_operand 0 "register_operand"))) +;; Match register operands, include memory operand for TARGET_SSE4_1. +(define_predicate "register_sse4nonimm_operand" + (if_then_else (match_test "TARGET_SSE4_1") +(match_operand 0 "nonimmediate_operand") +(match_operand 0 "register_operand"))) + ;; Return true if VALUE is symbol reference (define_predicate "symbol_operand" (match_code "symbol_ref")) Index: config/i386/sse.md === --- config/i386/sse.md (revision 228109) +++ config/i386/sse.md (working copy) @@ -12864,23 +12864,21 @@ (set_attr "prefix" "maybe_vex,maybe_vex,orig,orig,vex") (set_attr "mode" "TI,TI,V4SF,SF,SF")]) +;; QI and HI modes handled by pextr patterns. +(define_mode_iterator PEXTR_MODE12 + [(V16QI "TARGET_SSE4_1") V8HI]) + (define_insn "*vec_extract" - [(set (match_operand: 0 "nonimmediate_operand" "=r,m") + [(set (match_operand: 0 "register_sse4nonimm_operand" "=r,m") (vec_select: - (match_operand:VI12_128 1 "register_operand" "x,x") + (match_operand:PEXTR_MODE12 1 "register_operand" "x,x") (parallel [(match_operand:SI 2 "const_0_to__operand")])))] - "TARGET_SSE4_1" - "@ - %vpextr\t{%2, %1, %k0|%k0, %1, %2} - %vpextr\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "sselog1") - (set (attr "prefix_data16") - (if_then_else - (and (eq_attr "alternative" "0") - (eq (const_string "mode") (const_string "V8HImode"))) - (const_string "1") - (const_string "*"))) + "TARGET_SSE2" + "%vpextr\t{%2, %1, %k0|%k0, %1, %2}" + [(set_attr "isa" "*,sse4") + (set_attr "type" "sselog1") + (set_attr "prefix_data16" "1") (set (attr "prefix_extra") (if_then_else (and (eq_attr "alternative" "0") @@ -12891,45 +12889,23 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI")]) -(define_insn "*vec_extractv8hi_sse2" - [(set (match_operand:HI 0 "register_operand" "=r") - (vec_select:HI - (match_operand:V8HI 1 "register_operand" "x") - (parallel - [(match_operand:SI 2 "const_0_to_7_operand")])))] - "TARGET_SSE2 && !TARGET_SSE4_1" - "pextrw\t{%2, %1, %k0|%k0, %1, %2}" - [(set_attr "type" "sselog1") - (set_attr "prefix_data16" "1") - (set_attr "length_immediate" "1") - (set_attr "mode" "TI")]) - -(define_insn "*vec_extractv16qi_zext" +(define_insn "*vec_extract_zext" [(set (match_operand:SWI48 0 "register_operand" "=r") (zero_extend:SWI48 - (vec_select:QI - (match_operand:V16QI 1 "register_operand" "x") + (vec_select: + (match_operand:PEXTR_MODE12 1 "register_operand" "x") (parallel - [(match_operand:SI 2 "const_0_to_15_operand")]] - "TARGET_SSE4_1" - "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}" - [(set_attr "type" "sselog1") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "prefix" "maybe_vex") - (set_attr "mode" "TI")]) - -(define_insn "*vec_extractv8hi_zext" - [(set (match_operand:SWI48 0 "register_operand" "=r") - (zero_extend:SWI48 - (vec_select:HI - (match_operand:V8HI 1 "register_operand" "x") - (parallel - [(match_operand:SI 2 "const_0_to_7_operand")]] + [(match_operand:SI 2 + "const_0_to__operand")]] "TARGET_SSE2" - "%vpextrw\t{%2, %1, %k0|%k0, %1, %2}" + "%vpextr\t{%2, %1, %k0|%k0, %1, %2}" [(set_attr "type" "sselog1") (set_attr "prefix_data16" "1") + (set (attr "prefix_extra") + (if_then_else + (eq (const_string "mode") (const_string "V8HImode")) + (const_string "*") + (const_string "1"))) (set_attr "length_immediate" "1") (set_attr "prefix" "maybe_vex") (set_attr "mode" "TI")])
Re: Add checkpoint to libgomp dg-shouldfail tests
Hi! > > OK for trunk? Ok. Jakub
Re: Add checkpoint to libgomp dg-shouldfail tests
> Hi! > Ping. OK for the Fortran part, though I suspect you need Jakub to approve it as well. FX
RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.
-Original Message- From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] Sent: Sunday, September 27, 2015 7:49 PM To: Ajit Kumar Agarwal Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost. On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote: > SPEC CPU 2000 benchmarks are run and there is following impact on the > performance and code size. > > ratio with the optimization vs ratio without optimization for INT > benchmarks > (3807.632 vs 3804.661) > > ratio with the optimization vs ratio without optimization for FP > benchmarks ( 4668.743 vs 4778.741) >>Did you swap these? You're saying FP got significantly worse? Sorry for the typo error. Please find the corrected one. Ratio with the optimization vs ratio without optimization for FP benchmarks ( 4668.743 vs 4668.741). With the optimization FP is slightly better performance. Thanks & Regards Ajit Segher
Re: Add checkpoint to libgomp dg-shouldfail tests
Hi! Ping. On Fri, 14 Aug 2015 17:53:52 +0200, I wrote: > On Fri, 14 Aug 2015 12:56:00 +0200, I wrote: > > (Can a Fortran person please comment on this: as it's nontrivial to write > > to stderr, let's just write to stdout followed by a flush, which does > > have the same ordering effect -- OK?) > > OK, turns out it's actually not very difficult to write to stderr -- > thanks FX and Janne for your suggestions! Here, I went with Janne's, > which is a little simpler yet sufficient. > > > On Thu, 30 Apr 2015 14:47:03 +0200, I wrote: > > > Here is a patch, prepared by Jim Norris, to fix dg-shouldfail usage in > > > OpenACC libgomp tests. [...] > > > > (These dg-shouldfail tests are expected to exit with a non-zero exit > > status, and we're checking for a specific messages on stdout/stderr.) > > > > > As obvious, committed to trunk in r222620: > > > > > > commit cf9c09c49e63176ff8a1fba429971cb13226260b > > > Author: tschwinge > > > Date: Thu Apr 30 12:44:39 2015 + > > > > > > [PR testsuite/65205] Fix dg-shouldfail usage in OpenACC libgomp tests > > > > > > PR testsuite/65205 > > > > > --- libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c > > > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c > > > @@ -64,4 +64,5 @@ main (int argc, char **argv) > > > > > > return 0; > > > } > > > -/* { dg-shouldfail "libgomp: \[\h+,\d+\] is not mapped" } */ > > > +/* { dg-output "Trying to map into device > > > \\\[0x\[0-9a-f\]+..0x\[0-9a-f\]+\\\) object when > > > \\\[0x\[0-9a-f\]+..0x\[0-9a-f\]+\\\) is already mapped" } > > > +/* { dg-shouldfail "" } */ > > > > It once occurred to me that it's also a good idea to verify that we're > > actually reaching the expected checkpoint before terminating -- OK to > > commit? > > OK for trunk? > > commit 97f963dc86199ef2237fffa6293d4dfdacbd1e59 > Author: Thomas Schwinge > Date: Fri Aug 14 17:51:03 2015 +0200 > > Add checkpoint to libgomp dg-shouldfail tests > > That is, verify that we're actually reaching the expected checkpoint > before > terminating. > --- > libgomp/testsuite/libgomp.oacc-c-c++-common/abort-1.c |5 +++-- > libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c |5 +++-- > libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c |3 +++ > .../testsuite/libgomp.oacc-c-c++-common/data-already-1.c|3 +++ > .../testsuite/libgomp.oacc-c-c++-common/data-already-2.c|8 +++- > .../testsuite/libgomp.oacc-c-c++-common/data-already-3.c| 11 > --- > .../testsuite/libgomp.oacc-c-c++-common/data-already-4.c|7 +-- > .../testsuite/libgomp.oacc-c-c++-common/data-already-5.c|7 +-- > .../testsuite/libgomp.oacc-c-c++-common/data-already-6.c|7 +-- > .../testsuite/libgomp.oacc-c-c++-common/data-already-7.c|7 +-- > .../testsuite/libgomp.oacc-c-c++-common/data-already-8.c|8 +++- > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c |3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-11.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-16.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-17.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-18.c|5 +++-- > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c |3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-21.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-25.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-26.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-27.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-28.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-29.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c |3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-34.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-35.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-36.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-39.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-4.c |3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-40.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-42.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-44.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c|3 +++ > libgomp/testsuite/libgomp.oacc-c-c++-com
Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.
On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote: > SPEC CPU 2000 benchmarks are run and there is following impact on the > performance > and code size. > > ratio with the optimization vs ratio without optimization for INT benchmarks > (3807.632 vs 3804.661) > > ratio with the optimization vs ratio without optimization for FP benchmarks > ( 4668.743 vs 4778.741) Did you swap these? You're saying FP got significantly worse? Segher
Re: [SH][committed] Fix PR 67391
On Wed, 2015-09-23 at 21:04 +0900, Oleg Endo wrote: > Hi, > > The attached patch fixes PR 67391. Some additional reg overlapping were > added to the addsi3 patterns while making LRA on SH work, but not all of > them seem to be good. Removing them, seems to be working just fine. > Tested on sh-elf (LRA enabled) with make -k check > RUNTESTFLAGS="--target_board=sh-sim > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}" > and by Kaz on sh4-linux. > > Committed to trunk as r228046 and to the GCC 5 branch as r228047. This has opened a small can of worms. The follow up patch is attached and has been commited to trunk as r228176. Tested by me on sh-elf and by Kaz on sh4-linux with LRA enabled/disabled. As a positive side effect, we get some code size reduction on the CSiBE set: 3345527 bytes -> 3334351 bytes-11176 bytes / -0.334058 % However, this is only when LRA is disabled because of some problems with LRA and its usage/handling of addsi3 insns. Backport to GCC 5 branch will follow. Cheers, Oleg gcc/ChangeLog: PR target/67391 * config/sh/sh-protos.h (sh_lra_p): Declare. * config/sh/sh.c (sh_lra_p): Make non-static. * config/sh/sh.md (addsi3): Use arith_reg_dest for operands[0] and arith_reg_operand for operands[1]. Remove TARGET_SHMEDIA case. Expand into addsi3_scr if operands[2] if needed. (*addsi3_compact): Rename to *addsi3_compact_lra. Use arith_reg_operand for operands[1]. Allow it only when LRA is enabled. (addsi3_scr, *addsi3): New insn_and_split patterns. Index: gcc/config/sh/sh-protos.h === --- gcc/config/sh/sh-protos.h (revision 228117) +++ gcc/config/sh/sh-protos.h (working copy) @@ -93,6 +93,7 @@ extern rtx sh_fsca_int2sf (void); /* Declare functions defined in sh.c and used in templates. */ +extern bool sh_lra_p (void); extern const char *output_branch (int, rtx_insn *, rtx *); extern const char *output_ieee_ccmpeq (rtx_insn *, rtx *); Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 228117) +++ gcc/config/sh/sh.c (working copy) @@ -216,7 +216,6 @@ static int sh_mode_entry (int); static int sh_mode_exit (int); static int sh_mode_priority (int entity, int n); -static bool sh_lra_p (void); static rtx mark_constant_pool_use (rtx); static tree sh_handle_interrupt_handler_attribute (tree *, tree, tree, @@ -14507,7 +14506,7 @@ */ /* Return true if we use LRA instead of reload pass. */ -static bool +bool sh_lra_p (void) { return sh_lra_flag; Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 228117) +++ gcc/config/sh/sh.md (working copy) @@ -2122,13 +2122,19 @@ }) (define_expand "addsi3" - [(set (match_operand:SI 0 "arith_reg_operand" "") - (plus:SI (match_operand:SI 1 "arith_operand" "") - (match_operand:SI 2 "arith_or_int_operand" "")))] + [(set (match_operand:SI 0 "arith_reg_dest") + (plus:SI (match_operand:SI 1 "arith_reg_operand") + (match_operand:SI 2 "arith_or_int_operand")))] "" { - if (TARGET_SHMEDIA) -operands[1] = force_reg (SImode, operands[1]); + if (TARGET_SH1 && !arith_operand (operands[2], SImode)) +{ + if (!sh_lra_p () || reg_overlap_mentioned_p (operands[0], operands[1])) + { + emit_insn (gen_addsi3_scr (operands[0], operands[1], operands[2])); + DONE; + } +} }) (define_insn "addsi3_media" @@ -2163,15 +2169,22 @@ ;; copy or constant load before the actual add insn. ;; Use u constraint for that case to avoid the invalid value in the stack ;; pointer. -(define_insn_and_split "*addsi3_compact" +;; This also results in better code when LRA is not used. However, we have +;; to use different sets of patterns and the order of these patterns is +;; important. +;; In some cases the constant zero might end up in operands[2] of the +;; patterns. We have to accept that and convert it into a reg-reg move. +(define_insn_and_split "*addsi3_compact_lra" [(set (match_operand:SI 0 "arith_reg_dest" "=r,&u") - (plus:SI (match_operand:SI 1 "arith_operand" "%0,r") + (plus:SI (match_operand:SI 1 "arith_reg_operand" "%0,r") (match_operand:SI 2 "arith_or_int_operand" "rI08,rn")))] - "TARGET_SH1" + "TARGET_SH1 && sh_lra_p () + && (! reg_overlap_mentioned_p (operands[0], operands[1]) + || arith_operand (operands[2], SImode))" "@ add %2,%0 #" - "reload_completed + "&& reload_completed && ! reg_overlap_mentioned_p (operands[0], operands[1])" [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (plus:SI (match_dup 0) (match_dup 1)))] @@ -2182,6 +2195,58 @@ } [(set_attr "type" "arith")]) +(define_insn_and_split "addsi3_scr" + [(set (match_operand:SI 0 "arith_reg_dest" "=r,&u,&u") + (plus:SI (match_operand:SI 1 "arith_reg_operand" "%0,r,r") + (match_operand:SI 2 "arith_or_int_operand
Re: [patch] Enable lightweight checks with _GLIBCXX_ASSERTIONS.
On 26/09/15 22:49 +0200, Florian Weimer wrote: On 09/26/2015 09:52 PM, Jonathan Wakely wrote: Would changes like this be suitable for _FORTIFY_SOURCE? diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex index 5e5ced1..074bf26 100644 --- a/libstdc++-v3/include/std/mutex +++ b/libstdc++-v3/include/std/mutex @@ -70,7 +70,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __recursive_mutex_base& operator=(const __recursive_mutex_base&) = delete; #ifdef __GTHREAD_RECURSIVE_MUTEX_INIT +# if _GLIBCXX_ASSERTIONS && defined(PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP) +// Use an error-checking mutex type when assertions are enabled. +__native_type _M_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; +# else __native_type _M_mutex = __GTHREAD_RECURSIVE_MUTEX_INIT; +# endif I think this is incorrect. If you try to lock an error-checking mutex recursively, the operation fails, and it does *not* increment the internal lock counter (the mutex may not even have one). This means a subsequent unlock operation will release the mutex too early. The trylock will be have differently, too. POSIX recursive mutexes are already error-checking in that sense (self-deadlock cannot happen, and an unlock when not lock is defined to return an error), so I don't think anything like that is even needed. Doh, sorry, I meant this instead i.e. the non-recursive mutex. (I forgot that I'd moved the non-recursive std::mutex definition to a new file, and just edited the first thing I saw in include/std/mutex!) diff --git a/libstdc++-v3/include/bits/mutex.h b/libstdc++-v3/include/bits/mutex.h index 43f5b0b..7f88821 100644 --- a/libstdc++-v3/include/bits/mutex.h +++ b/libstdc++-v3/include/bits/mutex.h @@ -63,7 +63,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typedef __gthread_mutex_t __native_type; #ifdef __GTHREAD_MUTEX_INIT +# if _GLIBCXX_ASSERTIONS && defined(PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP) +// Use an error-checking mutex type when assertions are enabled. +__native_type _M_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; +# else __native_type _M_mutex = __GTHREAD_MUTEX_INIT; +# endif constexpr __mutex_base() noexcept = default; #else