Re: [PATCH 5/5] Add plugin to recursively dump the source-ranges in a tree (v2)

2015-09-27 Thread Dodji Seketeli
David Malcolm  a écrit:

> This patch adds a test plugin that recurses down an expression tree,
> printing diagnostics showing the ranges of each node in the tree.
>
> It corresponds to:
>   [PATCH 15/22] Add plugin to recursively dump the source-ranges in a tree
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00741.html
> from v1 of the patch kit.
>
> Changes in v2:
>   * the output no longer contains the PARAM_DECL and INTEGER_CST
> leaves since we no longer have range data for them; updated
> the expected output accordingly.
>   * slightly updated to eliminate use of SOURCE_RANGE
>
> Updated screenshot:
>   
> https://dmalcolm.fedorapeople.org/gcc/2015-09-22/diagnostic-test-show-trees-1.html
>
> gcc/testsuite/ChangeLog:
>   * gcc.dg/plugin/diagnostic-test-show-trees-1.c: New file.
>   * gcc.dg/plugin/diagnostic_plugin_show_trees.c: New file.
>   * gcc.dg/plugin/plugin.exp (plugin_test_list): Add
>   diagnostic_plugin_show_trees.c and
>   diagnostic-test-show-trees-1.c.

For what it's worth, this looks good to me.

Thanks!

-- 
Dodji


Re: [AArch64] Fix Prefetch ICE

2015-09-27 Thread Hurugalawadi, Naveen
Hi Marcus,

Thanks for the review and comments.

>> OK and can you back port to 5 ?

Please find attached the backported patch on gcc-5-branch.

Regression tested on AArch64 without any issues.

2015-09-28  Andrew Pinski  

ChangeLog

* config/aarch64/aarch64.md (prefetch):
Change the predicate of operand 0 to register_operand.

Thanks,
Naveen
Index: config/aarch64/aarch64.md
===
--- config/aarch64/aarch64.md	(revision 228182)
+++ config/aarch64/aarch64.md	(working copy)
@@ -382,7 +382,7 @@
 )
 
 (define_insn "prefetch"
-  [(prefetch (match_operand:DI 0 "address_operand" "r")
+  [(prefetch (match_operand:DI 0 "register_operand" "r")
 (match_operand:QI 1 "const_int_operand" "")
 (match_operand:QI 2 "const_int_operand" ""))]
   ""


RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-27 Thread Ajit Kumar Agarwal


-Original Message-
From: Bin.Cheng [mailto:amker.ch...@gmail.com] 
Sent: Monday, September 28, 2015 7:05 AM
To: Ajit Kumar Agarwal
Cc: Segher Boessenkool; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
pressure cost.

On Sun, Sep 27, 2015 at 11:13 PM, Ajit Kumar Agarwal 
 wrote:
>
>
> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Sunday, September 27, 2015 7:49 PM
> To: Ajit Kumar Agarwal
> Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
> Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
> pressure cost.
>
> On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote:
>> SPEC CPU 2000 benchmarks are run and there is following impact on the 
>> performance and code size.
>>
>> ratio with the optimization vs ratio without optimization for INT 
>> benchmarks
>> (3807.632 vs 3804.661)
>>
>> ratio with the optimization vs ratio without optimization for FP 
>> benchmarks ( 4668.743 vs 4778.741)
>
>>>Did you swap these?  You're saying FP got significantly worse?
>
> Sorry for the typo error.  Please find the corrected one.
>
> Ratio  with the optimization vs ratio without optimization for FP  
> benchmarks ( 4668.743 vs 4668.741). With the optimization FP is slightly 
> better performance.
>>Did you mis-type the number again?  Or this must be noise.  Now I remember 
>>why I didn't get perf improvement from this.  Changing reg_new to reg_new >>+ 
>>reg_old doesn't have big impact because it just increased the starting number 
>>for each scenarios.  Maybe it still makes sense for cases on the verge of 
exceeding target's available register number.  I will try to collect 
>>benchmark data on ARM, but it may take some time.

This is the correct one.

Thanks & Regards
Ajit

Thanks,
bin
>
> Thanks & Regards
> Ajit
>
> Segher


Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-27 Thread Bin.Cheng
On Sun, Sep 27, 2015 at 11:13 PM, Ajit Kumar Agarwal
 wrote:
>
>
> -Original Message-
> From: Segher Boessenkool [mailto:seg...@kernel.crashing.org]
> Sent: Sunday, September 27, 2015 7:49 PM
> To: Ajit Kumar Agarwal
> Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
> Nagaraju Mekala
> Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
> pressure cost.
>
> On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote:
>> SPEC CPU 2000 benchmarks are run and there is following impact on the
>> performance and code size.
>>
>> ratio with the optimization vs ratio without optimization for INT
>> benchmarks
>> (3807.632 vs 3804.661)
>>
>> ratio with the optimization vs ratio without optimization for FP
>> benchmarks ( 4668.743 vs 4778.741)
>
>>>Did you swap these?  You're saying FP got significantly worse?
>
> Sorry for the typo error.  Please find the corrected one.
>
> Ratio  with the optimization vs ratio without optimization for FP  benchmarks 
> ( 4668.743 vs 4668.741). With the optimization
> FP is slightly better performance.
Did you mis-type the number again?  Or this must be noise.  Now I
remember why I didn't get perf improvement from this.  Changing
reg_new to reg_new + reg_old doesn't have big impact because it just
increased the starting number for each scenarios.  Maybe it still
makes sense for cases on the verge of exceeding target's available
register number.  I will try to collect benchmark data on ARM, but it
may take some time.

Thanks,
bin
>
> Thanks & Regards
> Ajit
>
> Segher


[Graphite] Redesign Graphite scop detection

2015-09-27 Thread Aditya Kumar
From: hiraditya 

Redesign Graphite scop detection for faster compiler time and detecting more 
SCoPs.

Existing algorithm for SCoP detection in graphite was based on dominator tree
where a tree (CFG) traversal was required for analyzing an SESE. The tree
traversal is linear in the number of basic blocks and SCoP detection is
(probably) linear in number of instructions. That algorithm utilized a generic
infrastructure of SESE which does not directly represent loops.  With regards to
graphite framework, we are only interested in subtrees with loops. The new
algorithm is geared towards tree traversal on loop structure. The algorithm is
linear in number of loops which is faster than the previous algorithm.

Briefly, we start the traversal at a loop-nest and analyze it recursively for
validity. Once a valid loop is found we find a valid adjacent loop. If an
adjacent loop is found and is valid, we merge both loop nests otherwise we form
a SCoP from the previous loop nest, and resume the algorithm from the adjacent
loop nest. The data structure to represent an SESE is an ordered pair of edges
(entry, exit). The new algoritm can extend a SCoP in both the directions. With
this approach, the number of instructions to be analyzed for validity reduces to
a minimal set.  We start by analyzing those statements which are inside a loop,
because validity of those statements is necessary for the validity of loop. The
statements outside the loop nest can be just excluded from the SESE if they are
not valid.


This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg02024.html

Passes (c,c++,fortran) regtest and bootstrap.

gcc/ChangeLog:

2015-09-27  Aditya Kumar  
Sebastian Pop  
* graphite-optimize-isl.c (optimize_isl):
* graphite-scop-detection.c (struct sese_l): New type.
(get_entry_bb): API for getting entry bb of SESE.
(get_exit_bb): API for getting exit bb of SESE.
(class debug_printer): New type. Simple printer in debug mode.
(trivially_empty_bb_p): New. Return true when BB is empty or
contains only debug instructions.
(graphite_can_represent_expr): Call scalar_evoution_in_region
instead of analyze_scalar_evolution. Pass in scop instead of only
the scop entry.
(stmt_has_simple_data_refs_p): Pass in scop instead of only the
scop entry.
(stmt_simple_for_scop_p): Same.
(harmful_stmt_in_bb): Same.
(graphite_can_represent_loop): Deleted.
(struct scopdet_info): Deleted.
(scopdet_basic_block_info): Deleted.
(build_scops_1): Deleted.
(bb_in_sd_region): Deleted.
(find_single_entry_edge): Deleted.
(find_single_exit_edge): Deleted.
(create_single_entry_edge): Deleted.
(sd_region_without_exit): Deleted.
(create_single_exit_edge): Deleted.
(unmark_exit_edges): Deleted.
(mark_exit_edges): Deleted.
(create_sese_edges): Deleted.
(build_graphite_scops): Deleted.
(canonicalize_loop_closed_ssa): Recompute all dominators at the
end.
(build_scops): Use the new scop_builder to build scops.
(dot_all_scops_1): Use the new pretty printer. Print loop father
as well.
(loop_body_is_valid_scop): New. Return true if loop body is a
valid scop.
(class scop_builder): New. Builds SCoPs for polyhedral
optimizatios.
(scop_builder): New. Constructor.
(static sese_l invalid_sese): sese_l with invalid edges.
(get_sese): Get an sese (from a loop) if possible, invalid_sese
otherwise.
(get_nearest_dom_with_single_entry): Get nearest dominator of a
basic_block with single entry. Return NULL if we get to the
beginning of a function.
(get_nearest_pdom_with_single_exit): Get nearest post-dominator of
a basic_block with single exit. Return NULL if we get to the
beginning of a function.
(print_sese): Pretty-print SESE.
(merge_sese): Merge two SESEs if possible and return the new SESE.
(build_scop_depth): Start building the SCoP within a loop nest.
(build_scop_breadth): Start building the SCoP at a single loop
depth. Merge adjacent SESEs if valid.
(can_represent_loop_1): Returns true if Graphite can represent
loop inside SCoP. Helper for can_represent_loop.
(can_represent_loop): Returns true if Graphite can represent LOOP
and all its nested loops in SCoP.
(loop_is_valid_scop): Returns true if LOOP and all its nests
constitute a valid SCoP.
(region_has_one_loop): Returns true of a region has only one loop.
(add_scop): Add SCoP to the list of valid scops. Removes an
already existing scop if it intersects with or subsumed by this
one.
(harmful_stmt_in_region): Returns true if SCoP has any statment
which cannot be represented by Graphite.
   

Re: [PATCH] Convert SPARC to LRA

2015-09-27 Thread Jeff Law

On 09/27/2015 01:57 PM, Hans-Peter Nilsson wrote:

On Wed, 9 Sep 2015, Mike Stump wrote:


On Sep 8, 2015, at 9:41 PM, David Miller  wrote:

+#define TARGET_LRA_P hook_bool_void_true


Are we at the point there this should be the default, and old
ports should just define to false, if they really need to?


I think no.  For one, we don't have proper target documentation
updates for LRA.  What does it need?  What is outdated?

Also, give ample time for gcc releases of odd ports with LRA to
get into the public and cover most of the inevitable remaining
bugs.  Not even sh has moved over due to remaining issues.  Let
the reports come in - and be fixed.  Let's revisit in a year or
two.
I don't think we're there yet either -- many ports still require some 
guidance from Vlad to get working with LRA.


It *may* be time to decree that any new ports must use the LRA path 
rather than reload.  I'm still on the fence with that.


jeff


Re: [PATCH] Convert SPARC to LRA

2015-09-27 Thread Hans-Peter Nilsson
On Wed, 9 Sep 2015, Mike Stump wrote:

> On Sep 8, 2015, at 9:41 PM, David Miller  wrote:
> > +#define TARGET_LRA_P hook_bool_void_true
>
> Are we at the point there this should be the default, and old
> ports should just define to false, if they really need to?

I think no.  For one, we don't have proper target documentation
updates for LRA.  What does it need?  What is outdated?

Also, give ample time for gcc releases of odd ports with LRA to
get into the public and cover most of the inevitable remaining
bugs.  Not even sh has moved over due to remaining issues.  Let
the reports come in - and be fixed.  Let's revisit in a year or
two.

brgds, H-P


RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-27 Thread Bernhard Reutner-Fischer
On September 27, 2015 5:13:59 PM GMT+02:00, Ajit Kumar Agarwal 
 wrote:
>
>
>-Original Message-
>From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] 
>Sent: Sunday, September 27, 2015 7:49 PM
>To: Ajit Kumar Agarwal
>Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli
>Hunsigida; Nagaraju Mekala
>Subject: Re: [Patch,optimization]: Optimized changes in the estimate
>register pressure cost.
>
>On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote:
>> SPEC CPU 2000 benchmarks are run and there is following impact on the
>
>> performance and code size.
>> 
>> ratio with the optimization vs ratio without optimization for INT 
>> benchmarks
>> (3807.632 vs 3804.661)
>> 
>> ratio with the optimization vs ratio without optimization for FP 
>> benchmarks ( 4668.743 vs 4778.741)
>
>>>Did you swap these?  You're saying FP got significantly worse?
>
>Sorry for the typo error.  Please find the corrected one.
>
>Ratio  with the optimization vs ratio without optimization for FP 
>benchmarks ( 4668.743 vs 4668.741). With the optimization
>FP is slightly better performance.

Ah, I see. Sorry for the noise..

Thanks,



Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-27 Thread Bernhard Reutner-Fischer
On September 26, 2015 9:10:13 AM GMT+02:00, "Bin.Cheng"  
wrote:
>On Sat, Sep 26, 2015 at 12:51 PM, Ajit Kumar Agarwal
> wrote:

>> SPEC CPU 2000 benchmarks are run and there is following impact on the
>performance
>> and code size.
>>
>> ratio with the optimization vs ratio without optimization for INT
>benchmarks
>> (3807.632 vs 3804.661)
>>
>> ratio with the optimization vs ratio without optimization for FP
>benchmarks
>> ( 4668.743 vs 4778.741)

Do I read this correctly to introduce a 2,4% regression for FP?

Thanks,
>>
>> Code size reduction with respect to FP SPEC CPU 2000 benchmarks
>>
>> Number of instruction with optimization = 1094117
>> Number of instruction without optimization = 1094659
>>
>> Reduction in number of instruction with the optimization = 542
>instruction.




[PATCH, i386]: Merge *vec_extract_zext patterns

2015-09-27 Thread Uros Bizjak
Hello!

Now that PR 57195 (Mode attributes with specific mode iterator can not
be used as mode iterators in *.md files) [1] is fixed, we can merge
*vec_extract_zext patterns.

2015-09-27  Uros Bizjak  

* config/i386/predicates.md (register_sse4nonimm_operand): New
predicate.
* config/i386/sse.md (PEXTR_MODE12): New mode iterator.
(*vec_extract): Use PEXTR_MODE12 instead of VI12_128 mode.
Use register_sse4nonimm_operand as operand 0 predicate.
(*vec_extractv8hi_sse2): Remove insn pattern.
(*vec_extract_zext): Merge insn pattern from
*vec_extractv8hi_zext and *vec_extractv16qi_zext patterns.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57195

Uros.
Index: config/i386/predicates.md
===
--- config/i386/predicates.md   (revision 228109)
+++ config/i386/predicates.md   (working copy)
@@ -127,6 +127,12 @@
 (match_operand 0 "nonimmediate_operand")
 (match_operand 0 "register_operand")))
 
+;; Match register operands, include memory operand for TARGET_SSE4_1.
+(define_predicate "register_sse4nonimm_operand"
+  (if_then_else (match_test "TARGET_SSE4_1")
+(match_operand 0 "nonimmediate_operand")
+(match_operand 0 "register_operand")))
+
 ;; Return true if VALUE is symbol reference
 (define_predicate "symbol_operand"
   (match_code "symbol_ref"))
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 228109)
+++ config/i386/sse.md  (working copy)
@@ -12864,23 +12864,21 @@
(set_attr "prefix" "maybe_vex,maybe_vex,orig,orig,vex")
(set_attr "mode" "TI,TI,V4SF,SF,SF")])
 
+;; QI and HI modes handled by pextr patterns.
+(define_mode_iterator PEXTR_MODE12
+  [(V16QI "TARGET_SSE4_1") V8HI])
+
 (define_insn "*vec_extract"
-  [(set (match_operand: 0 "nonimmediate_operand" "=r,m")
+  [(set (match_operand: 0 "register_sse4nonimm_operand" "=r,m")
(vec_select:
- (match_operand:VI12_128 1 "register_operand" "x,x")
+ (match_operand:PEXTR_MODE12 1 "register_operand" "x,x")
  (parallel
[(match_operand:SI 2 "const_0_to__operand")])))]
-  "TARGET_SSE4_1"
-  "@
-   %vpextr\t{%2, %1, %k0|%k0, %1, %2}
-   %vpextr\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "sselog1")
-   (set (attr "prefix_data16")
- (if_then_else
-   (and (eq_attr "alternative" "0")
-   (eq (const_string "mode") (const_string "V8HImode")))
-   (const_string "1")
-   (const_string "*")))
+  "TARGET_SSE2"
+  "%vpextr\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,sse4")
+   (set_attr "type" "sselog1")
+   (set_attr "prefix_data16" "1")
(set (attr "prefix_extra")
  (if_then_else
(and (eq_attr "alternative" "0")
@@ -12891,45 +12889,23 @@
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "TI")])
 
-(define_insn "*vec_extractv8hi_sse2"
-  [(set (match_operand:HI 0 "register_operand" "=r")
-   (vec_select:HI
- (match_operand:V8HI 1 "register_operand" "x")
- (parallel
-   [(match_operand:SI 2 "const_0_to_7_operand")])))]
-  "TARGET_SSE2 && !TARGET_SSE4_1"
-  "pextrw\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
-   (set_attr "prefix_data16" "1")
-   (set_attr "length_immediate" "1")
-   (set_attr "mode" "TI")])
-
-(define_insn "*vec_extractv16qi_zext"
+(define_insn "*vec_extract_zext"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(zero_extend:SWI48
- (vec_select:QI
-   (match_operand:V16QI 1 "register_operand" "x")
+ (vec_select:
+   (match_operand:PEXTR_MODE12 1 "register_operand" "x")
(parallel
- [(match_operand:SI 2 "const_0_to_15_operand")]]
-  "TARGET_SSE4_1"
-  "%vpextrb\t{%2, %1, %k0|%k0, %1, %2}"
-  [(set_attr "type" "sselog1")
-   (set_attr "prefix_extra" "1")
-   (set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex")
-   (set_attr "mode" "TI")])
-
-(define_insn "*vec_extractv8hi_zext"
-  [(set (match_operand:SWI48 0 "register_operand" "=r")
-   (zero_extend:SWI48
- (vec_select:HI
-   (match_operand:V8HI 1 "register_operand" "x")
-   (parallel
- [(match_operand:SI 2 "const_0_to_7_operand")]]
+ [(match_operand:SI 2
+   "const_0_to__operand")]]
   "TARGET_SSE2"
-  "%vpextrw\t{%2, %1, %k0|%k0, %1, %2}"
+  "%vpextr\t{%2, %1, %k0|%k0, %1, %2}"
   [(set_attr "type" "sselog1")
(set_attr "prefix_data16" "1")
+   (set (attr "prefix_extra")
+ (if_then_else
+   (eq (const_string "mode") (const_string "V8HImode"))
+   (const_string "*")
+   (const_string "1")))
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "TI")])


Re: Add checkpoint to libgomp dg-shouldfail tests

2015-09-27 Thread Jakub Jelinek
Hi!

> > OK for trunk?

Ok.

Jakub


Re: Add checkpoint to libgomp dg-shouldfail tests

2015-09-27 Thread FX
> Hi!
> Ping.

OK for the Fortran part, though I suspect you need Jakub to approve it as well.

FX

RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-27 Thread Ajit Kumar Agarwal


-Original Message-
From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] 
Sent: Sunday, September 27, 2015 7:49 PM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
pressure cost.

On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote:
> SPEC CPU 2000 benchmarks are run and there is following impact on the 
> performance and code size.
> 
> ratio with the optimization vs ratio without optimization for INT 
> benchmarks
> (3807.632 vs 3804.661)
> 
> ratio with the optimization vs ratio without optimization for FP 
> benchmarks ( 4668.743 vs 4778.741)

>>Did you swap these?  You're saying FP got significantly worse?

Sorry for the typo error.  Please find the corrected one.

Ratio  with the optimization vs ratio without optimization for FP  benchmarks ( 
4668.743 vs 4668.741). With the optimization
FP is slightly better performance.

Thanks & Regards
Ajit

Segher


Re: Add checkpoint to libgomp dg-shouldfail tests

2015-09-27 Thread Thomas Schwinge
Hi!

Ping.

On Fri, 14 Aug 2015 17:53:52 +0200, I wrote:
> On Fri, 14 Aug 2015 12:56:00 +0200, I wrote:
> > (Can a Fortran person please comment on this: as it's nontrivial to write
> > to stderr, let's just write to stdout followed by a flush, which does
> > have the same ordering effect -- OK?)
> 
> OK, turns out it's actually not very difficult to write to stderr --
> thanks FX and Janne for your suggestions!  Here, I went with Janne's,
> which is a little simpler yet sufficient.
> 
> > On Thu, 30 Apr 2015 14:47:03 +0200, I wrote:
> > > Here is a patch, prepared by Jim Norris, to fix dg-shouldfail usage in
> > > OpenACC libgomp tests.  [...]
> > 
> > (These dg-shouldfail tests are expected to exit with a non-zero exit
> > status, and we're checking for a specific messages on stdout/stderr.)
> > 
> > > As obvious, committed to trunk in r222620:
> > > 
> > > commit cf9c09c49e63176ff8a1fba429971cb13226260b
> > > Author: tschwinge 
> > > Date:   Thu Apr 30 12:44:39 2015 +
> > > 
> > > [PR testsuite/65205] Fix dg-shouldfail usage in OpenACC libgomp tests
> > > 
> > >   PR testsuite/65205
> > 
> > > --- libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c
> > > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c
> > > @@ -64,4 +64,5 @@ main (int argc, char **argv)
> > >  
> > >  return 0;
> > >  }
> > > -/* { dg-shouldfail "libgomp: \[\h+,\d+\] is not mapped" } */
> > > +/* { dg-output "Trying to map into device 
> > > \\\[0x\[0-9a-f\]+..0x\[0-9a-f\]+\\\) object when 
> > > \\\[0x\[0-9a-f\]+..0x\[0-9a-f\]+\\\) is already mapped" }
> > > +/* { dg-shouldfail "" } */
> > 
> > It once occurred to me that it's also a good idea to verify that we're
> > actually reaching the expected checkpoint before terminating -- OK to
> > commit?
> 
> OK for trunk?
> 
> commit 97f963dc86199ef2237fffa6293d4dfdacbd1e59
> Author: Thomas Schwinge 
> Date:   Fri Aug 14 17:51:03 2015 +0200
> 
> Add checkpoint to libgomp dg-shouldfail tests
> 
> That is, verify that we're actually reaching the expected checkpoint 
> before
> terminating.
> ---
>  libgomp/testsuite/libgomp.oacc-c-c++-common/abort-1.c   |5 +++--
>  libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c   |5 +++--
>  libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c |3 +++
>  .../testsuite/libgomp.oacc-c-c++-common/data-already-1.c|3 +++
>  .../testsuite/libgomp.oacc-c-c++-common/data-already-2.c|8 +++-
>  .../testsuite/libgomp.oacc-c-c++-common/data-already-3.c|   11 
> ---
>  .../testsuite/libgomp.oacc-c-c++-common/data-already-4.c|7 +--
>  .../testsuite/libgomp.oacc-c-c++-common/data-already-5.c|7 +--
>  .../testsuite/libgomp.oacc-c-c++-common/data-already-6.c|7 +--
>  .../testsuite/libgomp.oacc-c-c++-common/data-already-7.c|7 +--
>  .../testsuite/libgomp.oacc-c-c++-common/data-already-8.c|8 +++-
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c |3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-11.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-16.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-17.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-18.c|5 +++--
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c |3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-21.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-25.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-26.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-27.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-28.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-29.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c |3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-34.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-35.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-36.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-39.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-4.c |3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-40.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-42.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-44.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c|3 +++
>  libgomp/testsuite/libgomp.oacc-c-c++-com

Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-27 Thread Segher Boessenkool
On Sat, Sep 26, 2015 at 04:51:20AM +, Ajit Kumar Agarwal wrote:
> SPEC CPU 2000 benchmarks are run and there is following impact on the 
> performance
> and code size.
> 
> ratio with the optimization vs ratio without optimization for INT benchmarks
> (3807.632 vs 3804.661)
> 
> ratio with the optimization vs ratio without optimization for FP benchmarks
> ( 4668.743 vs 4778.741)

Did you swap these?  You're saying FP got significantly worse?


Segher


Re: [SH][committed] Fix PR 67391

2015-09-27 Thread Oleg Endo
On Wed, 2015-09-23 at 21:04 +0900, Oleg Endo wrote:
> Hi,
> 
> The attached patch fixes PR 67391.  Some additional reg overlapping were
> added to the addsi3 patterns while making LRA on SH work, but not all of
> them seem to be good.  Removing them, seems to be working just fine.
> Tested on sh-elf (LRA enabled) with make -k check
> RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> and by Kaz on sh4-linux.
> 
> Committed to trunk as r228046 and to the GCC 5 branch as r228047.

This has opened a small can of worms.  The follow up patch is attached
and has been commited to trunk as r228176.  Tested by me on sh-elf and
by Kaz on sh4-linux with LRA enabled/disabled.

As a positive side effect, we get some code size reduction on the CSiBE
set: 3345527 bytes -> 3334351 bytes-11176 bytes / -0.334058 %

However, this is only when LRA is disabled because of some problems with
LRA and its usage/handling of addsi3 insns.

Backport to GCC 5 branch will follow.

Cheers,
Oleg

gcc/ChangeLog:
PR target/67391
* config/sh/sh-protos.h (sh_lra_p): Declare.
* config/sh/sh.c (sh_lra_p): Make non-static.
* config/sh/sh.md (addsi3): Use arith_reg_dest for operands[0] and
arith_reg_operand for operands[1].  Remove TARGET_SHMEDIA case.
Expand into addsi3_scr if operands[2] if needed.
(*addsi3_compact): Rename to *addsi3_compact_lra.  Use
arith_reg_operand for operands[1].  Allow it only when LRA is enabled.
(addsi3_scr, *addsi3): New insn_and_split patterns.
Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 228117)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -93,6 +93,7 @@
 extern rtx sh_fsca_int2sf (void);
 
 /* Declare functions defined in sh.c and used in templates.  */
+extern bool sh_lra_p (void);
 
 extern const char *output_branch (int, rtx_insn *, rtx *);
 extern const char *output_ieee_ccmpeq (rtx_insn *, rtx *);
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 228117)
+++ gcc/config/sh/sh.c	(working copy)
@@ -216,7 +216,6 @@
 static int sh_mode_entry (int);
 static int sh_mode_exit (int);
 static int sh_mode_priority (int entity, int n);
-static bool sh_lra_p (void);
 
 static rtx mark_constant_pool_use (rtx);
 static tree sh_handle_interrupt_handler_attribute (tree *, tree, tree,
@@ -14507,7 +14506,7 @@
 */
 
 /* Return true if we use LRA instead of reload pass.  */
-static bool
+bool
 sh_lra_p (void)
 {
   return sh_lra_flag;
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 228117)
+++ gcc/config/sh/sh.md	(working copy)
@@ -2122,13 +2122,19 @@
 })
 
 (define_expand "addsi3"
-  [(set (match_operand:SI 0 "arith_reg_operand" "")
-	(plus:SI (match_operand:SI 1 "arith_operand" "")
-		 (match_operand:SI 2 "arith_or_int_operand" "")))]
+  [(set (match_operand:SI 0 "arith_reg_dest")
+	(plus:SI (match_operand:SI 1 "arith_reg_operand")
+		 (match_operand:SI 2 "arith_or_int_operand")))]
   ""
 {
-  if (TARGET_SHMEDIA)
-operands[1] = force_reg (SImode, operands[1]);
+  if (TARGET_SH1 && !arith_operand (operands[2], SImode))
+{
+  if (!sh_lra_p () || reg_overlap_mentioned_p (operands[0], operands[1]))
+	{
+	  emit_insn (gen_addsi3_scr (operands[0], operands[1], operands[2]));
+	  DONE;
+	}
+}
 })
 
 (define_insn "addsi3_media"
@@ -2163,15 +2169,22 @@
 ;; copy or constant load before the actual add insn.
 ;; Use u constraint for that case to avoid the invalid value in the stack
 ;; pointer.
-(define_insn_and_split "*addsi3_compact"
+;; This also results in better code when LRA is not used.  However, we have
+;; to use different sets of patterns and the order of these patterns is
+;; important.
+;; In some cases the constant zero might end up in operands[2] of the
+;; patterns.  We have to accept that and convert it into a reg-reg move.
+(define_insn_and_split "*addsi3_compact_lra"
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,&u")
-	(plus:SI (match_operand:SI 1 "arith_operand" "%0,r")
+	(plus:SI (match_operand:SI 1 "arith_reg_operand" "%0,r")
 		 (match_operand:SI 2 "arith_or_int_operand" "rI08,rn")))]
-  "TARGET_SH1"
+  "TARGET_SH1 && sh_lra_p ()
+   && (! reg_overlap_mentioned_p (operands[0], operands[1])
+   || arith_operand (operands[2], SImode))"
   "@
 	add	%2,%0
 	#"
-  "reload_completed
+  "&& reload_completed
&& ! reg_overlap_mentioned_p (operands[0], operands[1])"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 0) (plus:SI (match_dup 0) (match_dup 1)))]
@@ -2182,6 +2195,58 @@
 }
   [(set_attr "type" "arith")])
 
+(define_insn_and_split "addsi3_scr"
+  [(set (match_operand:SI 0 "arith_reg_dest" "=r,&u,&u")
+	(plus:SI (match_operand:SI 1 "arith_reg_operand" "%0,r,r")
+		 (match_operand:SI 2 "arith_or_int_operand

Re: [patch] Enable lightweight checks with _GLIBCXX_ASSERTIONS.

2015-09-27 Thread Jonathan Wakely

On 26/09/15 22:49 +0200, Florian Weimer wrote:

On 09/26/2015 09:52 PM, Jonathan Wakely wrote:


Would changes like this be suitable for _FORTIFY_SOURCE?



diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 5e5ced1..074bf26 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -70,7 +70,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __recursive_mutex_base& operator=(const __recursive_mutex_base&) = delete;

 #ifdef __GTHREAD_RECURSIVE_MUTEX_INIT
+# if _GLIBCXX_ASSERTIONS && defined(PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP)
+// Use an error-checking mutex type when assertions are enabled.
+__native_type  _M_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
+# else
 __native_type  _M_mutex = __GTHREAD_RECURSIVE_MUTEX_INIT;
+# endif


I think this is incorrect.

If you try to lock an error-checking mutex recursively, the operation
fails, and it does *not* increment the internal lock counter (the mutex
may not even have one).  This means a subsequent unlock operation will
release the mutex too early.

The trylock will be have differently, too.

POSIX recursive mutexes are already error-checking in that sense
(self-deadlock cannot happen, and an unlock when not lock is defined to
return an error), so I don't think anything like that is even needed.


Doh, sorry, I meant this instead i.e. the non-recursive mutex.

(I forgot that I'd moved the non-recursive std::mutex definition to a
new file, and just edited the first thing I saw in include/std/mutex!)

diff --git a/libstdc++-v3/include/bits/mutex.h b/libstdc++-v3/include/bits/mutex.h
index 43f5b0b..7f88821 100644
--- a/libstdc++-v3/include/bits/mutex.h
+++ b/libstdc++-v3/include/bits/mutex.h
@@ -63,7 +63,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 typedef __gthread_mutex_t			__native_type;
 
 #ifdef __GTHREAD_MUTEX_INIT
+# if _GLIBCXX_ASSERTIONS && defined(PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP)
+// Use an error-checking mutex type when assertions are enabled.
+__native_type  _M_mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP;
+# else
 __native_type  _M_mutex = __GTHREAD_MUTEX_INIT;
+# endif
 
 constexpr __mutex_base() noexcept = default;
 #else