RFC patch for #pragma ivdep

2013-10-07 Thread Tobias Burnus
Attached is an early version (C only) for  #pragma ivdep, which aids 
vectorization by setting (for the following for-loop) loop->safelen to 
INT_MAX. [In the final version, I will also add parsing support for C++ 
and use it for Fortran's "do concurrent".]


As suggested by Richard and Jakub (thanks!), it is implemented as follows:
* An ANNOTATE_EXPR with ANNOTATE_EXPR_ID == annot_expr_ivdep_kind is 
attached to the condition of the loop

* In gimplify.c, it is converted into an internal function (ANNOTATE)
* When the "struct loops" is created, the internal function is removed 
and loop->safelen is set to INT_MAX


RFC:
* The replacement of the internal function is done in cfgloop.c's 
flow_loops_find. The code path I am interested in is: pass_build_cfg -> 
execute_build_cfg -> loop_optimizer_init (AVOID_CFG_MODIFICATIONS) -> 
flow_loops_find. But flow_loops_find is also called elsewhere.
Thus, is this the best spot? Is the slowdown of walking the gimple 
statements acceptable? Additionally, there are some assumptions, which 
may or may not be valid:
- There is only one latch edge (if there are more, loop->latch is set to 
NULL and my code is not reached; if expand_ANNOTATE might then get 
called and one gets an ICE [gcc_unreachable()]).

- The IFN_ANNOTATE is just before the GIMPLE_COND
- The loop condition is in loop->latch->next_bb.
* Parsing: Currently, #pragma ivdep and #pragma omp for require that a 
for loop follows. Other compilers permit #pragma ivdep followed by 
#pragma omp for - and vice versa [which gets complicated when OpenMPv4's 
safelen is also used]. Is restricting to either ivdep xor another pragma 
fine?



Example:

void foo(int n, int *a, int *b, int *c) {
  int i;
#pragma ivdep
  for (i = 0; i < n; ++i) {
a[i] = b[i] + c[i];
  }
}

Without the pragma, "gcc -O3 -fopt-info-vec-optimized -c foo.c" gives:
foo.c:4:3: note: loop vectorized
foo.c:4:3: note: loop versioned for vectorization because of possible 
aliasing

foo.c:4:3: note: loop peeled for vectorization to enhance alignment

With the pragma, as expected, no loop versioning is done (i.e. there is 
no "loop versioned for vectorization because of possible aliasing").


(I successfully did an ada,c,c++,fortran,go,java,lto,objc,obj-c++ 
bootstrap on x86-64-gnu-linux; regtesting is on-going.)


Tobias
2013-08-10  Tobias Burnus  

	* c-pragma.c (init_pragma) Add #pragma ivdep handling.
	* c-pragma.h (pragma_kind): Add PRAGMA_IVDEP.

	* c-parser.c (c_parser_pragma, c_parser_for_statement):
	Handle PRAGMA_IVDEP.
	(c_parser_statement_after_labels): Update call.

	* cfgloop.c (flow_loops_find): Search for IFN_ANNOTATE
	and set safelen.
	* gimplify.c (gimple_boolify, gimplify_expr): Handle ANNOTATE_EXPR.
	* internal-fn.c (expand_ANNOTATE): New function.
	* internal-fn.def (ANNOTATE): Define as new internal function.
	* tree-core.h (tree_node_kind): Add annot_expr_ivdep_kind.
	(tree_base) Update a comment.
	* tree-pretty-print.c (dump_generic_node): Handle ANNOTATE_EXPR.
	* tree.def (ANNOTATE_EXPR): New DEFTREECODE.
	* tree.h (ANNOTATE_EXPR_ID, SET_ANNOTATE_EXPR_ID): New macros.

diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index 309859f..06dbf17 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1353,6 +1353,8 @@ init_pragma (void)
 cpp_register_deferred_pragma (parse_in, "GCC", "pch_preprocess",
   PRAGMA_GCC_PCH_PREPROCESS, false, false);
 
+  cpp_register_deferred_pragma (parse_in, 0, "ivdep", PRAGMA_IVDEP, false,
+false);
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 41215db..c826fbd 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -46,6 +46,7 @@ typedef enum pragma_kind {
   PRAGMA_OMP_THREADPRIVATE,
 
   PRAGMA_GCC_PCH_PREPROCESS,
+  PRAGMA_IVDEP,
 
   PRAGMA_FIRST_EXTERNAL
 } pragma_kind;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index b612e29..6bf9fbf 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1150,7 +1150,7 @@ static void c_parser_if_statement (c_parser *);
 static void c_parser_switch_statement (c_parser *);
 static void c_parser_while_statement (c_parser *);
 static void c_parser_do_statement (c_parser *);
-static void c_parser_for_statement (c_parser *);
+static void c_parser_for_statement (c_parser *, bool);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -4495,7 +4495,7 @@ c_parser_statement_after_labels (c_parser *parser)
 	  c_parser_do_statement (parser);
 	  break;
 	case RID_FOR:
-	  c_parser_for_statement (parser);
+	  c_parser_for_statement (parser, false);
 	  break;
 	case RID_GOTO:
 	  c_parser_consume_token (parser);
@@ -4948,7 +4948,7 @@ c_parser_do_statement (c_parser *parser)
 */
 
 static void
-c_parser_for_statement (c_parser *parser)
+c_parser_for_statement (c_parser *p

Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-07 Thread Teresa Johnson
On Sun, Oct 6, 2013 at 6:51 AM, Jan Hubicka  wrote:
>> The second part ensures that frequencies are correctly updated after
>> inlining. The problem is that after inlining the frequencies were
>> being recomputed directly from the corresponding bb counts in
>> rebuild_frequencies. If the counts had been truncated to 0, then the
>> recomputed frequencies would become 0 as well (often leading to
>> inconsistencies in the frequencies between blocks). Then the above
>> change to probably_never_executed would not help identify these blocks
>> as non-cold from the frequencies. The fix was to use the existing
>> logic used during static profile rebuilding to also
>> recompute/propagate the frequencies from the branch probabilities in
>> the profile feedback case. I also renamed a number of estimate_*
>> routines to compute_* and updated the comments to reflect the fact
>> that these routines are not doing estimation (from a static profile),
>> but in fact recomputing/propagating frequencies from the existing
>> (either guessed or profile-feedback-based) probabilities.
>
> I see where your plan - you want frequencies to be computed purely on
> probabilities that are there.  This however will lead to unnecesary loss of
> precisions for functions with large counts: probabilities are truncated in
> precision and the propagation won't give correct answers and has serious
> truncating issues.
>
> So perhaps we want to execute this only if maximal count in function is low?
> (i.e. less than REG_BR_PROB_BASE or even REG_BR_PROB_BASE/10 - if counts
> are close enough the roundoff errors of count updating should be less terrible
> than errors introduced by the propagation).

Yes, I think that is a better approach.

>
>> -/* Estimate probabilities of loopback edges in loops at same nest level.  */
>> +/* Compute frequencies in loops at same nest level.  */
>>
>>  static void
>> -estimate_loops_at_level (struct loop *first_loop)
>> +compute_loops_at_level (struct loop *first_loop)
>
> I would keep "estimate".  The whole logic is not safe and it will lead to
> mistakes for irreducible loops and many other cases...

Ok

>
>> @@ -3027,18 +3042,16 @@ void
>>  rebuild_frequencies (void)
>>  {
>>timevar_push (TV_REBUILD_FREQUENCIES);
>> -  if (profile_status == PROFILE_GUESSED)
>> +  if (profile_status == PROFILE_GUESSED || profile_status == PROFILE_READ)
>>  {
>>loop_optimizer_init (0);
>>add_noreturn_fake_exit_edges ();
>>mark_irreducible_loops ();
>>connect_infinite_loops_to_exit ();
>> -  estimate_bb_frequencies ();
>> +  compute_bb_frequencies (true);
>
> Here please add the check about maximal count in functiona and commnet
> (especially for case we will remember to remove this hack if we switch to 
> sreal
> based counts/frequencies/probabilities)
>
> OK, with those changes (if it fixes the problem)

Ok thanks.
Teresa

>
> Thanks,
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-07 Thread Teresa Johnson
On Fri, Oct 4, 2013 at 8:54 AM, Xinliang David Li  wrote:
> On Thu, Oct 3, 2013 at 10:15 PM, Teresa Johnson  wrote:
>> This patch handles the fact that small profile count values sometimes
>> get truncated down to 0 during updates due to integer arithmetic. This
>> causes sub-optimal function splitting under
>> -freorder-blocks-and-partition.
>>
>> The first part fixes the logic in probably_never_executed that looks
>> at the bb frequency when the bb count is zero. It now correctly
>> compares the count computed from the frequency to the number of
>> profile runs instead of to REG_BR_PROB_BASE. The minimum ratio of
>> profile counts to training runs required for a block to not be
>> considered cold is now encoded in a parameter, with the default value
>> of 4 to match the existing code.
>>
>> The second part ensures that frequencies are correctly updated after
>> inlining. The problem is that after inlining the frequencies were
>> being recomputed directly from the corresponding bb counts in
>> rebuild_frequencies. If the counts had been truncated to 0, then the
>> recomputed frequencies would become 0 as well (often leading to
>> inconsistencies in the frequencies between blocks). Then the above
>> change to probably_never_executed would not help identify these blocks
>> as non-cold from the frequencies. The fix was to use the existing
>> logic used during static profile rebuilding to also
>> recompute/propagate the frequencies from the branch probabilities in
>> the profile feedback case. I also renamed a number of estimate_*
>> routines to compute_* and updated the comments to reflect the fact
>> that these routines are not doing estimation (from a static profile),
>> but in fact recomputing/propagating frequencies from the existing
>> (either guessed or profile-feedback-based) probabilities.
>
> For the second part, it seems to assume the branch probabilities are
> better maintained than bb counts?

The issue was that there were truncation errors leading to 0 counts
when the profile counts were small. But as Honza pointed out in a
follow-on email, this will cause different problems due to less
precision in the probabilities when the counts are large, so we should
really only do this when the counts are small.

Teresa

>
> David
>
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
>> (One more round of regression testing in progress after making some
>> slight cleanup changes.)
>>
>> Thanks,
>> Teresa
>>
>> 2013-10-03  Teresa Johnson  
>>
>> * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
>> * predict.c (probably_never_executed): Compare frequency-based
>> count to number of training runs.
>> (tree_estimate_probability): Update function call to new name.
>> compute_bb_frequencies and pass new parameter.
>> (compute_loops_at_level): Renamed.
>> (compute_loops): Ditto.
>> (compute_bb_frequencies): Renamed, and new parameter to
>> force recomputing frequencies.
>> (rebuild_frequencies): Recompute bb frequencies from the
>> probabilities instead of from counts due to truncation issues.
>> * predict.h (compute_bb_frequencies): Update declaration.
>>
>> Index: params.def
>> ===
>> --- params.def  (revision 203152)
>> +++ params.def  (working copy)
>> @@ -373,6 +373,12 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
>>  "Select fraction of the maximal frequency of executions of
>> basic block in function given basic block needs to have to be
>> considered hot",
>>  1000, 0, 0)
>>
>> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
>> +"min-hot-run-ratio",
>> + "The minimum ratio of profile runs to basic block execution count "
>> + "for the block to be considered hot",
>> +4, 0, 1)
>> +
>>  DEFPARAM (PARAM_ALIGN_THRESHOLD,
>>   "align-threshold",
>>   "Select fraction of the maximal frequency of executions of
>> basic block in function given basic block get alignment",
>> Index: predict.c
>> ===
>> --- predict.c   (revision 203152)
>> +++ predict.c   (working copy)
>> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>>gcc_checking_assert (fun);
>>if (profile_status_for_function (fun) == PROFILE_READ)
>>  {
>> -  if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
>> +  int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
>> +  if (RDIV (count * min_run_ratio, profile_info->runs) > 0)
>> return false;
>>if (!frequency)
>> return true;
>>if (!ENTRY_BLOCK_PTR->frequency)
>> return false;
>> -  if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
>> REG_BR_PROB_BASE)
>> +  if (ENTRY_BLOCK_PTR->count)
>> {
>> - return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
>> -   ENTRY_

Re: [Patch] Handle profile counts truncated to 0 in coldness test

2013-10-07 Thread Teresa Johnson
On Sun, Oct 6, 2013 at 6:43 AM, Jan Hubicka  wrote:
>> 2013-10-03  Teresa Johnson  
>>
>> * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
>
> I would not mention HOT in the parameter name.  Blocks are hot/normal/unlikely
> and we HOT_BB_FREQUENCY_FRACTION.
>
> So perhaps UNLIKELY_BB_COUNT_FRACTION with an explanation that it is relative
> to number of train runs?

Ok.

>> +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
>> +"min-hot-run-ratio",
>> + "The minimum ratio of profile runs to basic block execution count "
>> + "for the block to be considered hot",
> Perhaps as:
>> + "The maximum ratio of profile runs to basic block execution count "
>> + "for the block to be considered unlikely",
>
>> +4, 0, 1)
>
> lower bound should be 1 or we divide by 0.

Yep, will fix.

>> Index: predict.c
>> ===
>> --- predict.c   (revision 203152)
>> +++ predict.c   (working copy)
>> @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
>>gcc_checking_assert (fun);
>>if (profile_status_for_function (fun) == PROFILE_READ)
>>  {
>> -  if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
>> +  int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
>> +  if (RDIV (count * min_run_ratio, profile_info->runs) > 0)
>
> This way if count is 1, runs needs to be n_run_ratio * 2
> So perhaps if (count * min_run_ratio >= profile_info->runs) instead of 
> division.

Ok.

>> -  if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < 
>> REG_BR_PROB_BASE)
>> +  if (ENTRY_BLOCK_PTR->count)
>> {
>> - return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
>> -   ENTRY_BLOCK_PTR->frequency)
>> - < REG_BR_PROB_BASE / 4);
>> +  gcov_type scaled_count
>> +  = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
>> +  gcov_type computed_count;
>> +  /* Check for overflow, in which case ENTRY_BLOCK_PTR->count should
>> + be large enough to do the division first without losing much
>> + precision.  */
>> +  if (scaled_count/frequency/min_run_ratio != 
>> ENTRY_BLOCK_PTR->count)
>
> I do not like the undefined behaviour triggered before the check (sure 
> unsigned
> arithmetic would fix that, but it seems all strange).  What about guarding the
> whole code.
>
> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE)
> For large counts the roudoff problems in first test should not be problem.

Sure, that sounds reasonable.

>> +{
>> +  computed_count = RDIV (ENTRY_BLOCK_PTR->count,
>> + ENTRY_BLOCK_PTR->frequency);
>> +  computed_count *= frequency * min_run_ratio;
>> +}
>> +  else
>> +computed_count = RDIV (scaled_count, 
>> ENTRY_BLOCK_PTR->frequency);
>> +  if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 0)
>
> So at nonoverflow path you compute
>
>((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency) * 
> min_run_ratio / runs > 0.5

Yes, there is an extra min_run_ratio factor in there that I need to remove!
>
> I think you want
>
>((frequency * entry_bb_count * min_run_ratio) / entry_bb_frequency >= runs

Right, will change that too.

>
> If entry_bb_frequency is 0, you can just return true iff frequency is > 
> min_run_ratio.

We already return false conservatively if entry_bb_frequency is 0. Do
we really want to change this - I'm not sure it makes sense to compare
the frequency to the run ratio directly.

>
> I think safe way to compute this is
>
> if (ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
>   scaled_count = ((frequency * entry_bb_count * min_run_ratio) / 
> entry_bb_frequency;
> else
>   scaled_count = ((frequency * entry_bb_count) / entry_bb_frequency) * 
> min_run_ratio
>
> In first path, we have at most 1*1*1*1 that still fits in 
> 64bit
> value.
>
> In the second path the base of fixed point arithmetic is large enough so the
> division should not matter. We never get over entry_count * 1 that is 
> safe.
> The first part of statement however just compute value that should match count
> so count > 1 unless profile got misupated and the low-count roundoff 
> errors should
> not matter. So perhaps the whole code can execute only if
> if (ENTRY_BLOCK_PTR->count && count < REG_BR_PROB_BASE
> && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE * REG_BR_PROB_BASE)
> and we do not need any conditionals for scaled_count
>
> Does such code work with ratio changed to 16?  I think the double 
> multiplication in your
> patch may have caused problems.

Let me fix this up and will get back.

Thanks,
Teresa

>
> I will check the second part of patch separately.
> Thanks!
> Honza



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH]Fix computation of offset in ivopt

2013-10-07 Thread Bin.Cheng
On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener
 wrote:
> On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng  wrote:
>>

>
> I don't think you need
>
> +  /* Sign extend off if expr is in type which has lower precision
> + than HOST_WIDE_INT.  */
> +  if (TYPE_PRECISION (TREE_TYPE (expr)) <= HOST_BITS_PER_WIDE_INT)
> +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr)));
>
> at least it would be suspicious if you did ...
>
> The only case that I can think of points to a bug in strip_offset_1
> again, namely if sizetype (the type of all offsets) is smaller than
> a HOST_WIDE_INT in which case
>
> +boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field));
> +*offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT;
>
> is wrong as boffset / BITS_PER_UNIT does not do a signed division
> then (for negative boffset which AFAIK does not happen - but it would
> be technically allowed).  Thus, the predicates like
>
> +&& cst_and_fits_in_hwi (tmp)
>
> would need to be amended with a check that the MSB is not set.
I think it twice and have below understandings.
1) "boffset / BITS_PER_UNIT" does not do signed division because
boffset might be negative and BIT_PER_UNIT could be defined as
unsigned for a target.
2) if sizetype is smaller enough than HOST_WIDE_INT, and if field
offset is large enough to have MSB set, then boffset would be computed
negative by int_cst_value.
3) If sizetype is as large as HOST_WIDE_INT, boffset would be negative
only if the field offset is a very large number with MSB set, but that
would be impossible for any structure.

Are these understandings right? And sorry to bother with the stupid
questions.  Thanks.

-- 
Best Regards.


Re: Optimize callers using nonnull attribute

2013-10-07 Thread Jeff Law

On 10/07/13 07:52, Marc Glisse wrote:

On Mon, 7 Oct 2013, Richard Biener wrote:


On Mon, Oct 7, 2013 at 12:33 AM, Marc Glisse 
wrote:

Hello,

this patch asserts that when we call a function with the nonnull
attribute,
the corresponding argument is not zero, just like when we dereference a
pointer. Everything is under a check for
flag_delete_null_pointer_checks.

Note that this function currently gives up if the statement may throw
(because in some languages indirections may throw?), but this could
probably
be relaxed a bit so my example would still work when compiled with g++,
without having to mark f1 and f2 as throw().

Bootstrap (default languages) + testsuite on x86_64-unknown-linux-gnu.


Can you please restructure it in a way to not require a goto?  That is,
move searching for a non-null opportunity into a helper function?


Thanks. I wasn't sure what to put in the helper and what to keep in the
original function, so I'll wait a bit for comments before the commit.

Bootstrap (default languages) + testsuite on x86_64-unknown-linux-gnu.
I like it.  Tweak per Richi's suggestion to avoid the goto and get it 
committed!


jeff



Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 8:43 PM, Jonathan Wakely  wrote:
> OK, this latest patch looks good so please go ahead and commit it - thanks!

Committed :)


-- 
Tim Shen


Re: [Google] Fix type merging of argument packs for LIPO

2013-10-07 Thread Xinliang David Li
ok.

thanks,

David

On Mon, Oct 7, 2013 at 6:33 PM, Teresa Johnson  wrote:
> Passes regression tests. Ok for google/4_8?
>
> Thanks,
> Teresa
>
> 2013-10-07  Teresa Johnson  
>
> * cp/cp-objcp-common.c (cmp_templ_arg): Fix argument pack
> accesses.
>
> Index: cp/cp-objcp-common.c
> ===
> --- cp/cp-objcp-common.c(revision 203253)
> +++ cp/cp-objcp-common.c(working copy)
> @@ -307,13 +307,15 @@ cmp_templ_arg (tree ta1, tree ta2)
>int n, i;
>if (!ARGUMENT_PACK_P (ta2))
>  return 0;
> -  n = TREE_VEC_LENGTH (ta1);
> -  if (n != TREE_VEC_LENGTH (ta2))
> +  tree pack1 = ARGUMENT_PACK_ARGS (ta1);
> +  tree pack2 = ARGUMENT_PACK_ARGS (ta2);
> +  n = TREE_VEC_LENGTH (pack1);
> +  if (n != TREE_VEC_LENGTH (pack2))
>  return 0;
>for (i = 0; i < n ; i++)
>  {
> -  if (!cmp_templ_arg (TREE_VEC_ELT (ta1, i),
> -  TREE_VEC_ELT (ta2, i)))
> +  if (!cmp_templ_arg (TREE_VEC_ELT (pack1, i),
> +  TREE_VEC_ELT (pack2, i)))
>  return 0;
>  }
>return 1;
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[Google] Fix type merging of argument packs for LIPO

2013-10-07 Thread Teresa Johnson
Passes regression tests. Ok for google/4_8?

Thanks,
Teresa

2013-10-07  Teresa Johnson  

* cp/cp-objcp-common.c (cmp_templ_arg): Fix argument pack
accesses.

Index: cp/cp-objcp-common.c
===
--- cp/cp-objcp-common.c(revision 203253)
+++ cp/cp-objcp-common.c(working copy)
@@ -307,13 +307,15 @@ cmp_templ_arg (tree ta1, tree ta2)
   int n, i;
   if (!ARGUMENT_PACK_P (ta2))
 return 0;
-  n = TREE_VEC_LENGTH (ta1);
-  if (n != TREE_VEC_LENGTH (ta2))
+  tree pack1 = ARGUMENT_PACK_ARGS (ta1);
+  tree pack2 = ARGUMENT_PACK_ARGS (ta2);
+  n = TREE_VEC_LENGTH (pack1);
+  if (n != TREE_VEC_LENGTH (pack2))
 return 0;
   for (i = 0; i < n ; i++)
 {
-  if (!cmp_templ_arg (TREE_VEC_ELT (ta1, i),
-  TREE_VEC_ELT (ta2, i)))
+  if (!cmp_templ_arg (TREE_VEC_ELT (pack1, i),
+  TREE_VEC_ELT (pack2, i)))
 return 0;
 }
   return 1;

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[C++ Patch] PR 58568

2013-10-07 Thread Paolo Carlini

Hi,

this ICE on invalid 4.8/4.9 Regression is very easy to fix: just add a 
check for error_mark_node on the return of xref_tag (I'm also removing a 
redundant one a few lines below). This brings us to the same error 
message we get when there are no lambdas involved, eg for the existing 
template/crash27.C:


template struct A
{
template int A::i;
};

error: type ‘A’ is not derived from type ‘A< >’

which I find quite confusing, in general. Thus, besides the 
error_mark_node change, I'm also proposing to tweak a bit the error 
message in grokdeclarator to the more general wording which we already 
use a few lines above (apparently EDG also uses something similar in 
such cases):


error: invalid use of qualified-name ‘A::i’

If I examine all the testcases we have got in the testsuite sensitive to 
this diagnostic (template/crash27.C, g++.mike/misc9.C, 
g++.other/decl5.C) I don't get the impression that explicitly talking 
about derivation ever adds much, if anything.


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2013-10-08  Paolo Carlini  

PR c++/58568
* lambda.c (begin_lambda_type): Check return value of xref_tag
for error_mark_node; tidy.
* decl.c (grokdeclarator): Tweak error message.

/testsuite
2013-10-08  Paolo Carlini  

PR c++/58568
* g++.dg/cpp0x/lambda/lambda-ice10.C: New.
* g++.old-deja/g++.mike/misc9.C: Adjust.
Index: cp/decl.c
===
--- cp/decl.c   (revision 203255)
+++ cp/decl.c   (working copy)
@@ -8774,8 +8774,8 @@ grokdeclarator (const cp_declarator *declarator,
 && !uniquely_derived_from_p (ctype,
  current_class_type))
  {
-   error ("type %qT is not derived from type %qT",
-  ctype, current_class_type);
+   error ("invalid use of qualified-name %<%T::%D%>",
+  qualifying_scope, decl);
return error_mark_node;
  }
  }
Index: cp/lambda.c
===
--- cp/lambda.c (revision 203255)
+++ cp/lambda.c (working copy)
@@ -138,6 +138,8 @@ begin_lambda_type (tree lambda)
  name,
  /*scope=*/ts_lambda,
  /*template_header_p=*/false);
+if (type == error_mark_node)
+  return error_mark_node;
   }
 
   /* Designate it as a struct so that we can use aggregate initialization.  */
@@ -152,8 +154,6 @@ begin_lambda_type (tree lambda)
 
   /* Start the class.  */
   type = begin_class_definition (type);
-  if (type == error_mark_node)
-return error_mark_node;
 
   return type;
 }
Index: testsuite/g++.dg/cpp0x/lambda/lambda-ice10.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-ice10.C(revision 0)
+++ testsuite/g++.dg/cpp0x/lambda/lambda-ice10.C(working copy)
@@ -0,0 +1,8 @@
+// PR c++/58568
+// { dg-do compile { target c++11 } }
+
+template struct A
+{
+  static const int i;
+  template const int A::i = []{ return 0; }(); // { dg-error 
"invalid use" }
+};
Index: testsuite/g++.old-deja/g++.mike/misc9.C
===
--- testsuite/g++.old-deja/g++.mike/misc9.C (revision 203255)
+++ testsuite/g++.old-deja/g++.mike/misc9.C (working copy)
@@ -8,6 +8,6 @@ class bee {
 
 class foo {
  public:
-  int bee::bar;// { dg-error "not derived" } you cannot do this
+  int bee::bar;// { dg-error "invalid use" } you cannot do this
 int me();
 };


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Jonathan Wakely
On 8 October 2013 01:06, Tim Shen wrote:
> On Mon, Oct 7, 2013 at 7:11 PM, Jonathan Wakely  wrote:
>> The version in your gist (which is *not* what your first patch did!)
>> is faster for any size of _M_exists and any ratio of _M_states.size()
>> / _M_exists.size():
>
> Sorry for my original patch, I made a mistake! It's:
>   while (!_BaseT::empty())
> _BaseT::pop_back();
> but I truely meant:
>   while (!this->_M_empty())
> this->_M_pop();
> To make all trues reset.

Yes, that's what I was trying to say in my first mail :-)

> In my machine the while loop is slower than assign() when the
> _M_states.size() / _M_exists.size() is more than even 7/1000. so let's
> keep the assign(), because the ratio in regex could probably be more
> than 25%.

OK, this latest patch looks good so please go ahead and commit it - thanks!


Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Kaz Kojima
Oleg Endo  wrote:
> To be honest, I had some difficulty picking the name.
> Maybe something like 'sh_tbit_combine' or 'sh_treg_combine' would be
> better, or at least less confusing?  Suggestions are highly appreciated.

'sh_treg_combine' or 'sh_combine_treg' sounds good to me.

Regards,
kaz


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 7:11 PM, Jonathan Wakely  wrote:
> The version in your gist (which is *not* what your first patch did!)
> is faster for any size of _M_exists and any ratio of _M_states.size()
> / _M_exists.size():

Sorry for my original patch, I made a mistake! It's:
  while (!_BaseT::empty())
_BaseT::pop_back();
but I truely meant:
  while (!this->_M_empty())
this->_M_pop();
To make all trues reset.

In my machine the while loop is slower than assign() when the
_M_states.size() / _M_exists.size() is more than even 7/1000. so let's
keep the assign(), because the ratio in regex could probably be more
than 25%.


-- 
Tim Shen


a.patch
Description: Binary data


Re: [patch] fix libstdc++/57641

2013-10-07 Thread Jonathan Wakely
On 18 June 2013 23:55, Jonathan Wakely wrote:
> Instead of fixing the bug three times I refactored the try_lock_xxx
> functions into a mixin template and used that in the various timed
> mutexes.
>
> PR libstdc++/57641
> * include/std/mutex (timed_mutex, recursive_timed_mutex): Move common
> functionality to new __timed_mutex_impl mixin. Overload try_lock_until
> to handle conversion between different clocks. Replace constrained
> __try_lock_for_impl overloads with conditional increment.
> * include/std/shared_mutex (shared_mutex::_Mutex): Use the new mixin.
> * testsuite/30_threads/timed_mutex/try_lock_until/57641.cc: New.
>
> Tested x86_64-linux, committed to trunk.

I've committed a less invasive version of this patch (without the
refactoring to add a new base class) to the 4.8 branch. Tested
x86_64-linux.

PR libstdc++/57641
* include/std/mutex (timed_mutex, recursive_timed_mutex): Add
overloaded _M_try_lock_until to handle conversion between different
clocks. Replace constrained __try_lock_for_impl overloads with
conditional increment.
* testsuite/30_threads/timed_mutex/try_lock_until/57641.cc: New.
commit b28b4820d7a76e58dc764a9bfe978c657386991b
Author: Jonathan Wakely 
Date:   Mon Oct 7 10:13:00 2013 +0100

PR libstdc++/57641
* include/std/mutex (timed_mutex, recursive_timed_mutex): Add
overloaded _M_try_lock_until to handle conversion between different
clocks. Replace constrained __try_lock_for_impl overloads with
conditional increment.
* testsuite/30_threads/timed_mutex/try_lock_until/57641.cc: New.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 67f3418..3093d9a 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -237,25 +237,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template 
   bool
   try_lock_for(const chrono::duration<_Rep, _Period>& __rtime)
-  { return __try_lock_for_impl(__rtime); }
+  { return _M_try_lock_for(__rtime); }
 
 template 
   bool
   try_lock_until(const chrono::time_point<_Clock, _Duration>& __atime)
-  {
-   chrono::time_point<_Clock, chrono::seconds> __s =
- chrono::time_point_cast(__atime);
-
-   chrono::nanoseconds __ns =
- chrono::duration_cast(__atime - __s);
-
-   __gthread_time_t __ts = {
- static_cast(__s.time_since_epoch().count()),
- static_cast(__ns.count())
-   };
-
-   return !__gthread_mutex_timedlock(&_M_mutex, &__ts);
-  }
+  { return _M_try_lock_until(__atime); }
 
 void
 unlock()
@@ -270,26 +257,39 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   private:
 template
-  typename enable_if<
-   ratio_less_equal<__clock_t::period, _Period>::value, bool>::type
-  __try_lock_for_impl(const chrono::duration<_Rep, _Period>& __rtime)
+  bool
+  _M_try_lock_for(const chrono::duration<_Rep, _Period>& __rtime)
   {
-   __clock_t::time_point __atime = __clock_t::now()
- + chrono::duration_cast<__clock_t::duration>(__rtime);
+   auto __rt = chrono::duration_cast<__clock_t::duration>(__rtime);
+   if (ratio_greater<__clock_t::period, _Period>())
+ ++__rt;
 
-   return try_lock_until(__atime);
+   return _M_try_lock_until(__clock_t::now() + __rt);
   }
 
-template 
-  typename enable_if<
-   !ratio_less_equal<__clock_t::period, _Period>::value, bool>::type
-  __try_lock_for_impl(const chrono::duration<_Rep, _Period>& __rtime)
+template
+  bool
+  _M_try_lock_until(const chrono::time_point<__clock_t,
+_Duration>& __atime)
   {
-   __clock_t::time_point __atime = __clock_t::now()
- + ++chrono::duration_cast<__clock_t::duration>(__rtime);
+   chrono::time_point<__clock_t, chrono::seconds> __s =
+ chrono::time_point_cast(__atime);
+
+   chrono::nanoseconds __ns =
+ chrono::duration_cast(__atime - __s);
+
+   __gthread_time_t __ts = {
+ static_cast(__s.time_since_epoch().count()),
+ static_cast(__ns.count())
+   };
 
-   return try_lock_until(__atime);
+   return !__gthread_mutex_timedlock(native_handle(), &__ts);
   }
+
+template
+  bool
+  _M_try_lock_until(const chrono::time_point<_Clock, _Duration>& __atime)
+  { return _M_try_lock_for(__atime - _Clock::now()); }
   };
 
   /// recursive_timed_mutex
@@ -330,25 +330,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template 
   bool
   try_lock_for(const chrono::duration<_Rep, _Period>& __rtime)
-  { return __try_lock_for_impl(__rtime); }
+  { return _M_try_lock_for(__rtime); }
 
 template 
   bool
   try_lock_until(const chrono::time_point<_Clock, _Duration>& __atime)
-  {
-   chrono::time_point<_Clock, chrono::seconds>  __s =
- chrono::time_point

Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Jonathan Wakely
On 7 October 2013 22:14, Tim Shen wrote:
> On Mon, Oct 7, 2013 at 1:28 PM, Jonathan Wakely  wrote:
>> std::memset() on about 125 bytes is not a big deal, I doubt it's
>> significantly more expensive than the calculations to find the right
>> bits and do the necessary masking for three elements.
>> std::vector is a strange beast, remember.
>
> Probably there could be bitwise operation?

Yes, most operations on vector involve bitwise operations.

> I've test the 3 trues out of 1000 elements here :
> https://gist.github.com/innocentim/6874889
> In my machine, choosing "v.assign(v.size(), false);" cost 1.768s and
> choosing the other cost 1.088s. When we push 7 elements, they cost
> almost same time. When push more than that, v.assign() shall win.

You're right, popping one at a time is measurably faster than filling
the whole vector with zeros, even when _M_exists.size() is
small.  I'm quite surprised. I expected updating the vector size on
every pop_back() to make it slower.

The version in your gist (which is *not* what your first patch did!)
is faster for any size of _M_exists and any ratio of _M_states.size()
/ _M_exists.size():


> I do this because I need to use _M_clear() in this patch :)
>
> Anyway let's use v.assign().

Sorry for wasting your time regarding using assign(), it does seem to
be slower (and produces larger code) so please go for something like
in your gist (not in the original patch, which left true values in the
_M_exists vector!) i.e. something like

for (auto __i : _M_states)
_M_base[__i] = false;
  _M_base.clear();

But please rename _M_base, that name doesn't make sense, it used to be
the base class, but it isn't now, and that vector has nothing to do
with base classes now! How about  _M_states?

Please also make _M_empty() const.

Thanks!


Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #5

2013-10-07 Thread Michael Meissner
On this patch, I add some infrastructure to say whether a given mode can go in
a general purpose register, floating point register, or altivec register, and
if it can, what kinds of addressing is legal for the mode.  In addition, I have
switched to using memset to setup the arrays indexed by mode.

I've rewritten this infrastructure since I submitted version 1 of the patch.
At present, this is only an infrastructure change, and the compiler generates
the same code underneath before and after the patch.

I switched rs6000_legitimate_address_p so that it uses this new infrastructure
to determine if a mode can do auto update as part of the address.  I found that
the code in rs6000_legitimate_address_p was not obvious what they were trying
to do.  I discovered for some 32-bit abis, the types DImode, DFmode, and
DDmode, would allow addresses with PRE_INC or PRE_DEC, but not PRE_MODIFY.  At
present, I am generating the same code, but in a later patch, we can allow
PRE_MODIFY for those types as well.  I noticed this when I was doing version 1
of the patches, and I was generating more auto update addressing forms than
before.

I have run bootstrap and make check with no regressions with these patches.  I
have also tested code for various 32/64-bit targets (power8, power7, power6x,
power6, power5, power 5+, power4, cell, 476 with/with floating point, g4, g5,
e5500, e6500, and xilinx full/lite/no floating point), and I found the code
generated is identical.

Can I install these patches in GCC 4.9 on the trunk?

2013-10-07  Michael Meissner  

* config/rs6000/rs6000.c (enum rs6000_reload_reg_type): Add new
fields to the reg_addr array that describes the valid addressing
mode for any register, general purpose registers, floating point
registers, and Altivec registers.
(FIRST_RELOAD_REG_CLASS): Likewise.
(LAST_RELOAD_REG_CLASS): Likewise.
(struct reload_reg_map_type): Likewise.
(reload_reg_map_type): Likewise.
(RELOAD_REG_VALID): Likewise.
(RELOAD_REG_MULTIPLE): Likewise.
(RELOAD_REG_INDEXED): Likewise.
(RELOAD_REG_OFFSET): Likewise.
(RELOAD_REG_PRE_INCDEC): Likewise.
(RELOAD_REG_PRE_MODIFY): Likewise.
(reg_addr): Likewise.
(mode_supports_pre_incdec_p): New helper functions to say whether
a given mode supports PRE_INC, PRE_DEC, and PRE_MODIFY.
(mode_supports_pre_modify_p): Likewise.
(rs6000_debug_vector_unit): Rearrange the -mdebug=reg output to
print the valid address mode bits for each mode.
(rs6000_debug_print_mode): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): New function to set up the address
mask bits for each type.
(rs6000_init_hard_regno_mode_ok): Use memset to clear arrays.
Call rs6000_setup_reg_addr_masks to set up the address mask bits.
(rs6000_legitimate_address_p): Use mode_supports_pre_incdec_p and
mode_supports_pre_modify_p to determine if PRE_INC, PRE_DEC, and
PRE_MODIFY are supported.
(rs6000_print_options_internal): Tweak the debug output slightly.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 203166)
+++ gcc/config/rs6000/rs6000.c  (.../gcc/config/rs6000) (working copy)
@@ -313,6 +313,50 @@ static enum rs6000_reg_type reg_class_to
 
 #define IS_FP_VECT_REG_TYPE(RTYPE) IN_RANGE(RTYPE, VSX_REG_TYPE, FPR_REG_TYPE)
 
+
+/* Register classes we care about in secondary reload or go if legitimate
+   address.  We only need to worry about GPR, FPR, and Altivec registers here,
+   along an ANY field that is the OR of the 3 register classes.  */
+
+enum rs6000_reload_reg_type {
+  RELOAD_REG_GPR,  /* General purpose registers.  */
+  RELOAD_REG_FPR,  /* Traditional floating point regs.  */
+  RELOAD_REG_AV,   /* Altivec registers.  */
+  RELOAD_REG_ANY,  /* OR of GPR, FPR, Altivec masks.  */
+  N_RELOAD_REG
+};
+
+/* For setting up register classes, loop through the 3 register classes mapping
+   into real registers, and skip the ANY class, which is just an OR of the
+   bits.  */
+#define FIRST_RELOAD_REG_CLASS RELOAD_REG_GPR
+#define LAST_RELOAD_REG_CLASS  RELOAD_REG_AV
+
+/* Map reload register type to a register in the register class.  */
+struct reload_reg_map_type {
+  const char *name;/* Register class name.  */
+  int reg; /* Register in the register class.  */
+};
+
+static const struct reload_reg_map_type reload_reg_map[N_RELOAD_REG] = {
+  { "Gpr", FIRST_GPR_REGNO },  /* RELOAD_REG_GPR.

Re: PING^2 Re: [PATCH] Add -fno-instrument-function v2

2013-10-07 Thread Andi Kleen
Andi Kleen  writes:

PING^2

> Andi Kleen  writes:
>
>> From: Andi Kleen 
>>
>> [I posted this originally quite some time ago.
>> This version fixes all review problems, particularly
>> it works for C++ too and the test case really works.]
>
> Ping!
>
> Could someone please review it.
>
> Note this might be obsolete with Honza's LTO option work, but if it's
> not covered in his first iteration I would still have it earlier for the
> Linux kernel LTO build (fixed ftrace)
>
> -Andi
>
>> This adds a new C/C++ option to force
>> __attribute__((no_instrument_function)) on every function compiled.
>>
>> This is useful together with LTO. You may want to have the whole
>> program compiled with -pg and have to specify that in the LTO
>> link, but want to disable it for some specific files. As the
>> option works on the frontend level it is already passed through
>> properly by LTO.
>>
>> Without LTO it is equivalent to not specifing -pg or -mfentry.
>>
>> This fixes some missing functionality in the Linux kernel LTO port.
>>
>> Passed bootstrap and test suite on x86_64-linux. Ok?
>>
>> gcc/:
>>
>> 2013-08-10  Andi Kleen 
>>
>>  * c.opt (fno-instrument-function): Document.
>>
>> gcc/c:
>>
>> 2013-08-10  Andi Kleen 
>>
>>  * c-decl.c (start_function): Handle force_no_instrument_function
>>
>> gcc/cp:
>>
>> 2013-08-10  Andi Kleen 
>>
>>  * decl.c (start_preparsed_function): Handle
>>  force_no_instrument_function
>>
>> gcc/testsuite:
>>
>> 2013-08-10  Andi Kleen 
>>
>>  * g++.dg/fno-instrument-function.C: Add.
>>  * gcc.dg/fno-instrument-function.c: Add.
>> ---
>>  gcc/c-family/c.opt |  4 
>>  gcc/c/c-decl.c |  3 +++
>>  gcc/cp/decl.c  |  3 +++
>>  gcc/doc/invoke.texi|  8 +++-
>>  gcc/testsuite/g++.dg/fno-instrument-function.C | 18 ++
>>  gcc/testsuite/gcc.dg/fno-instrument-function.c | 24 
>>  6 files changed, 59 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/g++.dg/fno-instrument-function.C
>>  create mode 100644 gcc/testsuite/gcc.dg/fno-instrument-function.c
>>
>> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
>> index 10ae84d..2159f89 100644
>> --- a/gcc/c-family/c.opt
>> +++ b/gcc/c-family/c.opt
>> @@ -1014,6 +1014,10 @@ fnil-receivers
>>  ObjC ObjC++ Var(flag_nil_receivers) Init(1)
>>  Assume that receivers of Objective-C messages may be nil
>>  
>> +fno-instrument-function
>> +C C++ ObjC ObjC++ RejectNegative Report Var(force_no_instrument_function)
>> +Force __attribute__((no_instrument_function)) for all functions in 
>> translation unit.
>> +
>>  fnonansi-builtins
>>  C++ ObjC++ Var(flag_no_nonansi_builtin, 0)
>>  
>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>> index d9bbf5c..15717a9 100644
>> --- a/gcc/c/c-decl.c
>> +++ b/gcc/c/c-decl.c
>> @@ -7876,6 +7876,9 @@ start_function (struct c_declspecs *declspecs, struct 
>> c_declarator *declarator,
>>if (current_scope == file_scope)
>>  maybe_apply_pragma_weak (decl1);
>>  
>> +  if (force_no_instrument_function)
>> +DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl1) = 1;  
>> +
>>/* Warn for unlikely, improbable, or stupid declarations of `main'.  */
>>if (warn_main && MAIN_NAME_P (DECL_NAME (decl1)))
>>  {
>> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
>> index 01804d2..103188b 100644
>> --- a/gcc/cp/decl.c
>> +++ b/gcc/cp/decl.c
>> @@ -13023,6 +13023,9 @@ start_preparsed_function (tree decl1, tree attrs, 
>> int flags)
>>&& lookup_attribute ("noinline", attrs))
>>  warning (0, "inline function %q+D given attribute noinline", decl1);
>>  
>> +  if (force_no_instrument_function)
>> +DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (decl1) = 1;
>> +
>>/* Handle gnu_inline attribute.  */
>>if (GNU_INLINE_P (decl1))
>>  {
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 782b569..bc20a77 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -169,7 +169,7 @@ in the following sections.
>>  -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol
>>  -fno-asm  -fno-builtin  -fno-builtin-@var{function} @gol
>>  -fhosted  -ffreestanding -fopenmp -fms-extensions -fplan9-extensions @gol
>> --trigraphs  -traditional  -traditional-cpp @gol
>> +-trigraphs  -traditional  -traditional-cpp -fno-instrument-function @gol
>>  -fallow-single-precision  -fcond-mismatch -flax-vector-conversions @gol
>>  -fsigned-bitfields  -fsigned-char @gol
>>  -funsigned-bitfields  -funsigned-char}
>> @@ -1868,6 +1868,12 @@ Allow implicit conversions between vectors with 
>> differing numbers of
>>  elements and/or incompatible element types.  This option should not be
>>  used for new code.
>>  
>> +@item -fno-instrument-function
>> +@opindex fno-instrument-function
>> +Override @option{-pg} for this translation unit. This is useful with
>> +Link Time Optimization (LTO) to override the effects of

Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Oleg Endo
On Mon, 2013-10-07 at 07:44 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > Forgot to handle a case in function can_remove_cstore, thanks for
> > catching it.  Fixed in the attached patch and also added test cases.
> > Retested as before without new failures.
> 
> Ok for trunk.
> 
> > Yeah, right.  I've changed 'ifcvt_sh' to 'sh_ifcvt'.
> 
> >+  register_pass (make_pass_sh_ifcvt (g, false, "ifcvt1_sh"),
> >+ PASS_POS_INSERT_AFTER, "ce1", 1);
> >+*/
> 
> s/ifcvt1_sh/sh_ifcvt1/ might be better even in a comment.

Sorry, I missed one.  Will fix and resend the committed patch after we
have agreed on the pass name (see Christian's message).

Cheers,
Oleg



Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Oleg Endo
On Mon, 2013-10-07 at 10:30 +0200, Christian Bruel wrote:
> Hi Oleg,
> 
> +/*
> +This pass tries to optimize for example this:
> + mov.l   @(4,r4),r1
> + tst r1,r1
> + movtr1
> + tst r1,r1
> + bt/s.L5
> +
> +into something simpler:
> + mov.l   @(4,r4),r1
> + tst r1,r1
> + bf/s.L5
> +
> +Such sequences can be identified by looking for conditional branches and
> +checking whether the ccreg is set before the conditional branch
> +by testing another register for != 0, which was set by a ccreg store.
> +This can be optimized by eliminating the redundant comparison and
> +inverting the branch condition.  There can be multiple comparisons in
> +different basic blocks that all end up in the redunant test insn before the
> +conditional branch.  Some example RTL ...
> +
> 
> Nice things to optimize the sequences when t-bit values are not
> recognized due to branches direction, I have 2 questions
> 
> 1) I find the name "if-conversion" for this pass a little bit forced,
> since you don't aim to remove branches. If looks more like some kind of
> extension value numbering. 

To be honest, I had some difficulty picking the name.
Maybe something like 'sh_tbit_combine' or 'sh_treg_combine' would be
better, or at least less confusing?  Suggestions are highly appreciated.

> 2) I'm wondering in which extend this case could be handled by a more
> global generic target value numbering to handle boolean operations.
> Maybe just a phasing problem as the branch directions are not yet
> computed in gimple-ssa, which would mean reworking in RTL ?

I don't know.  What the pass currently does looks like highly
specialized combine (try_eliminate_cstores) and a little bit of CSE
(try_combine_comparisons), I think.
The problems on SH are due to the way insns are expanded and the T bit
handling (SImode hardreg, etc), I guess.  Moreover, some of the SH insns
might split out cstores and inverted cstores after combine, which then
would result in those sequences.  Combine itself sometimes can catch
some of the cases, but not all, or at least not with a finite amount of
combine patterns ;)

Cheers,
Oleg



Re: [PATCH] libgccjit.so: an embeddable JIT-compiler based on GCC

2013-10-07 Thread Xinliang David Li
On Wed, Oct 2, 2013 at 6:32 PM, David Malcolm  wrote:
> This is very much a proof-of-concept/work-in-progress at this stage, but
> attached is a patch to GCC which aims to provide an embeddable
> JIT-compilation API, using GCC as the backend: libgccjit.so.
>
> This shared library can then be dynamically-linked into bytecode
> interpreters and other such programs that want to generate machine code
> "on the fly" at run-time.
>
> The idea is that GCC is configured with a special --enable-host-shared
> option, which leads to it being built as position-independent code. You
> would configure it with host==target, given that the generated machine
> code will be executed within the same process (the whole point of JIT).
>
> libgccjit.so is built against libbackend.a.  To the rest of GCC, it
> looks like a "frontend" (in the "gcc/jit" subdir), but the parsing hook
> just runs a callback provided by client code.  You can see a diagram of
> how it all fits together within the patch (see gcc/jit/notes.txt).  The
> jit "frontend" requires --enable-host-shared, so it is off by default,
> so you need to configure with:
>   --enable-host-shared --enable-languages=jit
> to get the jit (and see caveats below).
>
> The "main" function is in the client code.  It uses a pure C API to call
> into libgccjit.so, registering a code creation hook:
>
>   gcc_jit_context *ctxt;
>   gcc_jit_result *result;
>
>   ctxt = gcc_jit_context_acquire ();
>
>   gcc_jit_context_set_code_factory (ctxt,
>  some_code_making_callback, user_data);
>
>   /* This actually calls into GCC and runs the build, all
>  in a mutex for now, getting make a result object.  */
>   result = gcc_jit_context_compile (ctxt);
> /* result is actually a wrapper around a DSO */
>
>   /* Now that we have result, we're done with ctxt: */
>   gcc_git_context_release (ctxt);
>
>   /* Look up a generated function by name, getting a void* back
>  from the result object (pointing to the machine code), and
>  cast it to the appropriate type for the function: */
>   some_fn_type some_fn  = (some_fn_type)gcc_jit_result_get_code (result,
> "some_fn");
>
>   /* We can now call the machine code: */
>   int val = some_fn (3, 4);
>
>   /* Presumably we'd call it more than once.
>  Once we're done with the code, this unloads the built DSO: */
>   gcc_jit_result_release (result);
>
> There are some major kludges in there, but it does work: it can
> successfully build code in-process 1000 times in a row [1], albeit with
> a slow memory leak, with all optimization turned off.   Upon turning on
> optimizations I run into crashes owing to not properly purging all state
> within the compiler - so this is a great motivation for doing more
> state-cleanup work.  I've also hacked timevars to run in "cumulative"
> mode, accumulating all timings across all iterations.
>
> The library API hides GCC's internals, and tries to be much more
> typesafe than GCC's, giving something rather like Andrew MacLeod's
> proposed changes - client code does not see "tree", instead dealing with
> types, rvalues, lvalues, jump labels, etc.  It is pure C, given the
> horror stories I have heard about people dealing with C++ ABIs.  FWIW I
> also have the beginnings of Python bindings for the library (doing the
> interface as pure C makes language-bindings easier), though that would
> probably live in a separate repository (so not part of this patch).
>
> The API deliberately uses C terminology, given that it's likely that the
> user will want to be plugging the JIT-generated code into a C/C++
> program (or library).
>
> I've been able to successfully use this API to add JIT-compilation to a
> toy bytecode interpreter:
>   https://github.com/davidmalcolm/jittest
> (where regvm.cc uses this API to compile a bytecode function into
> machine code).
>
> There's a DejaGnu-based test suite, which I can invoke via:
>   make check-parallel-jit RUNTESTFLAGS=""
> (potentially with some -v verbosity options in RUNTESTFLAGS), giving
># of expected passes 144
> and no failures on this box.
>
> Various caveats:
>   * Currently it only supports a small subset of C-like code.
>   * The API is still in flux: I'm not convinced by the label-placement
> approach; I suspect having an explicit "block" type may be easier for
> users to deal with.
>   * The patch is against r202664, which is a little out-of-date
> (2013-09-17), but I'm interested in feedback rather than perfection at
> this stage.
>   * I'm running into configure/Makefile issues with
> --enable-host-shared, where CFLAGS contains -fPIC, but only on
> invocations of leaf Makefiles, not on recursive "make" - so it works if
> you cd into $builddir/gcc and make (and so on for libcpp etc), but not
> from the top-level builddir.  Hence building the thing is currently
> unreliable (but again, I'm interested in feedback rather than
> perfection).  Help with configure/Makefiles would be appreciated!
>   * There are some grotesque kludges in in

Re: [PATCH] Do not append " *INTERNAL* " to the decl name

2013-10-07 Thread Dehao Chen
ping...

On Tue, Oct 1, 2013 at 1:28 PM, Dehao Chen  wrote:
> Hi,
>
> This patch disables the C++ frontend to add " *INTERNAL* " suffix to
> maybe_in_charge_destructor/constructor. This is needed because these
> functions could be emitted in the debug info, and we would want to
> demangle these names.
>
> Bootstrapped and passed all regression tests.
>
> OK for trunk?
>
> Thanks,
> Dehao
>
> gcc/ChangeLog:
>
> 2013-10-01  Dehao Chen  
>
> * cp/mangle.c (write_special_name_constructor): Remove the
> INTERNAL suffix.
>
> Index: gcc/cp/mangle.c
> ===
> --- gcc/cp/mangle.c (revision 202991)
> +++ gcc/cp/mangle.c (working copy)
> @@ -687,13 +687,6 @@ write_mangled_name (const tree decl, bool top_leve
>  mangled_name:;
>write_string ("_Z");
>write_encoding (decl);
> -  if (DECL_LANG_SPECIFIC (decl)
> -  && (DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (decl)
> -  || DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (decl)))
> - /* We need a distinct mangled name for these entities, but
> -   we should never actually output it.  So, we append some
> -   characters the assembler won't like.  */
> - write_string (" *INTERNAL* ");
>  }
>  }
>
> @@ -1654,8 +1647,7 @@ write_identifier (const char *identifier)
> Currently, allocating constructors are never used.
>
> We also need to provide mangled names for the maybe-in-charge
> -   constructor, so we treat it here too.  mangle_decl_string will
> -   append *INTERNAL* to that, to make sure we never emit it.  */
> +   constructor, so we treat it here too.  */
>
>  static void
>  write_special_name_constructor (const tree ctor)
> @@ -1682,8 +1674,7 @@ write_special_name_constructor (const tree ctor)
>  ::= D2 # base object (not-in-charge) destructor
>
> We also need to provide mangled names for the maybe-incharge
> -   destructor, so we treat it here too.  mangle_decl_string will
> -   append *INTERNAL* to that, to make sure we never emit it.  */
> +   destructor, so we treat it here too.  */
>
>  static void
>  write_special_name_destructor (const tree dtor)


[Patch, Fortran] PR58658 - add missing pointer-assign check for CLASS(*)

2013-10-07 Thread Tobias Burnus
The patch is rather obvious. The question is just why was the check 
there at the first place.


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias
2013-10-07  Tobias Burnus  

	PR fortran/58658
	* expr.c (gfc_check_vardef_context): Fix pointer diagnostic
	for CLASS(*).

2013-10-07  Tobias Burnus  

	PR fortran/58658
	* gfortran.dg/unlimited_polymorphic_10.f90: New.

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index b2af328..df96e5b 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4693,7 +4693,6 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, bool alloc_obj,
   bool is_pointer;
   bool check_intentin;
   bool ptr_component;
-  bool unlimited;
   symbol_attribute attr;
   gfc_ref* ref;
   int i;
@@ -4709,8 +4708,6 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, bool alloc_obj,
   sym = e->value.function.esym ? e->value.function.esym : e->symtree->n.sym;
 }
 
-  unlimited = e->ts.type == BT_CLASS && UNLIMITED_POLY (sym);
-
   attr = gfc_expr_attr (e);
   if (!pointer && e->expr_type == EXPR_FUNCTION && attr.pointer)
 {
@@ -4750,7 +4747,7 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, bool alloc_obj,
   /* Find out whether the expr is a pointer; this also means following
  component references to the last one.  */
   is_pointer = (attr.pointer || attr.proc_pointer);
-  if (pointer && !is_pointer && !unlimited)
+  if (pointer && !is_pointer)
 {
   if (context)
 	gfc_error ("Non-POINTER in pointer association context (%s)"
diff --git a/gcc/testsuite/gfortran.dg/unlimited_polymorphic_10.f90 b/gcc/testsuite/gfortran.dg/unlimited_polymorphic_10.f90
new file mode 100644
index 000..c4c3879
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/unlimited_polymorphic_10.f90
@@ -0,0 +1,10 @@
+! { dg-do compile }
+!
+! PR fortran/58658
+!
+! Contributed by Vladimír Fuka
+!
+subroutine sub(a)
+  class(*),allocatable :: a
+  a => null() ! { dg-error "Non-POINTER in pointer association context \\(pointer assignment\\)" }
+end subroutine


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 1:28 PM, Jonathan Wakely  wrote:
> std::memset() on about 125 bytes is not a big deal, I doubt it's
> significantly more expensive than the calculations to find the right
> bits and do the necessary masking for three elements.
> std::vector is a strange beast, remember.

Probably there could be bitwise operation?

I've test the 3 trues out of 1000 elements here :
https://gist.github.com/innocentim/6874889
In my machine, choosing "v.assign(v.size(), false);" cost 1.768s and
choosing the other cost 1.088s. When we push 7 elements, they cost
almost same time. When push more than that, v.assign() shall win.

I do this because I need to use _M_clear() in this patch :)

Anyway let's use v.assign().

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [PATCH] libgccjit.so: an embeddable JIT-compiler based on GCC

2013-10-07 Thread David Malcolm
On Fri, 2013-10-04 at 15:00 +, Joseph S. Myers wrote:
> On Thu, 3 Oct 2013, David Malcolm wrote:
> 
> > Right now all you get back from the result is a "void*" which you're
> > meant to cast to machine code for the CPU.   I guess we could add an
> 
> And right now the library is calling dlopen.  Which means that although 
> what the user gets is a void *, the dynamic linker processing has dealt 
> with registering eh_frame information and the right hooks have been passed 
> through for GDB to see that an object has been loaded and access its debug 
> information.  Is this use of dlopen intended to be part of the interface, 
> or just a temporary hack with users eventually needing to use other 
> interfaces for eh_frame and debug info handling?  (GDB has some support 
> for JITs, but I haven't looked into the details of how it works.)

The use of dlopen does indeed lead to a nice user-experience within gdb;
my plan (vague at this stage) is that by enabling the flag
   GCC_JIT_BOOL_OPTION_DEBUGINFO
on the gcc_jit_context that the client is asking the library to "do the
right thing" to integrate the generated code with a debugger.  Right now
that's implemented using the dlopen hack (and by setting -g), but
potentially we could implement it in other ways.

There are other behaviors due to using dlopen: if you JIT-compile a
function, that function can then be accessed by name from other code
within the process using dlsym.  That could be useful, perhaps, or could
be just weird - perhaps the kludge should be using RTLD_LOCAL so that
symbols are only accessible through the gcc_jit_result object, and don't
pollute the dlsym namespace within the process (though using the term
"pollute" there is perhaps a value-judgment) - not sure.

> > option on the gcc_jit_context for setting which ISA you want code for.
> 
> Even apart from completely separate ISAs, there's also the matter of other 
> command-line options such as -march=, which I'd think users should be able 
> to specify.  And the complication that default options to cc1 can be 
> determined by specs, whether in .h files or from configure options such as 
> --with=arch=, so cc1's defaults may not be the same as the defaults when 
> you run the driver (which are likely to be better defaults for the JIT 
> than cc1's).  And if the user wants to use -march=native (which seems 
> sensible for the common use case of a JIT, generating code for the same 
> system as runs the compiler), that's handled through specs.
> 
> Maybe the driver should be converted to library code so it's possible to 
> run command-line options through it and generate the command line that 
> would get passed to cc1 (but then use that to call a function in the same 
> process - with a different copy of global options data, diagnostic context 
> etc. to avoid any issues from those being initialized twice - rather than 
> running a subprocess).  That would also allow generating the right options 
> for the assembler to pass to the library version of the assembler.
I like these ideas - if nothing else, having the driver as a library
would enable me to get rid of one of the fork&exec pairs in my kludge.

> > > >   * There are some grotesque kludges in internal-api.c, especially in
> > > > how we go from .s assembler files to a DSO (grep for "gross hack" ;) )
> > > 
> > > Apart from making the assembler into a shared library itself, it would 
> > > also be nice to avoid needing those files at all (via an API for writing 
> > > assembler text that doesn't depend on a FILE * pointer, although on GNU 
> > > hosts you can always use in-memory files via open_memstream, which would 
> > > make the internal changes much smaller).  But in the absence of such a 
> > > cleanup, running the assembler via the driver should work, although 
> > > inefficient.
> > 
> > (nods)   Note that all of the kludgery (if that's a word) is hidden
> > inside the library, so we can fix it all up without affecting client
> > code: the client-facing API doesn't expose any of this.
> > 
> > FWIW I added timevars for measuring the invocation of the driver; that
> > kludge makes up about 50% of the overall time taken.  I haven't yet
> > broken that down into assembler vs linker vs fork/exec overhead, but
> > clearly this is something that could use optimization - leading to
> > (very) vague thoughts involving turning parts of binutils into libraries
> > also.
> 
> First I guess it might simply be a library that receives blocks of text 
> that currently go to asm_out_file - but eventually there might be 
> efficiency to be gained by passing binary data to the assembler in some 
> cases (e.g. for large blocks of debug info) rather than generating and 
> then parsing text.  (So some asm_out methods would gain implementations 
> passing such data instead of generating text.)

Agreed - though I think this is something of a version 2 thing, in that
this is an optimization, and a decidedly non-trivial one (converting
binutils into a 

Re: [PATCH][ubsan] Add VLA bound instrumentation

2013-10-07 Thread Marek Polacek
Ping.

On Wed, Sep 25, 2013 at 02:41:32PM +0200, Marek Polacek wrote:
> On Thu, Sep 12, 2013 at 02:26:55PM +0200, Marek Polacek wrote:
> > This patch adds the instrumentation of VLA bounds.  Basically, it just 
> > checks that
> > the size of a VLA is positive.  I.e., We also issue an error if the size of 
> > the
> > VLA is 0.  It catches e.g.
> > 
> > int i = 1;
> > int a[i][i - 2];
> > 
> > It is pretty straightforward, but I had
> > issues in the C++ FE, mainly choosing the right spot where to instrument...
> > Hopefully I picked up the right one.  Also note that in C++1y we throw
> > an exception when the size of a VLA is negative; hence no need to perform
> > the instrumentation if -std=c++1y is in effect.
> > 
> > Regtested/ran bootstrap-ubsan on x86_64-linux, also
> > make check -C gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} ubsan.exp'
> > passes.
> > 
> > Ok for trunk?
> 
> I'd like to ping this patch; below is rebased version with the ubsan.c
> hunk omitted, since that part was already fixed by another patch.
> 
> (It still doesn't contain alloca/SIZE_MAX/... checking, since that
> very much relies on libubsan.  Still, it'd be felicitous to get at
> least the basic VLA checking in.)
> 
> Ran ubsan testsuite + bootstrap-ubsan on x86_64-linux.
> 
> 2013-09-25  Marek Polacek  
> 
>   * opts.c (common_handle_option): Handle vla-bound.
>   * sanitizer.def (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE):
>   Define.
>   * flag-types.h (enum sanitize_code): Add SANITIZE_VLA.
>   * asan.c (initialize_sanitizer_builtins): Build BT_FN_VOID_PTR_PTR.
> c-family/
>   * c-ubsan.c: Don't include hash-table.h.
>   (ubsan_instrument_vla): New function.
>   * c-ubsan.h: Declare it.
> cp/
>   * decl.c (create_array_type_for_decl): Add VLA instrumentation.
> c/
>   * c-decl.c (grokdeclarator): Add VLA instrumentation.
> testsuite/
>   * g++.dg/ubsan/cxx1y-vla.C: New test.
>   * c-c++-common/ubsan/vla-3.c: New test.
>   * c-c++-common/ubsan/vla-2.c: New test.
>   * c-c++-common/ubsan/vla-4.c: New test.
>   * c-c++-common/ubsan/vla-1.c: New test.
> 
> --- gcc/opts.c.mp 2013-09-25 14:06:58.531276511 +0200
> +++ gcc/opts.c2013-09-25 14:07:03.580294566 +0200
> @@ -1428,6 +1428,7 @@ common_handle_option (struct gcc_options
> { "undefined", SANITIZE_UNDEFINED, sizeof "undefined" - 1 },
> { "unreachable", SANITIZE_UNREACHABLE,
>   sizeof "unreachable" - 1 },
> +   { "vla-bound", SANITIZE_VLA, sizeof "vla-bound" - 1 },
> { NULL, 0, 0 }
>   };
>   const char *comma;
> --- gcc/c-family/c-ubsan.c.mp 2013-09-25 14:06:58.535276527 +0200
> +++ gcc/c-family/c-ubsan.c2013-09-25 14:07:03.580294566 +0200
> @@ -25,7 +25,6 @@ along with GCC; see the file COPYING3.
>  #include "alloc-pool.h"
>  #include "cgraph.h"
>  #include "gimple.h"
> -#include "hash-table.h"
>  #include "output.h"
>  #include "toplev.h"
>  #include "ubsan.h"
> @@ -86,8 +85,7 @@ ubsan_instrument_division (location_t lo
>return t;
>  }
>  
> -/* Instrument left and right shifts.  If not instrumenting, return
> -   NULL_TREE.  */
> +/* Instrument left and right shifts.  */
>  
>  tree
>  ubsan_instrument_shift (location_t loc, enum tree_code code,
> @@ -157,4 +155,23 @@ ubsan_instrument_shift (location_t loc,
>t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
>  
>return t;
> +}
> +
> +/* Instrument variable length array bound.  */
> +
> +tree
> +ubsan_instrument_vla (location_t loc, tree size)
> +{
> +  tree type = TREE_TYPE (size);
> +  tree t, tt;
> +
> +  t = fold_build2 (LE_EXPR, boolean_type_node, size, build_int_cst (type, 
> 0));
> +  tree data = ubsan_create_data ("__ubsan_vla_data",
> +  loc, ubsan_type_descriptor (type), NULL_TREE);
> +  data = build_fold_addr_expr_loc (loc, data);
> +  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_VLA_BOUND_NOT_POSITIVE);
> +  tt = build_call_expr_loc (loc, tt, 2, data, ubsan_encode_value (size));
> +  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
> +
> +  return t;
>  }
> --- gcc/c-family/c-ubsan.h.mp 2013-09-25 14:06:58.538276539 +0200
> +++ gcc/c-family/c-ubsan.h2013-09-25 14:07:03.595294628 +0200
> @@ -23,5 +23,6 @@ along with GCC; see the file COPYING3.
>  
>  extern tree ubsan_instrument_division (location_t, tree, tree);
>  extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
> +extern tree ubsan_instrument_vla (location_t, tree);
>  
>  #endif  /* GCC_C_UBSAN_H  */
> --- gcc/sanitizer.def.mp  2013-09-25 14:06:58.542276558 +0200
> +++ gcc/sanitizer.def 2013-09-25 14:07:03.628294753 +0200
> @@ -297,3 +297,7 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HAN
> "__ubsan_handle_builtin_unreachable",
> BT_FN_VOID_PTR,
> ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_V

Re: [PATCH] libgccjit.so: an embeddable JIT-compiler based on GCC

2013-10-07 Thread David Malcolm
On Wed, 2013-10-02 at 21:32 -0400, David Malcolm wrote:
[...]
>   * I'm running into configure/Makefile issues with
> --enable-host-shared, where CFLAGS contains -fPIC, but only on
> invocations of leaf Makefiles, not on recursive "make" - so it works if
> you cd into $builddir/gcc and make (and so on for libcpp etc), but not
> from the top-level builddir.  Hence building the thing is currently
> unreliable (but again, I'm interested in feedback rather than
> perfection).  Help with configure/Makefiles would be appreciated!

I believe I've fixed this issue as of this commit on my "dmalcolm/jit"
branch:
http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=11f23e6248339c3bad498a7618cfde96338a1d3c
so it should now be possible to build the branch by configuring with:
   --enable-host-shared --enable-languages=jit
and then simply doing a "make" in the top-level build dir (with
appropriate -jN options, of course).




Re: [c++-concepts] friends regression

2013-10-07 Thread Jason Merrill

OK.

If we have a friend declaration inside a constrained partial 
specialization, will that still get a false positive?


Jason


Re: Attribute returns_null

2013-10-07 Thread Jeff Law

On 10/07/13 12:37, David Malcolm wrote:

On Mon, 2013-10-07 at 19:51 +0200, Marc Glisse wrote:

On Mon, 7 Oct 2013, David Malcolm wrote:


On Mon, 2013-10-07 at 16:17 +0200, Marc Glisse wrote:

Hello,

this patch adds an attribute to let the compiler know that a function
never returns NULL. I saw some ECF_* flags, but the attribute seems
sufficient. I considered using nonnull(0), but then it would have been
confusing that the version of nonnull without arguments applies only to
parameters and not the return value.


I can't comment on the patch itself, but could there also be an
attribute "returns_null", for functions that *always* return NULL?
This may sound weird, but I know of at least one API that exposes such
functions: CPython's exception-handling API: see e.g.
http://docs.python.org/2/c-api/exceptions.html#PyErr_NoMemory
and various other functions that have "Return value: Always NULL."
This allows the user to write one-liners like:

   return PyErr_NoMemory();


I didn't think about it very long, so I probably missed the best reasons,
but it doesn't sound like such a good idea. If PyErr_NoMemory always
returns NULL, why not make that clear in the code? It could be an inline
function, or even a macro that expands to
(PyErr_SetNone(PyExc_MemoryError),NULL).

To me, attributes are there for when the language is insufficient, kind of
a last resort. Could you explain why you think it would be the best option
here?

No, I can't :)   (that API predates my involvement in that project).
This seems to have dubious value to me -- ie, I don't think there's many 
functions that always return NULL in practice and of those, I doubt 
there's a lot of opportunity to optimize with that knowledge.


In contrast, knowing a function can not return null has a ton of value, 
both for optimization purposes and for improving warnings.


jeff



Re: [GOOGLE] Iterative AutoFDO

2013-10-07 Thread Dehao Chen
Hi, David,

Thanks for the review. Patch updated.

Dehao

On Sun, Oct 6, 2013 at 11:54 AM, Xinliang David Li  wrote:
> Adding additional early inline + value transform (calling them as
> utility functions) is 'unconventional' way of invoking passes. It
> would be helpful to do more heavy documentation by providing more
> examples and explaining why simply augmenting the indirect target info
> for promoted icall site with inlined call stack profile data and rely
> on existing value profile transform to kick in is not enough.
>
> The patch also misses documentation for many functions at the
> definition site. There are also a couple of coding style problems such
> as function name does not start a new line (but following return
> type).
>
>
>
>> /* Return the combined relative location for STMT.  */
>
>> static unsigned
>> get_relative_location_for_stmt (gimple stmt)
>> {
>
> More explanation on 'relative' -- it is not mentioned in other places either.
>
>
>> /* Program behavior changed, original promoted (and inlined) target is not
>>  hot any more. Will avoid promote the original target.  */
>>  if (total >= info->first * 0.5)
>>return false;
>
> This needs more explanation.
>
>> /* For a given BB, return its execution count, and annotate value profile
>>   if a stmt is not in PROMOTED. Add the location of annotated stmt to
>>   ANNOTATED.  */
>
> More explanation on why skipping promoted ones.
>
>  >  || promoted_stmts->find (stmt) != promoted_stmts->end ())
>  >   continue;
>  > promoted_stmts->insert (stmt);
>
> It is not quite clear how 'promoted' stmts are identified here.
>
>> /* First do indirect call promotion and early inline to make the
>> IR match the profiled binary before actual annotation.  */
>>  autofdo::stmt_set promoted_stmts;
>>  for (int i = 0; i < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS); 
>> i++)
>>{
>>  early_inliner ();
>>  if (!flag_value_profile_transformations
>>  || !autofdo::afdo_vpt_for_early_inline (&promoted_stmts))
>>break;
>>}'
>
> This needs heavy documentation. ALso why putting early inline before
> value transform?
>
> David
>
>
>
>
> On Mon, Sep 23, 2013 at 10:51 AM, Dehao Chen  wrote:
>> This patch fixes the issue of indirect call promotion while using
>> AutoFDO optimized binary to collect profile, and use the new profile
>> to re-optimize the binary. Before the patch, the indirect call is
>> promoted (and likely inlined) in the profiled binary, and will not be
>> promoted in the new iteration. With the patch, a new iterative
>> vpt+einline pass is added before annotation to make sure hot promoted
>> indirect calls are still promoted before annotation.
>>
>> Bootstrapped and passed regression test and benchmark test.
>>
>> OK for google-4_8 branch?
>>
>> Thanks,
>> Dehao
>>
>> http://codereview.appspot.com/13492045
Index: value-prof.c
===
--- value-prof.c(revision 203250)
+++ value-prof.c(working copy)
@@ -1327,7 +1327,7 @@ find_func_by_global_id (unsigned HOST_WIDE_INT gid
may ICE. Here we only do very minimal sanity check just to make compiler 
happy.
Returns true if TARGET is considered ok for call CALL_STMT.  */
 
-static bool
+bool
 check_ic_target (gimple call_stmt, struct cgraph_node *target)
 {
location_t locus;
Index: passes.c
===
--- passes.c(revision 203250)
+++ passes.c(working copy)
@@ -1366,8 +1366,8 @@ init_optimization_passes (void)
   NEXT_PASS (pass_rebuild_cgraph_edges);
   NEXT_PASS (pass_inline_parameters);
 }
+  NEXT_PASS (pass_ipa_auto_profile);
   NEXT_PASS (pass_ipa_free_inline_summary);
-  NEXT_PASS (pass_ipa_auto_profile);
   NEXT_PASS (pass_ipa_tree_profile);
 {
   struct opt_pass **p = &pass_ipa_tree_profile.pass.sub;
Index: coverage.h
===
--- coverage.h  (revision 203250)
+++ coverage.h  (working copy)
@@ -58,6 +58,8 @@ extern gcov_type *get_coverage_counts_no_warn (str
 extern struct cgraph_node * find_func_by_global_id (unsigned HOST_WIDE_INT gid,
bool);
 
+extern bool check_ic_target (gimple call_stmt, struct cgraph_node *target);
+
 /* All the coverage counters are supposed to be allocated by the time
coverage_end_function is called. However, direct-call counters are
allocated after coverage_end_function has been called. This function
Index: ipa-inline.c
===
--- ipa-inline.c(revision 203250)
+++ ipa-inline.c(working copy)
@@ -2203,7 +2203,7 @@ early_inline_small_functions (struct cgraph_node *
 /* Do inlining of small functions.  Doing so early helps profiling and other
passes to be somewhat more effective and avoids some code duplication in
later real inlining pass for test

Re: Attribute returns_null (was: New attribute: returns_nonnull)

2013-10-07 Thread David Malcolm
On Mon, 2013-10-07 at 19:51 +0200, Marc Glisse wrote:
> On Mon, 7 Oct 2013, David Malcolm wrote:
> 
> > On Mon, 2013-10-07 at 16:17 +0200, Marc Glisse wrote:
> >> Hello,
> >>
> >> this patch adds an attribute to let the compiler know that a function
> >> never returns NULL. I saw some ECF_* flags, but the attribute seems
> >> sufficient. I considered using nonnull(0), but then it would have been
> >> confusing that the version of nonnull without arguments applies only to
> >> parameters and not the return value.
> >
> > I can't comment on the patch itself, but could there also be an
> > attribute "returns_null", for functions that *always* return NULL?
> > This may sound weird, but I know of at least one API that exposes such
> > functions: CPython's exception-handling API: see e.g.
> > http://docs.python.org/2/c-api/exceptions.html#PyErr_NoMemory
> > and various other functions that have "Return value: Always NULL."
> > This allows the user to write one-liners like:
> >
> >   return PyErr_NoMemory();
> 
> I didn't think about it very long, so I probably missed the best reasons, 
> but it doesn't sound like such a good idea. If PyErr_NoMemory always 
> returns NULL, why not make that clear in the code? It could be an inline 
> function, or even a macro that expands to 
> (PyErr_SetNone(PyExc_MemoryError),NULL).
> 
> To me, attributes are there for when the language is insufficient, kind of 
> a last resort. Could you explain why you think it would be the best option 
> here?
No, I can't :)   (that API predates my involvement in that project).



[c++-concepts] friends regression

2013-10-07 Thread Andrew Sutton
The last patch introduced a regression. This ensures that we don't
generate false positives diagnosing errors in non-template contexts.

2013-10-07  Andrew Sutton  
* gcc/cp/cp-tree.h (check_constrained_friend): Take requirements as
an argument.
* gcc/cp/constraints.cc (check_constrained_friend): Do not diagnose
errors in unconstrained friend declarations.
* gcc/cp/parser.cc (cp_parser_member_declaration): Pass current
requirements to check_constrained_friend.

Andrew Sutton


friends-4.patch
Description: Binary data


Attribute returns_null (was: New attribute: returns_nonnull)

2013-10-07 Thread Marc Glisse

On Mon, 7 Oct 2013, David Malcolm wrote:


On Mon, 2013-10-07 at 16:17 +0200, Marc Glisse wrote:

Hello,

this patch adds an attribute to let the compiler know that a function
never returns NULL. I saw some ECF_* flags, but the attribute seems
sufficient. I considered using nonnull(0), but then it would have been
confusing that the version of nonnull without arguments applies only to
parameters and not the return value.


I can't comment on the patch itself, but could there also be an
attribute "returns_null", for functions that *always* return NULL?
This may sound weird, but I know of at least one API that exposes such
functions: CPython's exception-handling API: see e.g.
http://docs.python.org/2/c-api/exceptions.html#PyErr_NoMemory
and various other functions that have "Return value: Always NULL."
This allows the user to write one-liners like:

  return PyErr_NoMemory();


I didn't think about it very long, so I probably missed the best reasons, 
but it doesn't sound like such a good idea. If PyErr_NoMemory always 
returns NULL, why not make that clear in the code? It could be an inline 
function, or even a macro that expands to 
(PyErr_SetNone(PyExc_MemoryError),NULL).


To me, attributes are there for when the language is insufficient, kind of 
a last resort. Could you explain why you think it would be the best option 
here?


--
Marc Glisse


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 1:40 PM, Tim Shen  wrote:
> while (!this->empty())
> this->pop();

Sorry it's this->_M_empty() and this->_M_pop();


-- 
Tim Shen


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 1:28 PM, Jonathan Wakely  wrote:
> std::memset() on about 125 bytes is not a big deal, I doubt it's
> significantly more expensive than the calculations to find the right
> bits and do the necessary masking for three elements.
> std::vector is a strange beast, remember.

OK so I may change to use something else(bitset?) next time I encounter this.

Is copying 125 bytes more expensive comparing to 3 assignment and
substrction? Because it *is* cheap to find all elements that's true,
they are just in _M_base(in the latest patch). So:

while (!this->empty())
this->pop();

...will be executed only 3 times. I think it's cheaper than copying a
long-enough and sparse-enough byte chunk? Or at least we should theck
the density(_M_base.size() / _M_exists.size()) of _M_exists ?


-- 
Tim Shen


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Jonathan Wakely
On 7 October 2013 18:12, Tim Shen  wrote:
> On Mon, Oct 7, 2013 at 12:22 PM, Jonathan Wakely  
> wrote:
>> because that turns into the equivalent of a std::memset() on integers.
>
> Here I catch your idea. But think about the following example:
> _M_exists.size() == 1000, but only 3 of the elements are true. Now
> what I intend to do is assigning these 3 elements to false, rather
> than reseting the whole vector. Is this called "pay for what you get"?

std::memset() on about 125 bytes is not a big deal, I doubt it's
significantly more expensive than the calculations to find the right
bits and do the necessary masking for three elements.
std::vector is a strange beast, remember.


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 7:12 AM, Jonathan Wakely  wrote:
> On 6 October 2013 23:43, Tim Shen wrote:
>> Here's a simple piece of code
>> https://gist.github.com/innocentim/6849759 the reveals _BFSExecutor's
>> inefficiency. Some optimizations are here to reduce the unecessary
>> time complexity from O(n^2) to O(n).
>>
>> I'll do a bootstrap and full testing before committing.
>
> In the ChangeLog, please fix the spelling of "unnecessary".
>
> The constructor of _TodoList could be "explicit"
>
> Does _TodoList really need to derive from std::vector<_StateIdT> or
> could it just have a second member? I think there's no benefit to
> using inheritance here.
>
> _TodoList::__exists should be named _M_exists to follow the library
> coding standards.
>
> Shouldn't _TodoList::_M_empty() also set __exists[n] = false for all n?
>
> I think it's guaranteed that __exists[__u] is always in range, right?

Anyway, here's the patch. I intended to write something like a
Quine[1] as the testcase but gave up ;)

Thank you!

[1] http://en.wikipedia.org/wiki/Quine_(computing)

-- 
Tim Shen


a.patch
Description: Binary data


Re: New attribute: returns_nonnull

2013-10-07 Thread David Malcolm
On Mon, 2013-10-07 at 16:17 +0200, Marc Glisse wrote:
> Hello,
> 
> this patch adds an attribute to let the compiler know that a function 
> never returns NULL. I saw some ECF_* flags, but the attribute seems 
> sufficient. I considered using nonnull(0), but then it would have been 
> confusing that the version of nonnull without arguments applies only to 
> parameters and not the return value.

I can't comment on the patch itself, but could there also be an
attribute "returns_null", for functions that *always* return NULL?
This may sound weird, but I know of at least one API that exposes such
functions: CPython's exception-handling API: see e.g.
http://docs.python.org/2/c-api/exceptions.html#PyErr_NoMemory
and various other functions that have "Return value: Always NULL."
This allows the user to write one-liners like:

   return PyErr_NoMemory(); 

Hope this is helpful
Dave



Re: [PATCH] Fixing improper conversion from sin() to sinf() in optimization mode.

2013-10-07 Thread Cong Hou
You are right. I am not an expert on numerical analysis, but I tested
your case and it proves the number 4 conversion is not safe.

Now we have four conversions which are safe once the precision
requirement is satisfied. I added a condition if (type != newtype) to
remove the unsafe one, as in this case once more conversion is added
which leads to the unsafe issue. If you think this condition does not
make sense please let me know.

The new patch is shown below (the attached file has tabs).

Thank you very much!



thanks,
Cong



Index: gcc/convert.c
===
--- gcc/convert.c (revision 203250)
+++ gcc/convert.c (working copy)
@@ -135,16 +135,19 @@ convert_to_real (tree type, tree expr)
   CASE_MATHFN (COS)
   CASE_MATHFN (ERF)
   CASE_MATHFN (ERFC)
-  CASE_MATHFN (FABS)
   CASE_MATHFN (LOG)
   CASE_MATHFN (LOG10)
   CASE_MATHFN (LOG2)
   CASE_MATHFN (LOG1P)
-  CASE_MATHFN (LOGB)
   CASE_MATHFN (SIN)
-  CASE_MATHFN (SQRT)
   CASE_MATHFN (TAN)
   CASE_MATHFN (TANH)
+/* The above functions are not safe to do this conversion.  */
+if (!flag_unsafe_math_optimizations)
+  break;
+  CASE_MATHFN (SQRT)
+  CASE_MATHFN (FABS)
+  CASE_MATHFN (LOGB)
 #undef CASE_MATHFN
 {
   tree arg0 = strip_float_extensions (CALL_EXPR_ARG (expr, 0));
@@ -155,13 +158,43 @@ convert_to_real (tree type, tree expr)
   if (TYPE_PRECISION (TREE_TYPE (arg0)) > TYPE_PRECISION (type))
  newtype = TREE_TYPE (arg0);

+  /* We consider to convert
+
+ (T1) sqrtT2 ((T2) exprT3)
+ to
+ (T1) sqrtT4 ((T4) exprT3)
+
+  , where T1 is TYPE, T2 is ITYPE, T3 is TREE_TYPE (ARG0),
+ and T4 is NEWTYPE. All those types are of floating point types.
+ T4 (NEWTYPE) should be narrower than T2 (ITYPE). This conversion
+ is safe only if P1 >= P2*2+2, where P1 and P2 are precisions of
+ T2 and T4. See the following URL for a reference:
+ 
http://stackoverflow.com/questions/9235456/determining-floating-point-square-root
+ */
+  if ((fcode == BUILT_IN_SQRT || fcode == BUILT_IN_SQRTL)
+  && !flag_unsafe_math_optimizations)
+ {
+  /* The following conversion is unsafe even the precision condition
+ below is satisfied:
+
+ (float) sqrtl ((long double) double_val) -> (float) sqrt (double_val)
+*/
+  if (type != newtype)
+break;
+
+  int p1 = REAL_MODE_FORMAT (TYPE_MODE (itype))->p;
+  int p2 = REAL_MODE_FORMAT (TYPE_MODE (newtype))->p;
+  if (p1 < p2 * 2 + 2)
+break;
+ }
+
   /* Be careful about integer to fp conversions.
  These may overflow still.  */
   if (FLOAT_TYPE_P (TREE_TYPE (arg0))
   && TYPE_PRECISION (newtype) < TYPE_PRECISION (itype)
   && (TYPE_MODE (newtype) == TYPE_MODE (double_type_node)
   || TYPE_MODE (newtype) == TYPE_MODE (float_type_node)))
-{
+ {
   tree fn = mathfn_built_in (newtype, fcode);

   if (fn)
Index: gcc/ChangeLog
===
--- gcc/ChangeLog (revision 203250)
+++ gcc/ChangeLog (working copy)
@@ -1,3 +1,9 @@
+2013-10-07  Cong Hou  
+
+ * convert.c (convert_to_real): Forbid unsafe math function
+ conversions including sin/cos/log etc. Add precision check
+ for sqrt.
+
 2013-10-07  Bill Schmidt  

  * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 203250)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,7 @@
+2013-10-07  Cong Hou  
+
+ * gcc.c-torture/execute/20030125-1.c: Update.
+
 2013-10-07  Bill Schmidt  

  * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian.
Index: gcc/testsuite/gcc.c-torture/execute/20030125-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20030125-1.c (revision 203250)
+++ gcc/testsuite/gcc.c-torture/execute/20030125-1.c (working copy)
@@ -44,11 +44,11 @@ __attribute__ ((noinline))
 double
 sin(double a)
 {
- abort ();
+ return a;
 }
 __attribute__ ((noinline))
 float
 sinf(float a)
 {
- return a;
+ abort ();
 }




On Thu, Oct 3, 2013 at 5:06 PM, Joseph S. Myers  wrote:
> On Fri, 6 Sep 2013, Cong Hou wrote:
>
>> 4: (float) sqrtl ((long double) double_val)  ->  (float) sqrt (double_val)
>
> I don't believe this case is in fact safe even if precision (long double)
>>= precision (double) * 2 + 2 (when your patch would allow it).
>
> The result that precision (double) * 2 + 2 is sufficient for the result of
> rounding the long double value to double to be the same as the result of
> rounding once from infinite precision to double would I think also mean
> the same when rounding of the infinite-precision result to float happens
> once - that is, if instead of (float) sqrt (double_val) you have fsqrt
> (double_val) (fsqrt being the proposed function in draft TS 18661-1 for
> computing a square root of a double value, rounded exactly once to float).
> But here the question isn't whether rounding to lo

Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 12:22 PM, Jonathan Wakely  wrote:
> because that turns into the equivalent of a std::memset() on integers.

Here I catch your idea. But think about the following example:
_M_exists.size() == 1000, but only 3 of the elements are true. Now
what I intend to do is assigning these 3 elements to false, rather
than reseting the whole vector. Is this called "pay for what you get"?

The ideal solution could be:

void _M_clear() {
if (__exists.size() < _BaseT::size() * CPU_WORD_LENGTH * some_factor)
clear_one_by_one();
else
reset_them_all();
}


-- 
Tim Shen


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
Sorry I forgot to reply the mailing list. Here's the discussion:

--- Tim


On Oct 7, 2013 7:12 AM, "Jonathan Wakely"  wrote:
> Does _TodoList really need to derive from std::vector<_StateIdT> or
> could it just have a second member? I think there's no benefit to
> using inheritance here.

I use private inheritance here to convey "it's just a wrapper class of
vector". I don't see great difference between it and using a second
member?

> _TodoList::__exists should be named _M_exists to follow the library
> coding standards.

Sorry for that >.<

> Shouldn't _TodoList::_M_empty() also set __exists[n] = false for all n?

We can add an assertion here, but _M_empty shouldn't set anything.
This function has no side effect.

> I think it's guaranteed that __exists[__u] is always in range, right?

Of course. Otherwise vector's range checker will help, right?

--- Jonathan


On 7 October 2013 14:31, Tim Shen wrote:
> On Oct 7, 2013 7:12 AM, "Jonathan Wakely"  wrote:
>> Does _TodoList really need to derive from std::vector<_StateIdT> or
>> could it just have a second member? I think there's no benefit to
>> using inheritance here.
>
> I use private inheritance here to convey "it's just a wrapper class of
> vector". I don't see great difference between it and using a second member?

Aggregation is simpler,

If I see subtyping I wonder if some users of the type need to rely on
conversion to the base class, i.e. whether it is important that
_TodoList IS-A vector, but as it's private and has no friends, that
can't be true.

In general, prefer composition to inheritance:
http://www.artima.com/cppsource/codestandards3.html

>> Shouldn't _TodoList::_M_empty() also set __exists[n] = false for all n?
>
> We can add an assertion here, but _M_empty shouldn't set anything. This
> function has no side effect.

Sorry, I meant _M_clear() not _M_empty().

Currently you loop over the vector popping one element at a time until
it's empty, why not just use _BaseT::clear()?  Does anything even use
that member?  If it is needed, why is it correct to not reset the
flags in __exists to false?

>> I think it's guaranteed that __exists[__u] is always in range, right?
>
> Of course. Otherwise vector's range checker will help, right?

Only in Debug Mode, there is no range checking in operator[] in the
normal case.  But as long as the vector has the right size (from
_M_nfa.size()) and can only be accessed with valid indices it's fine.

--- Tim


On Mon, Oct 7, 2013 at 10:56 AM, Jonathan Wakely  wrote:
> Aggregation is simpler,

OK I'll change it to use a private member, since indeed I don't need
to use protected part of vector.

> Currently you loop over the vector popping one element at a time until
> it's empty, why not just use _BaseT::clear()?  Does anything even use
> that member?  If it is needed, why is it correct to not reset the
> flags in __exists to false?

I use a loop to clear rather than a _BaseT::clear() then reset the
__exists(_M_exists), simply because it's too
expensive(O(__exists.size())). With my implementation, I can make it
O(this->size()), which is usually far smaller than size of __exists. A
better way to do that is to traverse *this and only reset all trues,
then _BaseT::clear().

Ah we can remove this member function, because it's used only in early
versions of this patch.

--- Jonathan


On 7 October 2013 16:34, Tim Shen wrote:
> On Mon, Oct 7, 2013 at 10:56 AM, Jonathan Wakely  
> wrote:
>> Aggregation is simpler,
>
> OK I'll change it to use a private member, since indeed I don't need
> to use protected part of vector.
>
>> Currently you loop over the vector popping one element at a time until
>> it's empty, why not just use _BaseT::clear()?  Does anything even use
>> that member?  If it is needed, why is it correct to not reset the
>> flags in __exists to false?
>
> I use a loop to clear rather than a _BaseT::clear() then reset the
> __exists(_M_exists),

Where do you reset __exists? That was my point, I don't see that being
reset, but it should have been.

> simply because it's too
> expensive(O(__exists.size())). With my implementation, I can make it
> O(this->size()), which is usually far smaller than size of __exists. A
> better way to do that is to traverse *this and only reset all trues,
> then _BaseT::clear().

Traversing a vector is not efficient because of the proxy
iterators it uses. I expect it would be much quicker to have done:

  __exists.assign(__exists.size(
), false);

because that turns into the equivalent of a std::memset() on integers.

> Ah we can remove this member function, because it's used only in early
> versions of this patch.

OK, great. So the changes to _M_clear() don't matter then :-)


Re: libgo patch committed: Implement reflect.MakeFunc for 386

2013-10-07 Thread Ian Lance Taylor
On Mon, Oct 7, 2013 at 5:24 AM, Rainer Orth  
wrote:
>
> Unfortunately, Solaris 9 testing with Sun as revealed two problems: that
> assembler cannot handle either # comments or .global instead of .globl.
> The following patch fixes this and allows makefunc_386.S to assemble.
> Also compiled with a gas build to make nothing broke there.

Thanks.  Committed to mainline and 4.8 branch.

Ian


Re: OMP4/cilkplus: simd clone function mangling

2013-10-07 Thread Aldy Hernandez

On 09/27/13 09:23, Jakub Jelinek wrote:

On Thu, Sep 26, 2013 at 02:31:33PM -0500, Aldy Hernandez wrote:



+/* Create a simd clone of OLD_NODE and return it.  */
+
+static struct cgraph_node *
+simd_clone_create (struct cgraph_node *old_node)
+{
+  struct cgraph_node *new_node;
+  new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false,
+NULL, NULL, "simdclone");
+


My understanding of how IPA cloning etc. works is that you first
set up various data structures describing how you change the arguments
and only then actually do cgraph_function_versioning which already during
the copying will do some of the transformations of the IL.
But perhaps those transformations are too complicated to describe for
tree-inline.c to make them for you.


For the record, the way this works (at least for tree-sra) is that you 
do the cgraph cloning first, and then call 
ipa_modify_formal_parameters() with an adjustments vector.  This 
function, in ipa-prop.c, seems to be only used for SRA.


ipa_modify_formal_parameters() doesn't handle adding new arguments out 
of the blue, only changing parameter types, removing them altogether, or 
leaving them as is.  My plan is to enhance this to handle adding new 
parameters (for the masked argument for simd-enabled functions / 
elementals).


But anyways... ipa_modify_formal_parameters() seems like the more 
elegant solution for modifying argument types after cloning.


Aldy


[C++ PATCH] PR58635

2013-10-07 Thread Marek Polacek
This patch fixes two ICEs on valid testcases and one ICE on an invalid
testcase.  In the former, we should check for EXPR_P (as in the fix for
PR58516) before using SET_EXPR_LOCATION; in the latter we should bail
out early if the return-expression is the error_mark_node, otherwise
subsequent call of fold_build_cleanup_point_expr will want to access
TREE_OPERANDs of that return-expression.

Regtested/bootstrapped on x86_64-linux, ok for trunk?  (I don't think
it merits backporting to 4.7/4.8, but let me know if you think
otherwise.)

2013-10-07  Marek Polacek  

PR c++/58635
cp/
* semantics.c (finish_return_stmt): Return error_mark_node
when error_operand_p of the expr is true.
(build_transaction_expr): Check for EXPR_P before setting the
expr location.
testsuite/
* g++.dg/tm/pr58635-1.C: New test.
* g++.dg/tm/pr58635-2.C: New test.

--- gcc/cp/semantics.c.mp   2013-10-07 15:49:10.801602065 +0200
+++ gcc/cp/semantics.c  2013-10-07 15:49:20.430638270 +0200
@@ -787,7 +787,8 @@ finish_return_stmt (tree expr)
 
   expr = check_return_expr (expr, &no_warning);
 
-  if (flag_openmp && !check_omp_return ())
+  if (error_operand_p (expr)
+  || (flag_openmp && !check_omp_return ()))
 return error_mark_node;
   if (!processing_template_decl)
 {
@@ -5221,7 +5222,8 @@ build_transaction_expr (location_t loc,
   if (noex)
 {
   expr = build_must_not_throw_expr (expr, noex);
-  SET_EXPR_LOCATION (expr, loc);
+  if (EXPR_P (expr))
+   SET_EXPR_LOCATION (expr, loc);
   TREE_SIDE_EFFECTS (expr) = 1;
 }
   ret = build1 (TRANSACTION_EXPR, TREE_TYPE (expr), expr);
--- gcc/testsuite/g++.dg/tm/pr58635-1.C.mp  2013-10-07 16:01:54.230240270 
+0200
+++ gcc/testsuite/g++.dg/tm/pr58635-1.C 2013-10-07 15:58:15.431447142 +0200
@@ -0,0 +1,14 @@
+// { dg-do compile }
+// { dg-options "-std=c++11 -fgnu-tm" }
+
+int
+foo (void)
+{
+  return __transaction_atomic noexcept(false) (false);
+}
+
+int
+bar (int i)
+{
+  return __transaction_atomic noexcept(false) (i);
+}
--- gcc/testsuite/g++.dg/tm/pr58635-2.C.mp  2013-10-07 16:02:00.398262163 
+0200
+++ gcc/testsuite/g++.dg/tm/pr58635-2.C 2013-10-07 16:01:44.874206993 +0200
@@ -0,0 +1,8 @@
+// { dg-do compile }
+// { dg-options "-std=c++11 -fgnu-tm" }
+
+int
+foo (void)
+{
+  return __transaction_atomic noexcept(false) (x); // { dg-error "was not 
declared in this scope" }
+}

Marek


New attribute: returns_nonnull

2013-10-07 Thread Marc Glisse

Hello,

this patch adds an attribute to let the compiler know that a function 
never returns NULL. I saw some ECF_* flags, but the attribute seems 
sufficient. I considered using nonnull(0), but then it would have been 
confusing that the version of nonnull without arguments applies only to 
parameters and not the return value.


2013-10-08  Marc Glisse  

PR tree-optimization/20318
gcc/c-family/
* c-common.c (handle_returns_nonnull_attribute): New function.
(c_common_attribute_table): Add returns_nonnull.

gcc/
* doc/extend.texi (returns_nonnull): New function attribute.
* fold-const.c (tree_expr_nonzero_warnv_p): Look for returns_nonnull
attribute.
* tree-vrp.c (gimple_stmt_nonzero_warnv_p): Likewise.
(stmt_interesting_for_vrp): Accept all GIMPLE_CALL.

gcc/testsuite/
* c-c++-common/pr20318.c: New file.
* gcc.dg/tree-ssa/pr20318.c: New file.

--
Marc GlisseIndex: c-family/c-common.c
===
--- c-family/c-common.c (revision 203241)
+++ c-family/c-common.c (working copy)
@@ -364,20 +364,21 @@ static tree handle_warn_unused_result_at
 bool *);
 static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *);
 static tree handle_type_generic_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alloc_size_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
 static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
 static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
+static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
 
 static void check_function_nonnull (tree, int, tree *);
 static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT);
 static bool nonnull_check_p (tree, unsigned HOST_WIDE_INT);
 static bool get_nonnull_operand (tree, unsigned HOST_WIDE_INT *);
 static int resort_field_decl_cmp (const void *, const void *);
 
 /* Reserved words.  The third field is a mask: keywords are disabled
if they match the mask.
 
@@ -740,20 +741,22 @@ const struct attribute_spec c_common_att
   { "*tm regparm",0, 0, false, true, true,
  ignore_attribute, false },
   { "no_split_stack",0, 0, true,  false, false,
  handle_no_split_stack_attribute, false },
   /* For internal use (marking of builtins and runtime functions) only.
  The name contains space to prevent its usage in source code.  */
   { "fn spec",   1, 1, false, true, true,
  handle_fnspec_attribute, false },
   { "warn_unused",0, 0, false, false, false,
  handle_warn_unused_attribute, false },
+  { "returns_nonnull",0, 0, false, true, true,
+ handle_returns_nonnull_attribute, false },
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 
 /* Give the specifications for the format attributes, used by C and all
descendants.  */
 
 const struct attribute_spec c_common_format_attribute_table[] =
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
affects_type_identity } */
@@ -9041,20 +9044,37 @@ handle_no_split_stack_attribute (tree *n
 }
   else if (DECL_INITIAL (decl))
 {
   error_at (DECL_SOURCE_LOCATION (decl),
"can%'t set %qE attribute after definition", name);
   *no_add_attrs = true;
 }
 
   return NULL_TREE;
 }
+
+/* Handle a "returns_nonnull" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_returns_nonnull_attribute (tree *node, tree, tree, int,
+ bool *no_add_attrs)
+{
+  // Even without a prototype we still have a return type we can check.
+  if (TREE_CODE (TREE_TYPE (*node)) != POINTER_TYPE)
+{
+  error ("returns_nonnull attribute on a function not returning a 
pointer");
+  *no_add_attrs = true;
+}
+  return NULL_TREE;
+}
+
 
 /* Check for valid arguments being passed to a function with FNTYPE.
There are NARGS arguments in the array ARGARRAY.  */
 void
 check_function_arguments (const_tree fntype, int nargs, tree *argarray)
 {
   /* Check for null being passed in a pointer argument that must be
  non-null.  We also need to do this if format checking is enabled.  */
 
   if (warn_nonnull)
Index: doc/extend.texi
===
--- doc/extend.texi (revision 203241)
+++ doc/extend.texi (working copy)
@@ -2126,21 +2126,22 @@ attributes when making a

Re: Optimize callers using nonnull attribute

2013-10-07 Thread Marc Glisse

On Mon, 7 Oct 2013, Richard Biener wrote:


On Mon, Oct 7, 2013 at 12:33 AM, Marc Glisse  wrote:

Hello,

this patch asserts that when we call a function with the nonnull attribute,
the corresponding argument is not zero, just like when we dereference a
pointer. Everything is under a check for flag_delete_null_pointer_checks.

Note that this function currently gives up if the statement may throw
(because in some languages indirections may throw?), but this could probably
be relaxed a bit so my example would still work when compiled with g++,
without having to mark f1 and f2 as throw().

Bootstrap (default languages) + testsuite on x86_64-unknown-linux-gnu.


Can you please restructure it in a way to not require a goto?  That is,
move searching for a non-null opportunity into a helper function?


Thanks. I wasn't sure what to put in the helper and what to keep in the 
original function, so I'll wait a bit for comments before the commit.


Bootstrap (default languages) + testsuite on x86_64-unknown-linux-gnu.

2013-10-08  Marc Glisse  

PR tree-optimization/58480
gcc/
* tree-vrp.c (infer_nonnull_range): New function.
(infer_value_range): Call infer_nonnull_range.

gcc/testsuite/
* gcc.dg/tree-ssa/pr58480.c: New file.

--
Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/pr58480.c
===
--- testsuite/gcc.dg/tree-ssa/pr58480.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/pr58480.c (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { ! keeps_null_pointer_checks } } } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+extern void eliminate (void);
+extern void* f1 (void *a, void *b) __attribute__((nonnull));
+extern void* f2 (void *a, void *b) __attribute__((nonnull(2)));
+void g1 (void*p, void*q){
+  f1 (q, p);
+  if (p == 0)
+eliminate ();
+}
+void g2 (void*p, void*q){
+  f2 (q, p);
+  if (p == 0)
+eliminate ();
+}
+
+/* { dg-final { scan-tree-dump-times "Folding predicate\[^\\n\]*to 0" 2 "vrp1" 
} } */
+/* { dg-final { cleanup-tree-dump "vrp1" } } */

Property changes on: testsuite/gcc.dg/tree-ssa/pr58480.c
___
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Index: tree-vrp.c
===
--- tree-vrp.c  (revision 203241)
+++ tree-vrp.c  (working copy)
@@ -4455,63 +4455,103 @@ build_assert_expr_for (tree cond, tree v
 
 static inline bool
 fp_predicate (gimple stmt)
 {
   GIMPLE_CHECK (stmt, GIMPLE_COND);
 
   return FLOAT_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)));
 }
 
 
+/* If OP can be inferred to be non-zero after STMT executes, return true.  */
+
+static bool
+infer_nonnull_range (gimple stmt, tree op)
+{
+  /* We can only assume that a pointer dereference will yield
+ non-NULL if -fdelete-null-pointer-checks is enabled.  */
+  if (!flag_delete_null_pointer_checks
+  || !POINTER_TYPE_P (TREE_TYPE (op))
+  || gimple_code (stmt) == GIMPLE_ASM)
+return false;
+
+  unsigned num_uses, num_loads, num_stores;
+
+  count_uses_and_derefs (op, stmt, &num_uses, &num_loads, &num_stores);
+  if (num_loads + num_stores > 0)
+return true;
+
+  if (gimple_code (stmt) == GIMPLE_CALL)
+{
+  tree fntype = gimple_call_fntype (stmt);
+  tree attrs = TYPE_ATTRIBUTES (fntype);
+  for (; attrs; attrs = TREE_CHAIN (attrs))
+   {
+ attrs = lookup_attribute ("nonnull", attrs);
+
+ /* If "nonnull" wasn't specified, we know nothing about
+the argument.  */
+ if (attrs == NULL_TREE)
+   return false;
+
+ /* If "nonnull" applies to all the arguments, then ARG
+is non-null.  */
+ if (TREE_VALUE (attrs) == NULL_TREE)
+   return true;
+
+ /* Now see if op appears in the nonnull list.  */
+ for (tree t = TREE_VALUE (attrs); t; t = TREE_CHAIN (t))
+   {
+ int idx = TREE_INT_CST_LOW (TREE_VALUE (t)) - 1;
+ tree arg = gimple_call_arg (stmt, idx);
+ if (op == arg)
+   return true;
+   }
+   }
+}
+
+  return false;
+}
+
 /* If the range of values taken by OP can be inferred after STMT executes,
return the comparison code (COMP_CODE_P) and value (VAL_P) that
describes the inferred range.  Return true if a range could be
inferred.  */
 
 static bool
 infer_value_range (gimple stmt, tree op, enum tree_code *comp_code_p, tree 
*val_p)
 {
   *val_p = NULL_TREE;
   *comp_code_p = ERROR_MARK;
 
   /* Do not attempt to infer anything in names that flow through
  abnormal edges.  */
   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
 return false;
 
   /* Similarly, don't infer anything from statements that may throw
- exceptions.  */
+ exceptions. ??? Relax this requirement?  */
   if

[c++] Fix pr58525

2013-10-07 Thread Alexander Ivchenko
Hi,

__cxa_throw_bad_array_new_length and __cxa_throw_bad_array_new_length
are generated with -fno-exceptions right now. The attached patch fixes
that problem. Bootstrapped and regtested on x86_64-unknown-linux-gnu:

2013-10-07  Alexander Ivchenko  

* gcc/cp/call.c (build_operator_new_call): Add flag_exceptions check.
* gcc/cp/decl.c (cp_finish_decl): Ditto.
* gcc/cp/init.c (build_new_1): Ditto.
(build_vec_init): Ditto.



Is it ok?

--Alexander


fix_58525.patch
Description: Binary data


[gomp4] Fix up affinity-1.c testcase

2013-10-07 Thread Jakub Jelinek
Hi!

I've obviously meant to do the forking even for more than 8 contiguous
logical CPUs available, all the test needs is that all of 0, 1, 2, 3, 4, 5,
6, 7 logical CPUs are available and not disallowed by initial affinity mask.

2013-10-07  Jakub Jelinek  

* testsuite/libgomp.c/affinity-1.c (main): Fork even if
contig_cpucount is > 8.

--- libgomp/testsuite/libgomp.c/affinity-1.c.jj 2013-10-07 09:31:53.0 
+0200
+++ libgomp/testsuite/libgomp.c/affinity-1.c2013-10-07 14:09:52.475331358 
+0200
@@ -178,7 +178,7 @@ main ()
   int test_places = 0;
 
 #ifdef DO_FORK
-  if (env_places == NULL && contig_cpucount == 8 && test_false
+  if (env_places == NULL && contig_cpucount >= 8 && test_false
   && getenv ("GOMP_AFFINITY") == NULL)
 {
   int i, j, status;

Jakub


Re: libgo patch committed: Implement reflect.MakeFunc for 386

2013-10-07 Thread Rainer Orth
Ian Lance Taylor  writes:

> On Wed, Oct 2, 2013 at 7:45 AM, Rainer Orth  
> wrote:
>>
>> Here's what I came up with.  As I said, it is inspired by the libffi
>> code, but a bit simplified since e.g. stuff like no .ascii support
>> aren't relevant on the Solaris versions supported on mainline and 4.8
>> branch.
>>
>> Bootstrapped on x86_64-unknown-linux-gnu and i386-pc-solaris2.10 with
>> Sun as and gas.  I've also compared the readelf --debug-dump=frames
>> output for the 32 and 64-bit makefunc.o, both PIC and non-PIC.  64-bit
>> is completely unchanged, while for 32-bit there are FDE encoding changes
>> as expected from the FDE_ENCODING/FDE_ENCODE macros.
>>
>> Rainer
>>
>>
>> 2013-10-01  Rainer Orth  
>>
>> * configure.ac (libgo_cv_ro_eh_frame): New test.
>> (libgo_cv_as_comdat_gnu): Likewise.
>> (libgo_cv_as_x86_pcrel): Likewise.
>> (libgo_cv_as_x86_64_unwind_section_type): Likewise.
>> * configure: Regenerate.
>> * config.h.in: Regenerate.
>> * go/reflect/makefunc_386.S: Replace CFI directives by hand-coded
>> .eh_frame section.
>> Restrict .note.* sections to Linux.
>> * go/reflect/makefunc_amd64.S: Likewise.
>
> Great, thanks for working on this.  Committed to trunk and 4.8 branch.

Unfortunately, Solaris 9 testing with Sun as revealed two problems: that
assembler cannot handle either # comments or .global instead of .globl.
The following patch fixes this and allows makefunc_386.S to assemble.
Also compiled with a gas build to make nothing broke there.

I didn't touch makefunc_amd64.S, but could to keep the two consistent.

Rainer


diff --git a/libgo/go/reflect/makefunc_386.S b/libgo/go/reflect/makefunc_386.S
--- a/libgo/go/reflect/makefunc_386.S
+++ b/libgo/go/reflect/makefunc_386.S
@@ -1,12 +1,12 @@
-# Copyright 2013 The Go Authors. All rights reserved.
-# Use of this source code is governed by a BSD-style
-# license that can be found in the LICENSE file.
+/* Copyright 2013 The Go Authors. All rights reserved.
+   Use of this source code is governed by a BSD-style
+   license that can be found in the LICENSE file.
 
-# MakeFunc 386 assembly code.
+   MakeFunc 386 assembly code.  */
 
 #include "config.h"
 
-	.global reflect.makeFuncStub
+	.globl reflect.makeFuncStub
 
 #ifdef __ELF__
 	.type reflect.makeFuncStub,@function
@@ -15,25 +15,26 @@
 reflect.makeFuncStub:
 .LFB1:
 
-	# Go does not provide any equivalent to the regparm function
-	# attribute, so on Go we do not need to worry about passing
-	# parameters in registers.  We just pass a pointer to the
-	# arguments on the stack.
-	#
-	# We do need to pick up the return values, though, so we pass
-	# a pointer to a struct that looks like this.
-	# struct {
-	#   esp uint32		// 0x0
-	#   eax uint32		// 0x4
-	#   st0 uint64		// 0x8
-	# }
+	/* Go does not provide any equivalent to the regparm function
+	   attribute, so on Go we do not need to worry about passing
+	   parameters in registers.  We just pass a pointer to the
+	   arguments on the stack.
+	
+	   We do need to pick up the return values, though, so we pass
+	   a pointer to a struct that looks like this.
+	   struct {
+	 esp uint32		// 0x0
+	 eax uint32		// 0x4
+	 st0 uint64		// 0x8
+	   }
+	*/
 
 	pushl	%ebp
 .LCFI0:
 	movl	%esp, %ebp
 .LCFI1:
-	pushl	%ebx		# In case this is PIC.
-	subl	$36, %esp	# Enough for args and to align stack.
+	pushl	%ebx		/* In case this is PIC.  */
+	subl	$36, %esp	/* Enough for args and to align stack.  */
 .LCFI2:
 
 #ifdef __PIC__
@@ -41,7 +42,7 @@ reflect.makeFuncStub:
 	addl	$_GLOBAL_OFFSET_TABLE_, %ebx
 #endif
 
-	leal	8(%ebp), %eax	# Set esp field in struct.
+	leal	8(%ebp), %eax	/* Set esp field in struct.  */
 	movl	%eax, -24(%ebp)
 
 #ifdef __PIC__
@@ -61,14 +62,14 @@ reflect.makeFuncStub:
 	call	reflect.MakeFuncStubGo
 #endif
 
-	# Set return registers.
+	/* Set return registers.  */
 
 	movl	-20(%ebp), %eax
 	fldl	-16(%ebp)
 
 #ifdef __SSE2__
-	# In case we are compiling with -msseregparm.  This won't work
-	# correctly if only SSE1 is supported, but that seems unlikely.
+	/* In case we are compiling with -msseregparm.  This won't work
+	   correctly if only SSE1 is supported, but that seems unlikely.  */
 	movsd	-16(%ebp), %xmm0
 #endif
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[gomp4] Affinity fixes (PR libgomp/58642)

2013-10-07 Thread Jakub Jelinek
Hi!

Vincent reported that on some machines affinity stopped working,
apparently sometimes sysconf (_SC_NPROCESSORS_CONF) is smaller than
the minimum number of logical CPUs kernel allows for sched_getaffinity
syscall :(, and I don't seem to see how to query that magic number.

So, this patch, if pthread_getaffinity_np fails with EINVAL, retries
with a bigger mask set (if the starting cpusetsize was smaller than
sizeof (cpu_set_t), then it goes on with that size (1024 CPUs), then
always doubles the size), and after successful pthread_getaffinity_np
looks through the affinity mask if it could use smaller value for the
places array and pthread_setaffinity_np syscalls.  gomp_get_cpuset_size
is the larger of the sizes then, for use in pthread_getaffinity_np
and the size of gomp_cpusetp bitmask, gomp_cpuset_size is equal to that
or smaller and is what is passed to pthread_setaffinity_np. Say, kernel
configured with 32 present CPUs and up to 96 hotplug CPUs, could have
gomp_get_cpuset_size 128 (1024 CPUs) or perhaps 16 (128 CPUs), while
gomp_cpuset_size 8 (up to 64 CPUs).

2013-10-07  Jakub Jelinek  

PR libgomp/58642
* config/linux/proc.c: Include errno.h.
(gomp_get_cpuset_size): New variable.
(gomp_cpuset_popcount): Add cpusetsize argument, use it instead of
gomp_cpuset_size.
(gomp_init_num_threads): If CPU_ALLOC_SIZE is defined and
pthread_getaffinity_np returned EINVAL, increase gomp_cpuset_size
and retry.  After successful pthread_getaffinity_np copy
gomp_cpuset_size to gomp_get_cpuset_size and try to find out
minimum gomp_cpuset_size that covers all the CPUs set in gomp_cpusetp.
(get_num_procs): Pass gomp_get_cpuset_size rather than gomp_cpuset_size
to pthread_getaffinity_np, adjust gomp_cpuset_popcount caller.
* config/linux/proc.h (gomp_cpuset_popcount): Add cpusetsize argument.
* config/linux/affinity.c (gomp_affinity_finalize_place_list,
gomp_affinity_init_level): Adjust gomp_cpuset_popcount callers.
* testsuite/libgomp.c/affinity-1.c (pthread_getaffinity_np): Set
contig_cpucount from the first successful pthread_getaffinity_np
call, rather than just first call.

--- libgomp/config/linux/proc.c.jj  2013-10-04 09:03:01.0 +0200
+++ libgomp/config/linux/proc.c 2013-10-07 09:31:00.350979942 +0200
@@ -30,6 +30,7 @@
 #endif
 #include "libgomp.h"
 #include "proc.h"
+#include 
 #include 
 #include 
 #ifdef HAVE_GETLOADAVG
@@ -40,17 +41,18 @@
 
 #ifdef HAVE_PTHREAD_AFFINITY_NP
 unsigned long gomp_cpuset_size;
+static unsigned long gomp_get_cpuset_size;
 cpu_set_t *gomp_cpusetp;
 
 unsigned long
-gomp_cpuset_popcount (cpu_set_t *cpusetp)
+gomp_cpuset_popcount (unsigned long cpusetsize, cpu_set_t *cpusetp)
 {
 #ifdef CPU_COUNT_S
   /* glibc 2.7 and above provide a macro for this.  */
-  return CPU_COUNT_S (gomp_cpuset_size, cpusetp);
+  return CPU_COUNT_S (cpusetsize, cpusetp);
 #else
 #ifdef CPU_COUNT
-  if (gomp_cpuset_size == sizeof (cpu_set_t))
+  if (cpusetsize == sizeof (cpu_set_t))
 /* glibc 2.6 and above provide a macro for this.  */
 return CPU_COUNT (cpusetp);
 #endif
@@ -59,7 +61,7 @@ gomp_cpuset_popcount (cpu_set_t *cpusetp
   extern int check[sizeof (cpusetp->__bits[0]) == sizeof (unsigned long int)
   ? 1 : -1];
 
-  for (i = 0; i < gomp_cpuset_size / sizeof (cpusetp->__bits[0]); i++)
+  for (i = 0; i < cpusetsize / sizeof (cpusetp->__bits[0]); i++)
 {
   unsigned long int mask = cpusetp->__bits[i];
   if (mask == 0)
@@ -86,24 +88,55 @@ gomp_init_num_threads (void)
 #endif
 
   gomp_cpusetp = (cpu_set_t *) gomp_malloc (gomp_cpuset_size);
-  if (pthread_getaffinity_np (pthread_self (), gomp_cpuset_size,
- gomp_cpusetp) == 0)
+  do
 {
-  /* Count only the CPUs this process can use.  */
-  gomp_global_icv.nthreads_var = gomp_cpuset_popcount (gomp_cpusetp);
-  if (gomp_global_icv.nthreads_var == 0)
+  int ret = pthread_getaffinity_np (pthread_self (), gomp_cpuset_size,
+   gomp_cpusetp);
+  if (ret == 0)
{
- gomp_global_icv.nthreads_var = 1;
- free (gomp_cpusetp);
- gomp_cpusetp = NULL;
+ unsigned long i;
+ /* Count only the CPUs this process can use.  */
+ gomp_global_icv.nthreads_var
+   = gomp_cpuset_popcount (gomp_cpuset_size, gomp_cpusetp);
+ if (gomp_global_icv.nthreads_var == 0)
+   break;
+ gomp_get_cpuset_size = gomp_cpuset_size;
+#ifdef CPU_ALLOC_SIZE
+ for (i = gomp_cpuset_size * 8; i; i--)
+   if (CPU_ISSET_S (i - 1, gomp_cpuset_size, gomp_cpusetp))
+ break;
+ gomp_cpuset_size = CPU_ALLOC_SIZE (i);
+#endif
+ return;
}
-  return;
-}
-  else
-{
-  free (gomp_cpusetp);
-  gomp_cpusetp = NULL;
+  if (ret != EINVAL)
+   break;
+#ifdef CPU_ALLOC_SIZE
+  i

Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Jonathan Wakely
On 6 October 2013 23:43, Tim Shen wrote:
> Here's a simple piece of code
> https://gist.github.com/innocentim/6849759 the reveals _BFSExecutor's
> inefficiency. Some optimizations are here to reduce the unecessary
> time complexity from O(n^2) to O(n).
>
> I'll do a bootstrap and full testing before committing.

In the ChangeLog, please fix the spelling of "unnecessary".

The constructor of _TodoList could be "explicit"

Does _TodoList really need to derive from std::vector<_StateIdT> or
could it just have a second member? I think there's no benefit to
using inheritance here.

_TodoList::__exists should be named _M_exists to follow the library
coding standards.

Shouldn't _TodoList::_M_empty() also set __exists[n] = false for all n?

I think it's guaranteed that __exists[__u] is always in range, right?


Re: [gomp4] C++ OpenMP user defined reductions (take 2)

2013-10-07 Thread Jakub Jelinek
Hi!

On Sat, Sep 21, 2013 at 07:45:03AM -0400, Jason Merrill wrote:

So, for things that worked fine for me, I'm attaching two patches
(W050b and W051a), do they look ok to you?  The rest causes issues, all
of them on gcc/testsuite/g++.dg/gomp/udr-3.C testcase, see below.

> On 09/20/2013 12:25 PM, Jakub Jelinek wrote:
> >In templates the UDRs are always FUNCTION_DECLs in classes or
> >at function block scope, the above one liner was I believe for the latter,
> >where without it duplicate_decls was returning incorrectly 0; the UDRs
> >from mismatching templates would actually never be seen by duplicate_decls,
> >but t1 was different from t2.  That was before the changes to
> >add the mangled names to the UDR DECL_NAMEs though, I can try to remove it
> >and see if the whole testsuite still passes.
> 
> Please.
> 
> >>>+&& ! (DECL_OMP_DECLARE_REDUCTION_P (newdecl)
> >>>+  && DECL_CONTEXT (newdecl) == NULL_TREE
> >>>+  && DECL_CONTEXT (olddecl) == current_function_decl))
> >>
> >>And this looks like you need to set DECL_CONTEXT sooner on reductions.
> >
> >I know it is ugly, but setting FUNCTION_DECL DECL_CONTEXT too early resulted
> >in all kinds of problems, when the C++ frontend doesn't support nested
> >functions.  So the patch doesn't set DECL_CONTEXT until it is pushdecled
> >into the block scope.
> 
> What is calling decls_match before pushdecl?

Ok, sorry for the delay, been busy with libgomp.
I've tried to remove the two decl.c changes, unfortunately it failed
miserably on the gcc/testsuite/g++.dg/gomp/udr-3.C testcase.
The problem isn't on valid OpenMP code where there shouldn't be any
duplicates anywhere, but during error diagnostics.  Without those two decl.c
hunks (either of them), pushdecl will sometimes return a different decl from
the original or error_mark_node, and the original fndecl passed to it has
ggc_free called on it, thus any further use of it ICEs or may ICE.
With the attached X1 patch I avoid the ICEs - if pushdecl doesn't return
the original fndecl, I treat it as an error that has been reported already,
unfortunately that still means many errors expected to be diagnosed aren't,
e.g. first one on line 71.  Perhaps if pushdecl returns error_mark_node,
then I'd should expect that the error has been reported already and if
it returns some other FUNCTION_DECL, then I should report it myself,
but a problem with that is that there are multiple locations that call
pushdecl (two in parser.c, one in pt.c) and more importantly, that for
the diagnostics the new fndecl is ggc_freed and thus I can't use it
for the diagnostics anymore.
Trying to set DECL_CONTEXT to non-NULL for block_scope UDRs leads to
immediate ICEs as I said earlier, again on udr-3.C testcase:
@@ -30179,7 +30179,10 @@ cp_parser_omp_declare_reduction (cp_pars
{
  block_scope = true;
  if (!processing_template_decl)
-   pushdecl (fndecl);
+   {
+ DECL_CONTEXT (fndecl) = current_function_decl;
+ pushdecl (fndecl);
+   }
}
   else if (current_class_type)
{
#0  internal_error (gmsgid=0x15243c8 "tree check: %s, have %s in %s, at %s:%d") 
at ../../gcc/diagnostic.c:1127
#1  0x00f6cc8e in tree_check_failed (node=0x71a2f600, 
file=0x144771e "../../gcc/cp/name-lookup.c", line=3392, 
function=0x1448a70  "namespace_binding_1") at ../../gcc/tree.c:9321
#2  0x0055e725 in tree_check (__t=, 
__f=0x144771e "../../gcc/cp/name-lookup.c", __l=3392, 
__g=0x1448a70  
"namespace_binding_1", __c=NAMESPACE_DECL) at ../../gcc/tree.h:2672
#3  0x007ae8bc in namespace_binding_1 (name=, 
scope=) at ../../gcc/cp/name-lookup.c:3392
#4  0x007ae9e8 in namespace_binding (name=, 
scope=) at ../../gcc/cp/name-lookup.c:3404
#5  0x007a38db in pushdecl_maybe_friend_1 (x=, is_friend=false)
at ../../gcc/cp/name-lookup.c:701
#6  0x007a7521 in pushdecl_maybe_friend (x=, is_friend=false)
at ../../gcc/cp/name-lookup.c:1264
#7  0x007a7558 in pushdecl (x=) at ../../gcc/cp/name-lookup.c:1274

pushdecl_maybe_friend_1 clearly doesn't expect anything to be in function scope,
and from what I remember from writing the UDR patch, it certainly wasn't the
only spot.

> >>>+  if (TREE_CODE (argtype) == REFERENCE_TYPE)
> >>>+error_at (DECL_SOURCE_LOCATION (t),
> >>>+  "function, array or reference type in "
> >>>+  "%<#pragma omp declare reduction%>");
> >>
> >>Let's just say "reference type", since we know that's what it is.
> >
> >That is true, but I wanted to match the same error message elsewhere,
> >otherwise the error will be different (more specific) for instantiation
> >vs. in non-template code.
> 
> It's more important to be specific during instantiation because we
> can't always tell what the types involved actually are.  So let's
> also add the type in question to the diagnostic.

Ok, see attached W050b patch.

Other changes that worked fi

Re: [SKETCH] Refactor implicit function template implementation and fix 58534, 58536, 58548, 58549 and 58637.

2013-10-07 Thread Paolo Carlini

Hi,

On 10/07/2013 11:14 AM, Adam Butcher wrote:

+  if (!cp_disable_auto_as_implicit_function_template_parm &&
+ current_binding_level->kind == sk_function_parms)


Stylistic nit: && should be on the next line.

Thanks!
Paolo.


Re: [PATCH] alternative hirate for builtin_expert

2013-10-07 Thread Richard Biener
On Mon, Oct 7, 2013 at 12:02 PM, Ramana Radhakrishnan  wrote:
> On 10/04/13 22:23, Jan Hubicka wrote:
>>>
>>> On Fri, Oct 4, 2013 at 11:54 AM, Jan Hubicka  wrote:
>
> I looked at this problem. Bug updated
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619
>
> This is a bug when updating block during tree-inline. Basically, it is
> legal for *n to be NULL. E.g. When gimple_block(id->gimple_call) is
> NULL, remap_blocks_to_null will be called to set *n to NULL.


 The NULL in gimple_block (gimple_call) comes from the call introduced by
 ipa-split?
>>>
>>>
>>> That is correct.
>>>
 I remember that ipa-split used to try to put the call into block since
 we was ICEing
 in similar ways previously, too.  Perhaps this has changed with new
 BLOCK representation?
>>>
>>>
>>> The new BLOCK representation does not change this. I think it makes
>>> sense to leave the block of newly introduced call_stmt as NULL because
>>> when it's inlined back, we don't want to add additional block layers.
>>
>>
>> You are right, it may be result of Jakub's changes in the area (to improve
>> debug info
>> after inlining back).  I guess the patch makes sense then.
>
>
> It at-least fixes the issues I've been seeing on arm-none-linux-gnueabihf.
> The build has atleast gone past that point and I should have some test
> results later today.
>
> I don't know enough in that area to comment further on the technical aspects
> of the patch but it doesn't look outrageous from my point of view .
>
> Can someone comment / approve it quickly so that we get AArch32 and AArch64
> linux cross-builds back up ?

Ok.

Thanks,
Richard.

>
> regards
> Ramana
>
>>
>> Honza
>>
>>>
>>> Dehao
>>>

 Honza
>>
>>
>
>


Re: [PATCH] alternative hirate for builtin_expert

2013-10-07 Thread Ramana Radhakrishnan

On 10/04/13 22:23, Jan Hubicka wrote:

On Fri, Oct 4, 2013 at 11:54 AM, Jan Hubicka  wrote:

I looked at this problem. Bug updated
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619

This is a bug when updating block during tree-inline. Basically, it is
legal for *n to be NULL. E.g. When gimple_block(id->gimple_call) is
NULL, remap_blocks_to_null will be called to set *n to NULL.


The NULL in gimple_block (gimple_call) comes from the call introduced by 
ipa-split?


That is correct.


I remember that ipa-split used to try to put the call into block since we was 
ICEing
in similar ways previously, too.  Perhaps this has changed with new BLOCK 
representation?


The new BLOCK representation does not change this. I think it makes
sense to leave the block of newly introduced call_stmt as NULL because
when it's inlined back, we don't want to add additional block layers.


You are right, it may be result of Jakub's changes in the area (to improve 
debug info
after inlining back).  I guess the patch makes sense then.


It at-least fixes the issues I've been seeing on 
arm-none-linux-gnueabihf. The build has atleast gone past that point and 
I should have some test results later today.


I don't know enough in that area to comment further on the technical 
aspects of the patch but it doesn't look outrageous from my point of view .


Can someone comment / approve it quickly so that we get AArch32 and 
AArch64 linux cross-builds back up ?



regards
Ramana



Honza



Dehao



Honza







RFA: Add news item for ARC port contribution

2013-10-07 Thread Joern Rennecke

OK to commit?
Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.892
diff -c -r1.892 index.html
*** index.html  3 Oct 2013 14:15:36 -   1.892
--- index.html  7 Oct 2013 09:35:35 -
***
*** 57,62 
--- 57,66 
  [2013-10-03]
  Regular expression support in libstdc++-v3 is 
now available.
  
+ Synopsys Designware ARC support
+ [2013-10-01]
+ A port for Synopsys Designware ARC has been contributed by Embecosm and 
Synopsys Inc.
+ 
  TI MSP430 support
  [2013-09-12]
  A port for the TI MSP430 has been contributed by Red Hat Inc.


Re: Optimize callers using nonnull attribute

2013-10-07 Thread Richard Biener
On Mon, Oct 7, 2013 at 12:33 AM, Marc Glisse  wrote:
> Hello,
>
> this patch asserts that when we call a function with the nonnull attribute,
> the corresponding argument is not zero, just like when we dereference a
> pointer. Everything is under a check for flag_delete_null_pointer_checks.
>
> Note that this function currently gives up if the statement may throw
> (because in some languages indirections may throw?), but this could probably
> be relaxed a bit so my example would still work when compiled with g++,
> without having to mark f1 and f2 as throw().
>
> Bootstrap (default languages) + testsuite on x86_64-unknown-linux-gnu.

Can you please restructure it in a way to not require a goto?  That is,
move searching for a non-null opportunity into a helper function?

Ok with that change.  (I wonder how many bogus nonnull attributes
this catches ...)

Thanks,
Richard.

> 2013-10-08  Marc Glisse  
>
> PR tree-optimization/58480
> gcc/
> * tree-vrp.c (infer_value_range): Look for a nonnull attribute.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/pr58480.c: New file.
>
> --
> Marc Glisse
> Index: testsuite/gcc.dg/tree-ssa/pr58480.c
> ===
> --- testsuite/gcc.dg/tree-ssa/pr58480.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/pr58480.c (working copy)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target { ! keeps_null_pointer_checks } } } */
> +/* { dg-options "-O2 -fdump-tree-vrp1" } */
> +
> +extern void eliminate (void);
> +extern void* f1 (void *a, void *b) __attribute__((nonnull));
> +extern void* f2 (void *a, void *b) __attribute__((nonnull(2)));
> +void g1 (void*p, void*q){
> +  f1 (q, p);
> +  if (p == 0)
> +eliminate ();
> +}
> +void g2 (void*p, void*q){
> +  f2 (q, p);
> +  if (p == 0)
> +eliminate ();
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Folding predicate\[^\\n\]*to 0" 2
> "vrp1" } } */
> +/* { dg-final { cleanup-tree-dump "vrp1" } } */
>
> Property changes on: testsuite/gcc.dg/tree-ssa/pr58480.c
> ___
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property
> Index: tree-vrp.c
> ===
> --- tree-vrp.c  (revision 203229)
> +++ tree-vrp.c  (working copy)
> @@ -4493,28 +4493,58 @@ infer_value_range (gimple stmt, tree op,
>/* We can only assume that a pointer dereference will yield
>   non-NULL if -fdelete-null-pointer-checks is enabled.  */
>if (flag_delete_null_pointer_checks
>&& POINTER_TYPE_P (TREE_TYPE (op))
>&& gimple_code (stmt) != GIMPLE_ASM)
>  {
>unsigned num_uses, num_loads, num_stores;
>
>count_uses_and_derefs (op, stmt, &num_uses, &num_loads, &num_stores);
>if (num_loads + num_stores > 0)
> +   goto nonnull;
> +
> +  if (gimple_code (stmt) == GIMPLE_CALL)
> {
> - *val_p = build_int_cst (TREE_TYPE (op), 0);
> - *comp_code_p = NE_EXPR;
> - return true;
> + tree fntype = gimple_call_fntype (stmt);
> + tree attrs = TYPE_ATTRIBUTES (fntype);
> + for (; attrs; attrs = TREE_CHAIN (attrs))
> +   {
> + attrs = lookup_attribute ("nonnull", attrs);
> +
> + /* If "nonnull" wasn't specified, we know nothing about
> +the argument.  */
> + if (attrs == NULL_TREE)
> +   return false;
> +
> + /* If "nonnull" applies to all the arguments, then ARG
> +is non-null.  */
> + if (TREE_VALUE (attrs) == NULL_TREE)
> +   goto nonnull;
> +
> + /* Now see if op appears in the nonnull list.  */
> + for (tree t = TREE_VALUE (attrs); t; t = TREE_CHAIN (t))
> +   {
> + int idx = TREE_INT_CST_LOW (TREE_VALUE (t)) - 1;
> + tree arg = gimple_call_arg (stmt, idx);
> + if (op == arg)
> +   goto nonnull;
> +   }
> +   }
> }
>  }
>
>return false;
> +
> +nonnull:
> +  *val_p = build_int_cst (TREE_TYPE (op), 0);
> +  *comp_code_p = NE_EXPR;
> +  return true;
>  }
>
>
>  void dump_asserts_for (FILE *, tree);
>  void debug_asserts_for (tree);
>  void dump_all_asserts (FILE *);
>  void debug_all_asserts (void);
>
>  /* Dump all the registered assertions for NAME to FILE.  */
>
>


Re: Enable SSE math on i386 with -Ofast

2013-10-07 Thread Jan Hubicka
> >  In meantime I (partially,
> > since megrez stopped producing 32bit spec2k6 results) benchmarked
> > -mfpmath=sse,387 and it does not seem to be a loss anymore.  So perhaps we 
> > can
> > give it a try?
> 
> Not sure ... I would guess that it's not a win on any recent architecture
> (and LRA is probably not well-prepared here either).

I think it has chance to win when the input/out registers are forced to be in
387 (because of return value ABI) and perhaps with register pressure in cases
two independent computtions are going on and LRA can home one in SSE and other
in 387 registers.  Don't really know.

Main advantage of 387 is that it is significantly more compact than SSE. Last
hardware really favouring 387 was probably original pentium4 (where additions
was better pipelined on 387 path if I recall correctly).  I wonder how AVX
changed this.  I was thus thining about adding a mode where we chose 387 or SSE
based on fact if function is optimzed for size.
The size difference is quite high - around 5% on specfp.
> 
> > > change the ABI ... (do we change the local functions ABI with 
> > > -mfpmath=sse?)
> > 
> > We don't.  It is probably quite easy to default to sse_regparm and change 
> > return value type.
> > I will look into it.
> 
M
> Thanks.  That's independent of enabling -mfpmath=sse at -Ofast of course.
Yep, my plan is to enable fpmath with -Ofast today and look into those two 
items incrementally.

Honza
> 
> Richard.


Re: Enable SSE math on i386 with -Ofast

2013-10-07 Thread Richard Biener
On Mon, 7 Oct 2013, Jan Hubicka wrote:

> > On Fri, 4 Oct 2013, Jan Hubicka wrote:
> > 
> > > Hi,
> > > this patch makes -Ofast to also imply -mfpmath=sse.  It is important win 
> > > on
> > > SPECfP (2000 and 2006). Even though for exmaple the following
> > > float a(float b)
> > > {
> > >return b+10;
> > > }
> > > 
> > > results in somewhat ridiculous
> > > a:
> > > .LFB0:  
> > > .cfi_startproc
> > > subl$4, %esp
> > > .cfi_def_cfa_offset 8
> > > movss   .LC0, %xmm0
> > > addss   8(%esp), %xmm0
> > > movss   %xmm0, (%esp)
> > > flds(%esp)
> > > addl$4, %esp
> > > .cfi_def_cfa_offset 4
> > > ret
> > > 
> > > I wonder if we can get rid at least of the redundant stack alignment on 
> > > ESP...
> > > 
> > > Bootstrapped/regtested x86_64-linux, will commit it on weekend if there 
> > > are no
> > > complains.  I wonder if -ffast-math should do the same - it is documented 
> > > as enabling
> > > explicit set of options, bu that can be changed I guess.
> > 
> > I wonder if we can restrict -mfpmath=sse to local functions where we can
> 
> We can, but why? Parameters are passed in memory that is equaly bad for 387 
> and
> SSE.  Only return values are passed in registers, that is not that expensive 
> to
> have one extra reload per function except for functions containing almost
> nothing that should be inlined if they are local.

Ah, I forgot that detail.  Still going through the FP stack for return
values is bad.

>  In meantime I (partially,
> since megrez stopped producing 32bit spec2k6 results) benchmarked
> -mfpmath=sse,387 and it does not seem to be a loss anymore.  So perhaps we can
> give it a try?

Not sure ... I would guess that it's not a win on any recent architecture
(and LRA is probably not well-prepared here either).

> > change the ABI ... (do we change the local functions ABI with 
> > -mfpmath=sse?)
> 
> We don't.  It is probably quite easy to default to sse_regparm and change 
> return value type.
> I will look into it.

Thanks.  That's independent of enabling -mfpmath=sse at -Ofast of course.

Richard.


Re: Enable SSE math on i386 with -Ofast

2013-10-07 Thread Jan Hubicka
> On Fri, 4 Oct 2013, Jan Hubicka wrote:
> 
> > Hi,
> > this patch makes -Ofast to also imply -mfpmath=sse.  It is important win on
> > SPECfP (2000 and 2006). Even though for exmaple the following
> > float a(float b)
> > {
> >return b+10;
> > }
> > 
> > results in somewhat ridiculous
> > a:
> > .LFB0:  
> > .cfi_startproc
> > subl$4, %esp
> > .cfi_def_cfa_offset 8
> > movss   .LC0, %xmm0
> > addss   8(%esp), %xmm0
> > movss   %xmm0, (%esp)
> > flds(%esp)
> > addl$4, %esp
> > .cfi_def_cfa_offset 4
> > ret
> > 
> > I wonder if we can get rid at least of the redundant stack alignment on 
> > ESP...
> > 
> > Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are 
> > no
> > complains.  I wonder if -ffast-math should do the same - it is documented 
> > as enabling
> > explicit set of options, bu that can be changed I guess.
> 
> I wonder if we can restrict -mfpmath=sse to local functions where we can

We can, but why? Parameters are passed in memory that is equaly bad for 387 and
SSE.  Only return values are passed in registers, that is not that expensive to
have one extra reload per function except for functions containing almost
nothing that should be inlined if they are local.  In meantime I (partially,
since megrez stopped producing 32bit spec2k6 results) benchmarked
-mfpmath=sse,387 and it does not seem to be a loss anymore.  So perhaps we can
give it a try?

> change the ABI ... (do we change the local functions ABI with 
> -mfpmath=sse?)

We don't.  It is probably quite easy to default to sse_regparm and change 
return value type.
I will look into it.

Honza


[SKETCH] Refactor implicit function template implementation and fix 58534, 58536, 58548, 58549 and 58637.

2013-10-07 Thread Adam Butcher

Hi Jason,

I did intend to break this up but as I've completed the first pass at 
refactoring implicit function templates and along the way fixed the 5 
bugs submitted by Volker I thought I'd post the rolled up patch for you 
to peruse.


It (should) support arbitrarily complex use of 'auto' in a parameter 
list to introduce template parameters.  Implicit pack expansion 
parameters containing a single 'auto' are handled by tentatively 
assuming a single 'auto' might be a pack then reverting the flag if 
found not to be.  This now fully supports the current C++14 generic 
lambda examples.


No new regressions on trunk as of this morning.

Any feedback appreciated.  Diff attached with -w -b.

Cheers,
Adam

  TODO: Changelog post review.
  PR c++/58534
  PR c++/58536
  PR c++/58548
  PR c++/58549
  PR c++/58637

 gcc/cp/decl.c|  30 +---
 gcc/cp/parser.c  | 312 
++-

 gcc/cp/parser.h  |  15 ++
 gcc/testsuite/g++.dg/cpp1y/pr58534.C |   9 +
 gcc/testsuite/g++.dg/cpp1y/pr58536.C |  12 ++
 gcc/testsuite/g++.dg/cpp1y/pr58548.C |  10 ++
 gcc/testsuite/g++.dg/cpp1y/pr58549.C |  10 ++
 gcc/testsuite/g++.dg/cpp1y/pr58637.C |   7 +
 8 files changed, 299 insertions(+), 106 deletions(-)

commit 90c77cdd87eb63617719a9ad129803a2048761ff
Author: Adam Butcher 
Date:   Wed Sep 18 17:39:40 2013 +0100

Refactor implicit function template implementation and fix 58534, 58536, 58548, 58549 and 58637.

	TODO: Changelog post review.
	PR c++/58534
	PR c++/58536
	PR c++/58548
	PR c++/58549
	PR c++/58637

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 81ed409..8095eca 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10323,34 +10323,12 @@ grokdeclarator (const cp_declarator *declarator,
 
   if (type_uses_auto (type))
 	{
-	  if (template_parm_flag)
-	{
-	  error ("template parameter declared %");
-	  type = error_mark_node;
-	}
-	  else if (decl_context == CATCHPARM)
-	{
-	  error ("catch parameter declared %");
+	  if (cxx_dialect >= cxx1y)
+	error ("% parameter not permitted in this context");
+	  else
+	error ("parameter declared %");
 	  type = error_mark_node;
 	}
-	  else if (current_class_type && LAMBDA_TYPE_P (current_class_type))
-	{
-	  if (cxx_dialect < cxx1y)
-		pedwarn (location_of (type), 0,
-			 "use of % in lambda parameter declaration "
-			 "only available with "
-			 "-std=c++1y or -std=gnu++1y");
-	}
-	  else if (cxx_dialect < cxx1y)
-	pedwarn (location_of (type), 0,
-		 "use of % in parameter declaration "
-		 "only available with "
-		 "-std=c++1y or -std=gnu++1y");
-	  else
-	pedwarn (location_of (type), OPT_Wpedantic,
-		 "ISO C++ forbids use of % in parameter "
-		 "declaration");
-	}
 
   /* A parameter declared as an array of T is really a pointer to T.
 	 One declared as a function is really a pointer to a function.
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 90c1775..8c8be4c 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -245,6 +245,19 @@ static FILE *cp_lexer_debug_stream;
sizeof, typeof, or alignof.  */
 int cp_unevaluated_operand;
 
+/* Nonzero if parsing a context where 'auto' in a parameter list should not
+   trigger an implicit template parameter.  Specifically, 'auto' should not
+   introduce a new template type parameter in explicit specializations, trailing
+   return types or exception specifiers.  */
+int cp_disable_auto_as_implicit_function_template_parm;
+
+/* Track the number of implicit template parameters introduced by the
+   current function parameter and, for handling implicit parameter packs, track
+   the most recently synthesized type.  These are reset prior to parsing in
+   cp_parameter_declarator and updated in synthesize_implicit_template_parm.  */
+int cp_num_implicit_template_type_parms;
+tree cp_last_implicit_template_type_parm;
+
 /* Dump up to NUM tokens in BUFFER to FILE starting with token
START_TOKEN.  If START_TOKEN is NULL, the dump starts with the
first token in BUFFER.  If NUM is 0, dump all the tokens.  If
@@ -2064,8 +2077,8 @@ static vec *cp_parser_initializer_list
 static bool cp_parser_ctor_initializer_opt_and_function_body
   (cp_parser *, bool);
 
-static tree add_implicit_template_parms
-  (cp_parser *, size_t, tree);
+static tree synthesize_implicit_template_parm
+  (cp_parser *parser);
 static tree finish_fully_implicit_template
   (cp_parser *, tree);
 
@@ -3393,6 +3406,8 @@ cp_parser_new (void)
 
   /* Not declaring an implicit function template.  */
   parser->fully_implicit_function_template_p = false;
+  parser->implicit_template_parms = 0;
+  parser->implicit_template_scope = 0;
 
   return parser;
 }
@@ -8559,11 +8574,15 @@ cp_parser_lambda_expression (cp_parser* parser)
 unsigned char in_statement = parser->in_statement;
 bool in_switch_statement_p = parser->in_switch_statement_p;

Re: [PATCH][ARM]Use cortex tune_params for armv8-a architecture

2013-10-07 Thread Kyrill Tkachov

On 07/10/13 09:47, Ramana Radhakrishnan wrote:

gcc/ChangeLog:

2013-10-03  Renlin Li 

 * config/arm/arm-cores.def (cortex-a53): Use cortex tunning.

s/tunning/tuning.

Ok with that change.

Hi Renlin,
I've committed this for you as r203241 with Changelog entry:

2013-10-07  Renlin Li  

* config/arm/arm-cores.def (cortex-a53): Use cortex tuning.


Kyrill



Re: [patch] Relocate remaining tree-flow-inline.h functions

2013-10-07 Thread Richard Biener
On Fri, Oct 4, 2013 at 6:43 PM, Andrew MacLeod  wrote:
> This patch mostly re-factors tree-flow-inline.h out of existence.
>
> the gimple-df data structure has found anew home in gimple-ssa.h, and this
> actually seems like a very appropriate place for it as it holds a lot fo the
> ssa specific stuff in it.
>
> The remaining inline functions in tree-flow-inline.h are spread to the wind
> a bit.
> - 2 were no longer used, s they are deleted.
> - The various stmt_uid functions went to gimple.h, along with some stats
> macros and misc stuff. didn't seem to be a more appropriate place right now
> based on their usage patterns.
> - tree-hasher.h was including tree-flow.h simply to get the definition of
> 'struct int_tree_map'.  man.  Its  a hash table function only, so it belongs
> here.
> - contains_view_convert_expr_p was only called from tree-sra.c, so I moved
> it there and made it static.
> - ranges_overlap_p .. this was used almost exclusively by SSA aliasing code,
> so this seemed appropriate for  tree-ssa-alias.h.
> - gimple_ssa_operands goes to tree-ssa-operands.c and becomes static as this
> was the only consumer
> - is_global_var and  may_be_aliased are tree specific.. may_be_aliased is
> used by dse.c, and by putting it into tree.c, that file is tantalizing close
> to being able to NOT include tree-ssa.h or any gimple stuff..
> local_variable_can_escape() is the only routine in there that is even aware
> of ssa or gimple and something seems wrong there. thats on my list to take
> care of shortly.
>
> Bootstraps on x86_64-unknown-linux-gnu and no new regressions.  OK?

Ok.

THanks,
Richard.

> Andrew
>


Re: Enable SSE math on i386 with -Ofast

2013-10-07 Thread Richard Biener
On Fri, 4 Oct 2013, Jan Hubicka wrote:

> Hi,
> this patch makes -Ofast to also imply -mfpmath=sse.  It is important win on
> SPECfP (2000 and 2006). Even though for exmaple the following
> float a(float b)
> {
>return b+10;
> }
> 
> results in somewhat ridiculous
> a:
> .LFB0:  
> .cfi_startproc
> subl$4, %esp
> .cfi_def_cfa_offset 8
> movss   .LC0, %xmm0
> addss   8(%esp), %xmm0
> movss   %xmm0, (%esp)
> flds(%esp)
> addl$4, %esp
> .cfi_def_cfa_offset 4
> ret
> 
> I wonder if we can get rid at least of the redundant stack alignment on ESP...
> 
> Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are no
> complains.  I wonder if -ffast-math should do the same - it is documented as 
> enabling
> explicit set of options, bu that can be changed I guess.

I wonder if we can restrict -mfpmath=sse to local functions where we can
change the ABI ... (do we change the local functions ABI with 
-mfpmath=sse?)

Richard.

>   * invoke.texi (Ofast): Update documentation.
>   * i386.h (TARGET_FPMATH_DEFAULT): Enable SSE math with -Ofast.
> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi (revision 203161)
> +++ doc/invoke.texi (working copy)
> @@ -6796,6 +6796,7 @@ Disregard strict standards compliance.
>  valid for all standard-compliant programs.
>  It turns on @option{-ffast-math} and the Fortran-specific
>  @option{-fno-protect-parens} and @option{-fstack-arrays}.
> +On i386 target it also enable @option{-mfpmath=sse}.
>  
>  @item -Og
>  @opindex Og
> Index: config/i386/i386.h
> ===
> --- config/i386/i386.h(revision 203161)
> +++ config/i386/i386.h(working copy)
> @@ -209,7 +209,8 @@ extern const struct processor_costs ix86
>  
>  #ifndef TARGET_FPMATH_DEFAULT
>  #define TARGET_FPMATH_DEFAULT \
> -  (TARGET_64BIT && TARGET_SSE ? FPMATH_SSE : FPMATH_387)
> +  ((TARGET_64BIT && TARGET_SSE) \
> +   || (TARGET_SSE && optimize_fast) ? FPMATH_SSE : FPMATH_387)
>  #endif
>  
>  #define TARGET_FLOAT_RETURNS_IN_80387 TARGET_FLOAT_RETURNS
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend


Re: [PATCH][ARM]Use cortex tune_params for armv8-a architecture

2013-10-07 Thread Ramana Radhakrishnan
> gcc/ChangeLog:
>
> 2013-10-03  Renlin Li 
>
> * config/arm/arm-cores.def (cortex-a53): Use cortex tunning.

s/tunning/tuning.

Ok with that change.

Ramana


Re: [patch] Add tree-ssa-loop.h and friends.

2013-10-07 Thread Richard Biener
On Thu, Oct 3, 2013 at 4:11 AM, Andrew MacLeod  wrote:
> this patch consolidates tree-ssa-loop*.c files with new .h files as required
> (8 in total)
>
> A number of the prototypes were in tree-flow.h, but there were also a few in
> cfgloop.h.  tree-ssa-loop.h was created to contain a couple of common
> structs and act as the gathering file for any generally applicable
> tree-ssa-loop includes. tree-flow.h includes this file for now.
>
> There is a bit of a criss-cross mess between the cfg-* and tree-ssa-loop*
> routines, but I'm not touching that for now.  Some of that might have to get
> resolved when I try to remove tree-flow.h as a standard include file from
> the .c files.. we'll see.
>
> In particular, tree-ssa-loop-niter.h exports a lot of more generally used
> routines. loop-iv.c, loop-unroll.c and loop-unswitch.c needed to include it.
>
> A few routines werent referenced outside their file so I made those static,
> and there was one routine stmt_invariant_in_loop_p wich was actually unused.
>
> bootstraps on x86_64-unknown-linux-gnu and passes with no new regressions.
> OK?

+   enum tree_code cmp;
+ };
+
+ #include "tree-ssa-loop-im.h"
+ #include "tree-ssa-loop-ivcanon.h"

what's the particular reason to not do includes first?  That looks really
odd (and maybe rather than doing that these includes should be done
elsewhere, like in .c files or in a header that includes tree-ssa-loop.h).

You seem to export things like movement_possibility that are not
used outside of tree-ssa-loop-im.c (in my tree at least).

Other than that the single reason we have to have all these exports
is that loop passes have their pass structures and gate / entries
defined in tree-ssa-loop.c instead of in the files the passes reside.
Consider changing that as a preparatory patch - it should cut down
the number of needed new .h files.

Richard.


> Andrew
>


Re: Enable building of libatomic on AArch64

2013-10-07 Thread Marcus Shawcroft
On 3 October 2013 23:43, Michael Hudson-Doyle  wrote:
> Hi,
>
> As libatomic builds for and the tests pass on AArch64 (built on x86_64
> but tested on a foundation model, logs and summary:
>
> http://people.linaro.org/~mwhudson/libatomic.sum.txt
> http://people.linaro.org/~mwhudson/runtest-log-v-2.txt
>
> ) this patch enables the build.
>
> Cheers,
> mwh
> (first time posting to this list, let me know if I'm doing it wrong)
>
> 2013-10-04  Michael Hudson-Doyle  
>
>   * configure.tgt: Add AArch64 support.
>

Hi,
The patch looks fine to me.

The ChangeLog entry should reflect the code that was removed rather
than the functionality added.  Perhaps:

  * configure.tgt (aarch64*): Remove.

Did you investigate whether or not the 10 UNSUPPORTED results in the
testsuite are sane?  I think that 5 look legitimate since they require
128 bit sync ops.  The other 5 look superficially like they should be
supported on aarch64.  We may just be missing aarch64 target supports
wiring in check_effective_target_sync_long_long_runtime?

/Marcus


Re: [SH] PR 51244 - Fix defects introduced in 4.8

2013-10-07 Thread Christian Bruel
Hi Oleg,

+/*
+This pass tries to optimize for example this:
+   mov.l   @(4,r4),r1
+   tst r1,r1
+   movtr1
+   tst r1,r1
+   bt/s.L5
+
+into something simpler:
+   mov.l   @(4,r4),r1
+   tst r1,r1
+   bf/s.L5
+
+Such sequences can be identified by looking for conditional branches and
+checking whether the ccreg is set before the conditional branch
+by testing another register for != 0, which was set by a ccreg store.
+This can be optimized by eliminating the redundant comparison and
+inverting the branch condition.  There can be multiple comparisons in
+different basic blocks that all end up in the redunant test insn before the
+conditional branch.  Some example RTL ...
+

Nice things to optimize the sequences when t-bit values are not recognized due 
to branches direction, I have 2 questions

1) I find the name "if-conversion" for this pass a little bit forced, since you 
don't aim to remove branches. If looks more like some kind of extension value 
numbering. 

2) I'm wondering in which extend this case could be handled by a more global 
generic target value numbering to handle boolean operations. Maybe just a 
phasing problem as the branch directions are not yet computed in gimple-ssa, 
which would mean reworking in RTL ?

Many thanks,

Christian