Re: [PATCH] Fix for PR/62089 (enable missing Asan checks)

2014-08-14 Thread Yury Gribov

On 08/14/2014 09:46 AM, Yury Gribov wrote:

Something like this?


Forgot to mention: full tested the new patch on x64 (bootstrap, regtest, 
Asan-bootstrap).


Re: [PATCH 2/3]Improve induction variable elimination

2014-08-14 Thread Bin.Cheng
On Fri, Jul 25, 2014 at 8:35 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Jul 17, 2014 at 11:08 AM, Bin Cheng bin.ch...@arm.com wrote:
 Hi,
 As quoted from the function difference_cannot_overflow_p,

   /* TODO: deeper inspection may be necessary to prove the equality.  */
   switch (code)
 {
 case PLUS_EXPR:
   return expr_equal_p (e1, offset) || expr_equal_p (e2, offset);
 case POINTER_PLUS_EXPR:
   return expr_equal_p (e2, offset);

 default:
   return false;
 }

 The overflow check can be improved by using deeper inspection to prove the
 equality.  This patch deals with that by making below two improvements:
   a) Handles constant cases.
   b) Uses affine expansion as deeper inspection to check the equality.

 As a result, functions strip_wrap_conserving_type_conversions and
 expr_equal_p can be removed now.  A test case is also added to illustrate iv
 elimination opportunity captured by this patch.

 Thanks,
 bin

 You add special casing for constants but I don't see any testcases for that.
 Specifically

 +  /* No overflow if offset is zero.  */
 +  if (offset == integer_zero_node)
  return true;

 is a bogus check (use integer_zerop).  Apart from the special-casing of
 constants the patch looks good to me.
Hi,
I changed/rebased the patch which passes bootstrap/test on x86_64.
Since it was review/approved before, I will commit it in 24 hours if
there is no objection.

Thanks,
bin


 2014-07-17  Bin Cheng  bin.ch...@arm.com

 * tree-ssa-loop-ivopts.c (ivopts_data): New field name_expansion.
 (tree_ssa_iv_optimize_init): Initialize name_expansion.
 (tree_ssa_iv_optimize_finalize): Free name_expansion.
 (strip_wrap_conserving_type_conversions, expr_equal_p): Delete.
 (difference_cannot_overflow_p): New parameter.  Handle constant
 cases.  Use affine expansion for equality check.
 (iv_elimination_compare_lt): Pass new argument.

 gcc/testsuite/ChangeLog
 2014-07-17  Bin Cheng  bin.ch...@arm.com

 * gcc.dg/tree-ssa/ivopts-lt-2.c: New test.
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-ivopts } */
+
+void
+f1 (int *p, unsigned int i)
+{
+  p += i;
+  do
+{
+  *p = 0;
+  p += 1;
+  i++;
+}
+  while (i  100);
+}
+
+/* { dg-final { scan-tree-dump-times PHI 1 ivopts } } */
+/* { dg-final { scan-tree-dump-times PHI p_ 1 ivopts} } */
+/* { dg-final { scan-tree-dump-times p_\[0-9\]*  1 ivopts } } */
+/* { dg-final { cleanup-tree-dump ivopts } } */
Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  (revision 213937)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -323,6 +323,9 @@ struct ivopts_data
   /* A bitmap of important candidates.  */
   bitmap important_candidates;
 
+  /* Cache used by tree_to_aff_combination_expand.  */
+  hash_maptree, name_expansion * *name_expansion_cache;
+
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
@@ -876,6 +879,7 @@ tree_ssa_iv_optimize_init (struct ivopts_data *dat
   data-iv_candidates.create (20);
   data-inv_expr_tab = new hash_tableiv_inv_expr_hasher (10);
   data-inv_expr_id = 0;
+  data-name_expansion_cache = NULL;
   decl_rtl_to_reset.create (20);
 }
 
@@ -4462,75 +4466,20 @@ iv_elimination_compare (struct ivopts_data *data,
   return (exit-flags  EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
-static tree
-strip_wrap_conserving_type_conversions (tree exp)
-{
-  while (tree_ssa_useless_type_conversion (exp)
- (nowrap_type_p (TREE_TYPE (exp))
-== nowrap_type_p (TREE_TYPE (TREE_OPERAND (exp, 0)
-exp = TREE_OPERAND (exp, 0);
-  return exp;
-}
-
-/* Walk the SSA form and check whether E == WHAT.  Fairly simplistic, we
-   check for an exact match.  */
-
-static bool
-expr_equal_p (tree e, tree what)
-{
-  gimple stmt;
-  enum tree_code code;
-
-  e = strip_wrap_conserving_type_conversions (e);
-  what = strip_wrap_conserving_type_conversions (what);
-
-  code = TREE_CODE (what);
-  if (TREE_TYPE (e) != TREE_TYPE (what))
-return false;
-
-  if (operand_equal_p (e, what, 0))
-return true;
-
-  if (TREE_CODE (e) != SSA_NAME)
-return false;
-
-  stmt = SSA_NAME_DEF_STMT (e);
-  if (gimple_code (stmt) != GIMPLE_ASSIGN
-  || gimple_assign_rhs_code (stmt) != code)
-return false;
-
-  switch (get_gimple_rhs_class (code))
-{
-case GIMPLE_BINARY_RHS:
-  if (!expr_equal_p (gimple_assign_rhs2 (stmt), TREE_OPERAND (what, 1)))
-   return false;
-  /* Fallthru.  */
-
-case GIMPLE_UNARY_RHS:
-case GIMPLE_SINGLE_RHS:
-  return expr_equal_p (gimple_assign_rhs1 (stmt), TREE_OPERAND (what, 0));
-default:
-  

Re: [PATCH][www] Document versioning scheme for GCC 5 and up

2014-08-14 Thread Jakub Jelinek
Hi!

Apparently using numbers without dots in 4.9/5 Regression has the
disadvantage that some bugs make it to the query even when they should not,
like
[4.8 Regression] ICE in set_address_disp, at rtlanal.c:5537
or
avx512f-ceil-sfix-vec-2.c and avx512f-floor-sfix-vec-2.c FAIL on Solaris9/x86
Matching a string 5 in the short_desc is simply too fragile.
Note even the 4.9 search strings are fragile, some people happened to type
[4.8 Regression] Something bla bla, works in 4.9
or similar.

Wonder what extra resources would it take to use regexps instead,
and dunno if we don't need to escape it further.

Thoughts?

--- index.html  13 Aug 2014 13:09:00 -  1.934
+++ index.html  14 Aug 2014 07:52:01 -
@@ -176,10 +176,10 @@ Any additions?  Don't be shy, send them 
   br /
   span class=regress
   a
-  
href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=allwordssubstramp;short_desc=5amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;priority=P1amp;priority=P2amp;priority=P3amp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=;Serious
+  
href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=regexpamp;short_desc=\[([
 0-9.%2F]*[ %2F])*5[ %2F][ 0-9.%2F]*[Rr]egression 
*\]amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;priority=P1amp;priority=P2amp;priority=P3amp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=Serious
   regressions/a.
   a
-  
href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=allwordssubstramp;short_desc=5amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=;All
+  
href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=regexpamp;short_desc=\[([
 0-9.%2F]*[ %2F])*5[ %2F][ 0-9.%2F]*[Rr]egression 
*\]amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=All
   regressions/a.
   /span
 /dd

Jakub


Re: [PATCH] Fix for PR/62089 (enable missing Asan checks)

2014-08-14 Thread Jakub Jelinek
On Thu, Aug 14, 2014 at 09:46:46AM +0400, Yury Gribov wrote:
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -1690,22 +1690,21 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
int volatilep = 0, unsignedp = 0;
tree inner = get_inner_reference (t, bitsize, bitpos, offset,
   mode, unsignedp, volatilep, false);
 -  if (((size_in_bytes  (size_in_bytes - 1)) == 0
 -(bitpos % (size_in_bytes * BITS_PER_UNIT)))
 -  || bitsize != size_in_bytes * BITS_PER_UNIT)
 +
 +  if (TREE_CODE (t) == COMPONENT_REF)
  {
 -  if (TREE_CODE (t) == COMPONENT_REF
 -DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1)) != NULL_TREE)
 +  if (DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1)) != NULL_TREE)
   {
 tree repr = DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (t, 1));
 instrument_derefs (iter, build3 (COMPONENT_REF, TREE_TYPE (repr),
  TREE_OPERAND (t, 0), repr,
  NULL_TREE), location, is_store);
 +   return;
   }
 -  return;

 +  else if (bitpos % BITS_PER_UNIT
 +|| bitsize != size_in_bytes * BITS_PER_UNIT)
 + return;

No, this should be if, not else if, and be after the } below.
We really can't handle it otherwise.
Generally, the bitfield COMPONENT_REFs should have
DECL_BIT_FIELD_REPRESENTATIVE which is not a bitfield, therefore the common
case will be handled.

As for BIT_FIELD_REFs, the question is what access to use instead of them.

Jakub


Re: [testsuite patch] add __ARM_ARCH check for arm_v8_neon_ok

2014-08-14 Thread Ramana Radhakrishnan
On Mon, Aug 11, 2014 at 11:01 PM, Janis Johnson
janis_john...@mentor.com wrote:
 The check for effective target arm_v8_neon_ok passes even if __ARM_ARCH
 is not 8 or greater, but then some tests fail because intrinsic functions
 used in the test have not been declared.  This patch requires that
 __ARM_ARCH be 8 or greater.  Tested for arm-none-linux-gnu for mainline
 and 4.9 with a variety of multilib flags.


Out of curiosity  - A number of tests are gated off by the target
triplet arm*-*-*eabi* , I wonder how many of them actually run if you
test with arm-none-linux-gnu . Also given this is the pre-EABI linux
triplet, so I'd prefer not to conflate this with the use of the
triplet arm-none-linux-gnueabi(hf).

regards
Ramana

 OK for mainline and the 4.9 branch?

 Janis


Re: [PATCH][optabs.c] Fix PR 61713: ICE when expanding single-threaded version of atomic_test_and_set

2014-08-14 Thread Kyrill Tkachov

Hi all,

CC'ing release manager,

Is this ok to backport to 4.9? Tested there with no problems.

Kyrill


On 04/08/14 17:54, Kyrill Tkachov wrote:

On 25/07/14 23:05, Jeff Law wrote:

On 07/23/14 02:53, Kyrill Tkachov wrote:

Darn, had forgotten to attach the patch...


On 16/07/14 12:30, Kyrill Tkachov wrote:

Hi all,

This fixes the PR mentioned in the subject. When expanding
atomic_test_and_set we try the corresponding sync optabs and if none
exist, like for pre-SMP ARM architectures (-march=armv6 and earlier) the
code in optabs.c reverts to a single-threaded implementation that just
does a load and store.

However, if the result of the operation is to be ignored the code in
builtins.c follows the convention that the target RTX is set to
const0_rtx to signify that the result is ignored and the expand
functions in optabs.c are supposed to check for that and act
appropriately.

The code in the single-threaded codepath in expand_atomic_test_and_set
in optabs.c didn't perform this check and instead tried to emit a move
to a const0_rtx which ICEs further down the line.

This patch fixes that by checking if the result is ignored, and if it is
omits the load.
I wouldn't dare to remove the load in normal atomic handling code due to
all the memory ordering subtleties, but this code path occurs only when
we have a trivially single-threaded bare-metal target and the compiler
assumes no races anyway and no dodgy context switches and tries to
implement this with a ldrb+strb, so I think removing the ldrb is valid.

Bootstrapped on arm, aarch64 and x86 and tested as well.

Ok for trunk?
This appears on 4.9 as well, I'll test it on that branch as well

2014-07-16  Kyrylo Tkachov  kyrylo.tkac...@arm.com

PR target/61713
* gcc/optabs.c (expand_atomic_test_and_set): Do not try to emit
move to subtarget in serial version if result is ignored.

OK.
jeff

Thanks Jeff,
Is this ok for the 4.9 branch as well if testing comes back ok? (It
applies cleanly there.)

Kyrill








Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-08-14 Thread Evgeny Stupachenko
Ping.

On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson r...@redhat.com wrote:
 On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
 -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)

 insn_num might as well be bool avx2, since it's only ever set to two 
 values.

 Agree. However:
  after the alignment, one operand permutation could be just move and
 take 2 instructions for AVX2 as well
  for AVX2 there could be other cases when the scheme takes 4 or 5 instructions
  we can leave it for potential avx512 extension


 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  if (!TARGET_SSSE3)
 +   return false;
 +}
 +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
 +  else if (GET_MODE_SIZE (d-vmode) == 32)
 +{
 +  if (!TARGET_AVX2)
 +   return false;
 +}
 +  /* Other sizes are not supported.  */
 +  else
  return false;

 And you'd better check it up here because...


 Correct. The following should resolve the issue:
   /* For AVX2 we need more than 2 instructions when the alignment
  by itself does not produce the desired permutation.  */
   if (TARGET_AVX2  insn_num = 2)
 return false;

 +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
 + operand permutaoin.  */
 +  if (insn_num == 2)
 +{
 +  ok = expand_vec_perm_1 (dcopy);
 +  gcc_assert (ok);
 +}
 +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
 + vperm plus 4 instructions for one operand permutation.  */
 +  else if (insn_num == 6)
 +{
 +  ok = expand_vec_perm_vpshufb2_vpermq (dcopy);
 +  gcc_assert (ok);
 +}
 +  else
 +ok = false;
return ok;

 ... down here you'll simply ICE from the gcc_assert.






 r~


[PATCH testcase]fix failure of g++.dg/ext/arm-fp16/fp16-mangle-1.C

2014-08-14 Thread Bin Cheng
Hi,
g++.dg/ext/arm-fp16/fp16-mangle-1.C is failed because GCC now sets
DECL_COMDAT on template instantiations if flag_implicit_templates is in
effect.  Then DECL_WEAK will be set accordingly.  As a result, checking for
.global  would fail on this case.

This patch fixes this by relaxing scanning string into .global|weak 
just in case it's still global if some options are specified explicitly.

Since the change is made by
https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=213307.
Hi Jason, could you help me here that I understood the issue correctly?
Thanks very much.

Thanks,
bin

gcc/testsuite/ChangeLog
2014-08-14  Bin Cheng  bin.ch...@arm.com

* g++.dg/ext/arm-fp16/fp16-mangle-1.C: Also check .weak.
Index: gcc/testsuite/g++.dg/ext/arm-fp16/fp16-mangle-1.C
===
--- gcc/testsuite/g++.dg/ext/arm-fp16/fp16-mangle-1.C   (revision 213809)
+++ gcc/testsuite/g++.dg/ext/arm-fp16/fp16-mangle-1.C   (working copy)
@@ -9,6 +9,6 @@ void f (__fp16 *x) { }
 /* { dg-final { scan-assembler \t.global\t_Z1gPDhS_ } } */
 void g (__fp16 *x, __fp16 *y) { }
 
-/* { dg-final { scan-assembler \t.global\t_ZN1SIDhDhE1iE } } */
-template typename T, typename U struct S { static int i; }; 
+/* { dg-final { scan-assembler \t.global|weak\t_ZN1SIDhDhE1iE } } */
+template typename T, typename U struct S { static int i; };
 template  int S__fp16, __fp16::i = 3;


Re: [PATCH] Fix PR61950

2014-08-14 Thread Paul Richard Thomas
Dear Richie and Tobias,

I OK'd this patch on 9th August but I now see that the posting bounced
because of mime content emanating from my phone mail reader :-(

I also thought that the patch is obvious.

Cheers

Paul


On 13 August 2014 23:22, Tobias Burnus bur...@net-b.de wrote:
 Hi Richard,

 sorry for the belated reply due to spending lazily my time at the sea ...

 The patch is okay - and I would even claim that it is obvious.

 Regarding the data type, I have no idea why it uses a hard-coded 32bit
 integer instead of a size-type one. I have added it as item to
 https://gcc.gnu.org/wiki/LibgfortranAbiCleanup to change it once we break
 the ABI.

 Tobias

 PS: I wonder where all the other patch reviewers are, given that quite a few
 patches have accumulated. In particular, I am lacking a patch reviewer
 myself.


 Richard Biener wrote:

 The following fixes PR61950 by properly initializing the _size field
 of the static constructor for the vtable init member.  Previously
 gfortran would use a 64bit integer to initialize the 32bit size field
 (not sure why larger aggregates are not considered).

 This breaks more sophisticated constant-folding and previously
 inhibited constant folding (which would have worked in this case
 been there no mismatch between initializer and field).

 Bootstrap and regtest ongoing on powerpc64-linux-gnu, ok?  I'm not
 considering a backport as it is only a missed optimization there.

 Thanks,
 Richard.

 2014-07-31  Richard Biener  rguent...@suse.de

 * trans-expr.c (gfc_conv_structure): Initialize _size with
 a value of proper type.

 Index: gcc/fortran/trans-expr.c
 ===
 --- gcc/fortran/trans-expr.c(revision 213342)
 +++ gcc/fortran/trans-expr.c(working copy)
 @@ -6260,7 +6260,9 @@ gfc_conv_structure (gfc_se * se, gfc_exp
 else if (cm-ts.u.derived  strcmp (cm-name, _size) == 0)
 {
   val = TYPE_SIZE_UNIT (gfc_get_derived_type (cm-ts.u.derived));
 - CONSTRUCTOR_APPEND_ELT (v, cm-backend_decl, val);
 + CONSTRUCTOR_APPEND_ELT (v, cm-backend_decl,
 + fold_convert (TREE_TYPE
 (cm-backend_decl),
 +   val));
 }
 else
 {





-- 
The knack of flying is learning how to throw yourself at the ground and miss.
   --Hitchhikers Guide to the Galaxy


Re: [PATCH][www] Document versioning scheme for GCC 5 and up

2014-08-14 Thread Richard Biener
On Thu, 14 Aug 2014, Jakub Jelinek wrote:

 Hi!
 
 Apparently using numbers without dots in 4.9/5 Regression has the
 disadvantage that some bugs make it to the query even when they should not,
 like
 [4.8 Regression] ICE in set_address_disp, at rtlanal.c:5537
 or
 avx512f-ceil-sfix-vec-2.c and avx512f-floor-sfix-vec-2.c FAIL on Solaris9/x86
 Matching a string 5 in the short_desc is simply too fragile.
 Note even the 4.9 search strings are fragile, some people happened to type
 [4.8 Regression] Something bla bla, works in 4.9
 or similar.
 
 Wonder what extra resources would it take to use regexps instead,
 and dunno if we don't need to escape it further.

Works for me.

Richard.

 Thoughts?
 
 --- index.html13 Aug 2014 13:09:00 -  1.934
 +++ index.html14 Aug 2014 07:52:01 -
 @@ -176,10 +176,10 @@ Any additions?  Don't be shy, send them 
br /
span class=regress
a
 - 
 href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=allwordssubstramp;short_desc=5amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;priority=P1amp;priority=P2amp;priority=P3amp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=n
 oopamp;type0-0-0=noopamp;value0-0-0=Serious 
 + 
 href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=regexpamp;short_desc=\[([
  
 0-9.%2F]*[ %2F])*5[ %2F][ 0-9.%2F]*[Rr]egression 
 *\]amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;priority=P1amp;priority=P2amp;priority=P3amp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=Serious
regressions/a.
a
 -  
 href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=allwordssubstramp;short_desc=5amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-0=noopamp;type0-0-0=noopamp;value0-0-0=;All
 +  
 href=https://gcc.gnu.org/bugzilla/buglist.cgi?query_format=advancedamp;short_desc_type=regexpamp;short_desc=\[([
  0-9.%2F]*[ %2F])*5[ %2F][ 0-9.%2F]*[Rr]egression 
 *\]amp;target_milestone=4.8.4amp;target_milestone=4.9.2amp;target_milestone=5.0amp;known_to_fail_type=allwordssubstramp;known_to_work_type=allwordssubstramp;long_desc_type=allwordssubstramp;long_desc=amp;bug_file_loc_type=allwordssubstramp;bug_file_loc=amp;gcchost_type=allwordssubstramp;gcchost=amp;gcctarget_type=allwordssubstramp;gcctarget=amp;gccbuild_type=allwordssubstramp;gccbuild=amp;keywords_type=allwordsamp;keywords=amp;bug_status=UNCONFIRMEDamp;bug_status=NEWamp;bug_status=ASSIGNEDamp;bug_status=SUSPENDEDamp;bug_status=WAITINGamp;bug_status=REOPENEDamp;emailtype1=substringamp;email1=amp;emailtype2=substringamp;email2=amp;bugidtype=includeamp;bug_id=amp;votes=amp;chfieldfrom=amp;chfieldto=Nowamp;chfieldvalue=amp;cmdtype=doitamp;order=Reuse+same+sort+as+last+timeamp;field0-0-
 0=noopamp;type0-0-0=noopamp;value0-0-0=All
regressions/a.
/span
 

Re: [PATCH][optabs.c] Fix PR 61713: ICE when expanding single-threaded version of atomic_test_and_set

2014-08-14 Thread Kyrill Tkachov


On 14/08/14 09:09, Kyrill Tkachov wrote:

Hi all,

CC'ing release manager,

Is this ok to backport to 4.9? Tested there with no problems.


Ah, I see Jeff already ok'd it, sorry for the noise, must have missed then.

Kyrill


Kyrill


On 04/08/14 17:54, Kyrill Tkachov wrote:

On 25/07/14 23:05, Jeff Law wrote:

On 07/23/14 02:53, Kyrill Tkachov wrote:

Darn, had forgotten to attach the patch...


On 16/07/14 12:30, Kyrill Tkachov wrote:

Hi all,

This fixes the PR mentioned in the subject. When expanding
atomic_test_and_set we try the corresponding sync optabs and if none
exist, like for pre-SMP ARM architectures (-march=armv6 and earlier) the
code in optabs.c reverts to a single-threaded implementation that just
does a load and store.

However, if the result of the operation is to be ignored the code in
builtins.c follows the convention that the target RTX is set to
const0_rtx to signify that the result is ignored and the expand
functions in optabs.c are supposed to check for that and act
appropriately.

The code in the single-threaded codepath in expand_atomic_test_and_set
in optabs.c didn't perform this check and instead tried to emit a move
to a const0_rtx which ICEs further down the line.

This patch fixes that by checking if the result is ignored, and if it is
omits the load.
I wouldn't dare to remove the load in normal atomic handling code due to
all the memory ordering subtleties, but this code path occurs only when
we have a trivially single-threaded bare-metal target and the compiler
assumes no races anyway and no dodgy context switches and tries to
implement this with a ldrb+strb, so I think removing the ldrb is valid.

Bootstrapped on arm, aarch64 and x86 and tested as well.

Ok for trunk?
This appears on 4.9 as well, I'll test it on that branch as well

2014-07-16  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 PR target/61713
 * gcc/optabs.c (expand_atomic_test_and_set): Do not try to emit
 move to subtarget in serial version if result is ignored.

OK.
jeff

Thanks Jeff,
Is this ok for the 4.9 branch as well if testing comes back ok? (It
applies cleanly there.)

Kyrill











Re: [PATCH] Fix PR61950

2014-08-14 Thread Richard Biener
On Thu, 14 Aug 2014, Paul Richard Thomas wrote:

 Dear Richie and Tobias,
 
 I OK'd this patch on 9th August but I now see that the posting bounced
 because of mime content emanating from my phone mail reader :-(
 
 I also thought that the patch is obvious.

I already applied it after your approval.  I wasn't sure about the
obviousness because of the 32bit thing (the actual bug may be
different).

Richard.

 Cheers
 
 Paul
 
 
 On 13 August 2014 23:22, Tobias Burnus bur...@net-b.de wrote:
  Hi Richard,
 
  sorry for the belated reply due to spending lazily my time at the sea ...
 
  The patch is okay - and I would even claim that it is obvious.
 
  Regarding the data type, I have no idea why it uses a hard-coded 32bit
  integer instead of a size-type one. I have added it as item to
  https://gcc.gnu.org/wiki/LibgfortranAbiCleanup to change it once we break
  the ABI.
 
  Tobias
 
  PS: I wonder where all the other patch reviewers are, given that quite a few
  patches have accumulated. In particular, I am lacking a patch reviewer
  myself.
 
 
  Richard Biener wrote:
 
  The following fixes PR61950 by properly initializing the _size field
  of the static constructor for the vtable init member.  Previously
  gfortran would use a 64bit integer to initialize the 32bit size field
  (not sure why larger aggregates are not considered).
 
  This breaks more sophisticated constant-folding and previously
  inhibited constant folding (which would have worked in this case
  been there no mismatch between initializer and field).
 
  Bootstrap and regtest ongoing on powerpc64-linux-gnu, ok?  I'm not
  considering a backport as it is only a missed optimization there.
 
  Thanks,
  Richard.
 
  2014-07-31  Richard Biener  rguent...@suse.de
 
  * trans-expr.c (gfc_conv_structure): Initialize _size with
  a value of proper type.
 
  Index: gcc/fortran/trans-expr.c
  ===
  --- gcc/fortran/trans-expr.c(revision 213342)
  +++ gcc/fortran/trans-expr.c(working copy)
  @@ -6260,7 +6260,9 @@ gfc_conv_structure (gfc_se * se, gfc_exp
  else if (cm-ts.u.derived  strcmp (cm-name, _size) == 0)
  {
val = TYPE_SIZE_UNIT (gfc_get_derived_type (cm-ts.u.derived));
  - CONSTRUCTOR_APPEND_ELT (v, cm-backend_decl, val);
  + CONSTRUCTOR_APPEND_ELT (v, cm-backend_decl,
  + fold_convert (TREE_TYPE
  (cm-backend_decl),
  +   val));
  }
  else
  {
 
 
 
 
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [PATCH] Fix PR62077

2014-08-14 Thread Richard Biener
On Wed, 13 Aug 2014, Jason Merrill wrote:

 On 08/13/2014 10:28 AM, Richard Biener wrote:
  Sofar the patch survived building stage2 in a LTO bootstrap on the
  4.9 branch, full testing is scheduled for trunk.
 
 The patch breaks a lot of C++ testcases, such as
 g++.old-deja/g++.other/cvt1.C; I think you need to share the set the
 canonical type code with the template path.
 
  Jason, are you happy with that (esp. ripping out the odd
  type completion stuff that also messes with types recorded in
  said hashtable)?
 
 I'm nervous about it, since it leads to ARRAY_TYPEs with different TYPE_ALIGN
 than their elements, though I'm not sure this actually breaks anything.
 Perhaps we could copy TYPE_ALIGN and TYPE_USER_ALIGN at the same place we copy
 TYPE_NEEDS_CONSTRUCTING.

Um, ok.  I don't feel like fiddling with this C++ frontend part and
am going to try workaround it in tree.c with sth like

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 213814)
+++ gcc/tree.c  (working copy)
@@ -6759,6 +6759,13 @@ type_hash_canon (unsigned int hashcode,
   t1 = type_hash_lookup (hashcode, type);
   if (t1 != 0)
 {
+  if (TYPE_MAIN_VARIANT (t1) != t1)
+   {
+ /* Oops.  C++ FE fun.  Overwrite the entry.  */
+ type_hash_add (hashcode, type);
+ return type;
+   }
+  gcc_assert (TYPE_MAIN_VARIANT (t1) == t1);
   if (GATHER_STATISTICS)
{
  tree_code_counts[(int) TREE_CODE (type)]--;

as I seem to have other GC-related LTO IL differences to chase.

So - can you take over this C++ frontend issue?

Thanks,
Richard.


Re: [PATCH] Fix PR62077

2014-08-14 Thread Richard Biener
On Thu, 14 Aug 2014, Richard Biener wrote:

 On Wed, 13 Aug 2014, Jason Merrill wrote:
 
  On 08/13/2014 10:28 AM, Richard Biener wrote:
   Sofar the patch survived building stage2 in a LTO bootstrap on the
   4.9 branch, full testing is scheduled for trunk.
  
  The patch breaks a lot of C++ testcases, such as
  g++.old-deja/g++.other/cvt1.C; I think you need to share the set the
  canonical type code with the template path.
  
   Jason, are you happy with that (esp. ripping out the odd
   type completion stuff that also messes with types recorded in
   said hashtable)?
  
  I'm nervous about it, since it leads to ARRAY_TYPEs with different 
  TYPE_ALIGN
  than their elements, though I'm not sure this actually breaks anything.
  Perhaps we could copy TYPE_ALIGN and TYPE_USER_ALIGN at the same place we 
  copy
  TYPE_NEEDS_CONSTRUCTING.
 
 Um, ok.  I don't feel like fiddling with this C++ frontend part and
 am going to try workaround it in tree.c with sth like
 
 Index: gcc/tree.c
 ===
 --- gcc/tree.c  (revision 213814)
 +++ gcc/tree.c  (working copy)
 @@ -6759,6 +6759,13 @@ type_hash_canon (unsigned int hashcode,
t1 = type_hash_lookup (hashcode, type);
if (t1 != 0)
  {
 +  if (TYPE_MAIN_VARIANT (t1) != t1)
 +   {
 + /* Oops.  C++ FE fun.  Overwrite the entry.  */
 + type_hash_add (hashcode, type);
 + return type;
 +   }
 +  gcc_assert (TYPE_MAIN_VARIANT (t1) == t1);
if (GATHER_STATISTICS)
 {
   tree_code_counts[(int) TREE_CODE (type)]--;

Awww, and that quickly runs into a bootstrap ICE like

In file included from 
/abuild/rguenther/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/rope:2975:0,
 from 
/space/rguenther/src/svn/trunk/libstdc++-v3/include/precompiled/extc++.h:52:
/abuild/rguenther/obj/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/ropeimpl.h:1185:52:
 
internal compiler error: canonical types differ for identical types const 
long unsigned int [46] and const long unsigned int [46]
 _S_min_len[int(__detail::_S_max_rope_depth) + 1] = {

Richard.


Re: PING – Re: [Patch, Fortran] -fcoarray=lib - support CRITICAL, prepare for locking support

2014-08-14 Thread Paul Richard Thomas
Dear Tobias, dear all,

This patch and the documentation patch are OK for trunk.

Many thanks

Paul


On 6 August 2014 08:46, Tobias Burnus bur...@net-b.de wrote:
 * PING * – of the patch with the obvious change mentioned by Alessandro
 (i.e. using if(is_lock_type))?

 Tobias

 On 1 August 2014 21:57, Alessandro Fanfarillo wrote:

 Hello,

 I was implementing lock/unlock on the library side when I found a
 possible problem in the patch:

 if (is_lock_type == GFC_CAF_CRITICAL)
 +reg_type = sym-attr.artificial ? GFC_CAF_CRITICAL :
 GFC_CAF_LOCK_STATIC;
 +  else
 +reg_type = GFC_CAF_COARRAY_STATIC;

 the if statement cannot be true since is_lock_type is a boolean and
 GFC_CAF_CRITICAL is 4.

 Using if (is_lock_type) it produces the right result for the lock
 registration.


 Regards

 Alessandro

 2014-07-28 14:37 GMT-06:00 Tobias Burnus bur...@net-b.de:

 This patch implements -fcoarray=lib support for CRITICAL blocks and
 includes
 some preparatory work for locking. In particular:

 * Updated the documentation for locking/critical, minor cleanup. The
 patch
 goes on top of the unreviewed patch
 https://gcc.gnu.org/ml/fortran/2014-07/msg00155.html
 * Add libcaf_single implementation for lock/unlock
 * Add lock/unlock calls for CRITICAL
 * Register static/SAVEd locking variables and locking variables for
 critical
 sections.

 Build and currently regtesting on x86-64-gnu-linux.
 OK when it regtested successfully?

   * * *

 Still to be done as follow up:
 * Handling the registering of lock-type components in statically
 allocated
 derived types
 * Handling the registering of lock-type variables and components with
 allocate and with implicit/explicit deallocate
 * Calling lock/unlock function for those
 * Test case for locking and critical blocks

 Other coarray to-do items:
 * Type-conversion test case missing
 * Vector subscript library implementation + test cases
 * Extending the documentation
 * Issues with striding for coarray components of derived types
 * Nonallocatable polymophic coarrays and select type/associated
 * Allocatable/pointer components of coarrays; co_reduce and co_broadcast

 Tobias





-- 
The knack of flying is learning how to throw yourself at the ground and miss.
   --Hitchhikers Guide to the Galaxy


Re: [PATCH] Fix find_inc in the scheduler (PR target/62025)

2014-08-14 Thread Jakub Jelinek
On Thu, Aug 14, 2014 at 05:12:49AM +0200, Bernd Schmidt wrote:
 On 08/12/2014 09:35 PM, Jakub Jelinek wrote:
 As detailed in the PR, find_inc ignored any possible clobbers on
 inc_insn (typically %cc/flags/etc. register) and thus we could ignore
 all register dependencies between mem_insn and inc_insn even when
 we could only safely ignore the mem_reg0 register dependency.
 
 I've been trying to remember how I intended to prevent this, and there is
 indeed a mechanism: DEP_MULTIPLE, which ought to be set for any dependency
 between two insns that is found for more than one reason. find_inc does not
 try to break such dependencies, which should make it examine only cases
 where the only reason for a dependency is between the incremented register
 and the memory address. I'm pretty sure this worked at some point.

It can't have ever worked.
DEP_MULTIPLE is set only in update_dep.
But what happens here is that while
add_or_update_dep_1 is called indeed twice, with
dep-pro:
(insn 4485 4484 2569 3 (parallel [
(set (reg:SI 10 %r10 [ D.1921 ])
(plus:SI (plus:SI (ltu:SI (reg:CCL1 33 %cc)
(const_int 0 [0]))
(reg:SI 10 %r10 [ D.1921 ]))
(mem:SI (plus:SI (reg:SI 3 %r3 [orig:622 ivtmp.179 ] [622])
(const_int 128 [0x80])) [4 MEM[base: _1817, offset: 
128B]+0 S4 A64])))
(clobber (reg:CC 33 %cc))
]) pr62025.c:139 407 {*addsi3_alc}
 (expr_list:REG_DEAD (reg:CCL1 33 %cc)
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil
dep-con:
(insn 1027 2571 2573 3 (parallel [
(set (reg:SI 3 %r3 [orig:622 ivtmp.179 ] [622])
(plus:SI (reg:SI 3 %r3 [orig:622 ivtmp.179 ] [622])
(const_int 128 [0x80])))
(clobber (reg:CC 33 %cc))
]) 327 {*addsi3}
 (expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil)))
dep-type:
REG_DEP_ANTI

(once because of the %cc clobber - %cc use, and then because of
the %r3 set - %r3 use), update_dep isn't called at all and thus
DEP_MULTIPLE is not set.  That is because we hit the:

  if (true_dependency_cache != NULL)
{
  switch (ask_dependency_caches (new_dep))
{
case DEP_PRESENT:
  return DEP_PRESENT;

path and return DEP_PRESENT immediately.  Thus, DEP_MULTIPLE is set
only if you have multiple kinds of dependencies in between pro and con,
not just multiple reasons to have the same kind of dependency.

So, to set DEP_MULTIPLE even in the case where ask_depencency_caches
returns DEP_PRESENT, you'd need to find the old dependency anyway (isn't
that going to be expensive and totally kill all the effects of
true_dependency_cache?) and set DEP_MULTIPLE on it if not already set.

Jakub


Re: [PATCH] Fix find_inc in the scheduler (PR target/62025)

2014-08-14 Thread Jakub Jelinek
On Thu, Aug 14, 2014 at 11:34:04AM +0200, Jakub Jelinek wrote:
 So, to set DEP_MULTIPLE even in the case where ask_depencency_caches
 returns DEP_PRESENT, you'd need to find the old dependency anyway (isn't
 that going to be expensive and totally kill all the effects of
 true_dependency_cache?) and set DEP_MULTIPLE on it if not already set.

Something like (untested except for the openssl s390 testcase), haven't
checked what effects it has on the number of successful find_inc cases.

--- gcc/sched-deps.c.jj 2014-08-12 17:06:26.0 +0200
+++ gcc/sched-deps.c2014-08-14 11:46:25.509631874 +0200
@@ -1233,6 +1233,13 @@ add_or_update_dep_1 (dep_t new_dep, bool
   switch (ask_dependency_caches (new_dep))
{
case DEP_PRESENT:
+ dep_t present_dep;
+ sd_iterator_def sd_it;
+  
+ present_dep = sd_find_dep_between_no_cache (DEP_PRO (new_dep),
+ DEP_CON (new_dep),
+ resolved_p, sd_it);
+ DEP_MULTIPLE (present_dep) = 1;
  return DEP_PRESENT;
 
case DEP_CHANGED:
@@ -4752,23 +4759,6 @@ find_inc (struct mem_inc_info *mii, bool
goto next;
  }
 
- /* The inc instruction could have clobbers, make sure those
-registers are not used in mem insn.  */
- FOR_EACH_INSN_DEF (def, mii-inc_insn)
-   if (!reg_overlap_mentioned_p (DF_REF_REG (def), mii-mem_reg0))
- {
-   df_ref use;
-   FOR_EACH_INSN_USE (use, mii-mem_insn)
- if (reg_overlap_mentioned_p (DF_REF_REG (def),
-  DF_REF_REG (use)))
-   {
- if (sched_verbose = 5)
-   fprintf (sched_dump,
-inc clobber used in store failure.\n);
- goto next;
-   }
- }
-
  newaddr = mii-inc_input;
  if (mii-mem_index != NULL_RTX)
newaddr = gen_rtx_PLUS (GET_MODE (newaddr), newaddr,


Jakub


Re: RFC: Patch for switch elimination (PR 54742)

2014-08-14 Thread Richard Biener
On Wed, Aug 13, 2014 at 11:06 PM, Jeff Law l...@redhat.com wrote:
 On 08/13/14 14:55, Sebastian Pop wrote:

 Steve Ellcey wrote:

 +/* This file implements an optimization where, when a variable is set
 +   to a constant value and there is a path that leads from that
 definition
 +   to a switch statement that uses that variable as its controlling
 expression
 +   we duplicate the blocks on this path and change the jump to the
 switch
 +   statement with a direct jump to the label of the switch block that
 control
 +   would goto based on the value of the variable.  This can come up in
 +   loops/switch statements that implement state machines.


 Can you explain why the jump-threading pass cannot do this?  Why do we
 need
 another pass to handle a corner case that jump-thread does not catch yet?

 I'm pretty sure jump threading *could* handle it, but after looking at the
 full testcase when it was posted, I'm not sure it's *wise* to handle this in
 jump threading.

 The core issue is we have to look even deeper into the CFG than was
 originally envisioned and do quite a bit more block duplication than was
 originally envisioned.  That's going to have a notable compile-time cost
 (and to a lesser extent issues around codesize).

 It's unfortunate that the testcase when we first looked at this was
 over-simplified and not actually representative of what happens in Coremark.
 Had I seen the Coremark testcase, I probably wouldn't have suggested we
 tackle the problem in jump threading and the extensions I made to jump
 threading would _not_ have included aggressively following backedges in the
 CFG.

 You'll note in a separate thread Steve and I discussed this during Cauldron
 and it was at my recommendation Steve resurrected his proof of concept
 plugin and started beating it into shape.

But do we really want a pass just to help coremark?

Richard.

 jeff




Re: [PATCH] Don't set dir prefix twice (PR middle-end/60484)

2014-08-14 Thread Richard Biener
On Thu, Aug 14, 2014 at 7:34 AM, Joey Ye joey.ye...@gmail.com wrote:
 PR60484 is marked as 4.7/4.8 regression and it is reported against 4.8
 recently by an user.

 OK backporting to 4.7/4.8?

The 4.7 branch is closed.

Richard.

 - Joey

 On Sat, Mar 15, 2014 at 1:43 AM, Joseph S. Myers
 jos...@codesourcery.com wrote:
 On Fri, 14 Mar 2014, Marek Polacek wrote:

 This patch makes sure that we set the directory prefix of
 dump_base_name only once, otherwise we'd end up with invalid path,
 resulting in error: could not open dump file ...
 This happened because finish_options is called for every optimize
 attribute and once more for command line options and every time it
 added the directory prefix.

 Regtested/bootstrapped on x86_64-linux, ok for trunk?

 OK, though I think it might be better to use separate fields of
 gcc_options for the originally specified name and the prefixed version.

 --
 Joseph S. Myers
 jos...@codesourcery.com


[PATCH] Speed up type_hash_canon

2014-08-14 Thread Richard Biener

This speeds up type_hash_canon by avoiding a 2nd hashtable lookup
in the case no previous same type is in the hashtable.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-08-14  Richard Biener  rguent...@suse.de

* tree.c (type_hash_lookup, type_hash_add): Merge into ...
(type_hash_canon): ... this and avoid 2nd lookup for the add.

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 213951)
+++ gcc/tree.c  (working copy)
@@ -6763,44 +6763,6 @@ type_hash_hash (const void *item)
   return ((const struct type_hash *) item)-hash;
 }
 
-/* Look in the type hash table for a type isomorphic to TYPE.
-   If one is found, return it.  Otherwise return 0.  */
-
-static tree
-type_hash_lookup (hashval_t hashcode, tree type)
-{
-  struct type_hash *h, in;
-
-  /* The TYPE_ALIGN field of a type is set by layout_type(), so we
- must call that routine before comparing TYPE_ALIGNs.  */
-  layout_type (type);
-
-  in.hash = hashcode;
-  in.type = type;
-
-  h = (struct type_hash *) htab_find_with_hash (type_hash_table, in,
-   hashcode);
-  if (h)
-return h-type;
-  return NULL_TREE;
-}
-
-/* Add an entry to the type-hash-table
-   for a type TYPE whose hash code is HASHCODE.  */
-
-static void
-type_hash_add (hashval_t hashcode, tree type)
-{
-  struct type_hash *h;
-  void **loc;
-
-  h = ggc_alloctype_hash ();
-  h-hash = hashcode;
-  h-type = type;
-  loc = htab_find_slot_with_hash (type_hash_table, h, hashcode, INSERT);
-  *loc = (void *)h;
-}
-
 /* Given TYPE, and HASHCODE its hash code, return the canonical
object for an identical type if one already exists.
Otherwise, return TYPE, and record it as the canonical object.
@@ -6813,17 +6775,28 @@ type_hash_add (hashval_t hashcode, tree
 tree
 type_hash_canon (unsigned int hashcode, tree type)
 {
-  tree t1;
+  type_hash in;
+  void **loc;
 
   /* The hash table only contains main variants, so ensure that's what we're
  being passed.  */
   gcc_assert (TYPE_MAIN_VARIANT (type) == type);
 
-  /* See if the type is in the hash table already.  If so, return it.
- Otherwise, add the type.  */
-  t1 = type_hash_lookup (hashcode, type);
-  if (t1 != 0)
+  /* The TYPE_ALIGN field of a type is set by layout_type(), so we
+ must call that routine before comparing TYPE_ALIGNs.  */
+  layout_type (type);
+
+  in.hash = hashcode;
+  in.type = type;
+
+  loc = htab_find_slot_with_hash (type_hash_table, in, hashcode, INSERT);
+  if (*loc)
 {
+  tree t1 = ((type_hash *) *loc)-type;
+  /* ???  We'd like to assert here that the hashtable only contains
+ main variants but the C++ frontend breaks this by modifying
+types already in the hashtable in build_cplus_array_type.  */
+  /* gcc_assert (TYPE_MAIN_VARIANT (t1) == t1); */
   if (GATHER_STATISTICS)
{
  tree_code_counts[(int) TREE_CODE (type)]--;
@@ -6834,7 +6807,13 @@ type_hash_canon (unsigned int hashcode,
 }
   else
 {
-  type_hash_add (hashcode, type);
+  struct type_hash *h;
+
+  h = ggc_alloctype_hash ();
+  h-hash = hashcode;
+  h-type = type;
+  *loc = (void *)h;
+
   return type;
 }
 }


[C++ Patch] PR 54377

2014-08-14 Thread Paolo Carlini

Hi,

this is a diagnostic issue about the error message for a wrong number of 
template arguments vs default arguments. The fix is simple but also 
replacing the second error with a default, as recommended by Manuel in 
the audit trail, means that we have to adjust a few existing testcases. 
Note, I'm replacing 'or more' with 'at least', as, again, recommended in 
the audit trail, and consistently with the other similar error message 
recently added. Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2014-08-14  Paolo Carlini  paolo.carl...@oracle.com

PR c++/54377
* pt.c (coerce_template_parms): Improve error message vs default
arguments.

/testsuite
2014-08-14  Paolo Carlini  paolo.carl...@oracle.com

PR c++/54377
* g++.dg/template/pr54377.C: New.
* g++.dg/cpp0x/alias-decl-2.C: Adjust.
* g++.dg/cpp0x/pr51226.C: Likewise.
* g++.dg/cpp0x/variadic2.C: Likewise.
* g++.dg/parse/too-many-tmpl-args1.C: Likewise.
* g++.dg/template/dtor3.C: Likewise.
* g++.dg/template/qualttp4.C: Likewise.
* g++.dg/template/spec28.C: Likewise.
* g++.old-deja/g++.brendan/crash8.C: Likewise.
* g++.old-deja/g++.pt/ttp7.C: Likewise.
Index: cp/pt.c
===
--- cp/pt.c (revision 213952)
+++ cp/pt.c (working copy)
@@ -6861,19 +6861,24 @@ coerce_template_parms (tree parms,
   int variadic_args_p = 0;
   int post_variadic_parms = 0;
 
+  /* Likewise for parameters with default arguments.  */
+  int default_p = 0;
+
   if (args == error_mark_node)
 return error_mark_node;
 
   nparms = TREE_VEC_LENGTH (parms);
 
-  /* Determine if there are any parameter packs.  */
+  /* Determine if there are any parameter packs or default arguments.  */
   for (parm_idx = 0; parm_idx  nparms; ++parm_idx)
 {
-  tree tparm = TREE_VALUE (TREE_VEC_ELT (parms, parm_idx));
+  tree parm = TREE_VEC_ELT (parms, parm_idx);
   if (variadic_p)
++post_variadic_parms;
-  if (template_parameter_pack_p (tparm))
+  if (template_parameter_pack_p (TREE_VALUE (parm)))
++variadic_p;
+  if (TREE_PURPOSE (parm))
+   ++default_p;
 }
 
   inner_args = orig_inner_args = INNERMOST_TEMPLATE_ARGS (args);
@@ -6902,11 +6907,11 @@ coerce_template_parms (tree parms,
 {
   if (complain  tf_error)
{
-  if (variadic_p)
+  if (variadic_p || default_p)
 {
-  nparms -= variadic_p;
+  nparms -= variadic_p ? variadic_p : default_p;
  error (wrong number of template arguments 
-(%d, should be %d or more), nargs, nparms);
+(%d, should be at least %d), nargs, nparms);
 }
  else
 error (wrong number of template arguments 
@@ -6913,7 +6918,7 @@ coerce_template_parms (tree parms,
(%d, should be %d), nargs, nparms);
 
  if (in_decl)
-   error (provided for %q+D, in_decl);
+   inform (input_location, provided for %q+D, in_decl);
}
 
   return error_mark_node;
Index: testsuite/g++.dg/cpp0x/alias-decl-2.C
===
--- testsuite/g++.dg/cpp0x/alias-decl-2.C   (revision 213952)
+++ testsuite/g++.dg/cpp0x/alias-decl-2.C   (working copy)
@@ -22,7 +22,7 @@ templateclass T using Vec = VectorT, AllocT 
 
 templateclass T void g(VectorT, AllocT );
 
-templatetemplateclass T class TT void h(TTint); // { dg-error provided 
for }
+templatetemplateclass T class TT void h(TTint); // { dg-message 
provided for }
 
 void
 bar()
Index: testsuite/g++.dg/cpp0x/pr51226.C
===
--- testsuite/g++.dg/cpp0x/pr51226.C(revision 213952)
+++ testsuite/g++.dg/cpp0x/pr51226.C(working copy)
@@ -1,7 +1,7 @@
 // PR c++/51226
 // { dg-do compile { target c++11 } }
 
-templateint struct A   // { dg-error provided }
+templateint struct A   // { dg-message provided }
 {
   enum E : int;
 };
Index: testsuite/g++.dg/cpp0x/variadic2.C
===
--- testsuite/g++.dg/cpp0x/variadic2.C  (revision 213952)
+++ testsuite/g++.dg/cpp0x/variadic2.C  (working copy)
@@ -6,9 +6,9 @@ templatetypename... = int // { dg-error default
 class tuple3;
 
 templatetypename T1, typename T2, typename... Rest
-struct two_or_more {}; // { dg-error provided for }
+struct two_or_more {}; // { dg-message provided for }
 
-typedef two_or_moreint bad; // { dg-error 2 or more 2 or more }
+typedef two_or_moreint bad; // { dg-error at least 2 at least 2 }
 
 void f()
 {
Index: testsuite/g++.dg/parse/too-many-tmpl-args1.C
===
--- testsuite/g++.dg/parse/too-many-tmpl-args1.C(revision 213952)
+++ testsuite/g++.dg/parse/too-many-tmpl-args1.C

Re: [patch,gomp4] make fortran loop variables implicitly private in openacc

2014-08-14 Thread Thomas Schwinge
Hi!

On Wed, 13 Aug 2014 22:41:47 +0200, Tobias Burnus bur...@net-b.de wrote:
 Cesar Philippidis wrote:
  According to section 2.6.1 in the openacc spec, fortran loop variables
  should be implicitly private like in openmp. This patch does just so.
 
 Makes sense. Looking at the patch, I wonder whether the context is 
 properly handled when mixing OpenMP and OpenACC. I have the feeling that 
 one then might end up using the OpenACC context when one should use the 
 OpenMP context. However, I have not fully followed the program flow. For 
 instance, would something like
 
 !$oacc parallel
 !$omp simd private(i) reduction(+:c)
 do i = 1, n
 ...
 end do
 
 be properly handled?

While that isn't something we're currently focussing on supporting, my
intuition tells me that there shouldn't be a separate oacc_current_ctx,
and omp_current_ctx should be used instead for OpenACC contexts, too.
(If you guys agree, don't let fixing that hold up the patch as posted.)


Grüße,
 Thomas


pgpaBODbFSnoJ.pgp
Description: PGP signature


[PATCH i386 AVX512] [9/n] Extend iterators and attributes.

2014-08-14 Thread Kirill Yukhin
Hello,
This patch extends iterators and iterator modes
to support new patterns (future patches).

Bootstrapped.

Is it ok trunk?

* config/i386/sse.md (define_mode_attr avx512): New.
(define_mode_attr sse2_avx_avx512f): Allow V8HI, V16HI, V32HI, V2DI,
V4DI modes.
(define_mode_attr sse2_avx2): Allow V64QI, V32HI, V4TI modes.
(define_mode_attr ssse3_avx2): Ditto.
(define_mode_attr sse4_1_avx2): Allow V64QI, V32HI, V8DI modes.
(define_mode_attr avx2_avx512bw): New.
(define_mode_attr ssedoublemodelower): New.
(define_mode_attr ssedoublemode): Allow V8SF, V8SI, V4DI, V4DF, V4SI,
V32HI, V64QI modes.
(define_mode_attr ssebytemode): Allow V8DI modes.
(define_mode_attr sseinsnmode): Allow V4TI, V32HI, V64QI modes.
(define_mode_attr sseintvecmodelower): Allow V8DF, V4TI modes.
(define_mode_attr ssePSmode2): New.
(define_mode_attr ssescalarsize): Allow V64QI, V32QI, V16QI, V8HI,
V16HI, V32HI modes.
(define_mode_attr dbpsadbwmode): New.
(define_mode_attr bcstscalarsuff): Allow V64QI, V32QI, V16QI, V32HI,
V16HI, V8HI, V8SI, V4SI, V4DI, V2DI, V8SF, V4SF, V4DF, V2DF modes.
(vi8_sse4_1_avx2_avx512): New.
(define_insn sse4_1_avx2_movntdqa): Use vi8_sse4_1_avx2_avx512
mode attribute.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ebe38f3..89a1842 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -331,32 +331,41 @@
(V4SI TARGET_AVX2) (V2DI TARGET_AVX2)
(V8SI TARGET_AVX2) (V4DI TARGET_AVX2)])
 
+(define_mode_attr avx512
+  [(V16QI avx512vl) (V32QI avx512vl) (V64QI avx512bw)
+   (V8HI  avx512vl) (V16HI  avx512vl) (V32HI avx512bw)
+   (V4SI  avx512vl) (V8SI  avx512vl) (V16SI avx512f)
+   (V2DI  avx512vl) (V4DI  avx512vl) (V8DI avx512f)
+   (V4SF avx512vl) (V8SF avx512vl) (V16SF avx512f)
+   (V2DF avx512vl) (V4DF avx512vl) (V8DF avx512f)])
+
 (define_mode_attr sse2_avx_avx512f
   [(V16QI sse2) (V32QI avx) (V64QI avx512f)
+   (V8HI  avx512vl) (V16HI  avx512vl) (V32HI avx512bw)
(V4SI  sse2) (V8SI  avx) (V16SI avx512f)
-   (V8DI avx512f)
+   (V2DI  avx512vl) (V4DI  avx512vl) (V8DI avx512f)
(V16SF avx512f) (V8SF avx) (V4SF avx)
(V8DF avx512f) (V4DF avx) (V2DF avx)])
 
 (define_mode_attr sse2_avx2
-  [(V16QI sse2) (V32QI avx2)
-   (V8HI sse2) (V16HI avx2)
+  [(V16QI sse2) (V32QI avx2) (V64QI avx512bw)
+   (V8HI sse2) (V16HI avx2) (V32HI avx512bw)
(V4SI sse2) (V8SI avx2) (V16SI avx512f)
(V2DI sse2) (V4DI avx2) (V8DI avx512f)
-   (V1TI sse2) (V2TI avx2)])
+   (V1TI sse2) (V2TI avx2) (V4TI avx512bw)])
 
 (define_mode_attr ssse3_avx2
-   [(V16QI ssse3) (V32QI avx2)
-(V4HI ssse3) (V8HI ssse3) (V16HI avx2)
+   [(V16QI ssse3) (V32QI avx2) (V64QI avx512bw)
+(V4HI ssse3) (V8HI ssse3) (V16HI avx2) (V32HI avx512bw)
 (V4SI ssse3) (V8SI avx2)
 (V2DI ssse3) (V4DI avx2)
-(TI ssse3) (V2TI avx2)])
+(TI ssse3) (V2TI avx2) (V4TI avx512bw)])
 
 (define_mode_attr sse4_1_avx2
-   [(V16QI sse4_1) (V32QI avx2)
-(V8HI sse4_1) (V16HI avx2)
+   [(V16QI sse4_1) (V32QI avx2) (V64QI avx512bw)
+(V8HI sse4_1) (V16HI avx2) (V32HI avx512bw)
 (V4SI sse4_1) (V8SI avx2) (V16SI avx512f)
-(V2DI sse4_1) (V4DI avx2) (V8DI avx512f)])
+(V2DI sse4_1) (V4DI avx2) (V8DI avx512dq)])
 
 (define_mode_attr avx_avx2
   [(V4SF avx) (V2DF avx)
@@ -376,6 +385,13 @@
(V8SF avx2) (V16SF avx512f)
(V4DF avx2) (V8DF avx512f)])
 
+(define_mode_attr avx2_avx512bw
+  [(V4SI avx2) (V8SI avx2) (V16SI avx512f)
+   (V2DI avx2) (V4DI avx2) (V8DI avx512f)
+   (V4SF avx2) (V8SF avx2) (V16SF avx512f)
+   (V2DF avx2) (V4DF avx2) (V8DF avx512f)
+   (V8HI avx512vl) (V16HI avx512vl) (V32HI avx512bw)])
+
 (define_mode_attr shuffletype
   [(V16SF f) (V16SI i) (V8DF f) (V8DI i)
   (V8SF f) (V8SI i) (V4DF f) (V4DI i)
@@ -386,13 +402,19 @@
 (define_mode_attr ssequartermode
   [(V16SF V4SF) (V8DF V2DF) (V16SI V4SI) (V8DI V2DI)])
 
+(define_mode_attr ssedoublemodelower
+  [(V16QI v16hi) (V32QI v32hi) (V64QI v64hi)
+   (V8HI v8si)   (V16HI v16si) (V32HI v32si)
+   (V4SI v4di)   (V8SI v8di)   (V16SI v16di)])
+
 (define_mode_attr ssedoublemode
   [(V16SF V32SF) (V16SI V32SI) (V8DI V16DI) (V8DF V16DF)
-   (V16HI V16SI) (V8HI V8SI) (V4HI V4SI)
-   (V32QI V32HI) (V16QI V16HI)])
+   (V8SF V16SF) (V8SI V16SI) (V4DI V8DI) (V4DF V8DF)
+   (V16HI V16SI) (V8HI V8SI) (V4HI V4SI) (V4SI V4DI)
+   (V32HI V32SI) (V32QI V32HI) (V16QI V16HI) (V64QI V64HI)])
 
 (define_mode_attr ssebytemode
-  [(V4DI V32QI) (V2DI V16QI)])
+  [(V8DI V64QI) (V4DI V32QI) (V2DI V16QI)])
 
 ;; All 128bit vector integer modes
 (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI])
@@ -461,7 +483,7 @@
 
 ;; SSE instruction mode
 (define_mode_attr sseinsnmode
-  [(V64QI XI) (V32HI XI) (V16SI XI) (V8DI XI)
+  [(V64QI XI) (V32HI XI) (V16SI XI) (V8DI XI) (V4TI XI)
(V32QI OI) (V16HI OI) (V8SI OI) (V4DI OI) (V2TI OI)
(V16QI TI) (V8HI TI) (V4SI TI) (V2DI 

[PATCH i386 AVX512] [10/n] Add vector move/load/store.

2014-08-14 Thread Kirill Yukhin
Hello,
This patch extends load/store insns.
No built-ins added in this patch.

Bootstrapped.
New tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.c
(ix86_expand_special_args_builtin): Handle avx512vl_storev8sf_mask,
avx512vl_storev8si_mask, avx512vl_storev4df_mask, 
avx512vl_storev4di_mask,
avx512vl_storev4sf_mask, avx512vl_storev4si_mask, 
avx512vl_storev2df_mask,
avx512vl_storev2di_mask, avx512vl_loadv8sf_mask, avx512vl_loadv8si_mask,
avx512vl_loadv4df_mask, avx512vl_loadv4di_mask, avx512vl_loadv4sf_mask,
avx512vl_loadv4si_mask, avx512vl_loadv2df_mask, avx512vl_loadv2di_mask,
avx512bw_loadv64qi_mask, avx512vl_loadv32qi_mask, 
avx512vl_loadv16qi_mask,
avx512bw_loadv32hi_mask, avx512vl_loadv16hi_mask, 
avx512vl_loadv8hi_mask.
* config/i386/i386.md: Allow V32HI mode.
* config/i386/sse.md
(define_mode_iterator VMOVE): Allow V4TI mode.
(define_mode_iterator V_AVX512VL): New.
(define_mode_iterator V): New handling for AVX512VL.
(define_insn avx512f_loadmode_mask): Delete.
(define_insn avx512_loadmode_mask): New.
(define_insn avx512f_storemode_mask): Delete.
(define_insn avx512_storemode_mask): New.

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 183b7be..da01ac6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34722,6 +34722,14 @@ ix86_expand_special_args_builtin (const struct 
builtin_description *d,
case CODE_FOR_avx512f_storev16si_mask:
case CODE_FOR_avx512f_storev8df_mask:
case CODE_FOR_avx512f_storev8di_mask:
+   case CODE_FOR_avx512vl_storev8sf_mask:
+   case CODE_FOR_avx512vl_storev8si_mask:
+   case CODE_FOR_avx512vl_storev4df_mask:
+   case CODE_FOR_avx512vl_storev4di_mask:
+   case CODE_FOR_avx512vl_storev4sf_mask:
+   case CODE_FOR_avx512vl_storev4si_mask:
+   case CODE_FOR_avx512vl_storev2df_mask:
+   case CODE_FOR_avx512vl_storev2di_mask:
  aligned_mem = true;
  break;
default:
@@ -34765,6 +34773,20 @@ ix86_expand_special_args_builtin (const struct 
builtin_description *d,
case CODE_FOR_avx512f_loadv16si_mask:
case CODE_FOR_avx512f_loadv8df_mask:
case CODE_FOR_avx512f_loadv8di_mask:
+   case CODE_FOR_avx512vl_loadv8sf_mask:
+   case CODE_FOR_avx512vl_loadv8si_mask:
+   case CODE_FOR_avx512vl_loadv4df_mask:
+   case CODE_FOR_avx512vl_loadv4di_mask:
+   case CODE_FOR_avx512vl_loadv4sf_mask:
+   case CODE_FOR_avx512vl_loadv4si_mask:
+   case CODE_FOR_avx512vl_loadv2df_mask:
+   case CODE_FOR_avx512vl_loadv2di_mask:
+   case CODE_FOR_avx512bw_loadv64qi_mask:
+   case CODE_FOR_avx512vl_loadv32qi_mask:
+   case CODE_FOR_avx512vl_loadv16qi_mask:
+   case CODE_FOR_avx512bw_loadv32hi_mask:
+   case CODE_FOR_avx512vl_loadv16hi_mask:
+   case CODE_FOR_avx512vl_loadv8hi_mask:
  aligned_mem = true;
  break;
default:
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a72c206..b8ce6c0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1054,7 +1054,7 @@
(V4SF ps) (V2DF pd)
(V16QI b) (V8HI w) (V4SI d) (V2DI q)
(V32QI b) (V16HI w) (V8SI d) (V4DI q)
-   (V64QI b) (V16SI d) (V8DI q)])
+   (V64QI b) (V32HI w) (V16SI d) (V8DI q)])
 
 ;; SSE vector suffix for floating point modes
 (define_mode_attr ssevecmodesuffix [(SF ps) (DF pd)])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 89a1842..910b29b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -146,10 +146,21 @@
(V32HI TARGET_AVX512F) (V16HI TARGET_AVX) V8HI
(V16SI TARGET_AVX512F) (V8SI TARGET_AVX) V4SI
(V8DI TARGET_AVX512F)  (V4DI TARGET_AVX) V2DI
-   (V2TI TARGET_AVX) V1TI
+   (V4TI TARGET_AVX) (V2TI TARGET_AVX) V1TI
(V16SF TARGET_AVX512F) (V8SF TARGET_AVX) V4SF
(V8DF TARGET_AVX512F)  (V4DF TARGET_AVX) V2DF])
 
+;; All AVX512VL vector modes
+(define_mode_iterator V_AVX512VL
+  [(V64QI TARGET_AVX512BW) (V32QI TARGET_AVX512VL  TARGET_AVX512BW)
+   (V16QI TARGET_AVX512VL  TARGET_AVX512BW)
+   (V32HI TARGET_AVX512BW) (V16HI TARGET_AVX512VL  TARGET_AVX512BW)
+   (V8HI TARGET_AVX512VL  TARGET_AVX512BW)
+   (V16SI TARGET_AVX512F) (V8SI TARGET_AVX512VL)  (V4SI TARGET_AVX512VL)
+   (V8DI TARGET_AVX512F)  (V4DI TARGET_AVX512VL)  (V2DI TARGET_AVX512VL)
+   (V16SF TARGET_AVX512F) (V8SF TARGET_AVX512VL)  (V4SF TARGET_AVX512VL)
+   (V8DF TARGET_AVX512F)  (V4DF TARGET_AVX512VL)  (V2DF 
TARGET_AVX512VL)])
+
 ;; All vector modes
 (define_mode_iterator V
   [(V32QI TARGET_AVX) V16QI
@@ -708,12 +719,10 @@
 case 2:
   /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
 in avx512f, so we need to use workarounds, to access sse registers
-16-31, which are evex-only.  */
-  if (TARGET_AVX512F  MODE_SIZE  64
-  

Re: [PATCH i386 AVX512] [9/n] Extend iterators and attributes.

2014-08-14 Thread Uros Bizjak
On Thu, Aug 14, 2014 at 1:20 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 This patch extends iterators and iterator modes
 to support new patterns (future patches).

 Bootstrapped.

 Is it ok trunk?

 * config/i386/sse.md (define_mode_attr avx512): New.
 (define_mode_attr sse2_avx_avx512f): Allow V8HI, V16HI, V32HI, V2DI,
 V4DI modes.
 (define_mode_attr sse2_avx2): Allow V64QI, V32HI, V4TI modes.
 (define_mode_attr ssse3_avx2): Ditto.
 (define_mode_attr sse4_1_avx2): Allow V64QI, V32HI, V8DI modes.
 (define_mode_attr avx2_avx512bw): New.
 (define_mode_attr ssedoublemodelower): New.
 (define_mode_attr ssedoublemode): Allow V8SF, V8SI, V4DI, V4DF, V4SI,
 V32HI, V64QI modes.
 (define_mode_attr ssebytemode): Allow V8DI modes.
 (define_mode_attr sseinsnmode): Allow V4TI, V32HI, V64QI modes.
 (define_mode_attr sseintvecmodelower): Allow V8DF, V4TI modes.
 (define_mode_attr ssePSmode2): New.
 (define_mode_attr ssescalarsize): Allow V64QI, V32QI, V16QI, V8HI,
 V16HI, V32HI modes.
 (define_mode_attr dbpsadbwmode): New.
 (define_mode_attr bcstscalarsuff): Allow V64QI, V32QI, V16QI, V32HI,
 V16HI, V8HI, V8SI, V4SI, V4DI, V2DI, V8SF, V4SF, V4DF, V2DF modes.
 (vi8_sse4_1_avx2_avx512): New.
 (define_insn sse4_1_avx2_movntdqa): Use vi8_sse4_1_avx2_avx512
 mode attribute.

OK (although I'd put single-use attributes - blendbits and probably
new dbpsadbwmode - nearby their users).

Uros.


[PATCH i386 AVX512] [11/n] AVX-512DQ extract.

2014-08-14 Thread Kirill Yukhin
Hello,
This patch extends vec_extract_hi_mode
pattern to support AVX-512DQ insn.

Bootstrapped.

Bootstrapped.
New tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.md
(define_attr isa): Add avx512dq, noavx512dq.
(define_attr enabled): Ditto.
* config/i386/sse.md
(define_insn vec_extract_hi_modemask_name): Support masking.

--
Thanks, K

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b8ce6c0..3a797c8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -741,7 +741,7 @@
 (define_attr isa base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
-   fma_avx512f,avx512bw,noavx512bw
+   fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq
   (const_string base))
 
 (define_attr enabled 
@@ -774,6 +774,8 @@
   (symbol_ref TARGET_FMA || TARGET_AVX512F)
 (eq_attr isa avx512bw) (symbol_ref TARGET_AVX512BW)
 (eq_attr isa noavx512bw) (symbol_ref !TARGET_AVX512BW)
+(eq_attr isa avx512dq) (symbol_ref TARGET_AVX512DQ)
+(eq_attr isa noavx512dq) (symbol_ref !TARGET_AVX512DQ)
]
(const_int 1)))
 
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 910b29b..3d3d1a0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6132,6 +6132,29 @@
(set_attr prefix evex)
(set_attr mode sseinsnmode)])
 
+(define_insn vec_extract_hi_modemask_name
+  [(set (match_operand:ssehalfvecmode 0 store_mask_predicate 
=store_mask_constraint,vm)
+   (vec_select:ssehalfvecmode
+ (match_operand:V16FI 1 register_operand v,v)
+ (parallel [(const_int 8) (const_int 9)
+(const_int 10) (const_int 11)
+   (const_int 12) (const_int 13)
+   (const_int 14) (const_int 15)])))]
+  TARGET_AVX512F  (!mask_applied || TARGET_AVX512DQ)
+  @
+   vextractshuffletype32x8\t{$0x1, %1, %0mask_operand2|%0mask_operand2, 
%1, 0x1}
+   vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}
+  [(set_attr type sselog)
+   (set_attr prefix_extra 1)
+   (set_attr isa avx512dq,noavx512dq)
+   (set_attr length_immediate 1)
+   (set (attr memory)
+  (if_then_else (match_test MEM_P (operands[0]))
+   (const_string store)
+   (const_string none)))
+   (set_attr prefix evex)
+   (set_attr mode sseinsnmode)])
+
 (define_expand avx_vextractf128mode
   [(match_operand:ssehalfvecmode 0 nonimmediate_operand)
(match_operand:V_256 1 register_operand)
@@ -6178,23 +6201,6 @@
   DONE;
 })
 
-(define_insn vec_extract_hi_mode
-  [(set (match_operand:ssehalfvecmode 0 nonimmediate_operand =v,m)
-   (vec_select:ssehalfvecmode
- (match_operand:V16FI 1 nonimmediate_operand v,v)
- (parallel [(const_int 8) (const_int 9)
-(const_int 10) (const_int 11)
-(const_int 12) (const_int 13)
-(const_int 14) (const_int 15)])))]
-  TARGET_AVX512F
-  vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}
-  [(set_attr type sselog)
-   (set_attr prefix_extra 1)
-   (set_attr length_immediate 1)
-   (set_attr memory none,store)
-   (set_attr prefix evex)
-   (set_attr mode XI)])
-
 (define_insn_and_split vec_extract_lo_mode
   [(set (match_operand:ssehalfvecmode 0 nonimmediate_operand =x,m)
(vec_select:ssehalfvecmode


[PATCH i386 AVX512] [12/n] Extend OI/TImode moves.

2014-08-14 Thread Kirill Yukhin
Hello,
This patch extends moves to OI/TI mode.

Bootstrapped.
New tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.md
(define_insn *movoi_internal_avx): Add EVEX version.
(define_insn *movti_internal): Ditto.

--
Thanks, K

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3a797c8..39fb2307 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1920,8 +1920,8 @@
(set_attr mode XI)])
 
 (define_insn *movoi_internal_avx
-  [(set (match_operand:OI 0 nonimmediate_operand =x,x ,m)
-   (match_operand:OI 1 vector_move_operand  C ,xm,x))]
+  [(set (match_operand:OI 0 nonimmediate_operand =v,v ,m)
+   (match_operand:OI 1 vector_move_operand  C ,vm,v))]
   TARGET_AVX  !(MEM_P (operands[0])  MEM_P (operands[1]))
 {
   switch (get_attr_type (insn))
@@ -1935,6 +1935,8 @@
{
  if (get_attr_mode (insn) == MODE_V8SF)
return vmovups\t{%1, %0|%0, %1};
+ else if (get_attr_mode (insn) == MODE_XI)
+   return vmovdqu32\t{%1, %0|%0, %1};
  else
return vmovdqu\t{%1, %0|%0, %1};
}
@@ -1942,6 +1944,8 @@
{
  if (get_attr_mode (insn) == MODE_V8SF)
return vmovaps\t{%1, %0|%0, %1};
+ else if (get_attr_mode (insn) == MODE_XI)
+   return vmovdqa32\t{%1, %0|%0, %1};
  else
return vmovdqa\t{%1, %0|%0, %1};
}
@@ -1953,7 +1957,10 @@
   [(set_attr type sselog1,ssemov,ssemov)
(set_attr prefix vex)
(set (attr mode)
-   (cond [(match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
+   (cond [(ior (match_operand 0 ext_sse_reg_operand)
+   (match_operand 1 ext_sse_reg_operand))
+(const_string XI)
+  (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)
 (const_string V8SF)
   (and (eq_attr alternative 2)
(match_test TARGET_SSE_TYPELESS_STORES))
@@ -1962,8 +1969,8 @@
  (const_string OI)))])
 
 (define_insn *movti_internal
-  [(set (match_operand:TI 0 nonimmediate_operand =!r ,o ,x,x ,m)
-   (match_operand:TI 1 general_operand  riFo,re,C,xm,x))]
+  [(set (match_operand:TI 0 nonimmediate_operand =!r ,o ,v,v ,m)
+   (match_operand:TI 1 general_operand  riFo,re,C,vm,v))]
   (TARGET_64BIT || TARGET_SSE)
 !(MEM_P (operands[0])  MEM_P (operands[1]))
 {
@@ -1983,6 +1990,8 @@
{
  if (get_attr_mode (insn) == MODE_V4SF)
return %vmovups\t{%1, %0|%0, %1};
+ else if (get_attr_mode (insn) == MODE_XI)
+   return vmovdqu32\t{%1, %0|%0, %1};
  else
return %vmovdqu\t{%1, %0|%0, %1};
}
@@ -1990,6 +1999,8 @@
{
  if (get_attr_mode (insn) == MODE_V4SF)
return %vmovaps\t{%1, %0|%0, %1};
+ else if (get_attr_mode (insn) == MODE_XI)
+   return vmovdqa32\t{%1, %0|%0, %1};
  else
return %vmovdqa\t{%1, %0|%0, %1};
}
@@ -2005,7 +2016,10 @@
(const_string maybe_vex)
(const_string orig)))
(set (attr mode)
-   (cond [(eq_attr alternative 0,1)
+   (cond [(ior (match_operand 0 ext_sse_reg_operand)
+   (match_operand 1 ext_sse_reg_operand))
+(const_string XI)
+  (eq_attr alternative 0,1)
 (const_string DI)
   (ior (not (match_test TARGET_SSE2))
(match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL))


Re: [PATCH i386 AVX512] [10/n] Add vector move/load/store.

2014-08-14 Thread Uros Bizjak
On Thu, Aug 14, 2014 at 1:30 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends load/store insns.
 No built-ins added in this patch.

 Bootstrapped.
 New tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.c
 (ix86_expand_special_args_builtin): Handle avx512vl_storev8sf_mask,
 avx512vl_storev8si_mask, avx512vl_storev4df_mask, 
 avx512vl_storev4di_mask,
 avx512vl_storev4sf_mask, avx512vl_storev4si_mask, 
 avx512vl_storev2df_mask,
 avx512vl_storev2di_mask, avx512vl_loadv8sf_mask, 
 avx512vl_loadv8si_mask,
 avx512vl_loadv4df_mask, avx512vl_loadv4di_mask, 
 avx512vl_loadv4sf_mask,
 avx512vl_loadv4si_mask, avx512vl_loadv2df_mask, 
 avx512vl_loadv2di_mask,
 avx512bw_loadv64qi_mask, avx512vl_loadv32qi_mask, 
 avx512vl_loadv16qi_mask,
 avx512bw_loadv32hi_mask, avx512vl_loadv16hi_mask, 
 avx512vl_loadv8hi_mask.
 * config/i386/i386.md: Allow V32HI mode.

Please update the above entry.

 * config/i386/sse.md
 (define_mode_iterator VMOVE): Allow V4TI mode.
 (define_mode_iterator V_AVX512VL): New.
 (define_mode_iterator V): New handling for AVX512VL.
 (define_insn avx512f_loadmode_mask): Delete.
 (define_insn avx512_loadmode_mask): New.
 (define_insn avx512f_storemode_mask): Delete.
 (define_insn avx512_storemode_mask): New.

 --
 Thanks, K

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 183b7be..da01ac6 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -34722,6 +34722,14 @@ ix86_expand_special_args_builtin (const struct 
 builtin_description *d,
 case CODE_FOR_avx512f_storev16si_mask:
 case CODE_FOR_avx512f_storev8df_mask:
 case CODE_FOR_avx512f_storev8di_mask:
 +   case CODE_FOR_avx512vl_storev8sf_mask:
 +   case CODE_FOR_avx512vl_storev8si_mask:
 +   case CODE_FOR_avx512vl_storev4df_mask:
 +   case CODE_FOR_avx512vl_storev4di_mask:
 +   case CODE_FOR_avx512vl_storev4sf_mask:
 +   case CODE_FOR_avx512vl_storev4si_mask:
 +   case CODE_FOR_avx512vl_storev2df_mask:
 +   case CODE_FOR_avx512vl_storev2di_mask:
   aligned_mem = true;
   break;
 default:
 @@ -34765,6 +34773,20 @@ ix86_expand_special_args_builtin (const struct 
 builtin_description *d,
 case CODE_FOR_avx512f_loadv16si_mask:
 case CODE_FOR_avx512f_loadv8df_mask:
 case CODE_FOR_avx512f_loadv8di_mask:
 +   case CODE_FOR_avx512vl_loadv8sf_mask:
 +   case CODE_FOR_avx512vl_loadv8si_mask:
 +   case CODE_FOR_avx512vl_loadv4df_mask:
 +   case CODE_FOR_avx512vl_loadv4di_mask:
 +   case CODE_FOR_avx512vl_loadv4sf_mask:
 +   case CODE_FOR_avx512vl_loadv4si_mask:
 +   case CODE_FOR_avx512vl_loadv2df_mask:
 +   case CODE_FOR_avx512vl_loadv2di_mask:
 +   case CODE_FOR_avx512bw_loadv64qi_mask:
 +   case CODE_FOR_avx512vl_loadv32qi_mask:
 +   case CODE_FOR_avx512vl_loadv16qi_mask:
 +   case CODE_FOR_avx512bw_loadv32hi_mask:
 +   case CODE_FOR_avx512vl_loadv16hi_mask:
 +   case CODE_FOR_avx512vl_loadv8hi_mask:
   aligned_mem = true;
   break;
 default:
 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
 index a72c206..b8ce6c0 100644
 --- a/gcc/config/i386/i386.md
 +++ b/gcc/config/i386/i386.md
 @@ -1054,7 +1054,7 @@
 (V4SF ps) (V2DF pd)
 (V16QI b) (V8HI w) (V4SI d) (V2DI q)
 (V32QI b) (V16HI w) (V8SI d) (V4DI q)
 -   (V64QI b) (V16SI d) (V8DI q)])
 +   (V64QI b) (V32HI w) (V16SI d) (V8DI q)])

  ;; SSE vector suffix for floating point modes
  (define_mode_attr ssevecmodesuffix [(SF ps) (DF pd)])
 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 89a1842..910b29b 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -146,10 +146,21 @@
 (V32HI TARGET_AVX512F) (V16HI TARGET_AVX) V8HI
 (V16SI TARGET_AVX512F) (V8SI TARGET_AVX) V4SI
 (V8DI TARGET_AVX512F)  (V4DI TARGET_AVX) V2DI
 -   (V2TI TARGET_AVX) V1TI
 +   (V4TI TARGET_AVX) (V2TI TARGET_AVX) V1TI

Are you sure TARGET_AVX is the correct condition for V4TI?

 (V16SF TARGET_AVX512F) (V8SF TARGET_AVX) V4SF
 (V8DF TARGET_AVX512F)  (V4DF TARGET_AVX) V2DF])

 +;; All AVX512VL vector modes
 +(define_mode_iterator V_AVX512VL
 +  [(V64QI TARGET_AVX512BW) (V32QI TARGET_AVX512VL  TARGET_AVX512BW)
 +   (V16QI TARGET_AVX512VL  TARGET_AVX512BW)
 +   (V32HI TARGET_AVX512BW) (V16HI TARGET_AVX512VL  TARGET_AVX512BW)

Is the above  OK? So, you have to pass -m...vl and -m...bw to the
compiler to enable these modes?

Uros.


[PATCH] Fix PR62067

2014-08-14 Thread Richard Biener

A static checker determined that LTO def_fn_type has erratic
va_start/end combinations on error paths.  The following fixes
that.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2014-08-14  Richard Biener  rguent...@suse.de

* lto-lang.c (def_fn_type): Fix error handling wrt va_end.

Index: gcc/lto/lto-lang.c
===
--- gcc/lto/lto-lang.c  (revision 213958)
+++ gcc/lto/lto-lang.c  (working copy)
@@ -524,6 +524,7 @@ def_fn_type (builtin_type def, builtin_t
   tree *args = XALLOCAVEC (tree, n);
   va_list list;
   int i;
+  bool err = false;
 
   va_start (list, n);
   for (i = 0; i  n; ++i)
@@ -531,22 +532,22 @@ def_fn_type (builtin_type def, builtin_t
   builtin_type a = (builtin_type) va_arg (list, int);
   t = builtin_types[a];
   if (t == error_mark_node)
-   goto egress;
+   err = true;
   args[i] = t;
 }
   va_end (list);
 
   t = builtin_types[ret];
+  if (err)
+t = error_mark_node;
   if (t == error_mark_node)
-goto egress;
-  if (var)
+;
+  else if (var)
 t = build_varargs_function_type_array (t, n, args);
   else
 t = build_function_type_array (t, n, args);
 
- egress:
   builtin_types[def] = t;
-  va_end (list);
 }
 
 /* Used to help initialize the builtin-types.def table.  When a type of


[PATCH PR62011]

2014-08-14 Thread Yuri Rumyantsev
Hi All,

Here is a fix for PR 62011 - remove false dependency for unary
bit-manipulation instructions for latest BigCore chips (Sandybridge
and Haswell) by outputting in assembly file zeroing destination
register before bmi instruction. I checked that performance restored
for popcnt, lzcnt and tzcnt instructions.

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?

gcc/ChangeLog
2014-08-14  Yuri Rumyantsev  ysrum...@gmail.com

PR target/62011
* config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function
 prototype.
* config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function.
* config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros.
* config/i386/i386.md (ctzmode2, clzmode2_lzcnt, popcountmode2,
 *popcountmode2_cmp, *popcountsi2_cmp_zext): Output zeroing
 destination register for unary bit-manipulation instructions
 if required.
* config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New.


patch
Description: Binary data


[PATCH] Fix PR62081

2014-08-14 Thread Richard Biener

The following fixes missing dominator computation before fixing loops.
Rather than doing even more such weird stuff in a pass gate function
this puts this into a new pass scheduled before the loop passes gate.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2014-08-14  Richard Biener  rguent...@suse.de

PR tree-optimization/62081
* tree-ssa-loop.c (pass_fix_loops): New pass.
(pass_tree_loop::gate):  Do not fixup loops here.
* tree-pass.h (make_pass_fix_loops): Declare.
* passes.def: Schedule pass_fix_loops before GIMPLE loop passes.

Index: gcc/tree-ssa-loop.c
===
--- gcc/tree-ssa-loop.c (revision 213956)
+++ gcc/tree-ssa-loop.c (working copy)
@@ -43,6 +43,56 @@ along with GCC; see the file COPYING3.
 #include tree-vectorizer.h
 
 
+/* A pass making sure loops are fixed up.  */
+
+namespace {
+
+const pass_data pass_data_fix_loops =
+{
+  GIMPLE_PASS, /* type */
+  fix_loops, /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_LOOP, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_fix_loops : public gimple_opt_pass
+{
+public:
+  pass_fix_loops (gcc::context *ctxt)
+: gimple_opt_pass (pass_data_fix_loops, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return flag_tree_loop_optimize; }
+
+  virtual unsigned int execute (function *fn);
+}; // class pass_fix_loops
+
+unsigned int
+pass_fix_loops::execute (function *)
+{
+  if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  fix_loop_structure (NULL);
+}
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_fix_loops (gcc::context *ctxt)
+{
+  return new pass_fix_loops (ctxt);
+}
+
+
 /* Gate for loop pass group.  The group is controlled by -ftree-loop-optimize
but we also avoid running it when the IL doesn't contain any loop.  */
 
@@ -57,9 +107,6 @@ gate_loop (function *fn)
   if (!loops_for_fn (fn))
 return true;
 
-  /* Make sure to drop / re-discover loops when necessary.  */
-  if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
-fix_loop_structure (NULL);
   return number_of_loops (fn)  1;
 }
 
Index: gcc/passes.def
===
--- gcc/passes.def  (revision 213956)
+++ gcc/passes.def  (working copy)
@@ -200,7 +200,10 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
   /* Pass group that runs when 1) enabled, 2) there are loops
-in the function.  */
+in the function.  Make sure to run pass_fix_loops before
+to discover/remove loops before running the gate function
+of pass_tree_loop.  */
+  NEXT_PASS (pass_fix_loops);
   NEXT_PASS (pass_tree_loop);
   PUSH_INSERT_PASSES_WITHIN (pass_tree_loop)
  NEXT_PASS (pass_tree_loop_init);
Index: gcc/tree-pass.h
===
--- gcc/tree-pass.h (revision 213956)
+++ gcc/tree-pass.h (working copy)
@@ -349,6 +349,7 @@ extern gimple_opt_pass *make_pass_sra_ea
 extern gimple_opt_pass *make_pass_early_ipa_sra (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tail_recursion (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tail_calls (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_no_loop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);


Re: [PATCH i386 AVX512] [9/n] Extend iterators and attributes.

2014-08-14 Thread Kirill Yukhin
Hello Uroš,
On 14 Aug 13:35, Uros Bizjak wrote:
 OK (although I'd put single-use attributes - blendbits and probably
 new dbpsadbwmode - nearby their users).
Thanks! I'll move `blendbits' to its user.
As far as currently no user for `dbpsadbwmode' exists in main trunk - I'll
move it to the user in the patch introducing the user.

--
Thanks, K


Re: [PATCH i386 AVX512] [11/n] AVX-512DQ extract.

2014-08-14 Thread Uros Bizjak
On Thu, Aug 14, 2014 at 1:36 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 This patch extends vec_extract_hi_mode
 pattern to support AVX-512DQ insn.

 Bootstrapped.

 Bootstrapped.
 New tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.md
 (define_attr isa): Add avx512dq, noavx512dq.
 (define_attr enabled): Ditto.
 * config/i386/sse.md
 (define_insn vec_extract_hi_modemask_name): Support masking.

 --
 Thanks, K

 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
 index b8ce6c0..3a797c8 100644
 --- a/gcc/config/i386/i386.md
 +++ b/gcc/config/i386/i386.md
 @@ -741,7 +741,7 @@
  (define_attr isa base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
 sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
 avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
 -   fma_avx512f,avx512bw,noavx512bw
 +   fma_avx512f,avx512bw,noavx512bw,avx512dq,noavx512dq
(const_string base))

  (define_attr enabled 
 @@ -774,6 +774,8 @@
(symbol_ref TARGET_FMA || TARGET_AVX512F)
  (eq_attr isa avx512bw) (symbol_ref TARGET_AVX512BW)
  (eq_attr isa noavx512bw) (symbol_ref !TARGET_AVX512BW)
 +(eq_attr isa avx512dq) (symbol_ref TARGET_AVX512DQ)
 +(eq_attr isa noavx512dq) (symbol_ref !TARGET_AVX512DQ)
 ]
 (const_int 1)))

 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 910b29b..3d3d1a0 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -6132,6 +6132,29 @@
 (set_attr prefix evex)
 (set_attr mode sseinsnmode)])

 +(define_insn vec_extract_hi_modemask_name
 +  [(set (match_operand:ssehalfvecmode 0 store_mask_predicate 
 =store_mask_constraint,vm)
 +   (vec_select:ssehalfvecmode
 + (match_operand:V16FI 1 register_operand v,v)
 + (parallel [(const_int 8) (const_int 9)
 +(const_int 10) (const_int 11)
 +   (const_int 12) (const_int 13)
 +   (const_int 14) (const_int 15)])))]
 +  TARGET_AVX512F  (!mask_applied || TARGET_AVX512DQ)
 +  @
 +   vextractshuffletype32x8\t{$0x1, %1, 
 %0mask_operand2|%0mask_operand2, %1, 0x1}
 +   vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}
 +  [(set_attr type sselog)
 +   (set_attr prefix_extra 1)
 +   (set_attr isa avx512dq,noavx512dq)
 +   (set_attr length_immediate 1)
 +   (set (attr memory)
 +  (if_then_else (match_test MEM_P (operands[0]))
 +   (const_string store)
 +   (const_string none)))
 +   (set_attr prefix evex)
 +   (set_attr mode sseinsnmode)])

Please change the type to sselog1 (the pattern has only one input
operand). The memory attribute will be then set automatically and
its explicit handling can be removed.

OK with this change.

Uros.


[PATCH i386 AVX512] [13/n] Add VF2_AVX512VL and vcvt[t]pd2 insns.

2014-08-14 Thread Kirill Yukhin
Hello,
This patch introduces vcvt[t]pd2 patterns.

Bootstrapped.
New tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.md
(define_code_attr ufix_bool): New.
* config/i386/sse.md
(define_mode_iterator VF2_AVX512VL): New.
(define_mode_attr sseintvecmode2): New.
(define_insn ufix_truncv2dfv2si2mask_name): Add masking.
(define_insn fixsuffixfix_truncv4dfv4si2mask_name): New.
(define_insn

fixsuffixfix_truncmodesseintvecmodelower2mask_nameround_saeonly_name):
New.
(define_insn 
fix_notruncmodesseintvecmodelower2mask_nameround_name):
New.
(define_insn 
ufix_notruncmodesseintvecmodelower2mask_nameround_name):
New.

--
Thanks, K

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 39fb2307..f6351cf 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -881,6 +881,7 @@
 ;; Used in signed and unsigned fix.
 (define_code_iterator any_fix [fix unsigned_fix])
 (define_code_attr fixsuffix [(fix ) (unsigned_fix u)])
+(define_code_attr ufix_bool [(fix false) (unsigned_fix true)])
 
 ;; All integer modes.
 (define_mode_iterator SWI1248x [QI HI SI DI])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 3d3d1a0..eaf97ab 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -228,6 +228,9 @@
 (define_mode_iterator VF_512
   [V16SF V8DF])
 
+(define_mode_iterator VF2_AVX512VL
+  [V8DF (V4DF TARGET_AVX512VL) (V2DF TARGET_AVX512VL)])
+
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F)
@@ -523,6 +526,10 @@
(V32HI V32HI) (V64QI V64QI)
(V32QI V32QI) (V16QI V16QI)])
 
+(define_mode_attr sseintvecmode2
+  [(V8DF XI) (V4DF OI) (V2DF TI)
+   (V8SF OI) (V4SF TI)])
+
 (define_mode_attr sseintvecmodelower
   [(V16SF v16si) (V8DF v8di)
(V8SF v8si) (V4DF v4di)
@@ -4236,15 +4243,58 @@
(set_attr prefix evex)
(set_attr mode OI)])
 
-(define_insn fix_truncv4dfv4si2
-  [(set (match_operand:V4SI 0 register_operand =x)
-   (fix:V4SI (match_operand:V4DF 1 nonimmediate_operand xm)))]
-  TARGET_AVX
-  vcvttpd2dq{y}\t{%1, %0|%0, %1}
+(define_insn ufix_truncv2dfv2si2mask_name
+  [(set (match_operand:V4SI 0 register_operand =v)
+   (vec_concat:V4SI
+ (unsigned_fix:V2SI (match_operand:V2DF 1 nonimmediate_operand vm))
+ (const_vector:V2SI [(const_int 0) (const_int 0)])))]
+  TARGET_AVX512VL
+  vcvttpd2udq{x}\t{%1, %0mask_operand2|%0mask_operand2, %1}
   [(set_attr type ssecvt)
-   (set_attr prefix vex)
+   (set_attr prefix evex)
+   (set_attr mode TI)])
+
+(define_insn fixsuffixfix_truncv4dfv4si2mask_name
+  [(set (match_operand:V4SI 0 register_operand =v)
+   (any_fix:V4SI (match_operand:V4DF 1 nonimmediate_operand vm)))]
+  (TARGET_AVX  !ufix_bool) || (TARGET_AVX512VL  TARGET_AVX512F)
+  vcvttpd2fixsuffixdq{y}\t{%1, %0mask_operand2|%0mask_operand2, %1}
+  [(set_attr type ssecvt)
+   (set_attr prefix maybe_evex)
(set_attr mode OI)])
 
+(define_insn 
fixsuffixfix_truncmodesseintvecmodelower2mask_nameround_saeonly_name
+  [(set (match_operand:sseintvecmode 0 register_operand =v)
+   (any_fix:sseintvecmode
+ (match_operand:VF2_AVX512VL 1 round_saeonly_nimm_predicate 
round_saeonly_constraint)))]
+  TARGET_AVX512DQ  round_saeonly_mode512bit_condition
+  vcvttpd2fixsuffixqq\t{round_saeonly_mask_op2%1, 
%0mask_operand2|%0mask_operand2, %1round_saeonly_mask_op2}
+  [(set_attr type ssecvt)
+   (set_attr prefix evex)
+   (set_attr mode sseintvecmode2)])
+
+(define_insn fix_notruncmodesseintvecmodelower2mask_nameround_name
+  [(set (match_operand:sseintvecmode 0 register_operand =v)
+   (unspec:sseintvecmode
+ [(match_operand:VF2_AVX512VL 1 round_nimm_predicate 
round_constraint)]
+ UNSPEC_FIX_NOTRUNC))]
+  TARGET_AVX512DQ  round_mode512bit_condition
+  vcvtpd2qq\t{round_mask_op2%1, %0mask_operand2|%0mask_operand2, 
%1round_mask_op2}
+  [(set_attr type ssecvt)
+   (set_attr prefix evex)
+   (set_attr mode sseintvecmode2)])
+
+(define_insn ufix_notruncmodesseintvecmodelower2mask_nameround_name
+  [(set (match_operand:sseintvecmode 0 register_operand =v)
+   (unspec:sseintvecmode
+ [(match_operand:VF2_AVX512VL 1 nonimmediate_operand 
round_constraint)]
+ UNSPEC_UNSIGNED_FIX_NOTRUNC))]
+  TARGET_AVX512DQ  round_mode512bit_condition
+  vcvtpd2uqq\t{round_mask_op2%1, %0mask_operand2|%0mask_operand2, 
%1round_mask_op2}
+  [(set_attr type ssecvt)
+   (set_attr prefix evex)
+   (set_attr mode sseintvecmode2)])
+
 (define_expand avx_cvttpd2dq256_2
   [(set (match_operand:V8SI 0 register_operand)
(vec_concat:V8SI


[PATCH i386 AVX512] [15/n] Extend vcvtudq2ps to avx512vl.

2014-08-14 Thread Kirill Yukhin
Hello,
This patch extends vcvtudq2ps to support
AVX-512VL new insns.

Bootstrapped.
New tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/sse.md
(define_mode_iterator VF1_AVX512VL): New.
(define_insn ufloatv16siv16sf2mask_nameround_name): Delete.
(define_insn 
ufloatsseintvecmodelowermode2mask_nameround_name):
New.

--
Thanks, K

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d5598d4..819dae1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -237,6 +237,9 @@
 (define_mode_iterator VF2_AVX512VL
   [V8DF (V4DF TARGET_AVX512VL) (V2DF TARGET_AVX512VL)])
 
+(define_mode_iterator VF1_AVX512VL
+  [V16SF (V8SF TARGET_AVX512VL) (V4SF TARGET_AVX512VL)])
+
 ;; All vector integer modes
 (define_mode_iterator VI
   [(V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F)
@@ -3687,15 +3690,15 @@
(set_attr prefix maybe_vex)
(set_attr mode sseinsnmode)])
 
-(define_insn ufloatv16siv16sf2mask_nameround_name
-  [(set (match_operand:V16SF 0 register_operand =v)
-   (unsigned_float:V16SF
- (match_operand:V16SI 1 round_nimm_predicate 
round_constraint)))]
+(define_insn ufloatsseintvecmodelowermode2mask_nameround_name
+  [(set (match_operand:VF1_AVX512VL 0 register_operand =v)
+   (unsigned_float:VF1_AVX512VL
+ (match_operand:sseintvecmode 1 nonimmediate_operand 
round_constraint)))]
   TARGET_AVX512F
   vcvtudq2ps\t{round_mask_op2%1, %0mask_operand2|%0mask_operand2, 
%1round_mask_op2}
   [(set_attr type ssecvt)
(set_attr prefix evex)
-   (set_attr mode V16SF)])
+   (set_attr mode MODE)])
 
 (define_expand floatunssseintvecmodelowermode2
   [(match_operand:VF1 0 register_operand)


Fwd: [PATCH i386 AVX512] [14/n] Add convert to PS insn patterns.

2014-08-14 Thread Kirill Yukhin
Adding community.


-- Forwarded message --
From: Kirill Yukhin kirill.yuk...@gmail.com
Date: Thu, Aug 14, 2014 at 4:16 PM
Subject: [PATCH i386 AVX512] [14/n] Add convert to PS insn patterns.
To: Uros Bizjak ubiz...@gmail.com


Hello,
This patch introduces new patterns for conversions
to PS.

Bootstrapped.
New tests on top of patch-set all pass
under simulator.

Is it ok for trunk?

gcc/
* config/i386/i386.c: Rename ufloatv8siv8df_mask to
ufloatv8siv8df2_mask.
* config/i386/i386.md
(define_code_iterator any_float): New.
(define_code_attr floatsuffix): New.
* config/i386/sse.md
(define_mode_iterator VF1_128_256VL): New.
(define_mode_iterator VF2_512_256VL): New.
(define_insn floatsi2dfmodelowermode2mask_name):
Remove unnecessary
TARGET check.
(define_insn ufloatv8siv8dfmask_name): Delete.
(define_insn
floatsuffixfloatsseintvecmodelowermode2mask_nameround_name):
New.
(define_mode_attr qq2pssuff): New.
(define_mode_attr sselongvecmode): New.
(define_mode_attr sselongvecmodelower): New.
(define_mode_attr sseintvecmode3): New.
(define_insn
floatsuffixfloatsselongvecmodelowermode2mask_nameround_name):
New.
(define_insn *floatsuffixfloatv2div2sf2): New.
(define_insn floatsuffixfloatv2div2sf2_mask): New.
(define_insn ufloatsi2dfmodelowermode2mask_name): New.
(define_insn ufloatv2siv2df2mask_name): New.

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index da01ac6..e2eda2a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -30036,7 +30036,7 @@ static const struct builtin_description bdesc_args[] =
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_compressv16sf_mask,
__builtin_ia32_compresssf512_mask, IX86_BUILTIN_COMPRESSPS512,
UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_HI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_floatv8siv8df2_mask,
__builtin_ia32_cvtdq2pd512_mask, IX86_BUILTIN_CVTDQ2PD512, UNKNOWN,
(int) V8DF_FTYPE_V8SI_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vcvtps2ph512_mask,
__builtin_ia32_vcvtps2ph512_mask, IX86_BUILTIN_CVTPS2PH512, UNKNOWN,
(int) V16HI_FTYPE_V16SF_INT_V16HI_HI },
-  { OPTION_MASK_ISA_AVX512F, CODE_FOR_ufloatv8siv8df_mask,
__builtin_ia32_cvtudq2pd512_mask, IX86_BUILTIN_CVTUDQ2PD512,
UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_QI },
+  { OPTION_MASK_ISA_AVX512F, CODE_FOR_ufloatv8siv8df2_mask,
__builtin_ia32_cvtudq2pd512_mask, IX86_BUILTIN_CVTUDQ2PD512,
UNKNOWN, (int) V8DF_FTYPE_V8SI_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_cvtusi2sd32,
__builtin_ia32_cvtusi2sd32, IX86_BUILTIN_CVTUSI2SD32, UNKNOWN, (int)
V2DF_FTYPE_V2DF_UINT },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_expandv8df_mask,
__builtin_ia32_expanddf512_mask, IX86_BUILTIN_EXPANDPD512, UNKNOWN,
(int) V8DF_FTYPE_V8DF_V8DF_QI },
   { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_expandv8df_maskz,
__builtin_ia32_expanddf512_maskz, IX86_BUILTIN_EXPANDPD512Z,
UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_QI },
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f6351cf..e686cf6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -883,6 +883,10 @@
 (define_code_attr fixsuffix [(fix ) (unsigned_fix u)])
 (define_code_attr ufix_bool [(fix false) (unsigned_fix true)])

+;; Used in signed and unsigned float.
+(define_code_iterator any_float [float unsigned_float])
+(define_code_attr floatsuffix [(float ) (unsigned_float u)])
+
 ;; All integer modes.
 (define_mode_iterator SWI1248x [QI HI SI DI])

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index eaf97ab..d5598d4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -205,6 +205,9 @@
 (define_mode_iterator VF1_128_256
   [(V8SF TARGET_AVX) V4SF])

+(define_mode_iterator VF1_128_256VL
+  [V8SF (V4SF TARGET_AVX512VL)])
+
 ;; All DFmode vector float modes
 (define_mode_iterator VF2
   [(V8DF TARGET_AVX512F) (V4DF TARGET_AVX) V2DF])
@@ -216,6 +219,9 @@
 (define_mode_iterator VF2_512_256
   [(V8DF TARGET_AVX512F) (V4DF TARGET_AVX)])

+(define_mode_iterator VF2_512_256VL
+  [V8DF (V4DF TARGET_AVX512VL)])
+
 ;; All 128bit vector float modes
 (define_mode_iterator VF_128
   [V4SF (V2DF TARGET_SSE2)])
@@ -4091,21 +4097,94 @@
 (define_insn floatsi2dfmodelowermode2mask_name
   [(set (match_operand:VF2_512_256 0 register_operand =v)
(float:VF2_512_256 (match_operand:si2dfmode 1
nonimmediate_operand vm)))]
-  TARGET_AVX  mask_mode512bit_condition
+  mask_mode512bit_condition
   vcvtdq2pd\t{%1, %0mask_operand2|%0mask_operand2, %1}
   [(set_attr type ssecvt)
(set_attr prefix maybe_vex)
(set_attr mode MODE)])

-(define_insn ufloatv8siv8dfmask_name
-  [(set (match_operand:V8DF 0 register_operand =v)
-   (unsigned_float:V8DF
- (match_operand:V8SI 1 nonimmediate_operand vm)))]
-  TARGET_AVX512F
+(define_insn 
floatsuffixfloatsseintvecmodelowermode2mask_nameround_name
+  [(set 

Re: [PATCH i386 AVX512] [10/n] Add vector move/load/store.

2014-08-14 Thread Kirill Yukhin
On 14 Aug 13:45, Uros Bizjak wrote:
 Please update the above entry.
Whoops. Updated ChangeLog:
gcc/
* config/i386/i386.c
(ix86_expand_special_args_builtin): Handle avx512vl_storev8sf_mask,
avx512vl_storev8si_mask, avx512vl_storev4df_mask, 
avx512vl_storev4di_mask,
avx512vl_storev4sf_mask, avx512vl_storev4si_mask, 
avx512vl_storev2df_mask,
avx512vl_storev2di_mask, avx512vl_loadv8sf_mask, avx512vl_loadv8si_mask,
avx512vl_loadv4df_mask, avx512vl_loadv4di_mask, avx512vl_loadv4sf_mask,
avx512vl_loadv4si_mask, avx512vl_loadv2df_mask, avx512vl_loadv2di_mask,
avx512bw_loadv64qi_mask, avx512vl_loadv32qi_mask, 
avx512vl_loadv16qi_mask,
avx512bw_loadv32hi_mask, avx512vl_loadv16hi_mask, 
avx512vl_loadv8hi_mask.
* config/i386/i386.md (define_mode_attr ssemodesuffix): Allow V32HI 
mode.
* config/i386/sse.md
(define_mode_iterator VMOVE): Allow V4TI mode.
(define_mode_iterator V_AVX512VL): New.
(define_mode_iterator V): New handling for AVX512VL.
(define_insn avx512f_loadmode_mask): Delete.
(define_insn avx512_loadmode_mask): New.
(define_insn avx512f_storemode_mask): Delete.
(define_insn avx512_storemode_mask): New.


  --- a/gcc/config/i386/sse.md
  +++ b/gcc/config/i386/sse.md
  @@ -146,10 +146,21 @@
  (V32HI TARGET_AVX512F) (V16HI TARGET_AVX) V8HI
  (V16SI TARGET_AVX512F) (V8SI TARGET_AVX) V4SI
  (V8DI TARGET_AVX512F)  (V4DI TARGET_AVX) V2DI
  -   (V2TI TARGET_AVX) V1TI
  +   (V4TI TARGET_AVX) (V2TI TARGET_AVX) V1TI
 
 Are you sure TARGET_AVX is the correct condition for V4TI?
Right! This should be TARGET_AVX512BW (because corresponding shifts
belong to AVX-512BW). 
  +;; All AVX512VL vector modes
  +(define_mode_iterator V_AVX512VL
  +  [(V64QI TARGET_AVX512BW) (V32QI TARGET_AVX512VL  TARGET_AVX512BW)
  +   (V16QI TARGET_AVX512VL  TARGET_AVX512BW)
  +   (V32HI TARGET_AVX512BW) (V16HI TARGET_AVX512VL  TARGET_AVX512BW)
 
 Is the above  OK? So, you have to pass -m...vl and -m...bw to the
 compiler to enable these modes?
Yeah. This looks strange, but should be so.
Simplest example: vpaddb (with regno  15). This insn
is enabled only when AVX512VL *and* AVX512BW bits are on.

Updated patch in the bottom.

Bootstrapped.

Is it ok?

--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 183b7be..da01ac6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34722,6 +34722,14 @@ ix86_expand_special_args_builtin (const struct 
builtin_description *d,
case CODE_FOR_avx512f_storev16si_mask:
case CODE_FOR_avx512f_storev8df_mask:
case CODE_FOR_avx512f_storev8di_mask:
+   case CODE_FOR_avx512vl_storev8sf_mask:
+   case CODE_FOR_avx512vl_storev8si_mask:
+   case CODE_FOR_avx512vl_storev4df_mask:
+   case CODE_FOR_avx512vl_storev4di_mask:
+   case CODE_FOR_avx512vl_storev4sf_mask:
+   case CODE_FOR_avx512vl_storev4si_mask:
+   case CODE_FOR_avx512vl_storev2df_mask:
+   case CODE_FOR_avx512vl_storev2di_mask:
  aligned_mem = true;
  break;
default:
@@ -34765,6 +34773,20 @@ ix86_expand_special_args_builtin (const struct 
builtin_description *d,
case CODE_FOR_avx512f_loadv16si_mask:
case CODE_FOR_avx512f_loadv8df_mask:
case CODE_FOR_avx512f_loadv8di_mask:
+   case CODE_FOR_avx512vl_loadv8sf_mask:
+   case CODE_FOR_avx512vl_loadv8si_mask:
+   case CODE_FOR_avx512vl_loadv4df_mask:
+   case CODE_FOR_avx512vl_loadv4di_mask:
+   case CODE_FOR_avx512vl_loadv4sf_mask:
+   case CODE_FOR_avx512vl_loadv4si_mask:
+   case CODE_FOR_avx512vl_loadv2df_mask:
+   case CODE_FOR_avx512vl_loadv2di_mask:
+   case CODE_FOR_avx512bw_loadv64qi_mask:
+   case CODE_FOR_avx512vl_loadv32qi_mask:
+   case CODE_FOR_avx512vl_loadv16qi_mask:
+   case CODE_FOR_avx512bw_loadv32hi_mask:
+   case CODE_FOR_avx512vl_loadv16hi_mask:
+   case CODE_FOR_avx512vl_loadv8hi_mask:
  aligned_mem = true;
  break;
default:
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a72c206..b8ce6c0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1054,7 +1054,7 @@
(V4SF ps) (V2DF pd)
(V16QI b) (V8HI w) (V4SI d) (V2DI q)
(V32QI b) (V16HI w) (V8SI d) (V4DI q)
-   (V64QI b) (V16SI d) (V8DI q)])
+   (V64QI b) (V32HI w) (V16SI d) (V8DI q)])
 
 ;; SSE vector suffix for floating point modes
 (define_mode_attr ssevecmodesuffix [(SF ps) (DF pd)])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 89a1842..ea56bcb 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -146,10 +146,21 @@
(V32HI TARGET_AVX512F) (V16HI TARGET_AVX) V8HI
(V16SI TARGET_AVX512F) (V8SI TARGET_AVX) V4SI
(V8DI TARGET_AVX512F)  (V4DI TARGET_AVX) V2DI
-   (V2TI TARGET_AVX) V1TI
+   (V4TI TARGET_AVX512BW) (V2TI TARGET_AVX) V1TI
(V16SF TARGET_AVX512F) (V8SF 

[PATCH] Bump BASE-VER, change snapshots to gcc-5-2014xxxxx

2014-08-14 Thread Richard Biener

See $subject.

Ok?

Thanks,
Richard.

2014-08-14  Richard Biener  rguent...@suse.de

maintainer-scripts/
* crontab: Change trunk snapshots from 4.10 to 5.

gcc/
* BASE-VER: Change to 5.0.0

Index: maintainer-scripts/crontab
===
--- maintainer-scripts/crontab  (revision 213958)
+++ maintainer-scripts/crontab  (working copy)
@@ -3,4 +3,4 @@
 55  0 * * * sh /home/gccadmin/scripts/update_web_docs_libstdcxx_svn
 32 22 * * 3 sh /home/gccadmin/scripts/gcc_release -s 
4.9:branches/gcc-4_9-branch -l -d /sourceware/snapshot-tmp/gcc all
 32 22 * * 4 sh /home/gccadmin/scripts/gcc_release -s 
4.8:branches/gcc-4_8-branch -l -d /sourceware/snapshot-tmp/gcc all
-32 22 * * 7 sh /home/gccadmin/scripts/gcc_release -s 4.10:trunk -l -d 
/sourceware/snapshot-tmp/gcc all
+32 22 * * 7 sh /home/gccadmin/scripts/gcc_release -s 5:trunk -l -d 
/sourceware/snapshot-tmp/gcc all
Index: gcc/BASE-VER
===
--- gcc/BASE-VER(revision 213958)
+++ gcc/BASE-VER(working copy)
@@ -1 +1 @@
-4.10.0
+5.0.0


Re: [PATCH] fix hardreg_cprop to honor HARD_REGNO_MODE_OK.

2014-08-14 Thread Ilya Tocar
 I've observed SPEC2006 failure on avx512-vlbwdq branch.
 It was caused by  hardreg_cprop. In maybe_mode_change it was
 assumed, that all values of the same register class and same mode.
 are ok. This is not the case for i386/avx512. We need to honor
 HARD_REGNO_MODE_OK.

 One could argue that having a class where some members are OK for being used
 in a particular mode, but other members are not is the core issue here.
 
 Can you describe a bit about why you've got a class of that nature?
 Background on that would be useful.
 

AVX512 added new 16 xmm registers (xmm16-xmm31).
Those registers require evex encoding.
Only 512-bit wide versions of instructions have evex encoding with
avx512f, but all versions have it with avx512vl.
Most instructions have same macroized pattern for 128/256/512 vector
length. They all use constraint 'v', which corresponds to 
class ALL_SSE_REGS (xmm0 - xmm31). To disallow e. g. xmm20 in
256-bit case (avx512f) and allow it only in avx512vl case we have
HARD_REGNO_MODE_OK checking for regno being evex-only and
disallowing it if mode is not 512-bit.


Re: [PATCH] Bump BASE-VER, change snapshots to gcc-5-2014xxxxx

2014-08-14 Thread Jakub Jelinek
On Thu, Aug 14, 2014 at 03:09:26PM +0200, Richard Biener wrote:
 
 See $subject.
 
 Ok?
 
 Thanks,
 Richard.
 
 2014-08-14  Richard Biener  rguent...@suse.de
 
   maintainer-scripts/
   * crontab: Change trunk snapshots from 4.10 to 5.
 
   gcc/
   * BASE-VER: Change to 5.0.0

LGTM.

 --- maintainer-scripts/crontab(revision 213958)
 +++ maintainer-scripts/crontab(working copy)
 @@ -3,4 +3,4 @@
  55  0 * * * sh /home/gccadmin/scripts/update_web_docs_libstdcxx_svn
  32 22 * * 3 sh /home/gccadmin/scripts/gcc_release -s 
 4.9:branches/gcc-4_9-branch -l -d /sourceware/snapshot-tmp/gcc all
  32 22 * * 4 sh /home/gccadmin/scripts/gcc_release -s 
 4.8:branches/gcc-4_8-branch -l -d /sourceware/snapshot-tmp/gcc all
 -32 22 * * 7 sh /home/gccadmin/scripts/gcc_release -s 4.10:trunk -l -d 
 /sourceware/snapshot-tmp/gcc all
 +32 22 * * 7 sh /home/gccadmin/scripts/gcc_release -s 5:trunk -l -d 
 /sourceware/snapshot-tmp/gcc all
 Index: gcc/BASE-VER
 ===
 --- gcc/BASE-VER  (revision 213958)
 +++ gcc/BASE-VER  (working copy)
 @@ -1 +1 @@
 -4.10.0
 +5.0.0

Jakub


[PINGv3][PATCHv3] Fix vector tests on ARM platforms with disabled unaligned accesses

2014-08-14 Thread Marat Zakirov


On 08/07/2014 12:52 PM, Marat Zakirov wrote:


On 07/31/2014 04:08 PM, Marat Zakirov wrote:


On 07/24/2014 07:40 PM, Marat Zakirov wrote:


On 07/24/2014 04:27 PM, Marat Zakirov wrote:


On 07/23/2014 06:23 PM, Marat Zakirov wrote:

Hi there!

I made a patch which fixes regressions on ARM platforms with 
disabled unaligned accesses. The problem is that 
'arm_vect_no_misalign' predicate do not check 'unaligned_access' 
global variable to determine whether unaligned access to vector 
are allowed. This leads to spurious vect.exp test fails when GCC 
is configured 
--with-specs=%{!munaligned-access:-mno-unaligned-access}.


Attached patch fixes ARM predicate and several tests to correctly 
handle the issue.


The following targets were reg. tested for multiple targets (ARM, 
Thumb-1, Thumb-2, x86, x86_64) with and without 
-mno-unaligned-access.  Analysis showed patch affects only 
vect.exp tests so only vect.exp was tested.


For x86, x86_64,  ARM without -mno-unaligned-access, Thumb-2 
without -mno-unaligned-access and Thumb-1 no regressions occured. 
For ARM/Thumb2 with -mno-unaligned-access patch fixed most of 
failures but triggered some problems (see attached log) for 
current vect.exp tests:

1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61887
2) Some XPASS'es due to unexpected loop versioning (e.g. 
gcc.dg/vect/pr33804.c).
3) After predicate fix some passing tests which require unaligned 
vector support become NA (this was expected).


Here is new version of patch and regression log. On the current 
trunk results are slightly different due to patches for Richard 
Biener (no UNRESOLVED fails) but some PASS-XPASS regressions still 
remain (see attachment):


PASS-XPASS: gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c 
scan-tree-dump-times vect vectorized 1 loops 1
PASS-XPASS: gcc.dg/vect/pr33804.c -flto -ffat-lto-objects 
scan-tree-dump-times vect vectorized 1 loops 1

etc.

These XPASS'es are due to code versioning: current GCC creates 2 
versions of loop: aligned and misaligned. It's look like they are 
slightly out of date at lest for ARM.




On 07/24/2014 06:50 PM, Ramana Radhakrishnan wrote:

This is redundant.


-   || (defined(__ARMEL__) \
+   || (defined(__ARM_FEATURE_UNALIGNED) \
+defined(__ARMEL__) \
  (!defined(__thumb__) || defined(__thumb2__)))

As is this line.

I think you can restrict the check to defined(__ARM_FEATURE_UNALIGNED)
 defined(__ARMEL__)

  __ARM_FEATURE_UNALIGNED should tell you whether unaligned access is
allowed or not, therefore you should no longer require any specific
architectural checks.



 #error FOO
 #endif


I'm not sure about the original intent of the tests right now.

Ramana



Thank you Ramana!

--Marat






gcc/testsuite/ChangeLog:

2014-07-23  Marat Zakirov  m.zaki...@samsung.com

	* gcc.dg/vect/vect-109.c: Skip predicate added.
	* gcc.dg/vect/vect-93.c: Test check fixed.
	* gcc.dg/vect/bb-slp-10.c: Likewise.
	* lib/target-supports.exp (check_effective_target_arm_vect_no_misalign):
	Check unaligned feature.

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-10.c b/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
index a1850ed..0090a4b 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-10.c
@@ -49,7 +49,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times unsupported alignment in basic block. 1 slp2 { xfail vect_element_align } } } */
+/* { dg-final { scan-tree-dump unsupported alignment in basic block. 1 slp2 { xfail vect_element_align } } } */
 /* { dg-final { scan-tree-dump-times basic block vectorized 1 slp2 { target vect_element_align } } } */
 /* { dg-final { cleanup-tree-dump slp2 } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/vect-109.c b/gcc/testsuite/gcc.dg/vect/vect-109.c
index 854c970..c671175 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-109.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-109.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if  { vect_no_align } } */
 /* { dg-require-effective-target vect_int } */
 
 #include stdarg.h
diff --git a/gcc/testsuite/gcc.dg/vect/vect-93.c b/gcc/testsuite/gcc.dg/vect/vect-93.c
index 65403eb..1065a6e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-93.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-93.c
@@ -79,7 +79,7 @@ int main (void)
 /* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { target vect_no_align } } } */
 
 /* in main: */
-/* { dg-final { scan-tree-dump-times vectorized 0 loops 1 vect { target vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { target vect_no_align } } } */
 /* { dg-final { scan-tree-dump-times Vectorizing an unaligned access 1 vect { xfail { vect_no_align } } } } */
 
 /* { dg-final { cleanup-tree-dump vect } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index db65ebe..35076d2 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2293,8 +2293,8 @@ proc 

Re: [PATCH i386 AVX512] [12/n] Extend OI/TImode moves.

2014-08-14 Thread Uros Bizjak
On Thu, Aug 14, 2014 at 1:42 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends moves to OI/TI mode.

 Bootstrapped.
 New tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.md
 (define_insn *movoi_internal_avx): Add EVEX version.
 (define_insn *movti_internal): Ditto.

OK.

Thanks,
Uros.


Re: [PATCH PR62011]

2014-08-14 Thread Alexander Monakov


On Thu, 14 Aug 2014, Yuri Rumyantsev wrote:

 Hi All,
 
 Here is a fix for PR 62011 - remove false dependency for unary
 bit-manipulation instructions for latest BigCore chips (Sandybridge
 and Haswell) by outputting in assembly file zeroing destination
 register before bmi instruction. I checked that performance restored
 for popcnt, lzcnt and tzcnt instructions.

I am not an x86 reviewer, but one thing looks a bit superfluous to me:

 +/* Retirn true if we need to insert before bit-manipulation instruction
note typo^

 +   zeroing of its destination register.  */
 +bool
 +ix86_avoid_false_dep_for_bm (rtx insn, rtx operands[])
 +{
 +  unsigned int regno0;
 +  df_ref use;
 +  if (!TARGET_AVOID_FALSE_DEP_FOR_BM || optimize_function_for_size_p (cfun))
 +return false;
 +  regno0 = true_regnum (operands[0]);
 +  /* Check if insn does not use REGNO0.  */
 +  FOR_EACH_INSN_USE (use, insn)
 +if (regno0 == DF_REF_REGNO (use))
 +  return false;
 +  return true;
 +}

The loop is to prevent adding the xor when the dest operand is also the source
operand.  Looks like a simpler reg_or_subregno (operands[0]) ==
reg_or_subregno (operands[1]) could be used here, as long as the assumption
that this is called only for two-operand instruction holds?

Alexander


Re: [PATCH PR62011]

2014-08-14 Thread Ilya Enkovich
2014-08-14 18:00 GMT+04:00 Alexander Monakov amona...@ispras.ru:


 On Thu, 14 Aug 2014, Yuri Rumyantsev wrote:

 Hi All,

 Here is a fix for PR 62011 - remove false dependency for unary
 bit-manipulation instructions for latest BigCore chips (Sandybridge
 and Haswell) by outputting in assembly file zeroing destination
 register before bmi instruction. I checked that performance restored
 for popcnt, lzcnt and tzcnt instructions.

 I am not an x86 reviewer, but one thing looks a bit superfluous to me:

 +/* Retirn true if we need to insert before bit-manipulation instruction
 note typo^

 +   zeroing of its destination register.  */
 +bool
 +ix86_avoid_false_dep_for_bm (rtx insn, rtx operands[])
 +{
 +  unsigned int regno0;
 +  df_ref use;
 +  if (!TARGET_AVOID_FALSE_DEP_FOR_BM || optimize_function_for_size_p (cfun))
 +return false;
 +  regno0 = true_regnum (operands[0]);
 +  /* Check if insn does not use REGNO0.  */
 +  FOR_EACH_INSN_USE (use, insn)
 +if (regno0 == DF_REF_REGNO (use))
 +  return false;
 +  return true;
 +}

 The loop is to prevent adding the xor when the dest operand is also the source
 operand.  Looks like a simpler reg_or_subregno (operands[0]) ==
 reg_or_subregno (operands[1]) could be used here, as long as the assumption
 that this is called only for two-operand instruction holds?

This wouldn't cover memory operand case.

Ilya


 Alexander


Re: [PATCH i386 AVX512] [13/n] Add VF2_AVX512VL and vcvt[t]pd2 insns.

2014-08-14 Thread Uros Bizjak
On Thu, Aug 14, 2014 at 2:12 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch introduces vcvt[t]pd2 patterns.

 Bootstrapped.
 New tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/i386.md
 (define_code_attr ufix_bool): New.
 * config/i386/sse.md
 (define_mode_iterator VF2_AVX512VL): New.
 (define_mode_attr sseintvecmode2): New.
 (define_insn ufix_truncv2dfv2si2mask_name): Add masking.
 (define_insn fixsuffixfix_truncv4dfv4si2mask_name): New.
 (define_insn
 
 fixsuffixfix_truncmodesseintvecmodelower2mask_nameround_saeonly_name):
 New.
 (define_insn 
 fix_notruncmodesseintvecmodelower2mask_nameround_name):
 New.
 (define_insn 
 ufix_notruncmodesseintvecmodelower2mask_nameround_name):
 New.

 --
 Thanks, K

 diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
 index 39fb2307..f6351cf 100644
 --- a/gcc/config/i386/i386.md
 +++ b/gcc/config/i386/i386.md
 @@ -881,6 +881,7 @@
  ;; Used in signed and unsigned fix.
  (define_code_iterator any_fix [fix unsigned_fix])
  (define_code_attr fixsuffix [(fix ) (unsigned_fix u)])
 +(define_code_attr ufix_bool [(fix false) (unsigned_fix true)])

  ;; All integer modes.
  (define_mode_iterator SWI1248x [QI HI SI DI])
 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
 index 3d3d1a0..eaf97ab 100644
 --- a/gcc/config/i386/sse.md
 +++ b/gcc/config/i386/sse.md
 @@ -228,6 +228,9 @@
  (define_mode_iterator VF_512
[V16SF V8DF])

 +(define_mode_iterator VF2_AVX512VL
 +  [V8DF (V4DF TARGET_AVX512VL) (V2DF TARGET_AVX512VL)])
 +
  ;; All vector integer modes
  (define_mode_iterator VI
[(V16SI TARGET_AVX512F) (V8DI TARGET_AVX512F)
 @@ -523,6 +526,10 @@
 (V32HI V32HI) (V64QI V64QI)
 (V32QI V32QI) (V16QI V16QI)])

 +(define_mode_attr sseintvecmode2
 +  [(V8DF XI) (V4DF OI) (V2DF TI)
 +   (V8SF OI) (V4SF TI)])
 +
  (define_mode_attr sseintvecmodelower
[(V16SF v16si) (V8DF v8di)
 (V8SF v8si) (V4DF v4di)
 @@ -4236,15 +4243,58 @@
 (set_attr prefix evex)
 (set_attr mode OI)])

 -(define_insn fix_truncv4dfv4si2
 -  [(set (match_operand:V4SI 0 register_operand =x)
 -   (fix:V4SI (match_operand:V4DF 1 nonimmediate_operand xm)))]
 -  TARGET_AVX
 -  vcvttpd2dq{y}\t{%1, %0|%0, %1}
 +(define_insn ufix_truncv2dfv2si2mask_name
 +  [(set (match_operand:V4SI 0 register_operand =v)
 +   (vec_concat:V4SI
 + (unsigned_fix:V2SI (match_operand:V2DF 1 nonimmediate_operand 
 vm))
 + (const_vector:V2SI [(const_int 0) (const_int 0)])))]
 +  TARGET_AVX512VL
 +  vcvttpd2udq{x}\t{%1, %0mask_operand2|%0mask_operand2, %1}
[(set_attr type ssecvt)
 -   (set_attr prefix vex)
 +   (set_attr prefix evex)
 +   (set_attr mode TI)])
 +
 +(define_insn fixsuffixfix_truncv4dfv4si2mask_name
 +  [(set (match_operand:V4SI 0 register_operand =v)
 +   (any_fix:V4SI (match_operand:V4DF 1 nonimmediate_operand vm)))]
 +  (TARGET_AVX  !ufix_bool) || (TARGET_AVX512VL  TARGET_AVX512F)
 +  vcvttpd2fixsuffixdq{y}\t{%1, %0mask_operand2|%0mask_operand2, %1}
 +  [(set_attr type ssecvt)
 +   (set_attr prefix maybe_evex)
 (set_attr mode OI)])

Please split this pattern to fix and unsigned_fix. The macroisation
here is confusing. You should then remove ufix_bool attribute.

OK with this change.

Thanks,
Uros.


Re: [C++ Patch] PR 54377

2014-08-14 Thread Jason Merrill

On 08/14/2014 07:14 AM, Paolo Carlini wrote:

+  nparms -= variadic_p ? variadic_p : default_p;


What if you have both default arguments and parameter packs?

Jason



Re: [PATCH i386 AVX512] [15/n] Extend vcvtudq2ps to avx512vl.

2014-08-14 Thread Uros Bizjak
On Thu, Aug 14, 2014 at 2:20 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 This patch extends vcvtudq2ps to support
 AVX-512VL new insns.

 Bootstrapped.
 New tests on top of patch-set all pass
 under simulator.

 Is it ok for trunk?

 gcc/
 * config/i386/sse.md
 (define_mode_iterator VF1_AVX512VL): New.
 (define_insn ufloatv16siv16sf2mask_nameround_name): Delete.
 (define_insn 
 ufloatsseintvecmodelowermode2mask_nameround_name):
 New.

OK.

Thanks,
Uros.


Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-08-14 Thread Richard Biener
On Mon, Jul 28, 2014 at 8:08 AM, Wei Mi w...@google.com wrote:
 But fact is that it is _not_ necessary to split the block because there
 are no outgoing abnormal edges from it.

 The verifier failure is an artifact from using the same predicates during
 CFG building and CFG verifying (usually ok, but for this particular
 case it leads to this issue).

 So I don't think your patch is the proper way to address this issue
 (while it certainly works).

 Instead whether a call can make abnormal gotos should be recorded
 per-call and stored on the gimple-call.  Martin - this is exactly
 one of the cases your patch would address?


 Thanks for the comment and thanks to Martin's patch. I try the patch.
 It works well to address both pr60449 and pr61776 after some
 extension. One extension is to replace GF_CALL_LEAF attribute using
 GF_CALL_NO_ABNORMAL_GOTO. That is because not only dropping leaf
 attribute in lto symbol merge could introduce the control flow
 verification problem in pr60449, dropping const/pure attributes
 could introduce the same problem too. It is unnecessary to introduce
 per-call attributes for all these three: ECF_LEAF/ECF_CONST/ECF_PURE,
 so GF_CALL_NO_ABNORMAL_GOTO is introduced to indicate that a call stmt
 has no abnormal goto.

 GF_CALL_NO_ABNORMAL_GOTO will be set according to gimple_call_flags()
 once gimple call stmt is created, then updated in execute_fixup_cfg
 and cleanup_tree_cfg.

 I posted the extended patch here. I didn't add the noreturn part in
 because it has no direct impact on pr60449 and pr61776. I can help
 Martin to test and post that part as an independent patch later.

 bootstrap and regression pass on x86_64-linux-gnu. Is it ok?

+static void
+update_no_abnormal_goto_attr (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+{

it should be enough to check these on last stmts of a basic block, no?

That you call update_no_abnormal_goto_attr from two places
before cleanup_tree_cfg_bb suggests you may want to perform
this change in cleanup_control_flow_bb which already looks
at the last stmt only?

Btw, I originally had this whole idea of moving flags to the gimple
stmt level because of the interesting way we handle the noreturn
attribute (calls to noreturn functions also end basic-blocks).

Thus would it be possible to turn all these into a single flag,
GF_CALL_CTRL_ALTERING?  That is, cover everything
that is_ctrl_altering_stmt covers?  I suggest we initialize it at
CFG build time and only ever clear it later.

Sorry for not thinking about this earlier (and for the slow review).

Thanks,
Richard.

 Thanks,
 Wei.


Re: Fix if-conversion pass for dead type-unsafe code

2014-08-14 Thread Richard Biener
On Sat, Aug 9, 2014 at 7:14 AM, Tom de Vries tom_devr...@mentor.com wrote:
 On 08-08-14 17:17, Tom de Vries wrote:

 Maybe instead of a new mem_alias_equal_p simply compare MEM_ATTRs
 with mem_attrs_eq_p?


 I propose to fix it this way (as attached) on 4.8/4.9/trunk, and maybe do
 a more
 efficient handling on trunk as a follow-up patch.

 I'll put this through bootstrap/test again.


 Bootstrapped and reg-tested on trunk x86_64.

 Re-attached patch OK for trunk, 4.8, 4.9 ?

Ok.  (did you check the effect on code-generation?  that is, how many
opportunities compiling GCC do we lose?)

Thanks,
Richard.

 Thanks,
 - Tom


[PATCH][LTO] Hide string streaming details

2014-08-14 Thread Richard Biener

This hides a part of the internal strings processing detail by
using appropriate APIs.

LTO bootstrap ongoing on x86_64-unknown-linux-gnu.

Richard.

2014-08-14  Richard Biener  rguent...@suse.de

* data-streamer.h (streamer_string_index, string_for_index):
Remove.
* data-streamer-out.c (streamer_string_index): Make static.
* data-streamer-in.c (string_for_index): Likewise.
* lto-streamer-out.c (lto_output_location): Use bp_pack_string.
* lto-streamer-in.c (lto_input_location): Use bp_unpack_string.

Index: gcc/data-streamer.h
===
--- gcc/data-streamer.h (revision 213966)
+++ gcc/data-streamer.h (working copy)
@@ -57,8 +57,6 @@ void streamer_write_hwi (struct output_b
 void streamer_write_gcov_count (struct output_block *, gcov_type);
 void streamer_write_string (struct output_block *, struct lto_output_stream *,
const char *, bool);
-unsigned streamer_string_index (struct output_block *, const char *,
-   unsigned int, bool);
 void streamer_write_string_with_length (struct output_block *,
struct lto_output_stream *,
const char *, unsigned int, bool);
@@ -74,7 +72,6 @@ void streamer_write_data_stream (struct
 size_t);
 
 /* In data-streamer-in.c  */
-const char *string_for_index (struct data_in *, unsigned int, unsigned int *);
 const char *streamer_read_string (struct data_in *, struct lto_input_block *);
 const char *streamer_read_indexed_string (struct data_in *,
  struct lto_input_block *,
Index: gcc/data-streamer-out.c
===
--- gcc/data-streamer-out.c (revision 213966)
+++ gcc/data-streamer-out.c (working copy)
@@ -81,7 +81,7 @@ lto_append_block (struct lto_output_stre
When PERSISTENT is set, the string S is supposed to not change during
duration of the OB and thus OB can keep pointer into it.  */
 
-unsigned
+static unsigned
 streamer_string_index (struct output_block *ob, const char *s, unsigned int 
len,
   bool persistent)
 {
Index: gcc/data-streamer-in.c
===
--- gcc/data-streamer-in.c  (revision 213966)
+++ gcc/data-streamer-in.c  (working copy)
@@ -36,7 +36,7 @@ along with GCC; see the file COPYING3.
 /* Read a string from the string table in DATA_IN using input block
IB.  Write the length to RLEN.  */
 
-const char *
+static const char *
 string_for_index (struct data_in *data_in, unsigned int loc, unsigned int 
*rlen)
 {
   unsigned int len;
Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  (revision 213966)
+++ gcc/lto-streamer-out.c  (working copy)
@@ -189,10 +189,7 @@ lto_output_location (struct output_block
   bp_pack_value (bp, ob-current_col != xloc.column, 1);
 
   if (ob-current_file != xloc.file)
-bp_pack_var_len_unsigned (bp,
- streamer_string_index (ob, xloc.file,
-strlen (xloc.file) + 1,
-true));
+bp_pack_string (ob, bp, xloc.file, true);
   ob-current_file = xloc.file;
 
   if (ob-current_line != xloc.line)
Index: gcc/lto-streamer-in.c
===
--- gcc/lto-streamer-in.c   (revision 213966)
+++ gcc/lto-streamer-in.c   (working copy)
@@ -154,7 +154,6 @@ lto_input_location (struct bitpack_d *bp
   static int current_line;
   static int current_col;
   bool file_change, line_change, column_change;
-  unsigned len;
   bool prev_file = current_file != NULL;
 
   if (bp_unpack_value (bp, 1))
@@ -165,10 +164,7 @@ lto_input_location (struct bitpack_d *bp
   column_change = bp_unpack_value (bp, 1);
 
   if (file_change)
-current_file = canon_file_name
-(string_for_index (data_in,
-   bp_unpack_var_len_unsigned (bp),
-   len));
+current_file = canon_file_name (bp_unpack_string (data_in, bp));
 
   if (line_change)
 current_line = bp_unpack_var_len_unsigned (bp);


[PATCH] Fix PR62031

2014-08-14 Thread Richard Biener

This fixes wrong answer from data-dependence analysis by realizing
that _all_ (even non-evolving) indirect accesses cannot be constrained
to a full object size.  This also gets rid of that ugly
DR_UNCONSTRAINED_BASE hack (but effectively make it always active).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-08-14  Richard Biener  rguent...@suse.de

PR tree-optimization/62031
* tree-data-ref.c (dr_analyze_indices): Do not set
DR_UNCONSTRAINED_BASE.
(dr_may_alias_p): All indirect accesses have to go the
formerly DR_UNCONSTRAINED_BASE path.
* tree-data-ref.h (struct indices): Remove
unconstrained_base member.
(DR_UNCONSTRAINED_BASE): Remove.

* gcc.dg/torture/pr62031.c: New testcase.

Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c (revision 213958)
+++ gcc/tree-data-ref.c (working copy)
@@ -982,7 +982,6 @@ dr_analyze_indices (struct data_referenc
  ref = fold_build2_loc (EXPR_LOCATION (ref),
 MEM_REF, TREE_TYPE (ref),
 base, memoff);
- DR_UNCONSTRAINED_BASE (dr) = true;
  access_fns.safe_push (access_fn);
}
 }
@@ -1389,14 +1388,20 @@ dr_may_alias_p (const struct data_refere
return false;
 }
 
-  /* If we had an evolution in a MEM_REF BASE_OBJECT we do not know
- the size of the base-object.  So we cannot do any offset/overlap
- based analysis but have to rely on points-to information only.  */
+  /* If we had an evolution in a pointer-based MEM_REF BASE_OBJECT we
+ do not know the size of the base-object.  So we cannot do any
+ offset/overlap based analysis but have to rely on points-to
+ information only.  */
   if (TREE_CODE (addr_a) == MEM_REF
-   DR_UNCONSTRAINED_BASE (a))
+   TREE_CODE (TREE_OPERAND (addr_a, 0)) == SSA_NAME)
 {
-  if (TREE_CODE (addr_b) == MEM_REF
-  DR_UNCONSTRAINED_BASE (b))
+  /* For true dependences we can apply TBAA.  */
+  if (flag_strict_aliasing
+  DR_IS_WRITE (a)  DR_IS_READ (b)
+  !alias_sets_conflict_p (get_alias_set (DR_REF (a)),
+get_alias_set (DR_REF (b
+   return false;
+  if (TREE_CODE (addr_b) == MEM_REF)
return ptr_derefs_may_alias_p (TREE_OPERAND (addr_a, 0),
   TREE_OPERAND (addr_b, 0));
   else
@@ -1404,9 +1409,21 @@ dr_may_alias_p (const struct data_refere
   build_fold_addr_expr (addr_b));
 }
   else if (TREE_CODE (addr_b) == MEM_REF
-   DR_UNCONSTRAINED_BASE (b))
-return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
-  TREE_OPERAND (addr_b, 0));
+   TREE_CODE (TREE_OPERAND (addr_b, 0)) == SSA_NAME)
+{
+  /* For true dependences we can apply TBAA.  */
+  if (flag_strict_aliasing
+  DR_IS_WRITE (a)  DR_IS_READ (b)
+  !alias_sets_conflict_p (get_alias_set (DR_REF (a)),
+get_alias_set (DR_REF (b
+   return false;
+  if (TREE_CODE (addr_a) == MEM_REF)
+   return ptr_derefs_may_alias_p (TREE_OPERAND (addr_a, 0),
+  TREE_OPERAND (addr_b, 0));
+  else
+   return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
+  TREE_OPERAND (addr_b, 0));
+}
 
   /* Otherwise DR_BASE_OBJECT is an access that covers the whole object
  that is being subsetted in the loop nest.  */

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h (revision 213958)
+++ gcc/tree-data-ref.h (working copy)
@@ -81,10 +81,6 @@ struct indices
 
   /* A list of chrecs.  Access functions of the indices.  */
   vectree access_fns;
-
-  /* Whether BASE_OBJECT is an access representing the whole object
- or whether the access could not be constrained.  */
-  bool unconstrained_base;
 };
 
 struct dr_alias
@@ -195,7 +191,6 @@ struct data_reference
 #define DR_STMT(DR)(DR)-stmt
 #define DR_REF(DR) (DR)-ref
 #define DR_BASE_OBJECT(DR) (DR)-indices.base_object
-#define DR_UNCONSTRAINED_BASE(DR)  (DR)-indices.unconstrained_base
 #define DR_ACCESS_FNS(DR) (DR)-indices.access_fns
 #define DR_ACCESS_FN(DR, I)DR_ACCESS_FNS (DR)[I]
 #define DR_NUM_DIMENSIONS(DR)  DR_ACCESS_FNS (DR).length ()
Index: gcc/testsuite/gcc.dg/torture/pr62031.c
===
--- gcc/testsuite/gcc.dg/torture/pr62031.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr62031.c  (working copy)
@@ -0,0 +1,52 @@
+/* { dg-do run } */
+
+#include stdlib.h
+
+#define NUM_OF_STATES 4
+typedef unsigned int entry_t[2];
+typedef struct entries_item { entry_t 

Re: [PATCH PR62011]

2014-08-14 Thread Yuri Rumyantsev
For example, for the first loop for attached test-case we do not
prepend xor to popcnt because of using destination register:

.L23:
leal 1(%rdx), %ecx
popcntq (%rbx,%rax,8), %rax
leal 2(%rdx), %r8d
popcntq (%rbx,%rcx,8), %rcx
addq %rax, %rcx
leal 3(%rdx), %esi
xorq %rax, %rax
popcntq (%rbx,%r8,8), %rax
addq %rax, %rcx
xorq %rax, %rax
popcntq (%rbx,%rsi,8), %rax
addq %rax, %rcx
leal 4(%rdx), %eax
addq %rcx, %r14
movq %rax, %rdx
cmpq %rax, %r12
ja .L23

2014-08-14 18:06 GMT+04:00 Ilya Enkovich enkovich@gmail.com:
 2014-08-14 18:00 GMT+04:00 Alexander Monakov amona...@ispras.ru:


 On Thu, 14 Aug 2014, Yuri Rumyantsev wrote:

 Hi All,

 Here is a fix for PR 62011 - remove false dependency for unary
 bit-manipulation instructions for latest BigCore chips (Sandybridge
 and Haswell) by outputting in assembly file zeroing destination
 register before bmi instruction. I checked that performance restored
 for popcnt, lzcnt and tzcnt instructions.

 I am not an x86 reviewer, but one thing looks a bit superfluous to me:

 +/* Retirn true if we need to insert before bit-manipulation instruction
 note typo^

 +   zeroing of its destination register.  */
 +bool
 +ix86_avoid_false_dep_for_bm (rtx insn, rtx operands[])
 +{
 +  unsigned int regno0;
 +  df_ref use;
 +  if (!TARGET_AVOID_FALSE_DEP_FOR_BM || optimize_function_for_size_p 
 (cfun))
 +return false;
 +  regno0 = true_regnum (operands[0]);
 +  /* Check if insn does not use REGNO0.  */
 +  FOR_EACH_INSN_USE (use, insn)
 +if (regno0 == DF_REF_REGNO (use))
 +  return false;
 +  return true;
 +}

 The loop is to prevent adding the xor when the dest operand is also the 
 source
 operand.  Looks like a simpler reg_or_subregno (operands[0]) ==
 reg_or_subregno (operands[1]) could be used here, as long as the assumption
 that this is called only for two-operand instruction holds?

 This wouldn't cover memory operand case.

 Ilya


 Alexander


Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-08-14 Thread H.J. Lu
On Thu, Aug 14, 2014 at 1:08 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Ping.

 On Thu, Jul 10, 2014 at 7:29 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 On Mon, Jul 7, 2014 at 6:40 PM, Richard Henderson r...@redhat.com wrote:
 On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote:
 -expand_vec_perm_palignr (struct expand_vec_perm_d *d)
 +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num)

 insn_num might as well be bool avx2, since it's only ever set to two 
 values.

 Agree. However:
  after the alignment, one operand permutation could be just move and
 take 2 instructions for AVX2 as well
  for AVX2 there could be other cases when the scheme takes 4 or 5 
 instructions
  we can leave it for potential avx512 extension


 -  /* Even with AVX, palignr only operates on 128-bit vectors.  */
 -  if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16)
 +  /* SSSE3 is required to apply PALIGNR on 16 bytes operands.  */
 +  if (GET_MODE_SIZE (d-vmode) == 16)
 +{
 +  if (!TARGET_SSSE3)
 +   return false;
 +}
 +  /* AVX2 is required to apply PALIGNR on 32 bytes operands.  */
 +  else if (GET_MODE_SIZE (d-vmode) == 32)
 +{
 +  if (!TARGET_AVX2)
 +   return false;
 +}
 +  /* Other sizes are not supported.  */
 +  else
  return false;

 And you'd better check it up here because...


 Correct. The following should resolve the issue:
   /* For AVX2 we need more than 2 instructions when the alignment
  by itself does not produce the desired permutation.  */
   if (TARGET_AVX2  insn_num = 2)
 return false;

 +  /* For SSSE3 we need 1 instruction for palignr plus 1 for one
 + operand permutaoin.  */
 +  if (insn_num == 2)
 +{
 +  ok = expand_vec_perm_1 (dcopy);
 +  gcc_assert (ok);
 +}
 +  /* For AVX2 we need 2 instructions for the shift: vpalignr and
 + vperm plus 4 instructions for one operand permutation.  */
 +  else if (insn_num == 6)
 +{
 +  ok = expand_vec_perm_vpshufb2_vpermq (dcopy);
 +  gcc_assert (ok);
 +}
 +  else
 +ok = false;
return ok;

 ... down here you'll simply ICE from the gcc_assert.


Can you modify your patch to fix

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62128

with a testcase?


-- 
H.J.


Re: [PATCH PR62011]

2014-08-14 Thread Uros Bizjak
On Thu, Aug 14, 2014 at 4:06 PM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-08-14 18:00 GMT+04:00 Alexander Monakov amona...@ispras.ru:


 On Thu, 14 Aug 2014, Yuri Rumyantsev wrote:

 Hi All,

 Here is a fix for PR 62011 - remove false dependency for unary
 bit-manipulation instructions for latest BigCore chips (Sandybridge
 and Haswell) by outputting in assembly file zeroing destination
 register before bmi instruction. I checked that performance restored
 for popcnt, lzcnt and tzcnt instructions.

 I am not an x86 reviewer, but one thing looks a bit superfluous to me:

 +/* Retirn true if we need to insert before bit-manipulation instruction
 note typo^

 +   zeroing of its destination register.  */
 +bool
 +ix86_avoid_false_dep_for_bm (rtx insn, rtx operands[])
 +{
 +  unsigned int regno0;
 +  df_ref use;
 +  if (!TARGET_AVOID_FALSE_DEP_FOR_BM || optimize_function_for_size_p 
 (cfun))
 +return false;
 +  regno0 = true_regnum (operands[0]);
 +  /* Check if insn does not use REGNO0.  */
 +  FOR_EACH_INSN_USE (use, insn)
 +if (regno0 == DF_REF_REGNO (use))
 +  return false;
 +  return true;
 +}

 The loop is to prevent adding the xor when the dest operand is also the 
 source
 operand.  Looks like a simpler reg_or_subregno (operands[0]) ==
 reg_or_subregno (operands[1]) could be used here, as long as the assumption
 that this is called only for two-operand instruction holds?

 This wouldn't cover memory operand case.

You should use reg_mentioned_p, like

if (!reg_mentioned_p (operands[0], operands[1])
  output_asm_insn (xor{l}\t%k0,%k0, operands);
...

Uros.


Re: [PATCH PR62011]

2014-08-14 Thread H.J. Lu
On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Hi All,

 Here is a fix for PR 62011 - remove false dependency for unary
 bit-manipulation instructions for latest BigCore chips (Sandybridge
 and Haswell) by outputting in assembly file zeroing destination
 register before bmi instruction. I checked that performance restored
 for popcnt, lzcnt and tzcnt instructions.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?

 gcc/ChangeLog
 2014-08-14  Yuri Rumyantsev  ysrum...@gmail.com

 PR target/62011
 * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function
  prototype.
 * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function.
 * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros.
 * config/i386/i386.md (ctzmode2, clzmode2_lzcnt, popcountmode2,
  *popcountmode2_cmp, *popcountsi2_cmp_zext): Output zeroing
  destination register for unary bit-manipulation instructions
  if required.

Why don't you use splitter to to generate XOR?

 * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New.

Is this needed for r16 and r32?  The original report says that only
r64 is affected:

http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance

Have you tried this on Silvermont?  Does it help Silvermont?

-- 
H.J.


Re: [PATCH testcase]fix failure of g++.dg/ext/arm-fp16/fp16-mangle-1.C

2014-08-14 Thread Jason Merrill

On 08/14/2014 04:31 AM, Bin Cheng wrote:

g++.dg/ext/arm-fp16/fp16-mangle-1.C is failed because GCC now sets
DECL_COMDAT on template instantiations if flag_implicit_templates is in
effect.  Then DECL_WEAK will be set accordingly.   As a result, checking for
.global  would fail on this case.


But i in this testcase is an explicit specialization, not an 
instantiation, so this is a bug.


Jason



Re: [PATCH 2/3]Improve induction variable elimination

2014-08-14 Thread Sebastian Pop
Bin.Cheng wrote:
  The overflow check can be improved by using deeper inspection to prove the
  equality.  This patch deals with that by making below two improvements:
a) Handles constant cases.
b) Uses affine expansion as deeper inspection to check the equality.

Looks good to me.

Thanks,
Sebastian



Re: [PATCH PR62011]

2014-08-14 Thread Yuri Rumyantsev
It does not help Silvermont, i.e. only Haswell and SandyBridge are affected.
I don't use splitter since (1) it deletes zeroing of dest reg; (2)
scheduler can hoist them up . I will try r16/r32 variants and tell you
later.

2014-08-14 19:18 GMT+04:00 H.J. Lu hjl.to...@gmail.com:
 On Thu, Aug 14, 2014 at 4:50 AM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Hi All,

 Here is a fix for PR 62011 - remove false dependency for unary
 bit-manipulation instructions for latest BigCore chips (Sandybridge
 and Haswell) by outputting in assembly file zeroing destination
 register before bmi instruction. I checked that performance restored
 for popcnt, lzcnt and tzcnt instructions.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?

 gcc/ChangeLog
 2014-08-14  Yuri Rumyantsev  ysrum...@gmail.com

 PR target/62011
 * config/i386/i386-protos.h (ix86_avoid_false_dep_for_bm): New function
  prototype.
 * config/i386/i386.c (ix86_avoid_false_dep_for_bm): New function.
 * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BM) New macros.
 * config/i386/i386.md (ctzmode2, clzmode2_lzcnt, popcountmode2,
  *popcountmode2_cmp, *popcountsi2_cmp_zext): Output zeroing
  destination register for unary bit-manipulation instructions
  if required.

 Why don't you use splitter to to generate XOR?

 * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BM): New.

 Is this needed for r16 and r32?  The original report says that only
 r64 is affected:

 http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance

 Have you tried this on Silvermont?  Does it help Silvermont?

 --
 H.J.


Re: [PATCH] Fix find_inc in the scheduler (PR target/62025)

2014-08-14 Thread Jakub Jelinek
On Thu, Aug 14, 2014 at 11:49:37AM +0200, Jakub Jelinek wrote:
 On Thu, Aug 14, 2014 at 11:34:04AM +0200, Jakub Jelinek wrote:
  So, to set DEP_MULTIPLE even in the case where ask_depencency_caches
  returns DEP_PRESENT, you'd need to find the old dependency anyway (isn't
  that going to be expensive and totally kill all the effects of
  true_dependency_cache?) and set DEP_MULTIPLE on it if not already set.
 
 Something like (untested except for the openssl s390 testcase), haven't
 checked what effects it has on the number of successful find_inc cases.

Here is the same thing with ChangeLog entry, after bootstrap/regtest
on x86_64-linux and i686-linux.

With extra instrumentation (pretty much not removing the find_inc hunk and
adding the patch from yesterday too + statistics accumulation) shows
comparable amount of successful parse_add_or_inc in find_inc (around 1.2M), but 
to my
surprise still about 13000 cases where DEP_MULTIPLE is not set, yet there
are clearly more than one dependency between mem_inc and inc_insn.

The only testcase I've looked so far has been
-m32 -O3 -fomit-frame-pointer -funroll-loops -funroll-all-loops 990517-1.c
and what I see there is we have
sp += 24, clobber flags;
...
sp += 16, clobber flags;
...
flags = ([sp+1096]  eax) != 0;
The dep in between the middle insn and last insn has DEP_MULTIPLE set,
one dep there has been for sp setter vs. sp user and the other for
clobber flags vs. flags setter.  But for the first insn and third insn pair,
only the clobber flags vs. flags setter dependency has been recorded,
presumably the one on sp not, because there is some other setter after that.
I hope the scheduler doesn't attempt to swap sp += 24 with flags setter
because of the sp += 16 vs. flags setter dependency and sp += 24 vs. sp +=
16 dependency, but I feel kind of uneasy with find_inc assuming the recorded
dependency is the one for the mem_reg0, when in this case the dependency is
there for completely different register.

2014-08-14  Jakub Jelinek  ja...@redhat.com

PR target/62025
* sched-deps.c (add_or_update_dep_1): If ask_dependency_caches
returned DEP_PRESENT, make sure to set DEP_MULTIPLE on present_dep.
(find_inc): Revert 2014-08-13 change.

--- gcc/sched-deps.c.jj 2014-08-12 17:06:26.0 +0200
+++ gcc/sched-deps.c2014-08-14 11:46:25.509631874 +0200
@@ -1233,6 +1233,13 @@ add_or_update_dep_1 (dep_t new_dep, bool
   switch (ask_dependency_caches (new_dep))
{
case DEP_PRESENT:
+ dep_t present_dep;
+ sd_iterator_def sd_it;
+  
+ present_dep = sd_find_dep_between_no_cache (DEP_PRO (new_dep),
+ DEP_CON (new_dep),
+ resolved_p, sd_it);
+ DEP_MULTIPLE (present_dep) = 1;
  return DEP_PRESENT;
 
case DEP_CHANGED:
@@ -4752,23 +4759,6 @@ find_inc (struct mem_inc_info *mii, bool
goto next;
  }
 
- /* The inc instruction could have clobbers, make sure those
-registers are not used in mem insn.  */
- FOR_EACH_INSN_DEF (def, mii-inc_insn)
-   if (!reg_overlap_mentioned_p (DF_REF_REG (def), mii-mem_reg0))
- {
-   df_ref use;
-   FOR_EACH_INSN_USE (use, mii-mem_insn)
- if (reg_overlap_mentioned_p (DF_REF_REG (def),
-  DF_REF_REG (use)))
-   {
- if (sched_verbose = 5)
-   fprintf (sched_dump,
-inc clobber used in store failure.\n);
- goto next;
-   }
- }
-
  newaddr = mii-inc_input;
  if (mii-mem_index != NULL_RTX)
newaddr = gen_rtx_PLUS (GET_MODE (newaddr), newaddr,


Jakub


Re: [PATCH] Fix find_inc in the scheduler (PR target/62025)

2014-08-14 Thread Bernd Schmidt

On 08/14/2014 05:50 PM, Jakub Jelinek wrote:

I hope the scheduler doesn't attempt to swap sp += 24 with flags setter
because of the sp += 16 vs. flags setter dependency and sp += 24 vs. sp +=
16 dependency, but I feel kind of uneasy with find_inc assuming the recorded
dependency is the one for the mem_reg0, when in this case the dependency is
there for completely different register.


Let me think about that for a while. Thanks for debugging the cache problem.


Bernd





[linaro/gcc-4_9-branch] Merge from gcc-4_8-branch and backports

2014-08-14 Thread Yvan Roux
Hi all

we have merged the gcc-4_8-branch into linaro/gcc-4_8-branch up to
revision 213802 as r213944.  We have also backported this set of revisions:

r204251 as r213841  PR sanitizer/58543
r206529 as r213842  PR target/59744
r206530 as r213842  PR target/59744 / fix changelog typo

This will be part of our 2014.08 4.8 release.

Thanks,
Yvan


Re: RFC: Patch for switch elimination (PR 54742)

2014-08-14 Thread Jeff Law

On 08/14/14 04:32, Richard Biener wrote:

You'll note in a separate thread Steve and I discussed this during Cauldron
and it was at my recommendation Steve resurrected his proof of concept
plugin and started beating it into shape.


But do we really want a pass just to help coremark?
And that's the biggest argument against Steve's work.  In theory it 
should be applicable to other FSMs, but nobody's come forth with 
additional testcases from real world applications.


jeff


[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and backports

2014-08-14 Thread Yvan Roux
Hi all

we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to
revision 213803 as r213943.  We have also backported this set of revisions:

r211140 as r213455  [AArch64] Drop ISB after FPCR write.
r211270 as r213790  [AArch64] Remove from arm_neon.h functions not
in the spec
r211271 as r213791  Fix check for __FAST_MATH in arm_neon.h
r211273 as r213792  [Patch,testsuite] Fix bind_pic_locally
r211275 as r213792  [Patch,testsuite] Fix tests that fail due to
symbol visibility when -fPIC
r211503 as r213793  [AArch64] fix and enable non-const shuffle for
bigendian using TBL instruction
r211779 as r213793  [AArch64][committed] Delete unused variable i
r212023 as r213794  [AArch64] Fix constraint vec_unpack_trunk
r212024 as r213795  [AArch32] Cortex-A5 rtx costs table
r212142 as r213796  [AArch32] Handle clz, rbit types in arm
pipeline descriptions
r212225 as r213797  [AArch64] Fix argument types for some
high_lane* intrinsics implemented in assembly
r212296 as r213798  [AArch64] Handle fcvta[su] and frint in RTX
cost function
r212358 as r213799  [AArch64] Relocate saved_varargs_size.
r212512 as r213799  [AArch64] Restructure callee save slot allocation logic.
r212752 as r213799  Unify slots x29/x30
r212753 as r213799  [AArch64] Add frame_size and hard_fp_offset to
machine.frame
r212912 as r213799  [AArch64] GNU-Stylize some un-formatted code.
r212913 as r213799  [AArch64] Consistent parameter types in
prologue/epilogue generation.
r212943 as r213799  [AArch64] Remove useless local variable.
r212945 as r213799  [AArch64] Remove useless parameter base_rtx.
r212946 as r213799  [AArch64] Use register offset in
cfun-machine-frame.reg_offset
r212947 as r213799  [AArch64] Remove useless variable 'increment'
r212949 as r213799  [AArch64] Hoist calculation of register rtx.
r212950 as r213799  [AArch64] Refactor code out into
aarch64_next_callee_save
r212951 as r213799  [AArch64] Use helper functions to handle multiple modes.
r212952 as r213799  [AArch64] Unify vector and core register
save/restore code.
r212954 as r213799  [AArch64] Split save restore path.
r212955 as r213799  [AArch64] Simplify prologue expand using new
helper functions.
r212956 as r213799  [AArch64] Simplify epilogue expansion using
new helper functions.
r212957 as r213799  [AArch64] Prologue and epilogue test cases.
r212958 as r213799  [AArch64] Optimize epilogue in the presence of
an outgoing args area.
r212959 as r213799  [AArch64] Extend frame state to track WB candidates.
r212976 as r213799  [AArch64] Infrastructure for optional use of writeback
r212996 as r213799  [AArch64] Optimize prologue when there is no
frame pointer.
r212997 as r213799  [AArch64] Optimize epilogue when there is no frame
r212999 as 213800   [PATCH, ARM] Fix PR61948 (ICE with DImode
shift by 1 bit)
r213000 as r213801  PR 61713: ICE when expanding single-threaded
version of atomic_test_and_set
r213376 as r213817  [AArch64][1/2] Remove UNSPEC_CLS and use clrsb
RTL code in its' place
r213555 as r213817  [AArch64][2/2] Add rtx cost function handling
of clz, clrsb, rbit

This will be part of our 2014.08 4.9 release.

Thanks,
Yvan


Re: [C++ Patch] PR 54377

2014-08-14 Thread Paolo Carlini

Hi,

On 08/14/2014 04:18 PM, Jason Merrill wrote:

On 08/14/2014 07:14 AM, Paolo Carlini wrote:

+  nparms -= variadic_p ? variadic_p : default_p;


What if you have both default arguments and parameter packs?

Right. Got distracted by the minor secondary issues...

It seems to me that simply adding the counters works, ie, in the 
additional cpp0x testcase we thus say 'at least 2', instead of a wrong 
'at least 3'.


I'm finishing testing the below.

Thanks,
Paolo.

/
Index: cp/pt.c
===
--- cp/pt.c (revision 213952)
+++ cp/pt.c (working copy)
@@ -6861,19 +6861,24 @@ coerce_template_parms (tree parms,
   int variadic_args_p = 0;
   int post_variadic_parms = 0;
 
+  /* Likewise for parameters with default arguments.  */
+  int default_p = 0;
+
   if (args == error_mark_node)
 return error_mark_node;
 
   nparms = TREE_VEC_LENGTH (parms);
 
-  /* Determine if there are any parameter packs.  */
+  /* Determine if there are any parameter packs or default arguments.  */
   for (parm_idx = 0; parm_idx  nparms; ++parm_idx)
 {
-  tree tparm = TREE_VALUE (TREE_VEC_ELT (parms, parm_idx));
+  tree parm = TREE_VEC_ELT (parms, parm_idx);
   if (variadic_p)
++post_variadic_parms;
-  if (template_parameter_pack_p (tparm))
+  if (template_parameter_pack_p (TREE_VALUE (parm)))
++variadic_p;
+  if (TREE_PURPOSE (parm))
+   ++default_p;
 }
 
   inner_args = orig_inner_args = INNERMOST_TEMPLATE_ARGS (args);
@@ -6902,11 +6907,11 @@ coerce_template_parms (tree parms,
 {
   if (complain  tf_error)
{
-  if (variadic_p)
+  if (variadic_p || default_p)
 {
-  nparms -= variadic_p;
+  nparms -= variadic_p + default_p;
  error (wrong number of template arguments 
-(%d, should be %d or more), nargs, nparms);
+(%d, should be at least %d), nargs, nparms);
 }
  else
 error (wrong number of template arguments 
@@ -6913,7 +6918,7 @@ coerce_template_parms (tree parms,
(%d, should be %d), nargs, nparms);
 
  if (in_decl)
-   error (provided for %q+D, in_decl);
+   inform (input_location, provided for %q+D, in_decl);
}
 
   return error_mark_node;
Index: testsuite/g++.dg/cpp0x/alias-decl-2.C
===
--- testsuite/g++.dg/cpp0x/alias-decl-2.C   (revision 213952)
+++ testsuite/g++.dg/cpp0x/alias-decl-2.C   (working copy)
@@ -22,7 +22,7 @@ templateclass T using Vec = VectorT, AllocT 
 
 templateclass T void g(VectorT, AllocT );
 
-templatetemplateclass T class TT void h(TTint); // { dg-error provided 
for }
+templatetemplateclass T class TT void h(TTint); // { dg-message 
provided for }
 
 void
 bar()
Index: testsuite/g++.dg/cpp0x/pr51226.C
===
--- testsuite/g++.dg/cpp0x/pr51226.C(revision 213952)
+++ testsuite/g++.dg/cpp0x/pr51226.C(working copy)
@@ -1,7 +1,7 @@
 // PR c++/51226
 // { dg-do compile { target c++11 } }
 
-templateint struct A   // { dg-error provided }
+templateint struct A   // { dg-message provided }
 {
   enum E : int;
 };
Index: testsuite/g++.dg/cpp0x/pr54377.C
===
--- testsuite/g++.dg/cpp0x/pr54377.C(revision 0)
+++ testsuite/g++.dg/cpp0x/pr54377.C(working copy)
@@ -0,0 +1,7 @@
+// PR c++/54377
+// { dg-do compile { target c++11 } }
+
+template typename, typename, typename = void, typename...
+struct foo {};  // { dg-message provided for }
+
+fooint f; // { dg-error at least 2 }
Index: testsuite/g++.dg/cpp0x/variadic2.C
===
--- testsuite/g++.dg/cpp0x/variadic2.C  (revision 213952)
+++ testsuite/g++.dg/cpp0x/variadic2.C  (working copy)
@@ -6,9 +6,9 @@ templatetypename... = int // { dg-error default
 class tuple3;
 
 templatetypename T1, typename T2, typename... Rest
-struct two_or_more {}; // { dg-error provided for }
+struct two_or_more {}; // { dg-message provided for }
 
-typedef two_or_moreint bad; // { dg-error 2 or more 2 or more }
+typedef two_or_moreint bad; // { dg-error at least 2 at least 2 }
 
 void f()
 {
Index: testsuite/g++.dg/parse/too-many-tmpl-args1.C
===
--- testsuite/g++.dg/parse/too-many-tmpl-args1.C(revision 213952)
+++ testsuite/g++.dg/parse/too-many-tmpl-args1.C(working copy)
@@ -2,7 +2,7 @@
 // Origin: Wolfgang Bangerth bange...@ticam.utexas.edu
 // { dg-do compile }
 
-template typename T class A // { dg-error  
}
+template typename T class A   // { dg-message  
}
 {
 struct B;
 template typename U friend 

Re: RFC: Patch for switch elimination (PR 54742)

2014-08-14 Thread Jeff Law

On 08/14/14 10:12, David Malcolm wrote:

On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:

On 08/14/14 04:32, Richard Biener wrote:

You'll note in a separate thread Steve and I discussed this during Cauldron
and it was at my recommendation Steve resurrected his proof of concept
plugin and started beating it into shape.


But do we really want a pass just to help coremark?

And that's the biggest argument against Steve's work.  In theory it
should be applicable to other FSMs, but nobody's come forth with
additional testcases from real world applications.


Maybe a regex library?  Perhaps:
http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?
The key is that at least some states tell you at compile time what state 
you'll be in during the next loop iteration.  Thus instead of coming 
around the loop, evaluating the switch condition, then doing the 
multi-way branch, we just directly jump to the case for the next iteration.


I've never looked at the PCRE code to know if it's got cases like that.

jeff


Re: [PATCH] Fix PR62077

2014-08-14 Thread Jason Merrill

On 08/14/2014 05:07 AM, Richard Biener wrote:

So - can you take over this C++ frontend issue?


OK.

Jason




Re: RFC: Patch for switch elimination (PR 54742)

2014-08-14 Thread David Malcolm
On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:
 On 08/14/14 04:32, Richard Biener wrote:
  You'll note in a separate thread Steve and I discussed this during Cauldron
  and it was at my recommendation Steve resurrected his proof of concept
  plugin and started beating it into shape.
 
  But do we really want a pass just to help coremark?
 And that's the biggest argument against Steve's work.  In theory it 
 should be applicable to other FSMs, but nobody's come forth with 
 additional testcases from real world applications.

Maybe a regex library?  Perhaps:
http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?


Dave




Re: [PATCH 000/236] Introduce rtx subclasses

2014-08-14 Thread David Malcolm
On Wed, 2014-08-13 at 20:13 -0400, David Malcolm wrote:
 On Wed, 2014-08-06 at 13:19 -0400, David Malcolm wrote:
  This is the patch series I spoke about at Cauldron in the talk
  A proposal for typesafe RTL; slides here:
  http://dmalcolm.fedorapeople.org/presentations/cauldron-2014/rtl
  
  They can also be seen at:
  https://dmalcolm.fedorapeople.org/gcc/patch-backups/rtx-classes/v20/
 
 [...snip...]
 
  The patches (and my control for testing) has been on top of r211061, which
  being from 2014-05-29 is now two months old, but hopefully is still
  reviewable; naturally I'll perform bootstrap and regression testing for
  each patch in turn before committing.
 
 FWIW I've now rebased the patches against today's trunk (specifically
 r213914) and I've backed up the results to:
 https://dmalcolm.fedorapeople.org/gcc/patch-backups/rtx-classes/v21/
 
 Still TODO:
   * rename as_a_nullable to safe_as_a
   * eliminate rtx_real_insn
   * make various scaffolding functions be inline, updating gdbinit.in as
 appropriate.
 
 [...snip...]

Likewise; latest WIP:
https://dmalcolm.fedorapeople.org/gcc/patch-backups/rtx-classes/v22/
has:
  * the renaming of as_a_nullable to safe_as_a
  * the elimination of rtx_real_insn

Still TODO:
* make various scaffolding functions be inline, updating gdbinit.in as




[PATCH] Handle -fsanitize=leak more similarly to address/thread

2014-08-14 Thread Jakub Jelinek
Hi!

Right now when -fsanitize=leak adds -llsan, it adds it late on the command
line, so e.g. -lstdc++ comes after it, which seems to be bad.
The following patch puts it early on the link command line like we do for
-lasan or -ltsan.  Bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2014-08-14  Jakub Jelinek  ja...@redhat.com

* config/gnu-user.h (LIBLSAN_EARLY_SPEC): Define.
* gcc.c (LIBLSAN_SPEC, LIBLSAN_EARLY_SPEC): Follow LIBTSAN*_SPEC.
(SANITIZER_EARLY_SPEC): Include LIBLSAN_EARLY_SPEC for -fsanitize=leak.

--- gcc/config/gnu-user.h.jj2014-01-03 11:41:11.0 +0100
+++ gcc/config/gnu-user.h   2014-08-11 13:42:24.187122214 +0200
@@ -114,7 +114,8 @@ see the files COPYING3 and COPYING.RUNTI
 /* Link -lasan early on the command line.  For -static-libasan, don't link
it for -shared link, the executable should be compiled with -static-libasan
in that case, and for executable link link with --{,no-}whole-archive around
-   it to force everything into the executable.  And similarly for -ltsan.  */
+   it to force everything into the executable.  And similarly for -ltsan
+   and -llsan.  */
 #if defined(HAVE_LD_STATIC_DYNAMIC)
 #undef LIBASAN_EARLY_SPEC
 #define LIBASAN_EARLY_SPEC %{!shared:libasan_preinit%O%s}  \
@@ -125,4 +126,8 @@ see the files COPYING3 and COPYING.RUNTI
 #define LIBTSAN_EARLY_SPEC %{static-libtsan:%{!shared: \
   LD_STATIC_OPTION  --whole-archive -ltsan --no-whole-archive  \
   LD_DYNAMIC_OPTION }}%{!static-libtsan:-ltsan}
+#undef LIBLSAN_EARLY_SPEC
+#define LIBLSAN_EARLY_SPEC %{static-liblsan:%{!shared: \
+  LD_STATIC_OPTION  --whole-archive -llsan --no-whole-archive  \
+  LD_DYNAMIC_OPTION }}%{!static-liblsan:-llsan}
 #endif
--- gcc/gcc.c.jj2014-08-01 09:23:35.0 +0200
+++ gcc/gcc.c   2014-08-11 13:45:27.994253297 +0200
@@ -576,13 +576,19 @@ proper position among the other output f
 #ifndef LIBLSAN_SPEC
 #define STATIC_LIBLSAN_LIBS \
%{static-liblsan:%:include(libsanitizer.spec)%(link_liblsan)}
-#ifdef HAVE_LD_STATIC_DYNAMIC
-#define LIBLSAN_SPEC %{!shared:%{static-liblsan: LD_STATIC_OPTION \
+#ifdef LIBLSAN_EARLY_SPEC
+#define LIBLSAN_SPEC STATIC_LIBLSAN_LIBS
+#elif defined(HAVE_LD_STATIC_DYNAMIC)
+#define LIBLSAN_SPEC %{static-liblsan: LD_STATIC_OPTION \
 } -llsan %{static-liblsan: LD_DYNAMIC_OPTION } \
-STATIC_LIBLSAN_LIBS }
+STATIC_LIBLSAN_LIBS
 #else
-#define LIBLSAN_SPEC %{!shared:-llsan STATIC_LIBLSAN_LIBS }
+#define LIBLSAN_SPEC -llsan STATIC_LIBLSAN_LIBS
+#endif
 #endif
+
+#ifndef LIBLSAN_EARLY_SPEC
+#define LIBLSAN_EARLY_SPEC 
 #endif
 
 #ifndef LIBUBSAN_SPEC
@@ -772,7 +778,8 @@ proper position among the other output f
 #ifndef SANITIZER_EARLY_SPEC
 #define SANITIZER_EARLY_SPEC \
 %{!nostdlib:%{!nodefaultlibs:%{%:sanitize(address): LIBASAN_EARLY_SPEC } \
-%{%:sanitize(thread): LIBTSAN_EARLY_SPEC }}}
+%{%:sanitize(thread): LIBTSAN_EARLY_SPEC } \
+%{%:sanitize(leak): LIBLSAN_EARLY_SPEC }}}
 #endif
 
 /* Linker command line options for -fsanitize= late on the command line.  */

Jakub


Re: [C++ Patch] PR 54377

2014-08-14 Thread Jason Merrill

OK.

Jason


[committed] Two !$omp declare simd fixes (PR fortran/62076)

2014-08-14 Thread Jakub Jelinek
Hi!

I've committed following fix for two issues revealed e.g. by valgrind
on some of the udr*.f90 testcases.
buffer could be uninitialized, and gfc_free_omp_udr could free symbols
in the combiner_ns (or initializer_ns) when freeing those whole namespaces,
so if some symbols were queued for commit/undo, it is bad to first free
the symbols and then try to undo them.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
and 4.9 branch.

2014-08-14  Jakub Jelinek  ja...@redhat.com

PR fortran/62076
* openmp.c (gfc_match_omp_clauses): When failed to match
operator name, defined op name or name, set buffer to
empty string.  Don't call gfc_find_omp_udr if buffer is empty
string.
(gfc_match_omp_declare_reduction): Call gfc_undo_symbols ()
before calling gfc_free_omp_udr.

--- gcc/fortran/openmp.c.jj 2014-08-12 15:42:44.239327384 +0200
+++ gcc/fortran/openmp.c2014-08-14 12:17:20.554565218 +0200
@@ -464,7 +464,11 @@ gfc_match_omp_clauses (gfc_omp_clauses *
  || !gfc_add_intrinsic (sym-attr, NULL)))
rop = OMP_REDUCTION_NONE;
}
- gfc_omp_udr *udr = gfc_find_omp_udr (gfc_current_ns, buffer, NULL);
+ else
+   buffer[0] = '\0';
+ gfc_omp_udr *udr
+   = (buffer[0]
+  ? gfc_find_omp_udr (gfc_current_ns, buffer, NULL) : NULL);
  gfc_omp_namelist **head = NULL;
  if (rop == OMP_REDUCTION_NONE  udr)
rop = OMP_REDUCTION_USER;
@@ -1240,6 +1244,7 @@ gfc_match_omp_declare_reduction (void)
 syntax:
  gfc_current_locus = old_loc;
  gfc_current_ns = combiner_ns-parent;
+ gfc_undo_symbols ();
  gfc_free_omp_udr (omp_udr);
  return MATCH_ERROR;
}

Jakub


Re: [PATCH] Asan static optimization (draft)

2014-08-14 Thread Konstantin Serebryany
Indeed, thanks for working on this.
We've been wanting such optimization phase from day one, but never got
to implementing it (except for a few simple ones).
https://code.google.com/p/address-sanitizer/wiki/CompileTimeOptimizations
There have been several attempts outside of our team to do such
optimizations, but none of them made it to llvm trunk.

In order for your work to be generally useful, I'd ask several things:
- Update 
https://code.google.com/p/address-sanitizer/wiki/CompileTimeOptimizations
with examples that will be handled
- Create small standalone test cases in C
- Don't put the tests under GPL (otherwise we'll not be able to reuse
them in LLVM)

 make sure that sanopt performs conservative optimization
Yes. I don't know a good solution to this problem, which is why we did
not attack it before.
Increasing asan speed will save lots of CPU time, but missing a single software
bug due to an overly aggressive optimization may cost much more.

We can do a detailed analysis of *simple* peephole-like optimizers,
but *proving* that a complex optimizer does not remove needed checks
is hardly practically possible.


On Fri, Aug 8, 2014 at 3:26 AM, Yury Gribov y.gri...@samsung.com wrote:
 Hi all,

 I have been working on Asan global optimization pass lately. The goal is to
 remove redundant Asan checks from sanitized code. This should hopefully
 reduce Asan's speed/size overhead (which is currently ~2x). The patch is not
 yet ready for trunk (e.g. I haven't done bootstrap, etc. but Asan testsuite
 passed wo errors) but I thought I'd send it for preliminary review of
 algorithm and data structures (*).

 Current implementation (based on existing sanopt pass) uses a simple
 iterative intra-procedural dataflow to compute information about living
 checks. For each pointer we track the size of memory that was already
 checked for it. During dataflow iterations, living checks are merged at
 blocks start in a natural way.

 I decided to keep current (local) Asan optimizations because they reduce
 compilation time by dropping many obviously redundant checks much earlier in
 the compilation pipeline.

 Current results seem to be encouraging: the pass was able to remove 112
 checks (out of 1768) in gcc/asan.c without significant increase in sanopt
 pass runtime.

 Before upstreaming this code, I plan to
 1) develop extensive set of tests to make sure that sanopt performs
 conservative optimization i.e. does not remove checks too agressively (I
 guess this is a critically important prerequisite so any suggestions are
 welcomed)
 2) disable optimizations for very large functions to avoid unbearable
 compile times
 3) do detailed performance and efficiency measuments for Asan-bootstrap

 I also have some ideas for improving this code (and I'm certainly open to
 suggestions from community):
 1) propagating checking information through assignments and PHI-nodes (and
 maybe even pointer arithmetic) should improve efficiency
 2) ditto for optimization of symbolic ranges; actually this looks like a
 relatively easy addition to current code (I just need to keep a list of
 checked symbolic ranges to check_info structure)
 3) in addition to redundant check removal, we could also move duplicate
 checks from e.g. branches of if-statement to their dominators.

 -Y

 (*) The patch relies on some additional changes in hash-table and CFG which
 have not yet been upstreamed so it won't go with current trunk.


Re: [Patch] PR55189 enable -Wreturn-type by default

2014-08-14 Thread Sylvestre Ledru
On 12/08/2014 19:48, Joseph S. Myers wrote:
 On Mon, 11 Aug 2014, Sylvestre Ledru wrote:

 The test Wmissing-return2.c only has one of the two warnings.  But as per 
 -Wreturn-type = Run both, and for backwards compatibility with the 
 existing definition of -Wreturn-type, both warnings should appear for this 
 test.  
 Make sense. Thanks for the feedback and the help.
 Here it is:
 https://github.com/sylvestre/gcc/commit/089ffc9fb85034111b892ee10190dc12b5dbe551
 Yes, those tests seem as I expect.

So, here is the full commit now.
Since the patch is 600k, I uploaded it here:
http://sylvestre.ledru.info/0001-Enable-warning-Wreturn-type-by-default.patch
Let me know when I can commit that.

At the end of this mail, I also added the changelog for
Enable -Wimplicit-int by default
Patch:
http://sylvestre.ledru.info/0002-Enable-Wimplicit-int-by-default.patch
Let me know if you prefer a separate mail.

Everything is on github if it helps:
https://github.com/sylvestre/gcc

gcc/c-family/ChangeLog:

2014-08-13  Sylvestre Ledru  sylves...@debian.org

* c.opt: Enable -Wreturn-type by default
Add -Wmissing-return:
Warn whenever control may reach end of non-void function

gcc/ChangeLog:

2014-08-13  Sylvestre Ledru  sylves...@debian.org

* doc/invoke.texi: Document new flag -Wmissing-return
Update -Wreturn-type
* tree-cfg.c (pass_warn_function_return::execute):
Introduce -Wreturn-type management

gcc/fortran/ChangeLog:

2014-08-13  Sylvestre Ledru  sylves...@debian.org

* options.c (gfc_handle_option): Manage the new flag -Wmissing-return

libgomp/ChangeLog:

2014-08-13  Sylvestre Ledru  sylves...@debian.org

* testsuite/libgomp.c++/loop-2.C: Update the test with -Wreturn-type by
default and -Wmissing-return
* testsuite/libgomp.c++/loop-4.C: likewise
* testsuite/libgomp.c++/parallel-1.C: likewise
* testsuite/libgomp.c++/shared-1.C: likewise
* testsuite/libgomp.c++/single-1.C: likewise
* testsuite/libgomp.c++/single-2.C: likewise
* testsuite/libgomp.c/omp-loop02.c: likewise
* testsuite/libgomp.c/omp-parallel-for.c: likewise
* testsuite/libgomp.c/omp-parallel-if.c: likewise
* testsuite/libgomp.c/omp-single-1.c: likewise
* testsuite/libgomp.c/omp-single-2.c: likewise
* testsuite/libgomp.c/omp_matvec.c: likewise
* testsuite/libgomp.c/omp_workshare3.c: likewise
* testsuite/libgomp.c/omp_workshare4.c: likewise
* testsuite/libgomp.c/pr30494.c (check): likewise
* testsuite/libgomp.c/shared-1.c: likewise

gcc/testsuite/ChangeLog:

2014-08-13  Sylvestre Ledru  sylves...@debian.org

* gcc.dg/Wmissing-return1.c: New test which tests the new behavior
* gcc.dg/Wmissing-return2.c: New test which tests the new behavior
* gcc.dg/Wmissing-return3.c: New test which tests the new behavior
* gcc.dg/Wmissing-return4.c: New test which tests the new behavior
* gcc.dg/Wmissing-return5.c: New test which tests the new behavior
* c-c++-common/asan/no-redundant-instrumentation-2.c (main):
Update the test with -Wreturn-type by default and -Wmissing-return
* c-c++-common/cilk-plus/AN/decl-ptr-colon.c (int main): likewise
* c-c++-common/cilk-plus/AN/parser_errors.c: likewise
* c-c++-common/cilk-plus/AN/parser_errors2.c: likewise
* c-c++-common/cilk-plus/AN/parser_errors3.c: likewise
* c-c++-common/cilk-plus/AN/pr57457-2.c: likewise
* c-c++-common/cilk-plus/AN/pr57541-2.c (void foo1): likewise
(void foo2): likewise
* c-c++-common/cilk-plus/AN/pr57541.c (int foo): likewise
(int foo1): likewise
* c-c++-common/cilk-plus/CK/pr60197.c: likewise
* c-c++-common/cilk-plus/CK/spawn_in_return.c: likewise
* c-c++-common/convert-vec-1.c: likewise
* c-c++-common/dfp/call-by-value.c (int foo32): likewise
(int foo64): likewise
(int foo128): likewise
* c-c++-common/pr36513-2.c (int main2): likewise
* c-c++-common/pr36513.c (int main1): likewise
* c-c++-common/pr43772.c: likewise
* c-c++-common/pr49706-2.c (same): likewise
* c-c++-common/raw-string-3.c: likewise
* c-c++-common/tm/pr54893.c: likewise
* c-c++-common/tm/trxn-expr-2.c: likewise
* c-c++-common/tsan/fd_pipe_race.c: likewise
* c-c++-common/tsan/tls_race.c: likewise
* c-c++-common/vector-1.c (int f): likewise
* c-c++-common/vector-2.c (void f): likewise
* g++.dg/abi/covariant2.C (struct c3): likewise
(struct c7): likewise
* g++.dg/abi/covariant3.C: likewise
* g++.dg/abi/key2.C (int sub): likewise
* g++.dg/abi/mangle7.C: likewise
* g++.dg/bprob/g++-bprob-1.C (call_for): likewise
* g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc: likewise
* g++.dg/conversion/op1.C (class C): likewise
(int fn): likewise
* g++.dg/conversion/op6.C: likewise
* g++.dg/cpp0x/access01.C: likewise
* g++.dg/cpp0x/alias-decl-19.C: likewise
* g++.dg/cpp0x/auto2.C (struct A): likewise
* g++.dg/cpp0x/constexpr-defarg2.C: likewise
* 

Re: [PATCH] Fix PR62081

2014-08-14 Thread Sebastian Pop
Richard Biener wrote:
 
 The following fixes missing dominator computation before fixing loops.
 Rather than doing even more such weird stuff in a pass gate function
 this puts this into a new pass scheduled before the loop passes gate.
 

Ok.

 +unsigned int
 +pass_fix_loops::execute (function *)
 +{

I would add an early exit if there are no loops in the function
(like in the original code below...)

if (!loops_for_fn (fn))
  return 0;

 +  if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
 +{
 +  calculate_dominance_info (CDI_DOMINATORS);
 +  fix_loop_structure (NULL);
 +}
 +  return 0;
 +}

[...]

  /* Gate for loop pass group.  The group is controlled by -ftree-loop-optimize
 but we also avoid running it when the IL doesn't contain any loop.  */
  
 @@ -57,9 +107,6 @@ gate_loop (function *fn)
if (!loops_for_fn (fn))
  return true;

... here.

  
 -  /* Make sure to drop / re-discover loops when necessary.  */
 -  if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
 -fix_loop_structure (NULL);
return number_of_loops (fn)  1;
  }


Re: [PATCH] Fix UB in diagnostic.c

2014-08-14 Thread Marek Polacek
On Wed, Aug 13, 2014 at 09:03:37PM +0200, Manuel López-Ibáñez wrote:
 I don't think this is the right fix. The problem is that we are trying
 to print the caret in a column that is larger than the line_width. We
 do this because the file given by the line directive has nothing to do
 with the actual code we are parsing. I think in that case it is ok to
 just give up and not print a caret. So something like:
 
 @@ -300,11 +299,11 @@ diagnostic_show_locus (diagnostic_contex
  return;
 
context-last_location = diagnostic-location;
s = expand_location_to_spelling_point (diagnostic-location);
line = location_get_source_line (s, line_width);
 -  if (line == NULL)
 +  if (line == NULL || s.column  line_width)
  return;
 
max_width = context-caret_max_width;
line = adjust_line (line, line_width, max_width, (s.column));
 
 Nonetheless, perhaps in addition adjust_line should have
 gcc_checking_assert(line_width = column).

Yea, I think this would be much better, thanks.  Done in the
following.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-08-14  Marek Polacek  pola...@redhat.com
Manuel López-Ibáñez  m...@gcc.gnu.org

PR c/62059
* diagnostic.c (adjust_line): Add gcc_checking_assert.
(diagnostic_show_locus): Don't print caret diagnostic
if a column is larger than the line_width.

diff --git gcc/diagnostic.c gcc/diagnostic.c
index 0cc7593..2109f31 100644
--- gcc/diagnostic.c
+++ gcc/diagnostic.c
@@ -270,6 +270,7 @@ adjust_line (const char *line, int line_width,
   int right_margin = 10;
   int column = *column_p;
 
+  gcc_checking_assert (line_width = column);
   right_margin = MIN (line_width - column, right_margin);
   right_margin = max_width - right_margin;
   if (line_width = max_width  column  right_margin)
@@ -302,7 +303,7 @@ diagnostic_show_locus (diagnostic_context * context,
   context-last_location = diagnostic-location;
   s = expand_location_to_spelling_point (diagnostic-location);
   line = location_get_source_line (s, line_width);
-  if (line == NULL)
+  if (line == NULL || s.column  line_width)
 return;
 
   max_width = context-caret_max_width;

Marek


Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

2014-08-14 Thread Richard Henderson
On 08/13/2014 05:29 AM, Kyrill Tkachov wrote:
 Is the attached patch ok? It just moves the section as you suggested. I did a
 build of the Linux kernel with and without this patch to make sure no code-gen
 was accidentally affected.

Looks good.

 We'd need to store a mapping from constant to RTXes and everytime we have a
 cache hit we'd have to tweak them to make sure the registers involved are
 correct. I had a quick play with this but ended up with LRA ICEs  :(

You mis-understand how I meant to memoize.  Have a look at how we cache
expansions for multiply in synth_mult: we don't record registers or rtx's, but
we do record the operation and arguments.

So you could consider building a trie, indexed by a hash.

struct imm_algorithm
{
  HOST_WIDE_INT op1;
  imm_algorithm *prev;

  enum operation {
// op1 is accepted by aarch64_mov_operand.
// prev should be null.
mov,
// op1 is to be inserted at the given position
// to the value constructed by prev.
movk_48, movk_32, movk_16, movk_0,
// op1 is an arithmetic immediate to be applied
// to the value constructed by prev
add, sub,
// op1 is a logical immediate to be applied to
// the value constructed by prev
and, ior,
  } code;
};


r~




[patch, fortran] Fix PR 62106

2014-08-14 Thread Thomas Koenig
Hello world,

the attached patch fixes the regression by making sure we never
try to create a temporary variable from a temporary variable,
which happened in the wrong order.

Regression-tested.  OK for trunk and 4.9?

2014-08-19  Thomas Koenig  tkoe...@gcc.gnu.org

PR fortran/62106
* gfortran.h (symbol_attribute):  Add fe_temp flag.
* frontend-passes.c (is_fe_temp):  New function.
(create_var):  Don't add a temporary for an already
created variable or for a constant.
(combine_ARRAY_constructor):  Remove special handling
for constants.

2014-08-19  Thomas Koenig  tkoe...@gcc.gnu.org

PR fortran/62106
* gfortran.dg/array_constructor_49.f90:  New test.
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 213778)
+++ frontend-passes.c	(Arbeitskopie)
@@ -430,11 +430,26 @@ cfe_register_funcs (gfc_expr **e, int *walk_subtre
   return 0;
 }
 
+/* Auxiliary function to check if an expression is a temporary created by
+   create var.  */
+
+static bool
+is_fe_temp (gfc_expr *e)
+{
+  if (e-expr_type != EXPR_VARIABLE)
+return false;
+
+  return e-symtree-n.sym-attr.fe_temp;
+}
+
+
 /* Returns a new expression (a variable) to be used in place of the old one,
with an assignment statement before the current statement to set
the value of the variable. Creates a new BLOCK for the statement if
that hasn't already been done and puts the statement, plus the
-   newly created variables, in that block.  */
+   newly created variables, in that block.  Special cases:  If the
+   expression is constant or a temporary which has already
+   been created, just copy it.  */
 
 static gfc_expr*
 create_var (gfc_expr * e)
@@ -448,6 +463,9 @@ create_var (gfc_expr * e)
   gfc_namespace *ns;
   int i;
 
+  if (e-expr_type == EXPR_CONSTANT || is_fe_temp (e))
+return gfc_copy_expr (e);
+
   /* If the block hasn't already been created, do so.  */
   if (inserted_block == NULL)
 {
@@ -522,6 +540,7 @@ create_var (gfc_expr * e)
   symbol-attr.flavor = FL_VARIABLE;
   symbol-attr.referenced = 1;
   symbol-attr.dimension = e-rank  0;
+  symbol-attr.fe_temp = 1;
   gfc_commit_symbol (symbol);
 
   result = gfc_get_expr ();
@@ -1082,10 +1101,7 @@ combine_array_constructor (gfc_expr *e)
   if (op2-ts.type == BT_CHARACTER)
 return false;
 
-  if (op2-expr_type == EXPR_CONSTANT)
-scalar = gfc_copy_expr (op2);
-  else
-scalar = create_var (gfc_copy_expr (op2));
+  scalar = create_var (gfc_copy_expr (op2));
 
   oldbase = op1-value.constructor;
   newbase = NULL;
Index: gfortran.h
===
--- gfortran.h	(Revision 213778)
+++ gfortran.h	(Arbeitskopie)
@@ -739,7 +739,7 @@ typedef struct
 optional:1, pointer:1, target:1, value:1, volatile_:1, temporary:1,
 dummy:1, result:1, assign:1, threadprivate:1, not_always_present:1,
 implied_index:1, subref_array_pointer:1, proc_pointer:1, asynchronous:1,
-contiguous:1;
+contiguous:1, fe_temp: 1;
 
   /* For CLASS containers, the pointer attribute is sometimes set internally
  even though it was not directly specified.  In this case, keep the
! { dg-do run }
! { dg-options -ffrontend-optimize -fdump-tree-original }
! PR 62106 - this used to give wrong results because
! of a bogus extra temporary variable.
! Original test case by Martien Hulsen
program t
  integer :: ndim=2, ndfp=4, i
  character (len=8) :: line
  write (unit=line,fmt='(4I2)'), (/ ( i, i = 1, ndfp ) /) + ndim
  if (line /= ' 3 4 5 6') call abort
end program t
! { dg-final { scan-tree-dump-times __var 3 original } }
! { dg-final { cleanup-tree-dump original } }


Re: RFC: Patch for switch elimination (PR 54742)

2014-08-14 Thread Steve Ellcey
On Wed, 2014-08-13 at 11:52 +0200, Richard Biener wrote:
 On Wed, Aug 13, 2014 at 4:54 AM, Bin.Cheng amker.ch...@gmail.com wrote:
  On Wed, Aug 13, 2014 at 4:40 AM, Jeff Law l...@redhat.com wrote:
  On 08/12/14 14:23, Richard Biener wrote:
  On August 12, 2014 8:31:16 PM CEST, Jeff Law l...@redhat.com wrote:
  On 08/12/14 11:46, Steve Ellcey wrote:
 
  Try setting the header  latch fields for the loop structure to NULL,
  then call loops_set_state (LOOPS_NEED_FIXUP).
 
 
  But that is _not_ the appropriate way of keeping loops preserved!
 
  I think that's done when we've scrogged the loop beyond repair and want the
  structures rebuilt.  IIRC, that's what you recommended we do.

An update on this part of the patch.  I found that just calling
'loops_set_state (LOOPS_NEED_FIXUP)' without setting the header and
latch fields to NULL first is enough to fix the problem I had earlier.
I thought I had tried that before but either I was calling something
else or I had put the call in the wrong place but it is working now.


Steve Ellcey



[C PATCH] Allow ATOMIC_*_LOCK_FREE in #if directive (DR#458)

2014-08-14 Thread Marek Polacek
The DR#458 is about the usage of ATOMIC_*_LOCK_FREE macros defined in
stdatomic.h in the #if directives.  Proposed Technical Corrigendum of
this DR is that these macros should expand to constant expressions
suitable for use in #if preprocessing directives.  This patch does
that by mapping these macros to __GCC_ATOMIC_*_LOCK_FREE variants
defined in c-cppbuiltin.c:cpp_atomic_builtins.  (It's what libsupc++
does.)  The current implementation wouldn't work in #if directives.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2014-08-14  Marek Polacek  pola...@redhat.com

DR 458
* ginclude/stdatomic.h (__atomic_type_lock_free): Remove.
(ATOMIC_*_LOCK_FREE): Map to __GCC_ATOMIC_*_LOCK_FREE.

* gcc.dg/c11-stdatomic-2.c: New test.

diff --git gcc/ginclude/stdatomic.h gcc/ginclude/stdatomic.h
index 108259b4..26b4b6e 100644
--- gcc/ginclude/stdatomic.h
+++ gcc/ginclude/stdatomic.h
@@ -95,30 +95,16 @@ typedef _Atomic __UINTMAX_TYPE__ atomic_uintmax_t;
 #define atomic_signal_fence(MO)__atomic_signal_fence  (MO)
 #define atomic_is_lock_free(OBJ) __atomic_is_lock_free (sizeof (*(OBJ)), (OBJ))
 
-#define __atomic_type_lock_free(T) \
-  (__atomic_always_lock_free (sizeof (T), (void *) 0)  \
-   ? 2 \
-   : (__atomic_is_lock_free (sizeof (T), (void *) 0) ? 1 : 0))
-#define ATOMIC_BOOL_LOCK_FREE  \
-  __atomic_type_lock_free (atomic_bool)
-#define ATOMIC_CHAR_LOCK_FREE  \
-  __atomic_type_lock_free (atomic_char)
-#define ATOMIC_CHAR16_T_LOCK_FREE  \
-  __atomic_type_lock_free (atomic_char16_t)
-#define ATOMIC_CHAR32_T_LOCK_FREE  \
-  __atomic_type_lock_free (atomic_char32_t)
-#define ATOMIC_WCHAR_T_LOCK_FREE   \
-  __atomic_type_lock_free (atomic_wchar_t)
-#define ATOMIC_SHORT_LOCK_FREE \
-  __atomic_type_lock_free (atomic_short)
-#define ATOMIC_INT_LOCK_FREE   \
-  __atomic_type_lock_free (atomic_int)
-#define ATOMIC_LONG_LOCK_FREE  \
-  __atomic_type_lock_free (atomic_long)
-#define ATOMIC_LLONG_LOCK_FREE \
-  __atomic_type_lock_free (atomic_llong)
-#define ATOMIC_POINTER_LOCK_FREE   \
-  __atomic_type_lock_free (void * _Atomic)
+#define ATOMIC_BOOL_LOCK_FREE  __GCC_ATOMIC_BOOL_LOCK_FREE
+#define ATOMIC_CHAR_LOCK_FREE  __GCC_ATOMIC_CHAR_LOCK_FREE
+#define ATOMIC_CHAR16_T_LOCK_FREE  __GCC_ATOMIC_CHAR16_T_LOCK_FREE
+#define ATOMIC_CHAR32_T_LOCK_FREE  __GCC_ATOMIC_CHAR32_T_LOCK_FREE
+#define ATOMIC_WCHAR_T_LOCK_FREE   __GCC_ATOMIC_WCHAR_T_LOCK_FREE
+#define ATOMIC_SHORT_LOCK_FREE __GCC_ATOMIC_SHORT_LOCK_FREE
+#define ATOMIC_INT_LOCK_FREE   __GCC_ATOMIC_INT_LOCK_FREE
+#define ATOMIC_LONG_LOCK_FREE  __GCC_ATOMIC_LONG_LOCK_FREE
+#define ATOMIC_LLONG_LOCK_FREE __GCC_ATOMIC_LLONG_LOCK_FREE
+#define ATOMIC_POINTER_LOCK_FREE   __GCC_ATOMIC_POINTER_LOCK_FREE
 
 
 /* Note that these macros require __typeof__ and __auto_type to remove
diff --git gcc/testsuite/gcc.dg/c11-stdatomic-2.c 
gcc/testsuite/gcc.dg/c11-stdatomic-2.c
index e69de29..d746b5c 100644
--- gcc/testsuite/gcc.dg/c11-stdatomic-2.c
+++ gcc/testsuite/gcc.dg/c11-stdatomic-2.c
@@ -0,0 +1,27 @@
+/* Test stdatomic.h header contents.  Test that ATOMIC_*_LOCK_FREE
+   macros can be used in an #if directive (DR#458).  */
+/* { dg-do preprocess } */
+/* { dg-options -std=c11 -pedantic-errors } */
+
+#include stdatomic.h
+
+#if ATOMIC_BOOL_LOCK_FREE
+#endif
+#if ATOMIC_CHAR_LOCK_FREE
+#endif
+#if ATOMIC_CHAR16_T_LOCK_FREE
+#endif
+#if ATOMIC_CHAR32_T_LOCK_FREE
+#endif
+#if ATOMIC_WCHAR_T_LOCK_FREE
+#endif
+#if ATOMIC_SHORT_LOCK_FREE
+#endif
+#if ATOMIC_INT_LOCK_FREE
+#endif
+#if ATOMIC_LONG_LOCK_FREE
+#endif
+#if ATOMIC_LLONG_LOCK_FREE
+#endif
+#if ATOMIC_POINTER_LOCK_FREE
+#endif

Marek


Re: [PATCH] Fix PR62031

2014-08-14 Thread Sebastian Pop
Richard Biener wrote:
 
 This fixes wrong answer from data-dependence analysis by realizing
 that _all_ (even non-evolving) indirect accesses cannot be constrained
 to a full object size.  This also gets rid of that ugly
 DR_UNCONSTRAINED_BASE hack (but effectively make it always active).
 

Looks good to me.

 @@ -1404,9 +1409,21 @@ dr_may_alias_p (const struct data_refere
  build_fold_addr_expr (addr_b));
  }
else if (TREE_CODE (addr_b) == MEM_REF
 - DR_UNCONSTRAINED_BASE (b))
 -return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
 -TREE_OPERAND (addr_b, 0));
 + TREE_CODE (TREE_OPERAND (addr_b, 0)) == SSA_NAME)
 +{
 +  /* For true dependences we can apply TBAA.  */
 +  if (flag_strict_aliasing
 +DR_IS_WRITE (a)  DR_IS_READ (b)
 +!alias_sets_conflict_p (get_alias_set (DR_REF (a)),
 +  get_alias_set (DR_REF (b
 + return false;
 +  if (TREE_CODE (addr_a) == MEM_REF)
 + return ptr_derefs_may_alias_p (TREE_OPERAND (addr_a, 0),
 +TREE_OPERAND (addr_b, 0));
 +  else

Remove the else here.

 + return ptr_derefs_may_alias_p (build_fold_addr_expr (addr_a),
 +TREE_OPERAND (addr_b, 0));
 +}
  
/* Otherwise DR_BASE_OBJECT is an access that covers the whole object
   that is being subsetted in the loop nest.  */
 


Re: [C PATCH] Allow ATOMIC_*_LOCK_FREE in #if directive (DR#458)

2014-08-14 Thread Joseph S. Myers
On Thu, 14 Aug 2014, Marek Polacek wrote:

 The DR#458 is about the usage of ATOMIC_*_LOCK_FREE macros defined in
 stdatomic.h in the #if directives.  Proposed Technical Corrigendum of
 this DR is that these macros should expand to constant expressions
 suitable for use in #if preprocessing directives.  This patch does
 that by mapping these macros to __GCC_ATOMIC_*_LOCK_FREE variants
 defined in c-cppbuiltin.c:cpp_atomic_builtins.  (It's what libsupc++
 does.)  The current implementation wouldn't work in #if directives.
 
 Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch, fortran] Fix PR 62106

2014-08-14 Thread Steve Kargl
On Thu, Aug 14, 2014 at 07:40:52PM +0200, Thomas Koenig wrote:
 Hello world,
 
 the attached patch fixes the regression by making sure we never
 try to create a temporary variable from a temporary variable,
 which happened in the wrong order.
 
 Regression-tested.  OK for trunk and 4.9?
 

Looks good to me.

-- 
Steve


Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting

2014-08-14 Thread Teresa Johnson
On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang ahyan...@google.com wrote:
 This bug is caused by my last patch, which did not differentiate
 between explicit section names (via attributes) and implicit section
 names (via -ffunction-section).

 This patch fixes that.

 --

 diff --git gcc/bb-reorder.c gcc/bb-reorder.c
 index 8f8c420..2115b01 100644
 --- gcc/bb-reorder.c
 +++ gcc/bb-reorder.c
 @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void)
  we are going to omit the reordering.  */
optimize_function_for_speed_p (cfun)
!DECL_ONE_ONLY (current_function_decl)
 -  !DECL_SECTION_NAME (current_function_decl));
 +  !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
  }

  /* This function is the main 'entrance' for the optimization that
 diff --git gcc/tree.h gcc/tree.h
 index 817507f..738675a 100644
 --- gcc/tree.h
 +++ gcc/tree.h
 @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl {
  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
(DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p)

 +/* Speficy whether the section name was explicitly set with decl_attributes. 
 */
 +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 +  (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \
 +   !!DECL_SECTION_NAME(NODE))

Personally, I think it is clearer to simply write this as:

#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
(DECL_SECTION_NAME(NODE) != NULL_TREE \
  !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))

Teresa

 +
  struct GTY(()) tree_decl_with_vis {
   struct tree_decl_with_rtl common;
   tree assembler_name;
 --



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: RFC: Patch for switch elimination (PR 54742)

2014-08-14 Thread Steve Ellcey
On Thu, 2014-08-14 at 10:21 -0600, Jeff Law wrote:
 On 08/14/14 10:12, David Malcolm wrote:
  On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:
  On 08/14/14 04:32, Richard Biener wrote:
  You'll note in a separate thread Steve and I discussed this during 
  Cauldron
  and it was at my recommendation Steve resurrected his proof of concept
  plugin and started beating it into shape.
 
  But do we really want a pass just to help coremark?
  And that's the biggest argument against Steve's work.  In theory it
  should be applicable to other FSMs, but nobody's come forth with
  additional testcases from real world applications.
 
  Maybe a regex library?  Perhaps:
  http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?
 The key is that at least some states tell you at compile time what state 
 you'll be in during the next loop iteration.  Thus instead of coming 
 around the loop, evaluating the switch condition, then doing the 
 multi-way branch, we just directly jump to the case for the next iteration.
 
 I've never looked at the PCRE code to know if it's got cases like that.
 
 jeff

I compiled PCRE but it never triggered this optimization (even if I
bumped up the parameters for instruction counts and paths).

I understand the desire not to add optimizations just for benchmarks but
we do know other compilers have added this optimization for coremark
(See
http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance)
 and the 13 people on the CC list for this bug certainly shows interest in 
having it even if it is just for a benchmark.  Does 'competing against other 
compilers' sound better then 'optimizing for a benchmark'?

Steve Ellcey
sell...@mips.com



[Google/4_9] A couple gcov-tool fixes

2014-08-14 Thread Teresa Johnson
Fix a couple problems found during testing.

Backport from trunk (r212694) failed to fixup gcov_read_counter
invocations in google-specific code. Also, forward port
r211800 from google/4_8 to tolerate differences after COMDAT
fixup.

Passes manual testing, ok if passes regression tests?

Thanks,
Teresa

2014-08-14  Teresa Johnson  tejohn...@google.com

* libgcov-merge.c (__gcov_merge_dc): Use gcov_get_counter,
Relax the check after COMDAT fixup.
(__gcov_merge_icall_topn): Use gcov_get_counter.

Index: libgcov-merge.c
===
--- libgcov-merge.c (revision 213975)
+++ libgcov-merge.c (working copy)
@@ -95,8 +95,8 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c
   gcc_assert (!(n_counters % 2));
   for (i = 0; i  n_counters; i += 2)
 {
-  gcov_type global_id = gcov_read_counter ();
-  gcov_type call_count = gcov_read_counter ();
+  gcov_type global_id = gcov_get_counter_target ();
+  gcov_type call_count = gcov_get_counter ();

   /* Note that global id counter may never have been set if no calls were
 made from this call-site.  */
@@ -108,7 +108,10 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c
   else if (__gcov_is_gid_insane (global_id))
 global_id = counters[i];

-  gcc_assert (counters[i] == global_id);
+  /* In the case of inconsistency, use the src's target.  */
+  if (counters[i] != global_id)
+fprintf (stderr, Warning: Inconsistent call targets in
+  direct-call profile.\n);
 }
   else if (global_id)
counters[i] = global_id;
@@ -158,12 +161,12 @@ __gcov_merge_icall_topn (gcov_type *counters, unsi
 }

   /* Skip the number_of_eviction entry.  */
-  gcov_read_counter ();
+  gcov_get_counter ();
   for (k = 0; k  GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2)
 {
   int found = 0;
-  gcov_type global_id = gcov_read_counter ();
-  gcov_type call_count = gcov_read_counter ();
+  gcov_type global_id = gcov_get_counter_target ();
+  gcov_type call_count = gcov_get_counter ();
   for (m = 0; m  j; m += 2)
 {
   if (tmp_array[m] == global_id)


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[C++ Patch] pedwarn issues

2014-08-14 Thread Paolo Carlini

Hi,

a few cases where under SFINAE we just go ahead and we don't immediately 
return error_mark_node. During the work I also noticed a glitch in 
cxx_sizeof_or_alignof_type where we do complain  tf_warning_or_error 
but in that case complain is a bool.


Tested x86_64-linux as usual.

Thanks,
Paolo.


2014-08-14  Paolo Carlini  paolo.carl...@oracle.com

* typeck.c (composite_pointer_type, cxx_sizeof_or_alignof_type,
cp_build_array_ref, cp_build_function_call_vec): When a
pedwarn is suppressed under SFINAE, return error_mark_node.

* typeck.c (cxx_sizeof_or_alignof_type): Fix complain 
tf_warning_or_error, where complain is a bool, glitch.
Index: typeck.c
===
--- typeck.c(revision 213976)
+++ typeck.c(working copy)
@@ -597,28 +597,34 @@ composite_pointer_type (tree t1, tree t2, tree arg
   tree attributes;
   tree result_type;
 
-  if (TYPE_PTRFN_P (t2)  (complain  tf_error))
-{
-  switch (operation)
-  {
-  case CPO_COMPARISON:
-pedwarn (input_location, OPT_Wpedantic, 
- ISO C++ forbids comparison between 
- pointer of type %void *% and pointer-to-function);
-break;
-  case CPO_CONVERSION:
-pedwarn (input_location, OPT_Wpedantic,
- ISO C++ forbids conversion between 
- pointer of type %void *% and pointer-to-function);
-break;
-  case CPO_CONDITIONAL_EXPR:
-pedwarn (input_location, OPT_Wpedantic,
- ISO C++ forbids conditional expression between 
- pointer of type %void *% and pointer-to-function);
-break;
-  default:
-gcc_unreachable ();
-  }
+  if (TYPE_PTRFN_P (t2))
+   {
+ if (complain  tf_error)
+   {
+ switch (operation)
+   {
+   case CPO_COMPARISON:
+ pedwarn (input_location, OPT_Wpedantic, 
+  ISO C++ forbids comparison between pointer 
+  of type %void *% and pointer-to-function);
+ break;
+   case CPO_CONVERSION:
+ pedwarn (input_location, OPT_Wpedantic,
+  ISO C++ forbids conversion between pointer 
+  of type %void *% and pointer-to-function);
+ break;
+   case CPO_CONDITIONAL_EXPR:
+ pedwarn (input_location, OPT_Wpedantic,
+  ISO C++ forbids conditional expression between 
+  pointer of type %void *% and 
+  pointer-to-function);
+ break;
+   default:
+ gcc_unreachable ();
+   }
+   }
+ else
+   return error_mark_node;
 }
   result_type
= cp_build_qualified_type (void_type_node,
@@ -1536,6 +1542,8 @@ cxx_sizeof_or_alignof_type (tree type, enum tree_c
pedwarn (input_location, OPT_Wpointer_arith, 
 invalid application of %qs to a member function, 
 operator_name_info[(int) op].name);
+  else
+   return error_mark_node;
   value = size_one_node;
 }
 
@@ -1561,7 +1569,7 @@ cxx_sizeof_or_alignof_type (tree type, enum tree_c
   if (cxx_dialect = cxx1y  array_of_runtime_bound_p (type)
(flag_iso || warn_vla  0))
 {
-  if (complain  tf_warning_or_error)
+  if (complain)
pedwarn (input_location, OPT_Wvla,
 taking sizeof array of runtime bound);
   else
@@ -3129,9 +3137,14 @@ cp_build_array_ref (location_t loc, tree array, tr
return error_mark_node;
}
 
-  if (!lvalue_p (array)  (complain  tf_error))
-   pedwarn (loc, OPT_Wpedantic, 
-ISO C++ forbids subscripting non-lvalue array);
+  if (!lvalue_p (array))
+   {
+ if (complain  tf_error)
+   pedwarn (loc, OPT_Wpedantic, 
+ISO C++ forbids subscripting non-lvalue array);
+ else
+   return error_mark_node;
+   }
 
   /* Note in C++ it is valid to subscript a `register' array, since
 it is valid to take the address of something with that
@@ -3467,10 +3480,14 @@ cp_build_function_call_vec (tree function, vectre
   fndecl = function;
 
   /* Convert anything with function type to a pointer-to-function.  */
-  if (DECL_MAIN_P (function)  (complain  tf_error))
-   pedwarn (input_location, OPT_Wpedantic, 
-ISO C++ forbids calling %::main% from within program);
-
+  if (DECL_MAIN_P (function))
+   {
+ if (complain  tf_error)
+   pedwarn (input_location, OPT_Wpedantic, 
+  

Re: RFC: Patch for switch elimination (PR 54742)

2014-08-14 Thread Sebastian Pop
Steve Ellcey wrote:
 I understand the desire not to add optimizations just for benchmarks but
 we do know other compilers have added this optimization for coremark
 (See
 http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance)
 and the 13 people on the CC list for this bug certainly shows interest in
 having it even if it is just for a benchmark.  Does 'competing against other
 compilers' sound better then 'optimizing for a benchmark'?

I definitely would like to see GCC trunk do this transform.  What about we
integrate the new pass, and then when jump-threading manages to catch the
coremark loop, we remove the pass?

Thanks,
Sebastian


Re: [Patch] PR55189 enable -Wreturn-type by default

2014-08-14 Thread Manuel López-Ibáñez
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -693,6 +693,10 @@ gfc_handle_option (size_t scode, const char *arg,
int value,
   gfc_option.warn_line_truncation = value;
   break;

+case OPT_Wmissing_return:
+  warn_missing_return = value;
+  break;
+
 case OPT_Wrealloc_lhs:
   gfc_option.warn_realloc_lhs = value;
   break;

The entry in c.opt says this is a C/C++ option, why you need this?

+Wmissing-return
+C ObjC C++ ObjC++ Var(warn_missing_return) LangEnabledBy(C ObjC C++
ObjC++,Wreturn-type)
+Warn whenever control may reach end of non-void function

This should prevent that using -Wreturn-type in Fortran tries to
enable -Wmissing-return, if not that is a bug.

In any case, the work-around should be adding Wmissing-return to
fortran/lang.opt with a ??? comment, not there.


--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -472,7 +472,7 @@ C ObjC Var(warn_implicit_function_declaration)
Init(-1) Warning LangEnabledBy(C
 Warn about implicit function declarations

 Wimplicit-int
-C ObjC Var(warn_implicit_int) Warning LangEnabledBy(C ObjC,Wimplicit)
+C ObjC Var(warn_implicit_int) Warning
 Warn when a declaration does not specify a type

 Wimport
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5ae910c..3f2019a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3615,7 +3615,7 @@ This warning is enabled by @option{-Wall} in C++.
 @opindex Wimplicit-int
 @opindex Wno-implicit-int
 Warn when a declaration does not specify a type.
-This warning is enabled by @option{-Wall}.
+This warning is enabled by default.

 @item -Wimplicit-function-declaration @r{(C and Objective-C only)}
 @opindex Wimplicit-function-declaration


Does this patch actually enables -Wimplicit-int by default? The
default without Init() should be zero!

And according to this: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01367.html

we still want -Wno-implicit to disable -Wimplicit-int (and
-Werror=implicit to set -Werror=implicit-int), so the LangEnabledBy()
should stay. The documentation could say: This warning is enabled by
default and it is also controlled by -Wimplicit.

Cheers,

Manuel.


Re: [patch] No allocation for empty unordered containers

2014-08-14 Thread François Dumont

On 13/08/2014 11:50, Jonathan Wakely wrote:


Yes you can, it's conforming to replace a (non-virtual) member function
with default arguments by two or more member functions. We do it all
the time.

See 17.6.5.5 [member.functions] p2.


You should have told it sooner ! But of course no-one is supposed 
to ignore the Standard :-).


Then here is the patch to introduce default constructor with 
compiler computed noexcept qualification. Note that I also made 
allocator aware default constructor allocation free however noexcept 
qualification has to be manually written which I find quite a burden. Do 
you think we shall do so now ?


2014-08-14  François Dumont fdum...@gcc.gnu.org

* include/bits/hashtable_policy.h (_Prime_rehash_policy): Qualify 
constructor

noexcept.
(_Hash_code_base): All specialization default constructible if
possible.
(_Hashtable_base): Likewise.
* include/bits/hashtable.h (_Hashtable()): Implementation defaulted.
* include/bits/unordered_map.h (unordered_map::unordered_map()): New,
implementation defaulted.
(unordered_multimap::unordered_multimap()): Likewise.
* include/bits/unordered_set.h
(unordered_set::unordered_set()): Likewise.
(unordered_multiset::unordered_multiset()): Likewise.
* include/debug/unordered_map: Likewise.
* include/debug/unordered_set: Likewise.
* testsuite/23_containers/unordered_map/allocator/noexcept.cc
(test04()): New.
* testsuite/23_containers/unordered_multimap/allocator/noexcept.cc
(test04()): New.
* testsuite/23_containers/unordered_set/allocator/noexcept.cc
(test04()): New.
* testsuite/23_containers/unordered_multiset/allocator/noexcept.cc
(test04()): New.

I am preparing a patch for profile mode so I will submit modification 
for this mode with this big patch.


Tested under Linux x86_64.

Ok to commit ?

François
Index: include/bits/hashtable_policy.h
===
--- include/bits/hashtable_policy.h	(revision 213780)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -460,7 +460,7 @@
   /// smallest prime that keeps the load factor small enough.
   struct _Prime_rehash_policy
   {
-_Prime_rehash_policy(float __z = 1.0)
+_Prime_rehash_policy(float __z = 1.0) noexcept
 : _M_max_load_factor(__z), _M_next_resize(0) { }
 
 float
@@ -1071,7 +1071,8 @@
   typedef void* 	__hash_code;
   typedef _Hash_node_Value, false			__node_type;
 
-  // We need the default constructor for the local iterators.
+  // We need the default constructor for the local iterators and _Hashtable
+  // default constructor.
   _Hash_code_base() = default;
 
   _Hash_code_base(const _ExtractKey __ex, const _H1, const _H2,
@@ -1161,7 +1162,8 @@
   typedef std::size_t __hash_code;
   typedef _Hash_node_Value, false			__node_type;
 
-  // We need the default constructor for the local iterators.
+  // We need the default constructor for the local iterators and _Hashtable
+  // default constructor.
   _Hash_code_base() = default;
 
   _Hash_code_base(const _ExtractKey __ex,
@@ -1250,6 +1252,8 @@
   typedef std::size_t __hash_code;
   typedef _Hash_node_Value, true			__node_type;
 
+  // We need the default constructor for _Hashtable default constructor.
+  _Hash_code_base() = default;
   _Hash_code_base(const _ExtractKey __ex,
 		  const _H1 __h1, const _H2 __h2,
 		  const _Default_ranged_hash)
@@ -1694,6 +1698,7 @@
 	__hash_code, __hash_cached::value;
 
   protected:
+_Hashtable_base() = default;
 _Hashtable_base(const _ExtractKey __ex, const _H1 __h1, const _H2 __h2,
 		const _Hash __hash, const _Equal __eq)
 : __hash_code_base(__ex, __h1, __h2, __hash), _EqualEBO(__eq)
@@ -1906,6 +1911,7 @@
 	__alloc_rebind__node_alloc_type, __bucket_type;
   using __bucket_alloc_traits = std::allocator_traits__bucket_alloc_type;
 
+  _Hashtable_alloc() = default;
   _Hashtable_alloc(const _Hashtable_alloc) = default;
   _Hashtable_alloc(_Hashtable_alloc) = default;
 
Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h	(revision 213780)
+++ include/bits/hashtable.h	(working copy)
@@ -310,10 +310,10 @@
    const_local_iterator;
 
 private:
-  __bucket_type*		_M_buckets;
-  size_type			_M_bucket_count;
+  __bucket_type*		_M_buckets		= _M_single_bucket;
+  size_type			_M_bucket_count		= 1;
   __node_base		_M_before_begin;
-  size_type			_M_element_count;
+  size_type			_M_element_count	= 0;
   _RehashPolicy		_M_rehash_policy;
 
   // A single bucket used when only need for 1 bucket. Especially
@@ -322,7 +322,7 @@
   // qualified.
   // Note that we can't leave hashtable with 0 bucket without adding
   // numerous checks in the code to avoid 0 modulus.
-  __bucket_type		

Re: [C++ Patch] pedwarn issues

2014-08-14 Thread Jason Merrill

OK, thanks.

Jason


Re: [PATCH 130/236] config/bfin: Use rtx_insn

2014-08-14 Thread Jeff Law

On 08/06/14 11:21, David Malcolm wrote:

gcc/
* config/bfin/bfin-protos.h (asm_conditional_branch): Strengthen
param 1 from rtx to rtx_insn *.
* config/bfin/bfin.c (expand_prologue_reg_save): Likewise for
the various locals named insn.
(expand_epilogue_reg_restore): Likewise.
(frame_related_constant_load): Likewise.
(add_to_reg): Likewise.
(emit_link_insn): Likewise.
(do_link): Likewise.
(expand_interrupt_handler_prologue): Likewise.
(branch_dest): Likewise for param branch.
(asm_conditional_branch): Likewise for param insn.
(gen_one_bundle): Likewise for elements of param slot and local
t.
(bfin_gen_bundles): Likewise for locals insn, next and
elements of local slot.
(reorder_var_tracking_notes): Likewise for locals insn, next,
queue, next_queue, prev.
(workaround_rts_anomaly): Likewise for locals insn, first_insn.
(add_sched_insns_for_speculation): Likewise for local insn.

OK.

As are patches: #131-155


I resisted the nostalgia trip through a variety of backends I've hacked 
through the years and focused strictly on the patches  :-)


Jeff


Re: [Google/4_9] A couple gcov-tool fixes

2014-08-14 Thread Teresa Johnson
On Thu, Aug 14, 2014 at 11:36 AM, Xinliang David Li davi...@google.com wrote:
 Ok.

 The interfaces of counter reading/getting now becomes confusing. Should it
 be better documented somewhere so that developer knows what is the right one
 to use in a certain context?

I think it is documented in libgcov.h and accesses should be through
the new gcov_get_counter interface,
but these just got missed when it was ported from trunk.

Teresa


 David



 On Thu, Aug 14, 2014 at 11:27 AM, Teresa Johnson tejohn...@google.com
 wrote:

 Fix a couple problems found during testing.

 Backport from trunk (r212694) failed to fixup gcov_read_counter
 invocations in google-specific code. Also, forward port
 r211800 from google/4_8 to tolerate differences after COMDAT
 fixup.

 Passes manual testing, ok if passes regression tests?

 Thanks,
 Teresa

 2014-08-14  Teresa Johnson  tejohn...@google.com

 * libgcov-merge.c (__gcov_merge_dc): Use gcov_get_counter,
 Relax the check after COMDAT fixup.
 (__gcov_merge_icall_topn): Use gcov_get_counter.

 Index: libgcov-merge.c
 ===
 --- libgcov-merge.c (revision 213975)
 +++ libgcov-merge.c (working copy)
 @@ -95,8 +95,8 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c
gcc_assert (!(n_counters % 2));
for (i = 0; i  n_counters; i += 2)
  {
 -  gcov_type global_id = gcov_read_counter ();
 -  gcov_type call_count = gcov_read_counter ();
 +  gcov_type global_id = gcov_get_counter_target ();
 +  gcov_type call_count = gcov_get_counter ();

/* Note that global id counter may never have been set if no calls
 were
  made from this call-site.  */
 @@ -108,7 +108,10 @@ __gcov_merge_dc (gcov_type *counters, unsigned n_c
else if (__gcov_is_gid_insane (global_id))
  global_id = counters[i];

 -  gcc_assert (counters[i] == global_id);
 +  /* In the case of inconsistency, use the src's target.  */
 +  if (counters[i] != global_id)
 +fprintf (stderr, Warning: Inconsistent call targets in
 +  direct-call profile.\n);
  }
else if (global_id)
 counters[i] = global_id;
 @@ -158,12 +161,12 @@ __gcov_merge_icall_topn (gcov_type *counters, unsi
  }

/* Skip the number_of_eviction entry.  */
 -  gcov_read_counter ();
 +  gcov_get_counter ();
for (k = 0; k  GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2)
  {
int found = 0;
 -  gcov_type global_id = gcov_read_counter ();
 -  gcov_type call_count = gcov_read_counter ();
 +  gcov_type global_id = gcov_get_counter_target ();
 +  gcov_type call_count = gcov_get_counter ();
for (m = 0; m  j; m += 2)
  {
if (tmp_array[m] == global_id)


 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413





-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH Fortran/Diagnostics] Move Fortran to common diagnostics machinery

2014-08-14 Thread Manuel López-Ibáñez
Hi Tobias (or any other Fortran maintainer),

Is this patch OK?

https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00488.html

Thanks,

Manuel.


Re: [PATCH 012/236] Convert DF_REF_INSN to a function for now

2014-08-14 Thread David Malcolm
On Wed, 2014-08-13 at 20:55 -0600, Jeff Law wrote:
 On 08/13/14 18:11, David Malcolm wrote:
  On Wed, 2014-08-13 at 14:34 -0600, Jeff Law wrote:
  On 08/13/14 14:28, David Malcolm wrote:
  Thanks.  Although this function gets converted back to a macro in patch
  191, I just realized that in the meantime that it's not inlined, as is
  the case for some of the other macro-function conversions in patches
  13-16.
 
  Do I need to convert them to inline functions with the appropriate
  headers, and is that regarded as a sufficiently trivial fix to the stuff
  you've already reviewed to not need re-review? (I will bootstraptest).
  I'd just make it a follow-up. #237 ;-)
 
  Right, but these would be modifications to stuff that only exists
  between phases 1-3 of the kit, before going away in phase 4, so although
  it might be #237, it would need to be applied when the individual
  changes go in.
 If they're around just through phases #1-#3, then I wouldn't worry about 
 it.
...and indeed, having experimented with it, the inline function approach
would need to include more header files: for example, an inline
implementation of, say, BB_HEAD() in basic-block.h would look like this:

inline rtx_insn *BB_HEAD (const_basic_block bb)
{
  rtx insn = bb-il.x.head_;
  return safe_as_a rtx_insn * (insn);
}

but this requires everything including basic-block.h to know about
safe_as_a rtx_insn * (rtx) and hence have access to
is_a_helper rtx_insn * (rtx), and hence needs to include rtl.h i.e.
either basic-block.h would need to include rtl.h, or everything
including it would.  Similar considerations apply to the macros in the
various other header files.  Touching the header file graph doesn't seem
like a good thing to be doing for a transient effort during phases 1-3
of applying the kit, so I'll hold off on making that change, and go with
the patches as-is.

  Transient, yes, but given the amount of time for me to simply bootstrap
  each candidate patch, the non-inlined functions could last in trunk for
  a couple of weeks (there are only 168 hours in a week, and a bootstrap
  +regrtest takes about 3 hours on my box, so for 236 patches we're
  talking ~4 weeks of compute time just for that).
 Well, I'd suggest a few things.
 
 1. For the config/ changes, a full bootstrap is not necessary.  For 
 those targets which are embedded, just build a stage1 cross to the 
 target to verify it builds and call it good.
 
 2. For targets where you can bootstrap, go ahead and do so, but just on 
 those targets.  Some of these you can probably have going in parallel.
 
 3. Some of the changes are so trivial that I'd squash them together in a 
 single build/test cycle.

Thanks.

FWIW I'm working on followup cleanups whilst waiting for the
build/tests.

 I would even consider seeing all the scaffolding go in as a single 
 chunk.  It's nice to see the individuals during the review process, but 
 I wouldn't lose any sleep if bundled those together.
 
 
 
  I guess the underlying point here is that this is a big change and I'm
  trying to be fastidious here.  Murphy's Law suggests I'm going to break
  at least *something* :(
 Understood, but we don't want to be so fastidious that the time this 
 stuff is in flux is so long that it creates more problems than it would 
 if we took some sensible and safe shortcuts.

(nods)

Thanks again for the reviews
Dave



Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting

2014-08-14 Thread Yi Yang
Patch v2.

Trunk no longer set SECTION_NAME for implicit section names, so this
probably does not apply to trunk. It's probably not necessary for
trunk either.

Tested for Google 4.8(albeit unnecessary) and 4.9 branch.

diff --git gcc/bb-reorder.c gcc/bb-reorder.c
index a1b3e65..b9a829e 100644
--- gcc/bb-reorder.c
+++ gcc/bb-reorder.c
@@ -2554,7 +2554,7 @@ gate_handle_partition_blocks (void)
  we are going to omit the reordering.  */
optimize_function_for_speed_p (cfun)
!DECL_ONE_ONLY (current_function_decl)
-   !DECL_SECTION_NAME (current_function_decl));
+   !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
 }

 /* This function is the main 'entrance' for the optimization that
diff --git gcc/tree.h gcc/tree.h
index b656b7b..308eef8 100644
--- gcc/tree.h
+++ gcc/tree.h
@@ -2405,6 +2405,11 @@ extern void decl_value_expr_insert (tree, tree);
 #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
   (DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p)

+/* Speficy whether the section name was explicitly set with decl_attributes. */
+#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
+  (DECL_SECTION_NAME(NODE) != NULL_TREE \
+!DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))
+
 extern tree decl_debug_expr_lookup (tree);
 extern void decl_debug_expr_insert (tree, tree);

-- 

On Thu, Aug 14, 2014 at 11:25 AM, Teresa Johnson tejohn...@google.com wrote:
 On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang ahyan...@google.com wrote:
 This bug is caused by my last patch, which did not differentiate
 between explicit section names (via attributes) and implicit section
 names (via -ffunction-section).

 This patch fixes that.

 --

 diff --git gcc/bb-reorder.c gcc/bb-reorder.c
 index 8f8c420..2115b01 100644
 --- gcc/bb-reorder.c
 +++ gcc/bb-reorder.c
 @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void)
  we are going to omit the reordering.  */
optimize_function_for_speed_p (cfun)
!DECL_ONE_ONLY (current_function_decl)
 -  !DECL_SECTION_NAME (current_function_decl));
 +  !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
  }

  /* This function is the main 'entrance' for the optimization that
 diff --git gcc/tree.h gcc/tree.h
 index 817507f..738675a 100644
 --- gcc/tree.h
 +++ gcc/tree.h
 @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl {
  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
(DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p)

 +/* Speficy whether the section name was explicitly set with 
 decl_attributes. */
 +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 +  (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \
 +   !!DECL_SECTION_NAME(NODE))

 Personally, I think it is clearer to simply write this as:

 #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 (DECL_SECTION_NAME(NODE) != NULL_TREE \
   !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))

 Teresa

 +
  struct GTY(()) tree_decl_with_vis {
   struct tree_decl_with_rtl common;
   tree assembler_name;
 --



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting

2014-08-14 Thread Teresa Johnson
On Thu, Aug 14, 2014 at 1:46 PM, Yi Yang ahyan...@google.com wrote:
 Patch v2.

 Trunk no longer set SECTION_NAME for implicit section names, so this
 probably does not apply to trunk. It's probably not necessary for
 trunk either.

 Tested for Google 4.8(albeit unnecessary) and 4.9 branch.

 diff --git gcc/bb-reorder.c gcc/bb-reorder.c
 index a1b3e65..b9a829e 100644
 --- gcc/bb-reorder.c
 +++ gcc/bb-reorder.c
 @@ -2554,7 +2554,7 @@ gate_handle_partition_blocks (void)
   we are going to omit the reordering.  */
 optimize_function_for_speed_p (cfun)
 !DECL_ONE_ONLY (current_function_decl)
 -   !DECL_SECTION_NAME (current_function_decl));
 +   !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
  }

  /* This function is the main 'entrance' for the optimization that
 diff --git gcc/tree.h gcc/tree.h
 index b656b7b..308eef8 100644
 --- gcc/tree.h
 +++ gcc/tree.h
 @@ -2405,6 +2405,11 @@ extern void decl_value_expr_insert (tree, tree);
  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
(DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p)

 +/* Speficy whether the section name was explicitly set with decl_attributes. 
 */

Typo Specify.

Otherwise looks ok to me.

Thanks,
Teresa

 +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 +  (DECL_SECTION_NAME(NODE) != NULL_TREE \
 +!DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))
 +
  extern tree decl_debug_expr_lookup (tree);
  extern void decl_debug_expr_insert (tree, tree);

 --

 On Thu, Aug 14, 2014 at 11:25 AM, Teresa Johnson tejohn...@google.com wrote:
 On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang ahyan...@google.com wrote:
 This bug is caused by my last patch, which did not differentiate
 between explicit section names (via attributes) and implicit section
 names (via -ffunction-section).

 This patch fixes that.

 --

 diff --git gcc/bb-reorder.c gcc/bb-reorder.c
 index 8f8c420..2115b01 100644
 --- gcc/bb-reorder.c
 +++ gcc/bb-reorder.c
 @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void)
  we are going to omit the reordering.  */
optimize_function_for_speed_p (cfun)
!DECL_ONE_ONLY (current_function_decl)
 -  !DECL_SECTION_NAME (current_function_decl));
 +  !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
  }

  /* This function is the main 'entrance' for the optimization that
 diff --git gcc/tree.h gcc/tree.h
 index 817507f..738675a 100644
 --- gcc/tree.h
 +++ gcc/tree.h
 @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl {
  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
(DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p)

 +/* Speficy whether the section name was explicitly set with 
 decl_attributes. */
 +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 +  (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \
 +   !!DECL_SECTION_NAME(NODE))

 Personally, I think it is clearer to simply write this as:

 #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 (DECL_SECTION_NAME(NODE) != NULL_TREE \
   !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))

 Teresa

 +
  struct GTY(()) tree_decl_with_vis {
   struct tree_decl_with_rtl common;
   tree assembler_name;
 --



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] cygwin: __cxa_atexit support

2014-08-14 Thread Yaakov Selkowitz
This patch implements __cxa_atexit support for Cygwin targets.  This 
requires Cygwin 1.7.32 and binutils master.


Net difference in check-c++ results on i686-pc-cygwin:
# of unexpected failures-11
# of unexpected successes   -3
# of expected failures  -61
# of unsupported tests  -46

--
Yaakov Selkowitz
Associate Software Engineer, ARM
Red Hat, Inc.
2014-08-12  Yaakov Selkowitz  yselk...@redhat.com

gcc/
* config/i386/cygwin.h (STARTFILE_SPEC): Use crtbeginS.o if shared.
* config.gcc (*-*-cygwin*): Use __cxa_atexit by default.

libgcc/
* config/i386/cygming-crtend.c (register_frame_ctor): Move atexit
call from here...
* config/i386/cygming-crtbegin.c (__gcc_register_frame): to here.
(__dso_handle): Define on Cygwin.
* config/i386/t-cygming (crtbeginS.o): New rule.
* config.host (*-*-cygwin*): Add crtbeginS.o to extra_parts.


Index: gcc/config/i386/cygwin.h
===
--- gcc/config/i386/cygwin.h(revision 213759)
+++ gcc/config/i386/cygwin.h(working copy)
@@ -40,7 +40,7 @@
 #define STARTFILE_SPEC \
   %{!shared: %{!mdll: crt0%O%s \
   %{pg:gcrt0%O%s}}}\
-  crtbegin.o%s
+  %{shared:crtbeginS.o%s;:crtbegin.o%s}
 
 #undef ENDFILE_SPEC
 #define ENDFILE_SPEC \
Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 213759)
+++ gcc/config.gcc  (working copy)
@@ -1575,6 +1575,7 @@
if test x$enable_threads = xyes; then
thread_file='posix'
fi
+   default_use_cxa_atexit=yes
use_gcc_stdint=wrap
;;
 x86_64-*-cygwin*)
@@ -1590,6 +1591,7 @@
if test x$enable_threads = xyes; then
thread_file='posix'
fi
+   default_use_cxa_atexit=yes
use_gcc_stdint=wrap
tm_defines=${tm_defines} TARGET_CYGWIN64=1
;;
Index: libgcc/config/i386/cygming-crtbegin.c
===
--- libgcc/config/i386/cygming-crtbegin.c   (revision 213759)
+++ libgcc/config/i386/cygming-crtbegin.c   (working copy)
@@ -111,6 +111,23 @@
   = { };
 #endif
 
+#ifdef __CYGWIN__
+/* Declare the __dso_handle variable.  It should have a unique value
+   in every shared-object; in a main program its value is zero.  The
+   object should in any case be protected.  This means the instance
+   in one DSO or the main program is not used in another object.  The
+   dynamic linker takes care of this.  */
+
+#ifdef CRTSTUFFS_O
+extern void *__ImageBase;
+void *__dso_handle = __ImageBase;
+#else
+void *__dso_handle = 0;
+#endif
+
+#endif /* __CYGWIN__ */
+
+
 /* Pull in references from libgcc.a(unwind-dw2-fde.o) in the
startfile. These are referenced by a ctor and dtor in crtend.o.  */
 extern void __gcc_register_frame (void);
@@ -161,6 +178,13 @@
register_class_fn (__JCR_LIST__);
 }
 #endif
+
+#if DEFAULT_USE_CXA_ATEXIT
+  /* If we use the __cxa_atexit method to register C++ dtors
+ at object construction,  also use atexit to register eh frame
+ info cleanup.  */
+  atexit(__gcc_deregister_frame);
+#endif /* DEFAULT_USE_CXA_ATEXIT */
 }
 
 void
Index: libgcc/config/i386/cygming-crtend.c
===
--- libgcc/config/i386/cygming-crtend.c (revision 213759)
+++ libgcc/config/i386/cygming-crtend.c (working copy)
@@ -70,12 +70,6 @@
 register_frame_ctor (void)
 {
   __gcc_register_frame ();
-#if DEFAULT_USE_CXA_ATEXIT
-  /* If we use the __cxa_atexit method to register C++ dtors
- at object construction,  also use atexit to register eh frame
- info cleanup.  */
-  atexit (__gcc_deregister_frame);
-#endif
 }
 
 #if !DEFAULT_USE_CXA_ATEXIT
Index: libgcc/config/i386/t-cygming
===
--- libgcc/config/i386/t-cygming(revision 213759)
+++ libgcc/config/i386/t-cygming(working copy)
@@ -8,6 +8,9 @@
 crtbegin.o: $(srcdir)/config/i386/cygming-crtbegin.c
$(crt_compile) -fno-omit-frame-pointer  -c $
 
+crtbeginS.o: $(srcdir)/config/i386/cygming-crtbegin.c
+   $(crt_compile) -fno-omit-frame-pointer  -c $ -DCRTSTUFFS_O
+
 # We intentionally use a implementation-reserved init priority of 0,
 # so allow the warning.
 crtend.o: $(srcdir)/config/i386/cygming-crtend.c
Index: libgcc/config.host
===
--- libgcc/config.host  (revision 213759)
+++ libgcc/config.host  (working copy)
@@ -614,7 +614,7 @@
 i[4567]86-wrs-vxworks|i[4567]86-wrs-vxworksae)
;;
 i[34567]86-*-cygwin*)
-   extra_parts=crtbegin.o crtend.o crtfastmath.o
+   extra_parts=crtbegin.o crtbeginS.o crtend.o crtfastmath.o
# This has to match the logic for DWARF2_UNWIND_INFO in 
gcc/config/i386/cygming.h
if test x$enable_sjlj_exceptions = xyes; then
   

Re: [GOOGLE] Fix the bug where implicit section names prevents function splitting

2014-08-14 Thread Yi Yang
Thank you. I fixed the typo and committed.

On Thu, Aug 14, 2014 at 1:49 PM, Teresa Johnson tejohn...@google.com wrote:
 On Thu, Aug 14, 2014 at 1:46 PM, Yi Yang ahyan...@google.com wrote:
 Patch v2.

 Trunk no longer set SECTION_NAME for implicit section names, so this
 probably does not apply to trunk. It's probably not necessary for
 trunk either.

 Tested for Google 4.8(albeit unnecessary) and 4.9 branch.

 diff --git gcc/bb-reorder.c gcc/bb-reorder.c
 index a1b3e65..b9a829e 100644
 --- gcc/bb-reorder.c
 +++ gcc/bb-reorder.c
 @@ -2554,7 +2554,7 @@ gate_handle_partition_blocks (void)
   we are going to omit the reordering.  */
 optimize_function_for_speed_p (cfun)
 !DECL_ONE_ONLY (current_function_decl)
 -   !DECL_SECTION_NAME (current_function_decl));
 +   !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
  }

  /* This function is the main 'entrance' for the optimization that
 diff --git gcc/tree.h gcc/tree.h
 index b656b7b..308eef8 100644
 --- gcc/tree.h
 +++ gcc/tree.h
 @@ -2405,6 +2405,11 @@ extern void decl_value_expr_insert (tree, tree);
  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
(DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p)

 +/* Speficy whether the section name was explicitly set with 
 decl_attributes. */

 Typo Specify.

 Otherwise looks ok to me.

 Thanks,
 Teresa

 +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 +  (DECL_SECTION_NAME(NODE) != NULL_TREE \
 +!DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))
 +
  extern tree decl_debug_expr_lookup (tree);
  extern void decl_debug_expr_insert (tree, tree);

 --

 On Thu, Aug 14, 2014 at 11:25 AM, Teresa Johnson tejohn...@google.com 
 wrote:
 On Wed, Aug 13, 2014 at 9:03 PM, Yi Yang ahyan...@google.com wrote:
 This bug is caused by my last patch, which did not differentiate
 between explicit section names (via attributes) and implicit section
 names (via -ffunction-section).

 This patch fixes that.

 --

 diff --git gcc/bb-reorder.c gcc/bb-reorder.c
 index 8f8c420..2115b01 100644
 --- gcc/bb-reorder.c
 +++ gcc/bb-reorder.c
 @@ -2505,7 +2505,7 @@ gate_handle_partition_blocks (void)
  we are going to omit the reordering.  */
optimize_function_for_speed_p (cfun)
!DECL_ONE_ONLY (current_function_decl)
 -  !DECL_SECTION_NAME (current_function_decl));
 +  !DECL_HAS_EXPLICIT_SECTION_NAME_P(current_function_decl));
  }

  /* This function is the main 'entrance' for the optimization that
 diff --git gcc/tree.h gcc/tree.h
 index 817507f..738675a 100644
 --- gcc/tree.h
 +++ gcc/tree.h
 @@ -3201,6 +3201,11 @@ struct GTY(()) tree_parm_decl {
  #define DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE) \
(DECL_WITH_VIS_CHECK (NODE)-decl_with_vis.implicit_section_name_p)

 +/* Speficy whether the section name was explicitly set with 
 decl_attributes. */
 +#define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 +  (DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE)? false: \
 +   !!DECL_SECTION_NAME(NODE))

 Personally, I think it is clearer to simply write this as:

 #define DECL_HAS_EXPLICIT_SECTION_NAME_P(NODE) \
 (DECL_SECTION_NAME(NODE) != NULL_TREE \
   !DECL_HAS_IMPLICIT_SECTION_NAME_P(NODE))

 Teresa

 +
  struct GTY(()) tree_decl_with_vis {
   struct tree_decl_with_rtl common;
   tree assembler_name;
 --



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH] _cxa_thread_atexit fixes for Cygwin/MinGW-w64

2014-08-14 Thread Yaakov Selkowitz
The attached patch implements premature DLL unloading prevention in 
__cxa_thread_atexit for Cygwin and MinGW-w64 targets.  The mingw.org 
target is welcome to do the same in their os_defines.h, but this code 
does require Windows XP/2003, and they have historically catered to 
older platforms.


MinGW cannot practically implement __cxa_thread_atexit_impl because it 
has no control over the contents of its libc (msvcrt), and implementing 
this in the (static) mingw32 support library would be of little benefit 
because it would still end up statically linked into libstdc++.  Also, 
the GPL-with-exceptions libstdc++ implementation cannot be copied as a 
basis of our own __cxa_thread_atexit_impl due to licensing conflicts 
with both Cygwin (copyright assignment) and MinGW (public domain).


The testsuite shows no regressions with this patch.

--
Yaakov Selkowitz
Associate Software Engineer, ARM
Red Hat, Inc.
2014-08-13  Yaakov Selkowitz  yselk...@redhat.com

libstdc++-v3/
* config/os/mingw32-w64/os_defines.h (_GLIBCXX_THREAD_ATEXIT_WIN32):
Define.
* config/os/newlib/os_defines.h (_GLIBCXX_THREAD_ATEXIT_WIN32):
Ditto.
* libsupc++/atexit_thread.cc [_GLIBCXX_THREAD_ATEXIT_WIN32]:
#include windows.h.
(struct elt): Add dll member.
(run): Decrement dll refcount.
(__cxxabiv1::__cxa_thread_atexit): Increment dll refcount.

Index: libstdc++-v3/config/os/mingw32-w64/os_defines.h
===
--- libstdc++-v3/config/os/mingw32-w64/os_defines.h (revision 213759)
+++ libstdc++-v3/config/os/mingw32-w64/os_defines.h (working copy)
@@ -78,4 +78,9 @@
 #define _GLIBCXX_LLP64 1
 #endif
 
+// Enable use of GetModuleHandleEx (requires Windows XP/2003) in
+// __cxa_thread_atexit to prevent modules from being unloaded before
+// their dtors are called
+#define _GLIBCXX_THREAD_ATEXIT_WIN32 1
+
 #endif
Index: libstdc++-v3/config/os/newlib/os_defines.h
===
--- libstdc++-v3/config/os/newlib/os_defines.h  (revision 213759)
+++ libstdc++-v3/config/os/newlib/os_defines.h  (working copy)
@@ -47,6 +47,12 @@
 
 // See libstdc++/20806.
 #define _GLIBCXX_HAVE_DOS_BASED_FILESYSTEM 1
+
+// Enable use of GetModuleHandleEx (requires Windows XP/2003) in
+// __cxa_thread_atexit to prevent modules from being unloaded before
+// their dtors are called
+#define _GLIBCXX_THREAD_ATEXIT_WIN32 1
+
 #endif
 
 #endif
Index: libstdc++-v3/libsupc++/atexit_thread.cc
===
--- libstdc++-v3/libsupc++/atexit_thread.cc (revision 213759)
+++ libstdc++-v3/libsupc++/atexit_thread.cc (working copy)
@@ -25,6 +25,10 @@
 #include cstdlib
 #include new
 #include bits/gthr.h
+#ifdef _GLIBCXX_THREAD_ATEXIT_WIN32
+#define WIN32_LEAN_AND_MEAN
+#include windows.h
+#endif
 
 #if _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL
 
@@ -47,6 +51,9 @@
 void (*destructor)(void *);
 void *object;
 elt *next;
+#ifdef _GLIBCXX_THREAD_ATEXIT_WIN32
+HMODULE dll;
+#endif
   };
 
   // Keep a per-thread list of cleanups in gthread_key storage.
@@ -62,6 +69,11 @@
   {
elt *old_e = e;
e-destructor (e-object);
+#ifdef _GLIBCXX_THREAD_ATEXIT_WIN32
+   /* Decrement DLL count */
+   if (e-dll)
+ FreeLibrary (e-dll);
+#endif
e = e-next;
delete (old_e);
   }
@@ -133,6 +145,14 @@
   new_elt-destructor = dtor;
   new_elt-object = obj;
   new_elt-next = first;
+#ifdef _GLIBCXX_THREAD_ATEXIT_WIN32
+  /* Store the DLL address for a later call to FreeLibrary in new_elt and
+ increment DLL load count.  This blocks the unloading of the DLL
+ before the thread-local dtors have been called.  This does NOT help
+ if FreeLibrary/dlclose is called in excess. */
+  GetModuleHandleExW (GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS,
+ (LPCWSTR) dtor, new_elt-dll);
+#endif
 
   if (__gthread_active_p ())
 __gthread_setspecific (key, new_elt);


Re: [PATCH 007/236] New function: for_each_rtx_in_insn

2014-08-14 Thread David Malcolm
On Tue, 2014-08-12 at 15:08 -0600, Jeff Law wrote:
 On 08/06/14 11:19, David Malcolm wrote:
  gcc/
  * rtl.h (for_each_rtx_in_insn): New function.
  * rtlanal.c (for_each_rtx_in_insn): Likewise.
 OK.  Note that we're moving away from for_each_rtx...   I haven't 
 looked, but there's a reasonable chance we may not need it after Richard 
 S.'s work is committed.  Something to watch.

CCing Richard S, as a heads up.

Richard, for context, the patch in this thread is this one:
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00494.html

which adds a for_each_rtx_in_insn function, to work on a top-level
insn node, as part of a big cleanup I'm working on that distinguishes
between insn nodes vs arbitrary rtx nodes:
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00498.html

I believe that your rtl-iter reworking:
 https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00133.html
hasn't landed in trunk yet (I don't see an rtl-iter.h there yet).

Summary of notes below: I think our patch series are mostly compatible,
but I think there's a conflict between them in rtlanal.c to do with
for_each_inc_dec, at patch 176 in my kit vs patch 40 of your kit.  I
think it's resolvable, though.

More detailed notes follow:

My reading of Richard's patch series is that it optimizes many specific
for_each_rtx implementations, but keeps the existing generic one around.

Some notes on all of the places in which I've made use of the new
for_each_rtx_in_insn in my patchkit:

cfgcleanup.c:1728:for_each_rtx_in_insn (BB_END (bb1), 
replace_label, rr);
cfgcleanup.c:1742:for_each_rtx_in_insn (BB_END (bb1), 
replace_label, rr);
  (both in outgoing_edges_match; I don't think Richard's patchkit
touches this one).


cfgcleanup.c:2002:  for_each_rtx_in_insn (insn, replace_label, 
rr);
  (in try_crossjump_to_edge; likewise, I don't think Richard's kit
touches this)


ira.c:3350: for_each_rtx_in_insn (insn, set_paradoxical_subreg,
  ...in update_equiv_regs, the use added in my
[PATCH 191/236] Remove DF_REF_INSN scaffolding:
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00554.html

It looks like Richard's
[PATCH 25/50] ira.c:set_paradoxical_subreg: 
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00161.html
optimizes the current code, and my reading of his patch is that it's
compatible with my patchkit; the for_each_rtx_in_insn can be eliminated.
Hence, depending on whose patch goes in first, the other patch will need
some fixup, but I think it will be trivial to fixup.


rtlanal.c:3103:  return for_each_rtx_in_insn (insn, for_each_inc_dec_find_mem, 
data);
  ...in for_each_inc_dec, the use added in my:
[PATCH 176/236] cselib and incdec:
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00711.html

Looks like Richard's
[PATCH 40/50] rtlanal.c:for_each_inc_dec:
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00177.html
optimizes this.

Merging these looks more involved: in my patch series I establish that
the first param to for_each_inc_dec is an rtx_insn **, which may cause
type issues in Richard's new code as is - but perhaps may allow some
simplification?  Given that the loc passed in is an insn, am I right in
thinking we're only interested in MEM within the PATTERN of insn, right?
Also, am I right in thinking that that particular callback can't modify
the ptr that's passed in (I don't see writes back to *r)?  If so, that
suggests that we can lose an indirection from the first param, and
simply work on PATTERN (insn).


config/i386/i386.c:46886:for_each_rtx_in_insn (insn, 
(rtx_function) ix86_loop_memcount,
  (in ix86_loop_unroll_adjust)

config/s390/s390.c:11820:for_each_rtx_in_insn (insn, 
(rtx_function) check_dpu, mem_count);
  (in s390_loop_unroll_adjust)

(as far as I can see Richard's patches don't touch the config subdirs).


Hope this is sane
Dave




  1   2   >