Re: [PATCH, i386]: Fix PR 67484 (version 2)

2015-12-11 Thread Richard Biener
On Fri, Dec 11, 2015 at 10:10 AM, Martin Liška  wrote:
> Hello.
>
> I've just applied suggested change that Richi proposed.
> The patch can bootstrap on x86_64-linux-gnu and survives regression tests. 
> Moreover,
> the memory leak/invalid read has gone.
>
> Ready for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Martin


[PATCH] Remove redundant main_size field from LTO header

2015-12-11 Thread Richard Biener

LTO bootstrapped on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-11  Richard Biener  

* lto-streamer.h (lto_simple_header_with_strings): Remove
main_size field already in lto_simple_header.

Index: gcc/lto-streamer.h
===
--- gcc/lto-streamer.h  (revision 231552)
+++ gcc/lto-streamer.h  (working copy)
@@ -407,9 +407,6 @@ struct lto_simple_header : lto_header
 
 struct lto_simple_header_with_strings : lto_simple_header
 {
-  /* Size of main gimple body of function.  */
-  int32_t main_size;
-
   /* Size of the string table.  */
   int32_t string_size;
 };


Re: [PATCH 1/2] [graphite] document minimal required version for isl

2015-12-11 Thread Richard Biener
On Thu, Dec 10, 2015 at 6:05 PM, Sebastian Pop  wrote:
> also update ISL to isl as requested by its author Sven Verdoolaege.

Ok.  Please always post ChangeLog entries as well.

Richard.

> ---
>  gcc/doc/install.texi | 9 +
>  gcc/doc/invoke.texi  | 4 ++--
>  2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 0b71bef..b43a3ec 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -383,14 +383,15 @@ installed but it is not in your default library search 
> path, the
>  @option{--with-mpc} configure option should be used.  See also
>  @option{--with-mpc-lib} and @option{--with-mpc-include}.
>
> -@item ISL Library version 0.15 or 0.14.
> +@item isl Library version 0.15 or 0.14.
>
> +Minimal isl version supported is 0.14 and it is highly recommended to use 
> 0.15.
>  Necessary to build GCC with the Graphite loop optimizations.
>  It can be downloaded from @uref{ftp://gcc.gnu.org/pub/gcc/infrastructure/}.
> -If an ISL source distribution is found
> +If an isl source distribution is found
>  in a subdirectory of your GCC sources named @file{isl}, it will be
>  built together with GCC.  Alternatively, the @option{--with-isl} configure
> -option should be used if ISL is not installed in your default library
> +option should be used if isl is not installed in your default library
>  search path.
>
>  @end table
> @@ -1850,7 +1851,7 @@ a cross compiler, they will not be used to configure 
> target libraries.
>  @item --with-isl=@var{pathname}
>  @itemx --with-isl-include=@var{pathname}
>  @itemx --with-isl-lib=@var{pathname}
> -If you do not have the ISL library installed in a standard location and you
> +If you do not have the isl library installed in a standard location and you
>  want to build GCC, you can explicitly specify the directory where it is
>  installed (@samp{--with-isl=@/@var{islinstalldir}}). The
>  @option{--with-isl=@/@var{islinstalldir}} option is shorthand for
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 5256031..7ae0849 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8939,12 +8939,12 @@ Enable the identity transformation for graphite.  For 
> every SCoP we generate
>  the polyhedral representation and transform it back to gimple.  Using
>  @option{-fgraphite-identity} we can check the costs or benefits of the
>  GIMPLE -> GRAPHITE -> GIMPLE transformation.  Some minimal optimizations
> -are also performed by the code generator ISL, like index splitting and
> +are also performed by the code generator isl, like index splitting and
>  dead code elimination in loops.
>
>  @item -floop-nest-optimize
>  @opindex floop-nest-optimize
> -Enable the ISL based loop nest optimizer.  This is a generic loop nest
> +Enable the isl based loop nest optimizer.  This is a generic loop nest
>  optimizer based on the Pluto optimization algorithms.  It calculates a loop
>  structure optimized for data-locality and parallelism.  This option
>  is experimental.
> --
> 1.9.1
>


Re: Prune TYPE_FIELDS lists more in free_lang_data

2015-12-11 Thread Richard Biener
On Fri, 11 Dec 2015, Jan Hubicka wrote:

> > 
> > We explicitely do not use debug-info-level tests in free-lang-data
> > to allow mixing -g and -g0 objects.  Are you sure doing the above
> > doesn't mess up tree merging enough to effectively enlarge WPA
> > memory use and the merged decl sections?
> > 
> > [I'm quite sure firefox build system manages to mess up -g vs. -g0
> > in some places ;)]
> 
> Hmm, I will try the debug build with firefox on this.  -fdump-ipa-devirt
> now dumps all main variants that are duplicates of one ODR type.
> We definitely have types with hundreds of duplicates, so there are
> quite common cases where tree merging does not fire.
> > 
> > > +  return (!DECL_IGNORED_P (decl) && !is_redundant_typedef (decl));
> > > +}
> > > +
> > 
> > The patch would be ok if you simply export is_redundant_typedef
> > and inline the DECL_IGNORED_P check into free-lang-data.
> 
> OK, I had that originally, will return that back.
> is_redundant_typedef is declared inline.  Putting it to tree.h drags
> bit too many dwarf2out internals, but I suppose it is OK to just
> turn it non-inline.  It is a type of function where inliner should be
> able to decide.

Yeah.

Richard.

> Honza
> > 
> > Thanks,
> > Richard.


Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-11 Thread Richard Biener
On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law  wrote:
> On 12/03/2015 07:38 AM, Richard Biener wrote:
>>
>> This pass is now enabled by default with -Os but has no limits on the
>> amount of
>> stmts it copies.
>
> The more statements it copies, the more likely it is that the path spitting
> will turn out to be useful!  It's counter-intuitive.

Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer is enabled
with -fprofile-use (but it is also properly driven to only trace hot paths)
and otherwise not by default at any optimization level.

> The primary benefit AFAICT with path splitting is that it exposes additional
> CSE, DCE, etc opportunities.
>
> IIRC  Ajit posited that it could help with live/conflict analysis, I never
> saw that, and with the changes to push splitting deeper into the pipeline
> I'd further life/conflict analysis since that work also involved preserving
> the single latch property.
>
>
>
>  It also will make all loops with this shape have at least two
>>
>> exits (if the resulting loop will be disambiguated the inner loop will
>> have two exits).
>> Having more than one exit will disable almost all loop optimizations after
>> it.
>
> Hmmm, the updated code keeps the single latch property, but I'm pretty sure
> it won't keep a single exit policy.
>
> To keep a single exit policy would require keeping an additional block
> around.  Each of the split paths would unconditionally transfer to this new
> block.  The new block would then either transfer to the latch block or out
> of the loop.

Don't see how this would work for the CFG pattern it operates on unless you
duplicate the exit condition into that new block creating an even more
obfuscated
CFG.

>
>>
>> The pass itself documents the transform it does but does zero to motivate
>> it.
>>
>> What's the benefit of this pass (apart from disrupting further
>> optimizations)?
>
> It's essentially building superblocks in a special case to enable additional
> CSE, DCE and the like.
>
> Unfortunately what is is missing is heuristics and de-duplication.  The
> former to drive cases when it's not useful and the latter to reduce codesize
> for any statements that did not participate in optimizations when they were
> duplicated.
>
> The de-duplication is the "sink-statements-through-phi" problems, cross
> jumping, tail merging and the like class of problems.
>
> It was only after I approved this code after twiddling it for Ajit that I
> came across Honza's tracer implementation, which may in fact be
> retargettable to these loops and do a better job.  I haven't experimented
> with that.

Well, I originally suggested to merge this with the tracer pass...

>> I can see a _single_ case where duplicating the latch will allow threading
>> one of the paths through the loop header to eliminate the original exit.
>> Then
>> disambiguation may create a nice nested loop out of this.  Of course that
>> is only profitable again if you know the remaining single exit of the
>> inner
>> loop (exiting to the outer one) is executed infrequently (thus the inner
>> loop
>> actually loops).
>
> It wasn't ever about threading.

I see.

>>
>> But no checks other than on the CFG shape exist (oh, it checks it will
>> at _least_ copy two stmts!).
>
> Again, the more statements it copies the more likely it is to be profitable.
> Think superblocks to expose CSE, DCE and the like.

Ok, so similar to tracer (where I think the main benefit is actually increasing
scheduling opportunities for architectures where it matters).

Note that both passes are placed quite late and thus won't see much
of the GIMPLE optimizations (DOM mainly).  I wonder why they were
not placed adjacent to each other.

>>
>> Given the profitability constraints above (well, correct me if I am
>> wrong on these)
>> it looks like the whole transform should be done within the FSM threading
>> code which might be able to compute whether there will be an inner loop
>> with a single exit only.
>
> While it shares some concepts with jump threading, I don't think the
> transformation belongs in jump threading.
>
>>
>> I'm inclined to request the pass to be removed again or at least disabled
>> by
>> default.
>
> I wouldn't lose any sleep if we disabled by default or removed, particularly
> if we can repurpose Honza's code.  In fact, I might strongly support the
> former until we hear back from Ajit on performance data.

See above for what we do with -ftracer.  path-splitting should at _least_
restrict itself to operate on optimize_loop_for_speed_p () loops.

It should also (even if counter-intuitive) limit the amount of stmt copying
it does - after all there is sth like an instruction cache size which exceeeding
for loops will never be a good idea (and even smaller special loop caches on
some archs).

Note that a better heuristic than "at least more than one stmt" would be
to have at least one PHI in the merger block.  Otherwise I don't see how
CSE opportunities could exist we don't 

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-11 Thread Andreas Krebbel
On 12/04/2015 06:15 PM, Dominik Vogt wrote:
> Version 5 with the latest requested changes.  Seems to work now.
> I've dropped the extra patch and rather marked the failing tests
> as "xfail".
> 
> Ciao
> 
> Dominik ^_^  ^_^
> 

Patch applied with minor changes:

> +   ; Convert Pmode to BLKmode
> +   UNSPEC_REPLICATE_BYTE

That comment did not really fit after changing the name of the unspec.

> -(define_expand "setmem_long"
> +(define_expand "setmem_long_"
>[(parallel
>  [(clobber (match_dup 1))
>   (set (match_operand:BLK 0 "memory_operand" "")
> -  (match_operand 2 "shift_count_or_setmem_operand" ""))
> - (use (match_operand 1 "general_operand" ""))
> +   (unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")

Superfluous constraint removed.

Thanks!

-Andreas-



[PATCH] Add pass parameter to TERMINATE_PASS_LIST

2015-12-11 Thread Tom de Vries

Hi,

This patch adds a parameter to TERMINATE_PASS_LIST, that should match 
the pass list it's supposed to terminate.


The intention of the patch is that it:
- makes it easier to understand the top-level hierarchy of the pass
  list (given that the top-level list may be quite long).
- ensures that INSERT_PASSES_AFTER and TERMINATE_PASS_LIST are paired.

OK for stage3/stage1 trunk, if bootstrap and reg-test succeeds?

Thanks,
- Tom
Add pass parameter to TERMINATE_PASS_LIST

2015-12-11  Tom de Vries  

	* pass_manager.h (TERMINATE_PASS_LIST): Add pass argument.
	* passes.c (pass_manager::pass_manager): Declare and init p_start in
	INSERT_PASSES_AFTER.  Add pass parameter to TERMINATE_PASS_LIST, and
	check if it's equal to p_start.
	* passes.def: Add arguments to TERMINATE_PASS_LISTs.

---
 gcc/pass_manager.h |  2 +-
 gcc/passes.c   | 14 +-
 gcc/passes.def | 12 ++--
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index a8199e2..9340820 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -121,7 +121,7 @@ private:
 #define POP_INSERT_PASSES()
 #define NEXT_PASS(PASS, NUM) opt_pass *PASS ## _ ## NUM
 #define NEXT_PASS_WITH_ARG(PASS, NUM, ARG) NEXT_PASS (PASS, NUM)
-#define TERMINATE_PASS_LIST()
+#define TERMINATE_PASS_LIST(PASS)
 
 #include "pass-instances.def"
 
diff --git a/gcc/passes.c b/gcc/passes.c
index ba9bfc2..4266673 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1555,8 +1555,15 @@ pass_manager::pass_manager (context *ctxt)
 
   /* Build the tree of passes.  */
 
-#define INSERT_PASSES_AFTER(PASS) \
-  p = &(PASS);
+#define INSERT_PASSES_AFTER(PASS)		\
+  {		\
+opt_pass **p_start;\
+p_start = p = &(PASS);
+
+#define TERMINATE_PASS_LIST(PASS)		\
+gcc_assert (p_start == );		\
+*p = NULL;	\
+  }
 
 #define PUSH_INSERT_PASSES_WITHIN(PASS) \
   { \
@@ -1584,9 +1591,6 @@ pass_manager::pass_manager (context *ctxt)
   PASS ## _ ## NUM->set_pass_param (0, ARG);	\
 } while (0)
 
-#define TERMINATE_PASS_LIST() \
-  *p = NULL;
-
 #include "pass-instances.def"
 
 #undef INSERT_PASSES_AFTER
diff --git a/gcc/passes.def b/gcc/passes.def
index 43ce3d5..fde4690 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
PUSH_INSERT_PASSES_WITHIN (PASS)
POP_INSERT_PASSES ()
NEXT_PASS (PASS)
-   TERMINATE_PASS_LIST ()
+   TERMINATE_PASS_LIST (PASS)
  */
 
  /* All passes needed to lower the function into shape optimizers can
@@ -43,7 +43,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_expand_omp);
   NEXT_PASS (pass_build_cgraph_edges);
-  TERMINATE_PASS_LIST ()
+  TERMINATE_PASS_LIST (all_lowering_passes)
 
   /* Interprocedural optimization passes.  */
   INSERT_PASSES_AFTER (all_small_ipa_passes)
@@ -134,7 +134,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_increase_alignment);
   NEXT_PASS (pass_ipa_tm);
   NEXT_PASS (pass_ipa_lower_emutls);
-  TERMINATE_PASS_LIST ()
+  TERMINATE_PASS_LIST (all_small_ipa_passes)
 
   INSERT_PASSES_AFTER (all_regular_ipa_passes)
   NEXT_PASS (pass_ipa_whole_program_visibility);
@@ -153,7 +153,7 @@ along with GCC; see the file COPYING3.  If not see
  symbols are not allowed outside of the comdat group.  Privatizing early
  would result in missed optimizations due to this restriction.  */
   NEXT_PASS (pass_ipa_comdats);
-  TERMINATE_PASS_LIST ()
+  TERMINATE_PASS_LIST (all_regular_ipa_passes)
 
   /* Simple IPA passes executed after the regular passes.  In WHOPR mode the
  passes are executed after partitioning and thus see just parts of the
@@ -162,7 +162,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_pta);
   NEXT_PASS (pass_dispatcher_calls);
   NEXT_PASS (pass_omp_simd_clone);
-  TERMINATE_PASS_LIST ()
+  TERMINATE_PASS_LIST (all_late_ipa_passes)
 
   /* These passes are run after IPA passes on every function that is being
  output to the assembler file.  */
@@ -482,4 +482,4 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_df_finish);
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_clean_state);
-  TERMINATE_PASS_LIST ()
+  TERMINATE_PASS_LIST (all_passes)


Re: [PATCH][AArch64] Properly cost zero_extend+ashift forms of ubfi[xz]

2015-12-11 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00526.html

Thanks,
Kyrill

On 04/12/15 09:30, Kyrill Tkachov wrote:

Hi all,

We don't handle properly the patterns for the [us]bfiz and [us]bfx instructions 
when they
have an extend+ashift form. For example, the 
*_ashl pattern.
This leads to rtx costs recuring into the extend and assigning a cost to these 
patterns that is too
large.

This patch fixes that oversight.
I stumbled across this when working on a different combine patch and ended up 
matching the above
pattern, only to have it rejected for -mcpu=cortex-a53 due to the erroneous 
cost.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-12-04  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_extend_bitfield_pattern_p):
New function.
(aarch64_rtx_costs, ZERO_EXTEND, SIGN_EXTEND cases): Use the above
to handle extend+shift rtxes.




[PR 68064] Testcase and an assert for an already fixed bug

2015-12-11 Thread Martin Jambor
Hi,

PR 68064 has been fixed by Richi's revision 231246.  I would still
like to add the testcase to the testsuite and add a checking assert
so that if ever get zero alignment again, we catch it in the analysis
part of IPA-CP (which with LTO means in compilation and not linking
phase which makes a big difference for debugging).

I have tossed this into a bootstrap and test run on an x86_64-linux
and found no issues.  I believe the patch is quite obvious and so will
go ahead and commit it to trunk.

Thanks,

Martin


Add asssert and testcase for PR 68064

2015-12-09  Martin Jambor  

* ipa-prop.c (ipa_compute_jump_functions_for_edge): Add checking
assert that align is nonzero.

testsuite/
* g++.dg/torture/pr68064.C: New test.
---
 gcc/ipa-prop.c |  1 +
 gcc/testsuite/g++.dg/torture/pr68064.C | 35 ++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr68064.C

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index f379ea7..d0a3501 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1646,6 +1646,7 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
  && align % BITS_PER_UNIT == 0
  && hwi_bitpos % BITS_PER_UNIT == 0)
{
+ gcc_checking_assert (align != 0);
  jfunc->alignment.known = true;
  jfunc->alignment.align = align / BITS_PER_UNIT;
  jfunc->alignment.misalign = hwi_bitpos / BITS_PER_UNIT;
diff --git a/gcc/testsuite/g++.dg/torture/pr68064.C 
b/gcc/testsuite/g++.dg/torture/pr68064.C
new file mode 100644
index 000..59b6897
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr68064.C
@@ -0,0 +1,35 @@
+// { dg-do compile }
+
+template  class A {
+public:
+  class B;
+  typedef typename Config::template D::type TypeHandle;
+  static A *Tagged() { return B::New(B::kTagged); }
+  static TypeHandle Union(TypeHandle);
+  static TypeHandle Representation(TypeHandle, typename Config::Region *);
+  bool Is();
+};
+
+template  class A::B {
+  friend A;
+  enum { kTaggedPointer = 1 << 31, kTagged = kTaggedPointer };
+  static A *New(int p1) { return Config::from_bitset(p1); }
+};
+
+struct C {
+  typedef int Region;
+  template  struct D { typedef A *type; };
+  static A *from_bitset(unsigned);
+};
+A *C::from_bitset(unsigned p1) { return reinterpret_cast(p1); }
+
+namespace {
+int *a;
+void fn1(A *p1) { A::Union(A::Representation(p1, a)); }
+}
+
+void fn2() {
+  A b;
+  A *c = b.Is() ? 0 : A::Tagged();
+  fn1(c);
+}
-- 
2.6.3



Re: ipa-cp heuristics fixes

2015-12-11 Thread Martin Jambor
On Thu, Dec 10, 2015 at 05:56:26PM +0100, Jan Hubicka wrote:
> > Is this really necessary, is it not enough to remove the assignment to
> > ret below?  If the parameter is not used, devirtualization time bonus,
> > which you then rely on estimate_local_effects, should be zero for it.
> > 
> > It is a very minor point, I suppose, but if the function gets cloned
> > for a different reason, it might still be beneficial to have as much
> > context-independent information for it as possible too, because that
> > can then be used in a callee (see the second call of
> > gather_context_independent_values).
> > 
> > Other than that, all the changes seem like a clear improvement.
> 
> The cutoff is there mainly for the rest of the function:
>   if (known_aggs)
> {
>   vec *agg_items;
>   struct ipa_agg_jump_function *ajf;
> 
>   agg_items = context_independent_aggregate_values (plats);
>   ajf = &(*known_aggs)[i];
>   ajf->items = agg_items;
>   ajf->by_ref = plats->aggs_by_ref;
>   ret |= agg_items != NULL;
> }
> I did not want ret to become true if we manage to propagate into an unused
> aggregate parameter.

I see, it makes sense.

Thanks,

Martin



Re: [PATCH] Avoid integer vector used as a vector mask

2015-12-11 Thread Richard Biener
On Fri, Dec 11, 2015 at 10:58 AM, Ilya Enkovich  wrote:
> Hi,
>
> Currently when MASK_LOAD and MASK_STORE is vectorized we check
> scalar type of a mask but don't check its vector mask.  It means
> we may vectorize it when mask was just loaded from an array of
> booleans.  This happens e.g. for following test:
>
> SUBROUTINE TEST (x, y, z, mask, ims, ime)
>   IMPLICIT NONE
>   INTEGER ims, ime, i
>   LOGICAL mask (ims:ime)
>   REAL x(ims:ime), y(ims:ime), z(ims:ime)
>
>   DO 812 i=ims,ime
>  IF (mask(i)) x(i) = y(i) - z(i)
> 812  CONTINUE
>   END
>
> Produced GIMPLE:
>
>   vect__15.30_76 = MEM[base: _91, offset: 0B];
>   vect__17.32_80 = MASK_LOAD (vectp_y.33_78, 4B, vect__15.30_76);
>   vect__19.35_83 = MASK_LOAD (vectp_z.36_81, 4B, vect__15.30_76);
>   vect__20.38_84 = vect__17.32_80 - vect__19.35_83;
>   MASK_STORE (vectp_x.39_85, 4B, vect__15.30_76, vect__20.38_84);
>
> Loaded values don't have required all 0s and all 1s values and
> shouldn't be used as a mask.  This patch checks vector type of
> a mask used for MASK_LOAD and MASK_STORE.  Bootstrapped and
> regtested for x86_64-pc-linux-gnu.  OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-12-11  Ilya Enkovich  
>
> * tree-vect-stmts.c (vectorizable_mask_load_store): Check
> mask vectype.
>
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 5377d15..abcd9a4 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1780,7 +1780,7 @@ vectorizable_mask_load_store (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>if (!mask_vectype)
>  mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
>
> -  if (!mask_vectype)
> +  if (!mask_vectype || !VECTOR_BOOLEAN_TYPE_P (mask_vectype))
>  return false;
>
>if (is_store)


Re: [PATCH] Remove unpaired TERMINATE_PASS_LIST in passes.def

2015-12-11 Thread Richard Biener
On Fri, Dec 11, 2015 at 11:00 AM, Tom de Vries  wrote:
> Hi,
>
> this patch removes a TERMINATE_PASS_LIST from passes.def that is not paired
> with any INSERT_PASSES_AFTER.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for stage3 trunk?

Ok.

Richard.

> Thanks,
> - Tom


Re: [PATCH, PR67627][RFC] broken libatomic multilib parallel build

2015-12-11 Thread Szabolcs Nagy

On 04/12/15 12:39, Szabolcs Nagy wrote:

As described in pr other/67627, the all-multi target can be
built in parallel with the %_.lo targets which generate make
dependencies that are parsed during the build of all-multi.

gcc -MD does not generate the makefile dependencies in an
atomic way so make can fail if it concurrently parses those
half-written files.
(not observed on x86, but happens on arm native builds.)

this workaround forces all-multi to only run after the *_.lo
targets are done, but there might be a better solution using
automake properly. (automake should know about the generated
make dependency files that are included into the makefile so
no manual tinkering is needed to get the right build order,
but i don't know how to do that.)

2015-12-04  Szabolcs Nagy 

 PR other/67627
 * Makefile.am (all-multi): Add dependency.
 * Makefile.in: Regenerate.



ping
and cc rth (as his name is on this makefile).



diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index bd0ab29..38c635f 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -139,3 +139,10 @@ endif

  libatomic_convenience_la_SOURCES = $(libatomic_la_SOURCES)
  libatomic_convenience_la_LIBADD = $(libatomic_la_LIBADD)
+
+# Override the automake generated all-multi rule to guarantee that all-multi
+# is not run in parallel with the %_.lo rules which generate $(DEPDIR)/*.Ppo
+# makefile fragments to avoid broken *.Ppo getting included into the Makefile
+# when it is reloaded during the build of all-multi.
+all-multi: $(libatomic_la_LIBADD)
+   $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index b696d55..a083d87 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -496,12 +496,6 @@ clean-libtool:

  distclean-libtool:
-rm -f libtool config.lt
-
-# GNU Make needs to see an explicit $(MAKE) variable in the command it
-# runs to enable its job server during parallel builds.  Hence the
-# comments below.
-all-multi:
-   $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
  install-multi:
$(MULTIDO) $(AM_MAKEFLAGS) DO=install multi-do # $(MAKE)

@@ -800,6 +794,13 @@ vpath % $(strip $(search_path))
  %_.lo: Makefile
$(LTCOMPILE) $(M_DEPS) $(M_SIZE) $(M_IFUNC) -c -o $@ $(M_SRC)

+# Override the automake generated all-multi rule to guarantee that all-multi
+# is not run in parallel with the %_.lo rules which generate $(DEPDIR)/*.Ppo
+# makefile fragments to avoid broken *.Ppo getting included into the Makefile
+# when it is reloaded during the build of all-multi.
+all-multi: $(libatomic_la_LIBADD)
+   $(MULTIDO) $(AM_MAKEFLAGS) DO=all multi-do # $(MAKE)
+
  # Tell versions [3.59,3.63) of GNU make to not export all variables.
  # Otherwise a system limit (for SysV at least) may be exceeded.
  .NOEXPORT:





Re: [v3 PATCH] PR libstdc++/68139

2015-12-11 Thread Jonathan Wakely

On 11/12/15 09:53 +0200, Ville Voutilainen wrote:

On 11 December 2015 at 09:52, Marc Glisse  wrote:

   /libstdc++-v3
   * libsupc++/nested_exception.h (_S_rethrow): Use __std::addressof.

Typo.

I must be blind, but I don't see what you mean. :)

Shouldn't the underscores apply to addressof, not to the namespace?



Hah! Yes, thanks, I will fix that. :)


And we don't have separate changelogs for the libstdc++ testsuite, so
it should be just:

 * libsupc++/nested_exception.h: ...
 * testsuite/18_support/nested_exception/68139.cc: ...

Do we already include  in freestanding implementations?
Even if we don't, I think it's OK to do so, so the patch is OK with a
corrected changelog.



Re: [PATCH 4/4] Add -Wmisleading-indentation to -Wall

2015-12-11 Thread Iain Sandoe

> On 11 Dec 2015, at 11:25, Dominique d'Humières  wrote:
> 
> This breaks bootstrap on darwin:
> 
> ../../work/gcc/config/darwin.c: In function 'bool 
> darwin_use_anchors_for_symbol_p(const_rtx)':
> ../../work/gcc/config/darwin.c:3016:9: error: statement is indented as if it 
> were guarded by... [-Werror=misleading-indentation]
> return default_use_anchors_for_symbol_p (symbol);
> ^~
> 
> ../../work/gcc/config/darwin.c:3012:7: note: ...this 'if' clause, but it is 
> not
>   if (sect->common.flags & SECTION_NO_ANCHOR)
>   ^~
> 
> cc1plus: all warnings being treated as errors
> 
> Fixed by the following patch
> 
> --- ../_clean/gcc/config/darwin.c 2015-10-16 22:46:35.0 +0200
> +++ gcc/config/darwin.c   2015-12-11 12:17:40.0 +0100
> @@ -3012,8 +3012,8 @@ darwin_use_anchors_for_symbol_p (const_r
>   if (sect->common.flags & SECTION_NO_ANCHOR)
>   return false;
> 
> -/* Also check the normal reasons for suppressing.  */
> -return default_use_anchors_for_symbol_p (symbol);
> +  /* Also check the normal reasons for suppressing.  */
> +  return default_use_anchors_for_symbol_p (symbol);
> }
>   else
> return false;

I think you can apply this as obvious
Iain



Re: Do not decompress functions sections when copying them to ltrans

2015-12-11 Thread Jan Hubicka
> > For now I added the information if section is compressed into 
> > decl_state.  I am not thrilled by this but it is only way I found w/o 
> > wasting 4 bytes per every lto section (because the lto header is not 
> > really extensible and the stream is assumed to be aligned).
> 
> So this trick now only applies to decl sections?  I think you

Only function/variable sections are copies verbatim by WPA, so yes.
Everything else is re-streamed from scratch (and I do not se what else
can be just copied through anyway)

> could have stolen a bit from lto_simple_header::main_size
> (oddly lto_simple_header_with_strings adds its own main_size,
> hiding the simple-hearder ones - huh).
> 
> Changing lto_header itself into
> 
>   int16_t major_version
>   int8_t minor_version
>   int8_t flags
> 
> would be another possibility (and bump the major version).  I think

This seems better for me - we can steal just little from main_size, but
I think we can be quite fine with only 256 minor versions. I will update the 
patch.
> we have no sections produced with just lto_header but always
> lto_simple_header (from grepping).  Some sections have no header
> (lto.opts).

lto.opts is never compressed. Also the symbol table used by lto-plugin
goes w/o headers.
> 
> So would the patch be a lot more difficult if you go down either of
> the routes above?  (I think I prefer changing lto_header rather
> than making main_size a bitfield)

Agreed ;)

Honza
> 
> Richard.
> 
> > The whole lowlevel lto streaming code is grand mess, I hope we will clean 
> > this
> > up and get more sane headers in foreseable future. Until that time this
> > solution does not waste extra space as it is easy to pickle the flag as 
> > part of
> > reference.
> > 
> > The patch saves about 7% of WPA time for firefox:
> > 
> >  phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) 
> > wall  855644 kB (21%) ggc
> >  phase stream in :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) 
> > wall 3245604 kB (79%) ggc
> >  phase stream out:  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) 
> > wall  50 kB ( 0%) ggc
> >  ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) 
> > wall   0 kB ( 0%) ggc
> >  ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) 
> > wall   0 kB ( 0%) ggc
> >  ipa cp  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) 
> > wall  340698 kB ( 8%) ggc
> >  ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) 
> > wall  500427 kB (12%) ggc
> >  lto stream inflate  :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) 
> > wall   0 kB ( 0%) ggc
> >  ipa lto gimple in   :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) 
> > wall  324875 kB ( 8%) ggc
> >  ipa lto gimple out  :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) 
> > wall  50 kB ( 0%) ggc
> >  ipa lto decl in :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) 
> > wall 2348869 kB (57%) ggc
> >  ipa lto decl out:  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) 
> > wall   0 kB ( 0%) ggc
> >  ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) 
> > wall   0 kB ( 0%) ggc
> >  ipa lto decl merge  :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) 
> > wall   13737 kB ( 0%) ggc
> >  ipa reference   :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) 
> > wall   0 kB ( 0%) ggc
> >  ipa pure const  :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) 
> > wall   0 kB ( 0%) ggc
> >  ipa icf :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) 
> > wall   19203 kB ( 0%) ggc
> >  TOTAL : 195.2712.37   207.64   
> >  4103297 kB
> > 
> > 
> > 
> > 
> >  phase opt and generate  :  79.00 (38%) usr   1.61 (13%) sys  80.61 (36%) 
> > wall 1000597 kB (24%) ggc
> >  phase stream in :  33.93 (16%) usr   1.91 (15%) sys  35.83 (16%) 
> > wall 3242293 kB (76%) ggc
> >  phase stream out:  96.90 (46%) usr   9.19 (72%) sys 106.09 (48%) 
> > wall  52 kB ( 0%) ggc
> >  garbage collection  :   2.94 ( 1%) usr   0.00 ( 0%) sys   2.93 ( 1%) 
> > wall   0 kB ( 0%) ggc
> >  ipa dead code removal   :   4.60 ( 2%) usr   0.04 ( 0%) sys   4.53 ( 2%) 
> > wall   0 kB ( 0%) ggc
> >  ipa virtual call target :  24.48 (12%) usr   0.14 ( 1%) sys  24.76 (11%) 
> > wall   0 kB ( 0%) ggc
> >  ipa cp  :   4.92 ( 2%) usr   0.41 ( 3%) sys   5.31 ( 2%) 
> > wall  502843 kB (12%) ggc
> >  ipa inlining heuristics :  23.72 (11%) usr   0.23 ( 2%) sys  23.92 (11%) 
> > wall  490927 kB (12%) ggc
> >  lto stream inflate  :  14.35 ( 7%) usr   0.35 ( 3%) sys  15.22 ( 7%) 
> > wall   0 kB ( 0%) ggc
> >  ipa lto gimple in   :   1.79 ( 1%) usr   0.57 ( 4%) sys   2.46 ( 1%) 
> > 

Re: Request permission to delete gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c

2015-12-11 Thread Richard Biener
On Thu, Dec 10, 2015 at 8:33 PM, David Edelsohn  wrote:
> On Thu, Dec 10, 2015 at 2:23 PM, Bill Schmidt
>  wrote:
>> Hi,
>>
>> The subject test case has been failing as follows:
>>
>> FAIL: gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c 
>> scan-tree-dump-times vect "vectorization not profitable" 1
>>
>> The test has been failing since r223528, which is:
>>
>> 2015-05-22  Richard Biener  
>>
>> PR tree-optimization/65701
>> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
>> Move peeling cost models into one place.  Peel for alignment
>> for single loads only if an aligned load is cheaper than
>> an unaligned load.
>>
>> Thus with that modification, gcc now vectorizes the loop that was
>> previously deemed unprofitable to vectorize.  As a result, the test case
>> no longer has any reason to exist, and I would like to delete it.

Just curious - why was it not profitable before but is now?  The only
thing that has changed is we no longer require peeling for gaps(?)

Thus, did you check with -fno-vect-cost-model before/after the rev.?

We might also do outer loop vectorization if the inner loop is not unrolled?

Richard.

>> Ok for trunk?
>>
>> Thanks,
>> Bill
>>
>>
>> [gcc/testsuite]
>>
>> 2015-12-10  Bill Schmidt  
>>
>> * gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c:
>> Delete.
>
> Okay with me.
>
> Thanks, David


Re: Do not decompress functions sections when copying them to ltrans

2015-12-11 Thread Richard Biener
On Fri, 11 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch makes WPA to copy sections w/o decompressing them.  This leads
> to a nice /tmp usage for GCC bootstrap (about 70%) and little for Firefox.
> In GCC about 5% of the ltrans object file is the global decl section, while
> for Firefox it is 85%.  I will try to figure out if there is something
> terribly stupid pickled there.
> 
> The patch simply adds raw section i/o to lto-section-in.c and 
> lto-section-out.c
> which is used by copy_function_or_variable.  The catch is that WPA->ltrans
> stremaing is not compressed and this fact is not represented in the object 
> file
> at all.  We simply test flag_wpa and flag_ltrans.  Now function sections born
> at WPA time are uncompressed, while function sections just copied are
> compressed and we do not know how to read them.
> 
> I tried to simply turn off the non-compressed path and set compression level
> to minimal and then to none (which works despite the apparently outdated FIXME
> comments I removed).  Sadly zlib manages to burn about 16% of WPA time
> at minimal level and about 7% at none because it computes the checksum. 
> Clealry
> next stage1 it is time to switch to better compression backend.
> 
> For now I added the information if section is compressed into 
> decl_state.  I am not thrilled by this but it is only way I found w/o 
> wasting 4 bytes per every lto section (because the lto header is not 
> really extensible and the stream is assumed to be aligned).

So this trick now only applies to decl sections?  I think you
could have stolen a bit from lto_simple_header::main_size
(oddly lto_simple_header_with_strings adds its own main_size,
hiding the simple-hearder ones - huh).

Changing lto_header itself into

  int16_t major_version
  int8_t minor_version
  int8_t flags

would be another possibility (and bump the major version).  I think
we have no sections produced with just lto_header but always
lto_simple_header (from grepping).  Some sections have no header
(lto.opts).

So would the patch be a lot more difficult if you go down either of
the routes above?  (I think I prefer changing lto_header rather
than making main_size a bitfield)

Richard.

> The whole lowlevel lto streaming code is grand mess, I hope we will clean this
> up and get more sane headers in foreseable future. Until that time this
> solution does not waste extra space as it is easy to pickle the flag as part 
> of
> reference.
> 
> The patch saves about 7% of WPA time for firefox:
> 
>  phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) 
> wall  855644 kB (21%) ggc
>  phase stream in :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) 
> wall 3245604 kB (79%) ggc
>  phase stream out:  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) 
> wall  50 kB ( 0%) ggc
>  ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) 
> wall   0 kB ( 0%) ggc
>  ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) 
> wall   0 kB ( 0%) ggc
>  ipa cp  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) 
> wall  340698 kB ( 8%) ggc
>  ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) 
> wall  500427 kB (12%) ggc
>  lto stream inflate  :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) 
> wall   0 kB ( 0%) ggc
>  ipa lto gimple in   :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) 
> wall  324875 kB ( 8%) ggc
>  ipa lto gimple out  :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) 
> wall  50 kB ( 0%) ggc
>  ipa lto decl in :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) 
> wall 2348869 kB (57%) ggc
>  ipa lto decl out:  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) 
> wall   0 kB ( 0%) ggc
>  ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) 
> wall   0 kB ( 0%) ggc
>  ipa lto decl merge  :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) 
> wall   13737 kB ( 0%) ggc
>  ipa reference   :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) 
> wall   0 kB ( 0%) ggc
>  ipa pure const  :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) 
> wall   0 kB ( 0%) ggc
>  ipa icf :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) 
> wall   19203 kB ( 0%) ggc
>  TOTAL : 195.2712.37   207.64
> 4103297 kB
>   
>   
>   
>   
>  phase opt and generate  :  79.00 (38%) usr   1.61 (13%) sys  80.61 (36%) 
> wall 1000597 kB (24%) ggc
>  phase stream in :  33.93 (16%) usr   1.91 (15%) sys  35.83 (16%) 
> wall 3242293 kB (76%) ggc
>  phase stream out:  96.90 (46%) usr   9.19 (72%) sys 106.09 (48%) 
> wall  52 kB ( 0%) ggc
>  garbage collection  :   2.94 ( 1%) usr   0.00 ( 0%) sys   2.93 ( 1%) 
> wall   0 kB ( 0%) ggc
>  ipa dead code 

Re: [PATCH 2/2] [graphite] update required isl versions

2015-12-11 Thread Richard Biener
On Thu, Dec 10, 2015 at 6:05 PM, Sebastian Pop  wrote:
> we now check the isl version, as there are no real differences in existing 
> files
> in between isl 0.14 and isl 0.15.

I thought ISL 0.15 has some new features you could check?  Also using a run test
is bad for cross compiling.

I'd simply change the "compatible" test to check for isl_ctx_get_max_operations
or whatever is needed to implement the compute bound and not worry about
warning about using a deprecated ISL during configure.

Maybe add the ISL version to the list of dependency versions we print in
toplev.c:print_version.

Richard.

> ---
>  config/isl.m4 |  41 +++--
>  configure | 112 
> --
>  2 files changed, 123 insertions(+), 30 deletions(-)
>
> diff --git a/config/isl.m4 b/config/isl.m4
> index 459fac1..886b0e4 100644
> --- a/config/isl.m4
> +++ b/config/isl.m4
> @@ -19,23 +19,23 @@
>
>  # ISL_INIT_FLAGS ()
>  # -
> -# Provide configure switches for ISL support.
> +# Provide configure switches for isl support.
>  # Initialize isllibs/islinc according to the user input.
>  AC_DEFUN([ISL_INIT_FLAGS],
>  [
>AC_ARG_WITH([isl-include],
>  [AS_HELP_STRING(
>[--with-isl-include=PATH],
> -  [Specify directory for installed ISL include files])])
> +  [Specify directory for installed isl include files])])
>AC_ARG_WITH([isl-lib],
>  [AS_HELP_STRING(
>[--with-isl-lib=PATH],
> -  [Specify the directory for the installed ISL library])])
> +  [Specify the directory for the installed isl library])])
>
>AC_ARG_ENABLE(isl-version-check,
>  [AS_HELP_STRING(
>[--disable-isl-version-check],
> -  [disable check for ISL version])],
> +  [disable check for isl version])],
>  ENABLE_ISL_CHECK=$enableval,
>  ENABLE_ISL_CHECK=yes)
>
> @@ -58,15 +58,15 @@ AC_DEFUN([ISL_INIT_FLAGS],
>if test "x${with_isl_lib}" != x; then
>  isllibs="-L$with_isl_lib"
>fi
> -  dnl If no --with-isl flag was specified and there is in-tree ISL
> +  dnl If no --with-isl flag was specified and there is in-tree isl
>dnl source, set up flags to use that and skip any version tests
> -  dnl as we cannot run them before building ISL.
> +  dnl as we cannot run them before building isl.
>if test "x${islinc}" = x && test "x${isllibs}" = x \
>   && test -d ${srcdir}/isl; then
>  isllibs='-L$$r/$(HOST_SUBDIR)/isl/'"$lt_cv_objdir"' '
>  islinc='-I$$r/$(HOST_SUBDIR)/isl/include -I$$s/isl/include'
>  ENABLE_ISL_CHECK=no
> -AC_MSG_WARN([using in-tree ISL, disabling version check])
> +AC_MSG_WARN([using in-tree isl, disabling version check])
>fi
>
>isllibs="${isllibs} -lisl"
> @@ -75,7 +75,7 @@ AC_DEFUN([ISL_INIT_FLAGS],
>
>  # ISL_REQUESTED (ACTION-IF-REQUESTED, ACTION-IF-NOT)
>  # 
> -# Provide actions for failed ISL detection.
> +# Provide actions for failed isl detection.
>  AC_DEFUN([ISL_REQUESTED],
>  [
>AC_REQUIRE([ISL_INIT_FLAGS])
> @@ -106,12 +106,31 @@ AC_DEFUN([ISL_CHECK_VERSION],
>  LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs}"
>  LIBS="${_isl_saved_LIBS} -lisl"
>
> -AC_MSG_CHECKING([for compatible ISL])
> -AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include ]], [[;]])],
> +AC_MSG_CHECKING([for recommended isl 0.15])
> +AC_TRY_RUN([#include 
> +#include 
> +int main() {
> +  if (strncmp (_GENERATED_STDINT_H, "isl 0.15", 8))
> +return 1;
> +  return 0;
> +}],
> [gcc_cv_isl=yes],
> -   [gcc_cv_isl=no])
> +   [gcc_cv_isl=no], [gcc_cv_isl=no])
>  AC_MSG_RESULT([$gcc_cv_isl])
>
> +if test "${gcc_cv_isl}" = no ; then
> +   AC_MSG_CHECKING([for deprecated isl 0.14])
> +   AC_TRY_RUN([#include 
> +   #include 
> +   int main() {
> + if (strncmp (_GENERATED_STDINT_H, "isl 0.14", 8))
> +   return 1;
> + return 0;
> +   }],
> +   [gcc_cv_isl=yes],
> +   [gcc_cv_isl=no], [gcc_cv_isl=no])
> +AC_MSG_RESULT([$gcc_cv_isl, recommended isl version is 0.15, minimum 
> required isl version 0.14 is deprecated])
> +fi
>  CFLAGS=$_isl_saved_CFLAGS
>  LDFLAGS=$_isl_saved_LDFLAGS
>  LIBS=$_isl_saved_LIBS
> diff --git a/configure b/configure
> index 090615f..4284ba7 100755
> --- a/configure
> +++ b/configure
> @@ -1492,7 +1492,7 @@ Optional Features:
>build static libjava [default=no]
>--enable-bootstrap  enable bootstrapping [yes if native build]
>--disable-isl-version-check
> -  disable check for ISL version
> +  disable check for isl version
>--enable-ltoenable link time optimization support
>

[PATCH] Avoid integer vector used as a vector mask

2015-12-11 Thread Ilya Enkovich
Hi,

Currently when MASK_LOAD and MASK_STORE is vectorized we check
scalar type of a mask but don't check its vector mask.  It means
we may vectorize it when mask was just loaded from an array of
booleans.  This happens e.g. for following test:

SUBROUTINE TEST (x, y, z, mask, ims, ime)
  IMPLICIT NONE
  INTEGER ims, ime, i
  LOGICAL mask (ims:ime)
  REAL x(ims:ime), y(ims:ime), z(ims:ime)

  DO 812 i=ims,ime
 IF (mask(i)) x(i) = y(i) - z(i)
812  CONTINUE
  END

Produced GIMPLE:

  vect__15.30_76 = MEM[base: _91, offset: 0B];
  vect__17.32_80 = MASK_LOAD (vectp_y.33_78, 4B, vect__15.30_76);
  vect__19.35_83 = MASK_LOAD (vectp_z.36_81, 4B, vect__15.30_76);
  vect__20.38_84 = vect__17.32_80 - vect__19.35_83;
  MASK_STORE (vectp_x.39_85, 4B, vect__15.30_76, vect__20.38_84);

Loaded values don't have required all 0s and all 1s values and
shouldn't be used as a mask.  This patch checks vector type of
a mask used for MASK_LOAD and MASK_STORE.  Bootstrapped and
regtested for x86_64-pc-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-12-11  Ilya Enkovich  

* tree-vect-stmts.c (vectorizable_mask_load_store): Check
mask vectype.


diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 5377d15..abcd9a4 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1780,7 +1780,7 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   if (!mask_vectype)
 mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
 
-  if (!mask_vectype)
+  if (!mask_vectype || !VECTOR_BOOLEAN_TYPE_P (mask_vectype))
 return false;
 
   if (is_store)


[PATCH] Remove unpaired TERMINATE_PASS_LIST in passes.def

2015-12-11 Thread Tom de Vries

Hi,

this patch removes a TERMINATE_PASS_LIST from passes.def that is not 
paired with any INSERT_PASSES_AFTER.


Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?

Thanks,
- Tom
Remove unpaired TERMINATE_PASS_LIST in passes.def

2015-12-11  Tom de Vries  

	* passes.def: Remove unpaired TERMINATE_PASS_LIST.

---
 gcc/passes.def | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 624d121..43ce3d5 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -410,7 +410,6 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_rtl_unroll_loops);
 	  NEXT_PASS (pass_rtl_doloop);
 	  NEXT_PASS (pass_rtl_loop_done);
-	  TERMINATE_PASS_LIST ()
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_web);
   NEXT_PASS (pass_rtl_cprop);


Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-12-11 Thread Andreas Krebbel
On 12/04/2015 06:12 PM, Dominik Vogt wrote:
> Version 6 with another fix.  This should work now.

Applied. Thanks!

-Andreas-




Re: [Patch, Fortran] PR68815 - replace '%s' quotes by %< ... %>

2015-12-11 Thread Tobias Burnus
On Fri, Dec 11, 2015 at 12:03:26AM +, Joseph Myers wrote:
> On Thu, 10 Dec 2015, Manuel López-Ibáñez wrote:
> > On 12/09/2015 03:53 PM, Tobias Burnus wrote:
> > > In principle, %<%c%> and %<%d%> should be convertable to %qc and
> > > %qd (as the code is more readable), but the current function
> > > annotation prevent this, telling that the q flag is not valid for
> > > %c and %d. As %< is fine, I didn't dig into it.
> > 
> > You need to edit the gcc_gfc_* variables in c-family/c-format.c. [...]
> 
> Put "q" in the first flags string for those formats in gcc_gfc_char_table.


Thanks! The attached patch works :-)

Build and regtested on x86-64-gnu-linux.
I will commit it tomorrow, unless there are comments or objections.

Tobias



gcc/c-family/
PR fortran/68815
* c-format.c (gcc_gfc_char_table): Add 'q' flag to remaining
specifiers (%d, %i,%u and %c).

gcc/fortran/
PR fortran/68815
* check.c (gfc_check_reshape): Replace %<%d%> by %qd.
* matchexp.c (gfc_match_defined_op_name): Use %qc.
* symbol.c (gfc_add_new_implicit_range,
gfc_merge_new_implicit): Ditto.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 6e37265..de07b6c 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -809,9 +809,9 @@ static const format_char_info gcc_cxxdiag_char_table[] =
 static const format_char_info gcc_gfc_char_table[] =
 {
   /* C89 conversion specifiers.  */
-  { "di",  0, STD_C89, { T89_I,   BADLEN,  BADLEN,  T89_L,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN  }, "", "", NULL },
-  { "u",   0, STD_C89, { T89_UI,  BADLEN,  BADLEN,  T89_UL,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN  }, "", "", NULL },
-  { "c",   0, STD_C89, { T89_I,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN  }, "", "", NULL },
+  { "di",  0, STD_C89, { T89_I,   BADLEN,  BADLEN,  T89_L,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN  }, "q", "", NULL },
+  { "u",   0, STD_C89, { T89_UI,  BADLEN,  BADLEN,  T89_UL,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN  }, "q", "", NULL },
+  { "c",   0, STD_C89, { T89_I,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN  }, "q", "", NULL },
   { "s",   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN  }, "q", "cR", NULL },
 
   /* gfc conversion specifiers.  */
diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 6dc7f3e..3f1bdd3 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -3863,7 +3863,7 @@ gfc_check_reshape (gfc_expr *source, gfc_expr *shape,
{
  gfc_error ("%qs argument of %qs intrinsic at %L has "
 "invalid permutation of dimensions (dimension "
-"%<%d%> duplicated)",
+"%qd duplicated)",
 gfc_current_intrinsic_arg[3]->name,
 gfc_current_intrinsic, >where, dim);
  return false;
diff --git a/gcc/fortran/matchexp.c b/gcc/fortran/matchexp.c
index 02f43a0..c14ef59 100644
--- a/gcc/fortran/matchexp.c
+++ b/gcc/fortran/matchexp.c
@@ -69,7 +69,7 @@ gfc_match_defined_op_name (char *result, int error_flag)
   for (i = 0; name[i]; i++)
 if (!ISALPHA (name[i]))
   {
-   gfc_error ("Bad character %<%c%> in OPERATOR name at %C", name[i]);
+   gfc_error ("Bad character %qc in OPERATOR name at %C", name[i]);
return MATCH_ERROR;
   }
 
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 221fef3..d241bc0 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -166,7 +166,7 @@ gfc_add_new_implicit_range (int c1, int c2)
 {
   if (new_flag[i])
{
- gfc_error ("Letter %<%c%> already set in IMPLICIT statement at %C",
+ gfc_error ("Letter %qc already set in IMPLICIT statement at %C",
 i + 'A');
  return false;
}
@@ -198,7 +198,7 @@ gfc_merge_new_implicit (gfc_typespec *ts)
{
  if (gfc_current_ns->set_flag[i])
{
- gfc_error ("Letter %c already has an IMPLICIT type at %C",
+ gfc_error ("Letter %qc already has an IMPLICIT type at %C",
 i + 'A');
  return false;
}


Re: [PATCH 4/4] Add -Wmisleading-indentation to -Wall

2015-12-11 Thread Dominique d'Humières
This breaks bootstrap on darwin:

../../work/gcc/config/darwin.c: In function 'bool 
darwin_use_anchors_for_symbol_p(const_rtx)':
../../work/gcc/config/darwin.c:3016:9: error: statement is indented as if it 
were guarded by... [-Werror=misleading-indentation]
 return default_use_anchors_for_symbol_p (symbol);
 ^~

../../work/gcc/config/darwin.c:3012:7: note: ...this 'if' clause, but it is not
   if (sect->common.flags & SECTION_NO_ANCHOR)
   ^~

cc1plus: all warnings being treated as errors

Fixed by the following patch

--- ../_clean/gcc/config/darwin.c   2015-10-16 22:46:35.0 +0200
+++ gcc/config/darwin.c 2015-12-11 12:17:40.0 +0100
@@ -3012,8 +3012,8 @@ darwin_use_anchors_for_symbol_p (const_r
   if (sect->common.flags & SECTION_NO_ANCHOR)
return false;
 
-/* Also check the normal reasons for suppressing.  */
-return default_use_anchors_for_symbol_p (symbol);
+  /* Also check the normal reasons for suppressing.  */
+  return default_use_anchors_for_symbol_p (symbol);
 }
   else
 return false;

TIA

Dominique



[PATCH] Handle sizes and kinds params of GOACC_paralllel in find_func_clobbers

2015-12-11 Thread Tom de Vries

Hi,

while testing the oacc kernels patch series on top of trunk, using the 
optimal handling of BUILTIN_IN_GOACC_PARALLEL in fipa-pta  I ran into a 
failure where the stores to the omp_data_sizes array were removed by dse.


The call bb in the failing testcase normally looks like this:
...
  :
  .omp_data_arr.10.D.2550 = c.2_18;
  .omp_data_arr.10.c = 
  .omp_data_arr.10.D.2553 = b.1_15;
  .omp_data_arr.10.b = 
  .omp_data_arr.10.D.2556 = a.0_11;
  .omp_data_arr.10.a = 
   D.2572 = n_6(D);
  .omp_data_arr.10.n = 
  .omp_data_sizes.11[0] = _8;
  .omp_data_sizes.11[1] = 0;
  .omp_data_sizes.11[2] = _8;
  .omp_data_sizes.11[3] = 0;
  .omp_data_sizes.11[4] = _8;
  .omp_data_sizes.11[5] = 0;
  .omp_data_sizes.11[6] = 4;
  __builtin_GOACC_parallel_keyed (-1, foo._omp_fn.0, 7,
  &.omp_data_arr.10,
  &.omp_data_sizes.11,
  &.omp_data_kinds.12, 0);
...

Dse removed the stores, because omp_data_sizes was not marked as a used 
by __builtin_GOACC_parallel_keyed.


We pretend in fipa-pta that __builtin_GOACC_parallel_keyed is never 
called, and instead handle the call foo._omp_fn.0 (&.omp_data_arr.10). 
That means the use of omp_data_sizes by __builtin_GOACC_parallel_keyed 
is ignored.


This patch fixes that (for both sizes and kinds arrays), as confirmed 
with a test run of target-libgomp c.exp on the accelerator.


OK for stage3 if bootstrap and reg-test succeeds?

Thanks,
- Tom
Handle sizes and kinds params of GOACC_paralllel in find_func_clobbers

2015-12-11  Tom de Vries  

	* tree-ssa-structalias.c (find_func_clobbers): Handle sizes and kinds
	parameters of GOACC_paralllel.

---
 gcc/tree-ssa-structalias.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index dfc0422..98d7d7b 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5089,6 +5089,8 @@ find_func_clobbers (struct function *fn, gimple *origt)
 	  case BUILT_IN_GOACC_PARALLEL:
 	{
 	  unsigned int fnpos, argpos;
+	  unsigned int implicit_use_args[2];
+	  unsigned int num_implicit_use_args = 0;
 	  switch (DECL_FUNCTION_CODE (decl))
 		{
 		case BUILT_IN_GOMP_PARALLEL:
@@ -5101,6 +5103,8 @@ find_func_clobbers (struct function *fn, gimple *origt)
 	   sizes, kinds, ...).  */
 		  fnpos = 1;
 		  argpos = 3;
+		  implicit_use_args[num_implicit_use_args++] = 4;
+		  implicit_use_args[num_implicit_use_args++] = 5;
 		  break;
 		default:
 		  gcc_unreachable ();
@@ -5121,6 +5125,18 @@ find_func_clobbers (struct function *fn, gimple *origt)
 		process_constraint (new_constraint (lhs, *rhsp));
 	  rhsc.truncate (0);
 
+	  /* Handle parameters used by the call, but not used in cfi, as
+		 implicitly used by cfi.  */
+	  lhs = get_function_part_constraint (cfi, fi_uses);
+	  for (unsigned i = 0; i < num_implicit_use_args; ++i)
+		{
+		  tree arg = gimple_call_arg (t, implicit_use_args[i]);
+		  get_constraint_for (arg, );
+		  FOR_EACH_VEC_ELT (rhsc, j, rhsp)
+		process_constraint (new_constraint (lhs, *rhsp));
+		  rhsc.truncate (0);
+		}
+
 	  /* The caller clobbers what the callee does.  */
 	  lhs = get_function_part_constraint (fi, fi_clobbers);
 	  rhs = get_function_part_constraint (cfi, fi_clobbers);


[PATCH, CHKP, PR middle-end/68697] Add bounds support for VA_ARG calls

2015-12-11 Thread Ilya Enkovich
Hi,

This patch adds Pointer Bounds Checker support for VA_ARG calls.  I added 
bndret call for VA_ARG and corresponding bndret replacement when VA_ARG is 
expanded.  This fixes all vararg tests from MPX testsuite.  Bootstrapped and 
tested on x86_64-pc-linux-gnu.  Will commit after additional testing on 
benchmarks.

Thanks,
Ilya
--
gcc/

2015-12-10  Ilya Enkovich  

* tree-chkp.c (chkp_call_returns_bounds_p): Return true
for VA_ARG call.
(chkp_fixup_inlined_call): New.
* tree-chkp.h (chkp_fixup_inlined_call): New.
* tree-stdarg.c: Include tree-chkp.h.
(expand_ifn_va_arg_1): Fixup bndret calls for removed
VA_ARG calls.


diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index 8b6381f..b666e97 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -2157,7 +2157,11 @@ static bool
 chkp_call_returns_bounds_p (gcall *call)
 {
   if (gimple_call_internal_p (call))
-return false;
+{
+  if (gimple_call_internal_fn (call) == IFN_VA_ARG)
+   return true;
+  return false;
+}
 
   if (gimple_call_builtin_p (call, BUILT_IN_CHKP_NARROW_PTR_BOUNDS)
   || chkp_gimple_call_builtin_p (call, BUILT_IN_CHKP_NARROW))
@@ -2490,6 +2494,69 @@ chkp_build_bndstx (tree addr, tree ptr, tree bounds,
 }
 }
 
+/* This function is called when call statement
+   is inlined and therefore we can't use bndret
+   for its LHS anymore.  Function fixes bndret
+   call using new RHS value if possible.  */
+void
+chkp_fixup_inlined_call (tree lhs, tree rhs)
+{
+  tree addr, bounds;
+  gcall *retbnd, *bndldx;
+
+  if (!BOUNDED_P (lhs))
+return;
+
+  /* Search for retbnd call.  */
+  retbnd = chkp_retbnd_call_by_val (lhs);
+  if (!retbnd)
+return;
+
+  /* Currently only handle cases when call is replaced
+ with a memory access.  In this case bndret call
+ may be replaced with bndldx call.  Otherwise we
+ have to search for bounds which may cause wrong
+ result due to various optimizations applied.  */
+  switch (TREE_CODE (rhs))
+{
+case VAR_DECL:
+  if (DECL_REGISTER (rhs))
+   return;
+  break;
+
+case MEM_REF:
+  break;
+
+case ARRAY_REF:
+case COMPONENT_REF:
+  addr = get_base_address (rhs);
+  if (!DECL_P (addr)
+ && TREE_CODE (addr) != MEM_REF)
+   return;
+  if (DECL_P (addr) && DECL_REGISTER (addr))
+   return;
+  break;
+
+default:
+  return;
+}
+
+  /* Create a new statements sequence with bndldx call.  */
+  gimple_stmt_iterator gsi = gsi_for_stmt (retbnd);
+  addr = build_fold_addr_expr (rhs);
+  chkp_build_bndldx (addr, lhs, );
+  bndldx = as_a  (gsi_stmt (gsi));
+
+  /* Remove bndret call.  */
+  bounds = gimple_call_lhs (retbnd);
+  gsi = gsi_for_stmt (retbnd);
+  gsi_remove (, true);
+
+  /* Link new bndldx call.  */
+  gimple_call_set_lhs (bndldx, bounds);
+  update_stmt (bndldx);
+}
+
 /* Compute bounds for pointer NODE which was assigned in
assignment statement ASSIGN.  Return computed bounds.  */
 static tree
diff --git a/gcc/tree-chkp.h b/gcc/tree-chkp.h
index cc24858..9337eb7 100644
--- a/gcc/tree-chkp.h
+++ b/gcc/tree-chkp.h
@@ -59,5 +59,6 @@ extern tree chkp_insert_retbnd_call (tree bndval, tree retval,
 gimple_stmt_iterator *gsi);
 extern gcall *chkp_copy_call_skip_bounds (gcall *call);
 extern bool chkp_redirect_edge (cgraph_edge *e);
+extern void chkp_fixup_inlined_call (tree lhs, tree rhs);
 
 #endif /* GCC_TREE_CHKP_H */
diff --git a/gcc/tree-stdarg.c b/gcc/tree-stdarg.c
index f205ccb..ea2ef1c 100644
--- a/gcc/tree-stdarg.c
+++ b/gcc/tree-stdarg.c
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-into-ssa.h"
 #include "tree-cfg.h"
 #include "tree-stdarg.h"
+#include "tree-chkp.h"
 
 /* A simple pass that attempts to optimize stdarg functions on architectures
that need to save register arguments to stack on entry to stdarg functions.
@@ -1047,6 +1048,11 @@ expand_ifn_va_arg_1 (function *fun)
unsigned int nargs = gimple_call_num_args (stmt);
gcc_assert (useless_type_conversion_p (TREE_TYPE (lhs), type));
 
+   /* We replace call with a new expr.  This may require
+  corresponding bndret call fixup.  */
+   if (chkp_function_instrumented_p (fun->decl))
+ chkp_fixup_inlined_call (lhs, expr);
+
if (nargs == 3)
  {
/* We've transported the size of with WITH_SIZE_EXPR here as


Re: RFA (hash-*): PATCH for c++/68309

2015-12-11 Thread Richard Biener
On Thu, Dec 10, 2015 at 11:03 PM, Jason Merrill  wrote:
> The C++ front end uses a temporary hash table to remember specializations of
> local variables during template instantiations.  In a nested function such
> as a lambda or local class member function, we need to retain the elements
> from the enclosing function's local_specializations table; otherwise the
> testcase crashes because we don't find a local specialization for the
> non-captured use of 'args' in the decltype.
>
> This patch addresses that by making a copy of the enclosing
> local_specializations table if it exists; to enable that I've added copy
> constructors to hash_table and hash_map.
>
> Tested x86_64-pc-linux-gnu.  OK for trunk?

I don't think  you can copy the elements with memcpy they may be C++ classes
that are not copyable.  Also watch out for the bool gather_mem_stats = true
to bool gather_mem_stats = GATHER_STATISTICS change if that crosses your
change.

I also think copying hash tables should be discouraged ;)  I wonder if you
can get around the copying by adding a generation count (to easily "backtrack")
instead.

Richard.


RE: [PATCH][ARC] Refurbish emitting DWARF2 for epilogue.

2015-12-11 Thread Claudiu Zissulescu
H Joern,

> Or have some target hook to make it not even bother filling delay slots
> speculatively; for targets that can fully unexpose the delay slot, like SH and
> ARC >= ARC700, this aspect of fill_eager_delay_slots only mucks up
> schedules and increases code size.

I propose to solve the dwarf2 issues during epilogue by using the 
TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P hook for ARC processors. Hence, we do 
not need to emit the blockage instruction during epilogue. So, I have refactor 
the patch in two patches as follows:
- The 0001-Refurbish-emitting-DWARF2-related-information-when-e.patch 
is the initial patch without emitting the blockage instruction during epilogue.
- The 0002-ARC-Use-TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P-hook.patch 
adds TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P hook for ARC.

Both patches are attached.

> 
> More relevant ways to get data would be comparing the object files (from a
> whole toolchain library set and/or one or more big application(s)) built
> with/without the blockage insn emitted, or to benchmark it.

I did some testing here. For size, I used CSiBE testbench, and for speed, I 
used coremark and dhrystone. Using a blockage or not, doesn't affect the size 
or speed figures. However, using TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P hook 
betters the size figures (not much, just .1%), and improves the coremark by 2% 
and Dhrystone by 1%.

Hence, in the light of the new figures, I favor the above two patch solution. 
Both patches are checked using dg.exp and compile.exp. Ok to submit?

Thanks,
Claudiu


0001-Refurbish-emitting-DWARF2-related-information-when-e.patch
Description: 0001-Refurbish-emitting-DWARF2-related-information-when-e.patch


0002-ARC-Use-TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P-hook.patch
Description: 0002-ARC-Use-TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P-hook.patch


Re: Register pressure aware loop unrolling

2015-12-11 Thread Richard Biener
On Fri, 11 Dec 2015, Shiva Chen wrote:

> Hi all,
> 
> Loop unrolling would decrease performance in some case due to unrolling high 
> register pressure
> loops and produce lots of spill code.
> 
> I try to implement register pressure aware loop 
> unrolling(-funroll-loops-pressure-aware)
> to avoid unrolling high register pressure loops.
> 
> The idea is to calculate live register number for each basic block to 
> estimate register pressure.
> If the loop contain the basic block with high register pressure, don't 
> unrolling it.
> 
> Register pressure estimate to high if (live_reg_num > available_hard_reg_num)
> like ira and gcse did.
> 
> However, loop unrolling may increase the register pressure.
> Therefore, when live register number near to available hard register number,
> it's not profitable to unroll it.
> E.g. live_reg_num = 12, available_hard_reg_num = 14
> 
> To estimate the register pressure increment after unrolling, adding 
> parameter PARAM_LOOP_UNROLL_PRESSURE_INCREMENT.
> 
> The equation become 
> 
> high_reg_pressure_p = true if (live_reg_num + 
> PARAM_LOOP_UNROLL_PRESSURE_INCREMENT > available_hard_reg_num)
> 
> Bootstrapped and tested on x86-64.
> 
> Any suggestion ?

I think the general RTL loop unrolling pass is just "bad".  You
unroll to expose scheduling freedom (which in turn may increase
register pressure).  Unrolling itself doesn't increase register
pressure.

So to me the correct thing to do is to use one of the
loop-aware scheduling algorithms (that end up performing
unrolling) like -fmodulo-sched or to make scheduling
register pressure aware which is already supported with -fsched-pressure
(only relevant for pre-RA scheduling of course).

So to me the patch attacks the wrong pass and instead modulo
scheduling should be improved (like no longer depend on
do-loop patterns) and eventually move the thing to GIMPLE
to improve its dependence analysis, making it a scheduling-driven
GIMPLE unrolling pass.

disclaimer: I didn't look at your patch.

Richard.

> Thanks,
> Shiva
> 
> 
> 2015-12-11  Shiva Chen  
> 
>   * cfgloop.h (struct loop): Add high_reg_pressure_p field.
>   (UAP_UNROLL_PRESSURE_AWARE): New enums.
>   * reg-pressure.h (struct bb_data): Data stored for each basic block.
>   * common.opt: Add new flag -funroll-loops-pressure-aware.
>   * params.def: Add parameter PARAM_LOOP_UNROLL_PRESSURE_INCREMENT.
>   * loop-init.c(pass_rtl_unroll_loops:gate):
>   Enable unrolling while -funroll-loops-pressure-aware enable.
>   (pass_rtl_unroll_loops:execute): Likewise.
>   * toplev.c (process_options): Likewise.
>   * loop-unroll.c (decide_unroll_constant_iterations):
>   Not unroll the loop with high register pressure 
>   if -funroll-loops-pressure-aware enable.
>   (decide_unroll_runtime_iterations): Likewise.
>   (decide_unroll_stupid): Likewise.
>   * reg-pressure.c (get_regno_pressure_class): Get pressure class.
>   (change_pressure): Change register pressure while scan basic blocks.
>   (calculate_bb_reg_pressure): Calculate register pressure
>   for each basic block.
>   (check_register_pressure): Determine high_reg_pressure_p for each loop.
>   (calculate_loop_pressure): Calculate register pressure for each loop.
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index d2d09f6..6c62b63 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1384,6 +1384,7 @@ OBJS = \
> reginfo.o \
> regrename.o \
> regstat.o \
> +   reg-pressure.o \
> reload.o \
> reload1.o \
> reorg.o \
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index ee73bf9..c4a06e7 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -191,6 +191,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
>/* True if we should try harder to vectorize this loop.  */
>bool force_vectorize;
> 
> +  /* True if the loop have high register pressure.  */
> +  bool high_reg_pressure_p;
> +
>/* True if the loop is part of an oacc kernels region.  */
>bool in_oacc_kernels_region;
> 
> @@ -742,7 +745,8 @@ loop_optimizer_finalize ()
>  enum
>  {
>UAP_UNROLL = 1,  /* Enables unrolling of loops if it seems profitable. 
>  */
> -  UAP_UNROLL_ALL = 2   /* Enables unrolling of all loops.  */
> +  UAP_UNROLL_ALL = 2,  /* Enables unrolling of all loops.  */
> +  UAP_UNROLL_PRESSURE_AWARE = 4/* Enables unrolling of loops with 
> pressure aware.  */
>  };
> 
>  extern void doloop_optimize_loops (void);
> diff --git a/gcc/common.opt b/gcc/common.opt
> index e593631..a44c7dc 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2425,6 +2425,10 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
> 
> +funroll-loops-pressure-aware
> +Common Report Var(flag_unroll_loops_pressure_aware) Optimization
> +Perform loop unrolling for low register pressure loops
> +
>  ; Nonzero means that loop 

Re: [PATCH 1/6] [DJGPP] libstdc++-v3/config/os/djgpp/error_constants.h: Update according to errno codes available for DJGPP

2015-12-11 Thread Jonathan Wakely

On 10/12/15 17:58 -0500, DJ Delorie wrote:



I can't really judge this one.  Either DJ or Jon would need to some in
on this.


Looks OK to me, although in the default configuration (plain DJGPP)
the #ifdefs will always be false (omitted), which is harmless.


Is there a non-default configuration where they are true?

I've been wavering on approving this patch, not because there's
anything wrong with it, but because I'm still ruminating on
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01458.html

I think I'm going to make that change for mingw-w64, but either way
the DJGPP patch is OK for trunk (assuming there is some non-default
configuration where it actually does something useful).


[v4] avoid alignment of static variables affecting stack's

2015-12-11 Thread Jan Beulich
Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v4: Bail early, using is_global_var(), as requested by Bernd.]
[v3: Re-base to current trunk.]
[v2: Drop inclusion of hard register variables, as requested by
 Jakub and Richard.]

gcc/
2015-12-11  Jan Beulich  

* cfgexpand.c (expand_one_var): Exit early for static and
external variables when adjusting stack alignment related.

gcc/testsuite/
2015-12-11  Jan Beulich  

* gcc.c-torture/execute/stkalign.c: New.

--- 2015-12-09/gcc/cfgexpand.c
+++ 2015-12-09/gcc/cfgexpand.c
@@ -1550,6 +1550,9 @@ expand_one_var (tree var, bool toplevel,
 
   if (TREE_TYPE (var) != error_mark_node && TREE_CODE (var) == VAR_DECL)
 {
+  if (is_global_var (var))
+   return 0;
+
   /* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For non-automatic
--- 2015-12-09/gcc/testsuite/gcc.c-torture/execute/stkalign.c
+++ 2015-12-09/gcc/testsuite/gcc.c-torture/execute/stkalign.c
@@ -0,0 +1,26 @@
+/* { dg-options "-fno-inline" } */
+
+#include 
+
+#define ALIGNMENT 64
+
+unsigned test(unsigned n, unsigned p)
+{
+  static struct { char __attribute__((__aligned__(ALIGNMENT))) c; } s;
+  unsigned x;
+
+  assert(__alignof__(s) == ALIGNMENT);
+  asm ("" : "=g" (x), "+m" (s) : "0" ());
+
+  return n ? test(n - 1, x) : (x ^ p);
+}
+
+int main (int argc, char *argv[] __attribute__((unused)))
+{
+  unsigned int x = test(argc, 0);
+
+  x |= test(argc + 1, 0);
+  x |= test(argc + 2, 0);
+
+  return !(x & (ALIGNMENT - 1));
+}



avoid alignment of static variables affecting stack's

Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v4: Bail early, using is_global_var(), as requested by Bernd.]
[v3: Re-base to current trunk.]
[v2: Drop inclusion of hard register variables, as requested by
 Jakub and Richard.]

gcc/
2015-12-11  Jan Beulich  

* cfgexpand.c (expand_one_var): Exit early for static and
external variables when adjusting stack alignment related.

gcc/testsuite/
2015-12-11  Jan Beulich  

* gcc.c-torture/execute/stkalign.c: New.

--- 2015-12-09/gcc/cfgexpand.c
+++ 2015-12-09/gcc/cfgexpand.c
@@ -1550,6 +1550,9 @@ expand_one_var (tree var, bool toplevel,
 
   if (TREE_TYPE (var) != error_mark_node && TREE_CODE (var) == VAR_DECL)
 {
+  if (is_global_var (var))
+   return 0;
+
   /* Because we don't know if VAR will be in register or on stack,
 we conservatively assume it will be on stack even if VAR is
 eventually put into register after RA pass.  For non-automatic
--- 2015-12-09/gcc/testsuite/gcc.c-torture/execute/stkalign.c
+++ 2015-12-09/gcc/testsuite/gcc.c-torture/execute/stkalign.c
@@ -0,0 +1,26 @@
+/* { dg-options "-fno-inline" } */
+
+#include 
+
+#define ALIGNMENT 64
+
+unsigned test(unsigned n, unsigned p)
+{
+  static struct { char __attribute__((__aligned__(ALIGNMENT))) c; } s;
+  unsigned x;
+
+  assert(__alignof__(s) == ALIGNMENT);
+  asm ("" : "=g" (x), "+m" (s) : "0" ());
+
+  return n ? test(n - 1, x) : (x ^ p);
+}
+
+int main (int argc, char *argv[] __attribute__((unused)))
+{
+  unsigned int x = test(argc, 0);
+
+  x |= test(argc + 1, 0);
+  x |= test(argc + 2, 0);
+
+  return !(x & (ALIGNMENT - 1));
+}


[PATCH] S/390: Wide int support.

2015-12-11 Thread Dominik Vogt
The attached patch introduces wide int support to S/390 in order
to resolve a test case failure in gcc.dg/pr68129_1.c that is
caused by an assertion in
simplify-rtx.c:simplify_const_binary_operation().

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.c (s390_rtx_costs)
(s390_cannot_force_const_mem, legitimate_pic_operand_p)
(s390_preferred_reload_class, s390_reload_symref_address)
(legitimate_reload_constant_p, print_operand): Wide int support.
* config/s390/predicates.md ("const0_operand", "constm1_operand")
("consttable_operand"): Likewise.
("larl_operand"): Add a comment.
* config/s390/s390.h (TARGET_SUPPORTS_WIDE_INT): Enable wide int
support.
>From a2909e33789375d0217957defd335491d341918b Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Thu, 10 Dec 2015 11:51:08 +0100
Subject: [PATCH] S/390: Wide int support.

This fixes the assertion in simplify-rtx.c:simplify_const_binary_operation()
triggered by gcc.dg/pr68129_1.c.
---
 gcc/config/s390/predicates.md |  9 ++---
 gcc/config/s390/s390.c| 26 --
 gcc/config/s390/s390.h|  2 ++
 3 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 6a5ebbb..75cecf0 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -26,12 +26,12 @@
 
 ;; Return true if OP a const 0 operand (int/float/vector).
 (define_predicate "const0_operand"
-  (and (match_code "const_int,const_double,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (mode)")))
 
 ;; Return true if OP an all ones operand (int/float/vector).
 (define_predicate "constm1_operand"
-  (and (match_code "const_int, const_double,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONSTM1_RTX (mode)")))
 
 ;; Return true if OP is a 4 bit mask operand
@@ -42,7 +42,7 @@
 ;; Return true if OP is constant.
 
 (define_special_predicate "consttable_operand"
-  (and (match_code "symbol_ref, label_ref, const, const_int, const_double, const_vector")
+  (and (match_code "symbol_ref, label_ref, const, const_int, const_wide_int, const_double, const_vector")
(match_test "CONSTANT_P (op)")))
 
 ;; Return true if OP is a valid S-type operand.
@@ -121,6 +121,9 @@
 ;;  Return true if OP a valid operand for the LARL instruction.
 
 (define_predicate "larl_operand"
+; Note: Although CONST_INT and CONST_DOUBLE are not handled in this predicate,
+; at least one of them needs to appear or otherwise safe_predicate_mode will
+; assume that a DImode LABEL_REF is not accepted either (see genrecog.c).
   (match_code "label_ref, symbol_ref, const, const_int, const_double")
 {
   /* Allow labels and local symbols.  */
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index f8928b9..bed58d8 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1323,7 +1323,7 @@ s390_tm_ccmode (rtx op1, rtx op2, bool mixed)
 {
   int bit0, bit1;
 
-  /* ??? Fixme: should work on CONST_DOUBLE as well.  */
+  /* ??? Fixme: should work on CONST_WIDE_INT as well.  */
   if (GET_CODE (op1) != CONST_INT || GET_CODE (op2) != CONST_INT)
 return VOIDmode;
 
@@ -3355,6 +3355,7 @@ s390_rtx_costs (rtx x, machine_mode mode, int outer_code,
 case LABEL_REF:
 case SYMBOL_REF:
 case CONST_DOUBLE:
+case CONST_WIDE_INT:
 case MEM:
   *total = 0;
   return true;
@@ -3662,7 +3663,7 @@ tls_symbolic_reference_mentioned_p (rtx op)
 
 /* Return true if OP is a legitimate general operand when
generating PIC code.  It is given that flag_pic is on
-   and that OP satisfies CONSTANT_P or is a CONST_DOUBLE.  */
+   and that OP satisfies CONSTANT_P.  */
 
 int
 legitimate_pic_operand_p (rtx op)
@@ -3677,7 +3678,7 @@ legitimate_pic_operand_p (rtx op)
 }
 
 /* Returns true if the constant value OP is a legitimate general operand.
-   It is given that OP satisfies CONSTANT_P or is a CONST_DOUBLE.  */
+   It is given that OP satisfies CONSTANT_P.  */
 
 static bool
 s390_legitimate_constant_p (machine_mode mode, rtx op)
@@ -3731,6 +3732,7 @@ s390_cannot_force_const_mem (machine_mode mode, rtx x)
 {
 case CONST_INT:
 case CONST_DOUBLE:
+case CONST_WIDE_INT:
 case CONST_VECTOR:
   /* Accept all non-symbolic constants.  */
   return false;
@@ -3831,8 +3833,9 @@ legitimate_reload_constant_p (rtx op)
 return true;
 
   /* Accept double-word operands that can be split.  */
-  if (GET_CODE (op) == CONST_INT
-  && trunc_int_for_mode (INTVAL (op), word_mode) != INTVAL (op))
+  if (GET_CODE (op) == CONST_WIDE_INT
+  || (GET_CODE (op) == CONST_INT
+	  && trunc_int_for_mode (INTVAL (op), word_mode) != INTVAL (op)))
 {
   machine_mode dword_mode = word_mode == SImode ? DImode : 

Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-11 Thread Tom de Vries

On 13/11/15 12:39, Jakub Jelinek wrote:

We simply have some compiler internal interface between the caller and
callee of the outlined regions, each interface in between those has
its own structure type used to communicate the info;
we can attach attributes on the fields, or some flags to indicate some
properties interesting from aliasing POV.  We don't really need to perform
full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph
the relationship in between such callers and callees (for offloading regions
we already have "omp target entrypoint" attribute on the callee and a
singler caller), tell LTO if possible not to split those into different
partitions if easily possible, and then just for these pairs perform
aliasing/points-to analysis in the caller and the result record using
cliques/special attributes/whatever to the callee side, so that the callee
(outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis.


Hi,

This work-in-progress patch allows me to use IPA PTA information in the 
kernels pass group.


Since:
-  I'm running IPA PTA before ealias, and IPA PTA does not interpret
   restrict, and
- compute_may_alias doesn't run if IPA PTA information is present
I needed to convince ealias to do the restrict clique/base annotation.

It would be more logical to fit IPA PTA after ealias, but one is an IPA 
pass, the other a regular one-function pass, so I would have to split 
the containing pass groups pass_all_early_optimizations and 
pass_local_optimization_passes. I'll give that a try now.


Any comments?

Thanks,
- Tom
Run pass_ipa_pta before pass_local_optimization_passes

---
 gcc/gimple-ssa.h   |  2 ++
 gcc/passes.def |  1 +
 gcc/tree-pass.h|  1 +
 gcc/tree-ssa-structalias.c | 60 +++---
 4 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-ssa.h b/gcc/gimple-ssa.h
index 39551da..aff2fb7 100644
--- a/gcc/gimple-ssa.h
+++ b/gcc/gimple-ssa.h
@@ -83,6 +83,8 @@ struct GTY(()) gimple_df {
   /* The PTA solution for the ESCAPED artificial variable.  */
   struct pt_solution escaped;
 
+  bool clique_base_annotation_done;
+
   /* A map of decls to artificial ssa-names that point to the partition
  of the decl.  */
   hash_map * GTY((skip(""))) decls_to_pointers;
diff --git a/gcc/passes.def b/gcc/passes.def
index 678a900..5293be0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -68,6 +68,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_rebuild_cgraph_edges);
   POP_INSERT_PASSES ()
 
+  NEXT_PASS (pass_ipa_pta_oacc_kernels);
   NEXT_PASS (pass_local_optimization_passes);
   PUSH_INSERT_PASSES_WITHIN (pass_local_optimization_passes)
   NEXT_PASS (pass_fixup_cfg);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 4566d33..980922e 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -497,6 +497,7 @@ extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_ipa_pta_oacc_kernels (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_target_clone (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_dispatcher_calls (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 7420ce1..dfc0422 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -6939,7 +6939,7 @@ solve_constraints (void)
at the start of the file for an algorithmic overview.  */
 
 static void
-compute_points_to_sets (void)
+compute_points_to_sets (bool set_points_to_info)
 {
   basic_block bb;
   unsigned i;
@@ -6981,6 +6981,9 @@ compute_points_to_sets (void)
   /* From the constraints compute the points-to sets.  */
   solve_constraints ();
 
+  if (!set_points_to_info)
+goto done;
+
   /* Compute the points-to set for ESCAPED used for call-clobber analysis.  */
   cfun->gimple_df->escaped = find_what_var_points_to (cfun->decl,
 		  get_varinfo (escaped_id));
@@ -7057,6 +7060,7 @@ compute_points_to_sets (void)
 	}
 }
 
+ done:
   timevar_pop (TV_TREE_PTA);
 }
 
@@ -7289,6 +7293,8 @@ compute_dependence_clique (void)
 unsigned int
 compute_may_aliases (void)
 {
+  bool set_points_to_info = true;
+
   if (cfun->gimple_df->ipa_pta)
 {
   if (dump_file)
@@ -7300,13 +7306,16 @@ compute_may_aliases (void)
 	  dump_alias_info (dump_file);
 	}
 
-  return 0;
+  if (cfun->gimple_df->clique_base_annotation_done)
+	return 0;
+
+  set_points_to_info = false;
 }
 
   /* For each pointer P_i, determine the sets of variables that P_i may
  point-to.  Compute the reachability set of escaped and call-used
  variables.  */
-  

Re: fix scheduling antideps

2015-12-11 Thread Jeff Law

On 12/11/2015 02:22 AM, Eric Botcazou wrote:

This patch allows a target to increase the cost of anti-deps to better
reflect the actual cost on the machine.


But it can already do it via the TARGET_SCHED_ADJUST_COST hook, can't it?

And can't this be done with define_bypass as well?

Jeff



[PTX] TImode initializers

2015-12-11 Thread Nathan Sidwell
I noticed a C++ test ICE the compiler trying to output a 128bit enum 
initializer.  The fix is crazy simple -- have nvptx_assemble_initializer return 
false rather than ICE.  Defining SUPPORTS_WIDE_INT is helpful, but not necessary.


While adding the testcase, I noticed I'd  missed the opening '{' of the exiting 
dg-final  lines, so they weren't being run.  fixed that up while I was there.


nathan
2015-12-11  Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx.h (TARGET_SUPPORTS_WIDE_INT): Define.
	* config/nvptx/nvptx.c (nvptxx_assemble_integer): Return false for
	unrecognizable RTX.

	gcc/testsuite/
	* gcc.target/nvptx/ary-init.c: Repair dg_final syntax.
	* gcc.target/nvptx/decl-init.c: Likewise.  Add TI case.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 231563)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -1570,7 +1570,9 @@ nvptx_assemble_integer (rtx x, unsigned
   switch (GET_CODE (x))
 {
 default:
-  gcc_unreachable ();
+  /* Let the generic machinery figure it out, usually for a
+	 CONST_WIDE_INT.  */
+  return false;
 
 case CONST_INT:
   nvptx_assemble_value (INTVAL (x), size);
Index: gcc/config/nvptx/nvptx.h
===
--- gcc/config/nvptx/nvptx.h	(revision 231563)
+++ gcc/config/nvptx/nvptx.h	(working copy)
@@ -69,6 +69,7 @@
 #define FLOAT_TYPE_SIZE 32
 #define DOUBLE_TYPE_SIZE 64
 #define LONG_DOUBLE_TYPE_SIZE 64
+#define TARGET_SUPPORTS_WIDE_INT 1
 
 #undef SIZE_TYPE
 #define SIZE_TYPE (TARGET_ABI64 ? "long unsigned int" : "unsigned int")
Index: gcc/testsuite/gcc.target/nvptx/ary-init.c
===
--- gcc/testsuite/gcc.target/nvptx/ary-init.c	(revision 231563)
+++ gcc/testsuite/gcc.target/nvptx/ary-init.c	(working copy)
@@ -1,21 +1,25 @@
 /* { dg-additional-options "-Wno-long-long" } */
 
 char ca1[2] = {'a', 'b'};
+/* { dg-final { scan-assembler " .align 1 .u8 ca1\\\[2\\\] = { 97, 98 };" } } */
+
 short sa1[2] = { 1, 2 };
+/* { dg-final { scan-assembler " .align 2 .u16 sa1\\\[2\\\] = { 1, 2 };" } } */
+
 int ia1[2] = { 3, 4 };
+/* { dg-final { scan-assembler " .align 4 .u32 ia1\\\[2\\\] = { 3, 4 };" } } */
+
 long long la1[2] = { 5, 6 };
+/* { dg-final { scan-assembler " .align 8 .u64 la1\\\[2\\\] = { 5, 6 };" } } */
 
 char ca2[2][2] = {'A', 'B', 'C', 'D'};
+/* { dg-final { scan-assembler " .align 1 .u8 ca2\\\[4\\\] = { 65, 66, 67, 68 };" } } */
+
 short sa2[2][2] = { 7, 8, 9, 10 };
+/* { dg-final { scan-assembler " .align 2 .u16 sa2\\\[4\\\] = { 7, 8, 9, 10 };" } } */
+
 int ia2[2][2] = { 11, 12, 13, 14 };
-long long la2[2][2] = { 15, 16, 17, 18 };
+/* { dg-final { scan-assembler " .align 4 .u32 ia2\\\[4\\\] = { 11, 12, 13, 14 };" } } */
 
-/* dg-final { scan-assembler " .align 8 .u64 la1\\\[2\\\] = { 5, 6 };" } } */
-/* dg-final { scan-assembler " .align 4 .u32 ia1\\\[2\\\] = { 3, 4 };" } } */
-/* dg-final { scan-assembler " .align 2 .u16 sa1\\\[2\\\] = { 1, 2 };" } } */
-/* dg-final { scan-assembler " .align 1 .u8 ca1\\\[2\\\] = { 97, 98 };" } } */
-
-/* dg-final { scan-assembler " .align 8 .u64 la2\\\[4\\\] = { 15, 16, 17, 18 };" } } */
-/* dg-final { scan-assembler " .align 4 .u32 ia2\\\[4\\\] = { 11, 12, 13, 14 };" } } */
-/* dg-final { scan-assembler " .align 2 .u16 sa2\\\[4\\\] = { 7, 8, 9, 10 };" } } */
-/* dg-final { scan-assembler " .align 1 .u8 ca2\\\[4\\\] = { 65, 66, 67, 68 };" } } */
+long long la2[2][2] = { 15, 16, 17, 18 };
+/* { dg-final { scan-assembler " .align 8 .u64 la2\\\[4\\\] = { 15, 16, 17, 18 };" } } */
Index: gcc/testsuite/gcc.target/nvptx/decl-init.c
===
--- gcc/testsuite/gcc.target/nvptx/decl-init.c	(revision 231563)
+++ gcc/testsuite/gcc.target/nvptx/decl-init.c	(working copy)
@@ -2,11 +2,15 @@
 /* { dg-additional-options "-Wno-long-long" } */
 
 __extension__ _Complex float cf = 1.0f + 2.0if;
+/* { dg-final { scan-assembler ".align 4 .u32 cf\\\[2\\\] = { 1065353216, 1073741824 };" } } */
+
 __extension__ _Complex double cd = 3.0 + 4.0i;
+/* { dg-final { scan-assembler ".align 8 .u64 cd\\\[2\\\] = { 4613937818241073152, 4616189618054758400 };" } } */
 
 long long la[2] = 
   {0x0102030405060708ll,
0x1112131415161718ll};
+/* { dg-final { scan-assembler ".align 8 .u64 la\\\[2\\\] = { 72623859790382856, 1230066625199609624 };" } } */
 
 struct six 
 {
@@ -15,23 +19,27 @@ struct six
 };
 
 struct six six1 = {1, 2, 3};
+/* { dg-final { scan-assembler ".align 2 .u16 six1\\\[3\\\] = { 1, 2, 3 };" } } */
+
 struct six six2[2] = {{4, 5, 6}, {7, 8, 9}};
+/* { dg-final { scan-assembler ".align 2 .u16 six2\\\[6\\\] = { 4, 5, 6, 7, 8, 9 };" } } */
 
 struct __attribute__((packed)) five 
 {
   char a;
   int b;
 };
+
 struct five five1 = {10, 11};
+/* { dg-final { scan-assembler ".align 1 .u8 five1\\\[5\\\] = { 10, 11, 0, 0, 0 };" } } */
+
 struct five five2[2] 

Re: [PATCH] S/390: Wide int support.

2015-12-11 Thread Ulrich Weigand
Dominik Vogt wrote:

> +; Note: Although CONST_INT and CONST_DOUBLE are not handled in this 
> predicate,
> +; at least one of them needs to appear or otherwise safe_predicate_mode will
> +; assume that a DImode LABEL_REF is not accepted either (see genrecog.c).

The problem is not DImode LABEL_REFs, but rather VOIDmode LABEL_REFs when
matched against a match_operand:DI.

Otherwise, this patch is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [patch] Improve generated libstdc++ API docs

2015-12-11 Thread Jonathan Wakely

On 10/12/15 14:02 +, Jonathan Wakely wrote:

This adjusts some Doxygen comments and updates the Doxygen config file
to ensure all headers are processed (previously doxygen was ignoring
filenames without an extension, which is a lot of the library!)

There's a workaround in include/std/bitset for the bug 
https://bugzilla.gnome.org/show_bug.cgi?id=759242

and another in include/ext/pb_ds/detail/bin_search_tree_/traits.hpp
for https://bugzilla.gnome.org/show_bug.cgi?id=759241

Tested with 'make doc-man-doxygen doc-html-doxygen' and committed to
trunk.


And I've committed this one to gcc-5-branch.


commit d83ebcff2ca95ad3ff3832bd8d61bb1e7cc8cc34
Author: Jonathan Wakely 
Date:   Thu Dec 10 15:03:49 2015 +

Improve generated libstdc++ API docs

	* doc/doxygen/user.cfg.in: Use EXTENSION_MAPPING tag. Add new headers
	to INPUT. Remove obsolete XML_SCHEMA and XML_DTD tags. Update
	PREDEFINED macros. Set BRIEF_MEMBER_DESC for man-pages.
	* include/backward/strstream: Correct @file comment.
	* include/bits/forward_list.h: Improve Doxygen comments.
	* include/bits/locale_facets_nonio.h: Likewise.
	* include/debug/vector (_Safe_vector): Add @brief section to comment.
	* include/experimental/fs_fwd.h: Correct @file comment.
	* include/experimental/fs_ops.h: Likewise.
	* include/experimental/string_view.tcc: Likewise.
	* include/experimental/optional: Document experimental status.
	* include/experimental/string_view: Correct @file comment.
	* include/ext/pb_ds/detail/bin_search_tree_/traits.hpp: Reduce
	whitespace to avoid Doxygen bug.
	* include/std/bitset: Remove redundant @class Doxygen command. Add
	parentheses to avoid Doxygen bug.
	* include/std/mutex: Improve Doxygen comments.
	* include/tr2/dynamic_bitset: Add missing @param documentation.
	* scripts/run_doxygen: Rename man pages for std::experimental types.

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in b/libstdc++-v3/doc/doxygen/user.cfg.in
index ff2db48..ccd5fbb 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -272,7 +272,7 @@ OPTIMIZE_OUTPUT_VHDL   = NO
 # Note that for custom extensions you also need to set FILE_PATTERNS otherwise
 # the files are not read by doxygen.
 
-EXTENSION_MAPPING  =
+EXTENSION_MAPPING  = no_extension=C++ .h=C++ .tcc=C++ .hpp=C++
 
 # If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments
 # according to the Markdown format, which allows for more readable
@@ -757,6 +757,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/bitset \
  include/chrono \
  include/complex \
+ include/codecvt \
  include/condition_variable \
  include/deque \
  include/forward_list \
@@ -812,6 +813,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/cmath \
  include/csetjmp \
  include/csignal \
+ include/cstdalign \
  include/cstdarg \
  include/cstdbool \
  include/cstddef \
@@ -831,6 +833,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/backward/hash_set \
  include/backward/strstream \
  include/debug \
+ include/debug/array \
  include/debug/bitset \
  include/debug/deque \
  include/debug/forward_list \
@@ -853,6 +856,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/profile/unordered_set \
  include/profile/vector \
  include/ext/algorithm \
+ include/ext/cmath \
  include/ext/functional \
  include/ext/iterator \
  include/ext/memory \
@@ -886,9 +890,18 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc \
  include/tr2/ratio \
  include/tr2/type_traits \
  include/decimal/decimal \
+ include/experimental \
+ include/experimental/algorithm \
  include/experimental/any \
+ include/experimental/chrono \
+ include/experimental/filesystem \
+ include/experimental/functional \
  include/experimental/optional \
+ include/experimental/ratio \
  include/experimental/string_view \
+ 

Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-11 Thread Richard Biener
On Fri, 11 Dec 2015, Tom de Vries wrote:

> On 13/11/15 12:39, Jakub Jelinek wrote:
> > We simply have some compiler internal interface between the caller and
> > callee of the outlined regions, each interface in between those has
> > its own structure type used to communicate the info;
> > we can attach attributes on the fields, or some flags to indicate some
> > properties interesting from aliasing POV.  We don't really need to perform
> > full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph
> > the relationship in between such callers and callees (for offloading regions
> > we already have "omp target entrypoint" attribute on the callee and a
> > singler caller), tell LTO if possible not to split those into different
> > partitions if easily possible, and then just for these pairs perform
> > aliasing/points-to analysis in the caller and the result record using
> > cliques/special attributes/whatever to the callee side, so that the callee
> > (outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis.
> 
> Hi,
> 
> This work-in-progress patch allows me to use IPA PTA information in the
> kernels pass group.
> 
> Since:
> -  I'm running IPA PTA before ealias, and IPA PTA does not interpret
>restrict, and
> - compute_may_alias doesn't run if IPA PTA information is present
> I needed to convince ealias to do the restrict clique/base annotation.
> 
> It would be more logical to fit IPA PTA after ealias, but one is an IPA pass,
> the other a regular one-function pass, so I would have to split the containing
> pass groups pass_all_early_optimizations and pass_local_optimization_passes.
> I'll give that a try now.
> 
> Any comments?

I don't think you want to run IPA PTA before early
optimizations, it (and ealias) rely on some initial cleanup to
do anything meaningful with well-spent ressources.

The local PTA "hack" also looks more like a waste of resources, but well 
... teaching IPA PTA to honor restrict might be an impossible task
though I didn't think much about it other than handling it only for
nonlocal_p functions (for others we should see all incoming args
if IPA PTA works optimally).  The restrict tags will leak all over
the place of course and in the end no meaningful cliques may remain.

Richard.


Re: [PATCH] S/390: Wide int support.

2015-12-11 Thread Andreas Krebbel
On 12/11/2015 03:20 PM, Dominik Vogt wrote:
> The attached patch introduces wide int support to S/390 in order
> to resolve a test case failure in gcc.dg/pr68129_1.c that is
> caused by an assertion in
> simplify-rtx.c:simplify_const_binary_operation().

Applied with the change suggested by Uli.  Thanks!

-Andreas-




Re: [PATCH] Handle sizes and kinds params of GOACC_paralllel in find_func_clobbers

2015-12-11 Thread Richard Biener
On Fri, 11 Dec 2015, Tom de Vries wrote:

> Hi,
> 
> while testing the oacc kernels patch series on top of trunk, using the optimal
> handling of BUILTIN_IN_GOACC_PARALLEL in fipa-pta  I ran into a failure where
> the stores to the omp_data_sizes array were removed by dse.
> 
> The call bb in the failing testcase normally looks like this:
> ...
>   :
>   .omp_data_arr.10.D.2550 = c.2_18;
>   .omp_data_arr.10.c = 
>   .omp_data_arr.10.D.2553 = b.1_15;
>   .omp_data_arr.10.b = 
>   .omp_data_arr.10.D.2556 = a.0_11;
>   .omp_data_arr.10.a = 
>D.2572 = n_6(D);
>   .omp_data_arr.10.n = 
>   .omp_data_sizes.11[0] = _8;
>   .omp_data_sizes.11[1] = 0;
>   .omp_data_sizes.11[2] = _8;
>   .omp_data_sizes.11[3] = 0;
>   .omp_data_sizes.11[4] = _8;
>   .omp_data_sizes.11[5] = 0;
>   .omp_data_sizes.11[6] = 4;
>   __builtin_GOACC_parallel_keyed (-1, foo._omp_fn.0, 7,
>   &.omp_data_arr.10,
>   &.omp_data_sizes.11,
>   &.omp_data_kinds.12, 0);
> ...
> 
> Dse removed the stores, because omp_data_sizes was not marked as a used by
> __builtin_GOACC_parallel_keyed.
> 
> We pretend in fipa-pta that __builtin_GOACC_parallel_keyed is never called,
> and instead handle the call foo._omp_fn.0 (&.omp_data_arr.10). That means the
> use of omp_data_sizes by __builtin_GOACC_parallel_keyed is ignored.
> 
> This patch fixes that (for both sizes and kinds arrays), as confirmed with a
> test run of target-libgomp c.exp on the accelerator.
> 
> OK for stage3 if bootstrap and reg-test succeeds?

Ok, though techincally they are used by the OMP runtime (but this we
could only represent by letting them escape).  I wonder what can of
worms we'd open if you LTO the OMP runtime in ... (and thus
builtins map to real functions!)

Thanks,
Richard.


Re: [patch] Improve generated libstdc++ API docs

2015-12-11 Thread Jonathan Wakely

On 11/12/15 12:47 +, Jonathan Wakely wrote:

* doc/doxygen/user.cfg.in: Use EXTENSION_MAPPING tag. Add new headers
to INPUT. Remove obsolete XML_SCHEMA and XML_DTD tags. Update
PREDEFINED macros. Set BRIEF_MEMBER_DESC for man-pages.


Oops, that changelog is not quite right, I dropped the
BRIEF_MEMBER_DESC part on the branch. I set that variable for man
pages on trunk so that we get:

NAME
  std::forward_list< _Tp, _Alloc > - A standard container with linear time 
access to elements, and fixed time insertion/deletion at any point in the sequence.

rather than:

NAME
  std::forward_list< _Tp, _Alloc > -

But it also adds the brief descriptions to every member function:

  Public Member Functions
  forward_list () noexcept(is_nothrow_default_constructible< _Node_alloc_type 
>::value)
  Creates a forward_list with no elements.
  forward_list (const _Alloc &__al) noexcept
  Creates a forward_list with no elements.


We want it on the NAME, but I don't think it's good on every member
(so might revert that bit on trunk). I think Doxygen's behaviour is
wrong here, see https://bugzilla.gnome.org/show_bug.cgi?id=759275


Re: [PATCH PR68542]

2015-12-11 Thread Yuri Rumyantsev
Richard.
Thanks for your review.
I re-designed fix for assert by adding additional checks for vector
comparison with boolean result to fold_binary_op_with_conditional_arg
and remove early exit to combine_cond_expr_cond.
Unfortunately, I am not able to provide you with test-case since it is
in my second patch related to back-end patch which I sent earlier
(12-08).

Bootstrapping and regression testing did not show any new failures.
Is it OK for trunk?

ChangeLog:
2015-12-11  Yuri Rumyantsev  

PR middle-end/68542
* fold-const.c (fold_binary_op_with_conditional_arg): Add checks oh
vector comparison with boolean result to avoid ICE.
(fold_relational_const): Add handling of vector
comparison with boolean result.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
vector types.

2015-12-10 16:36 GMT+03:00 Richard Biener :
> On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev  wrote:
>> Hi Richard.
>>
>> Thanks a lot for your review.
>> Below are my answers.
>>
>> You asked why I inserted additional check to
>> ++ b/gcc/tree-ssa-forwprop.c
>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>> tree_code code, tree type,
>>
>>gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>
>> +  /* Do not perform combining it types are not compatible.  */
>> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>> +  && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE 
>> (op0
>> +return NULL_TREE;
>> +
>>
>> again, how does this happen?
>>
>> This is because without it I've got assert in fold_convert_loc
>>   gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>>
>> since it tries to convert vector of bool to scalar bool.
>> Here is essential part of call-stack:
>>
>> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
>> at ../../gcc/diagnostic.c:1259
>> #1  0x01743ada in fancy_abort (
>> file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
>> function=0x184b9d0 > tree_node*)::__FUNCTION__> "fold_convert_loc") at
>> ../../gcc/diagnostic.c:1332
>> #2  0x009c8330 in fold_convert_loc (loc=0, type=0x718a9d20,
>> arg=0x71a7f488) at ../../gcc/fold-const.c:2216
>> #3  0x009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
>> type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
>> op2=0x718c2030) at ../../gcc/fold-const.c:11453
>> #4  0x009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
>> type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
>> op2=0x718c2030) at ../../gcc/fold-const.c:12394
>> #5  0x009d870c in fold_binary_op_with_conditional_arg (loc=0,
>> code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
>> op1=0x71a48780, cond=0x71a7f460, arg=0x71a48780,
>> cond_first_p=1) at ../../gcc/fold-const.c:6465
>> #6  0x009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
>> type=0x718a9d20, op0=0x71a7f460, op1=0x71a48780)
>> at ../../gcc/fold-const.c:9211
>> #7  0x00ecb8fa in combine_cond_expr_cond (stmt=0x71a487d0,
>> code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
>> op1=0x71a48780, invariant_only=true)
>> at ../../gcc/tree-ssa-forwprop.c:382
>
> Ok, but that only shows that
>
>   /* Convert A ? 1 : 0 to simply A.  */
>   if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
>  : (integer_onep (op1)
> && !VECTOR_TYPE_P (type)))
>   && integer_zerop (op2)
>   /* If we try to convert OP0 to our type, the
>  call to fold will try to move the conversion inside
>  a COND, which will recurse.  In that case, the COND_EXPR
>  is probably the best choice, so leave it alone.  */
>   && type == TREE_TYPE (arg0))
> return pedantic_non_lvalue_loc (loc, arg0);
>
>   /* Convert A ? 0 : 1 to !A.  This prefers the use of NOT_EXPR
>  over COND_EXPR in cases such as floating point comparisons.  */
>   if (integer_zerop (op1)
>   && (code == VEC_COND_EXPR ? integer_all_onesp (op2)
> : (integer_onep (op2)
>&& !VECTOR_TYPE_P (type)))
>   && truth_value_p (TREE_CODE (arg0)))
> return pedantic_non_lvalue_loc (loc,
> fold_convert_loc (loc, type,
>   

Re: [PATCH] New version of libmpx with new memmove wrapper

2015-12-11 Thread Ilya Enkovich
On 08 Dec 13:53, Aleksandra Tsvetkova wrote:
> Wrong version of patch was attached.
> 
> On Tue, Dec 8, 2015 at 1:46 PM, Aleksandra Tsvetkova  
> wrote:
> > gcc/testsuite/ChangeLog
> > 2015-10-27  Tsvetkova Alexandra  
> >
> > * gcc.target/i386/mpx/memmove-1.c: New test for __mpx_wrapper_memmove.
> > * gcc.target/i386/mpx/memmove-2.c: New test covering fail on spec.

memmove-2.c has Windows-style end of lines.

> +  /* Not necessary to copy bounds if size is less then size of pointer
> + or SRC=DST.  */
> +  if ((n >= sizeof (void *)) || (src != dst))
> +move_bounds (dst, src, n);

Condition is still incorrect.

I fixed it, bootstrapped, regtested and applied to trunk.  Here is committed 
version.

Thanks,
Ilya
--
libmpx/

2015-12-11  Tsvetkova Alexandra  

* mpxrt/Makefile.am (libmpx_la_LDFLAGS): Add -version-info
option.
* libmpxwrap/Makefile.am (libmpx_la_LDFLAGS): Likewise and
fix include path.
* libmpx/Makefile.in: Regenerate.
* mpxrt/Makefile.in: Regenerate.
* libmpxwrap/Makefile.in: Regenerate.
* mpxrt/libtool-version: New version.
* libmpxwrap/libtool-version: Likewise.
* mpxrt/libmpx.map: Add new version and a new symbol.
* mpxrt/mpxrt.h: New file.
* mpxrt/mpxrt.c (NUM_L1_BITS): Moved to mpxrt.h.
(REG_IP_IDX): Moved to mpxrt.h.
(REX_PREFIX): Moved to mpxrt.h.
(XSAVE_OFFSET_IN_FPMEM): Moved to mpxrt.h.
(MPX_L1_SIZE): Moved to mpxrt.h.
* libmpxwrap/mpx_wrappers.c (mpx_pointer): New type.
(mpx_bt_entry): New type.
(alloc_bt): New function.
(get_bt): New function.
(copy_if_possible): New function.
(copy_if_possible_from_end): New function.
(move_bounds): New function.
(__mpx_wrapper_memmove): Use move_bounds to copy bounds.

gcc/testsuite/

2015-12-11  Tsvetkova Alexandra  

* gcc.target/i386/mpx/memmove-1.c: New test.
* gcc.target/i386/mpx/memmove-2.c: New test.


diff --git a/gcc/testsuite/gcc.target/i386/mpx/memmove-1.c 
b/gcc/testsuite/gcc.target/i386/mpx/memmove-1.c
new file mode 100755
index 000..0efd030
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mpx/memmove-1.c
@@ -0,0 +1,117 @@
+/* { dg-do run } */
+/* { dg-options "-fcheck-pointer-bounds -mmpx" } */
+
+
+#include 
+#include 
+#include "mpx-check.h"
+
+#ifdef __i386__
+/* i386 directory size is 4MB.  */
+#define MPX_NUM_L2_BITS 10
+#define MPX_NUM_IGN_BITS 2
+#else /* __i386__ */
+/* x86_64 directory size is 2GB.  */
+#define MPX_NUM_L2_BITS 17
+#define MPX_NUM_IGN_BITS 3
+#endif /* !__i386__ */
+
+
+/* bt_num_of_elems is the number of elements in bounds table.  */
+unsigned long bt_num_of_elems = (1UL << MPX_NUM_L2_BITS);
+/* Function to test MPX wrapper of memmove function.
+   src_bigger_dst determines which address is bigger, can be 0 or 1.
+   src_bt_index and dst_bt index are bt_indexes
+   from the beginning of the page.
+   bd_index_end is the bd index of the last element of src if we define
+   bd index of the first element as 0.
+   src_bt index_end is bt index of the last element of src.
+   pointers inside determines if array being copied includes pointers
+   src_align and dst_align are alignments of src and dst.
+   Arrays may contain unaligned pointers.  */
+int
+test (int src_bigger_dst, int src_bt_index, int dst_bt_index,
+  int bd_index_end, int src_bt_index_end, int pointers_inside,
+  int src_align, int dst_align)
+{
+  const int n =
+src_bt_index_end - src_bt_index + bd_index_end * bt_num_of_elems;
+  if (n < 0)
+{
+  return 0;
+}
+  const int num_of_pointers = (bd_index_end + 2) * bt_num_of_elems;
+  void **arr = 0;
+  posix_memalign ((void **) (),
+   1UL << (MPX_NUM_L2_BITS + MPX_NUM_IGN_BITS),
+   num_of_pointers * sizeof (void *));
+  void **src = arr, **dst = arr;
+  if ((src_bigger_dst) && (src_bt_index < dst_bt_index))
+src_bt_index += bt_num_of_elems;
+  if (!(src_bigger_dst) && (src_bt_index > dst_bt_index))
+dst_bt_index += bt_num_of_elems;
+  src += src_bt_index;
+  dst += dst_bt_index;
+  char *realign = (char *) src;
+  realign += src_align;
+  src = (void **) realign;
+  realign = (char *) dst;
+  realign += src_align;
+  dst = (void **) realign;
+  if (pointers_inside)
+{
+  for (int i = 0; i < n; i++)
+src[i] = __bnd_set_ptr_bounds (arr + i, i * sizeof (void *) + 1);
+}
+  memmove (dst, src, n * sizeof (void *));
+  if (pointers_inside)
+{
+  for (int i = 0; i < n; i++)
+{
+  if (dst[i] != arr + i)
+abort ();
+  if (__bnd_get_ptr_lbound (dst[i]) != arr + i)
+abort ();
+  if (__bnd_get_ptr_ubound (dst[i]) != arr + 2 * i)
+abort ();
+}
+}
+  free (arr);
+  return 0;
+}
+
+/* Call testall to test common cases 

RE: [PATCH 2/2] [graphite] update required isl versions

2015-12-11 Thread Sebastian Paul Pop
Good points.
I will send an updated patch following all your recommendations.

Sebastian

-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Friday, December 11, 2015 3:42 AM
To: Sebastian Pop
Cc: Sebastian Pop; GCC Patches; hiradi...@msn.com
Subject: Re: [PATCH 2/2] [graphite] update required isl versions

On Thu, Dec 10, 2015 at 6:05 PM, Sebastian Pop  wrote:
> we now check the isl version, as there are no real differences in existing 
> files
> in between isl 0.14 and isl 0.15.

I thought ISL 0.15 has some new features you could check?  Also using a run test
is bad for cross compiling.

I'd simply change the "compatible" test to check for isl_ctx_get_max_operations
or whatever is needed to implement the compute bound and not worry about
warning about using a deprecated ISL during configure.

Maybe add the ISL version to the list of dependency versions we print in
toplev.c:print_version.

Richard.

> ---
>  config/isl.m4 |  41 +++--
>  configure | 112 
> --
>  2 files changed, 123 insertions(+), 30 deletions(-)
>
> diff --git a/config/isl.m4 b/config/isl.m4
> index 459fac1..886b0e4 100644
> --- a/config/isl.m4
> +++ b/config/isl.m4
> @@ -19,23 +19,23 @@
>
>  # ISL_INIT_FLAGS ()
>  # -
> -# Provide configure switches for ISL support.
> +# Provide configure switches for isl support.
>  # Initialize isllibs/islinc according to the user input.
>  AC_DEFUN([ISL_INIT_FLAGS],
>  [
>AC_ARG_WITH([isl-include],
>  [AS_HELP_STRING(
>[--with-isl-include=PATH],
> -  [Specify directory for installed ISL include files])])
> +  [Specify directory for installed isl include files])])
>AC_ARG_WITH([isl-lib],
>  [AS_HELP_STRING(
>[--with-isl-lib=PATH],
> -  [Specify the directory for the installed ISL library])])
> +  [Specify the directory for the installed isl library])])
>
>AC_ARG_ENABLE(isl-version-check,
>  [AS_HELP_STRING(
>[--disable-isl-version-check],
> -  [disable check for ISL version])],
> +  [disable check for isl version])],
>  ENABLE_ISL_CHECK=$enableval,
>  ENABLE_ISL_CHECK=yes)
>
> @@ -58,15 +58,15 @@ AC_DEFUN([ISL_INIT_FLAGS],
>if test "x${with_isl_lib}" != x; then
>  isllibs="-L$with_isl_lib"
>fi
> -  dnl If no --with-isl flag was specified and there is in-tree ISL
> +  dnl If no --with-isl flag was specified and there is in-tree isl
>dnl source, set up flags to use that and skip any version tests
> -  dnl as we cannot run them before building ISL.
> +  dnl as we cannot run them before building isl.
>if test "x${islinc}" = x && test "x${isllibs}" = x \
>   && test -d ${srcdir}/isl; then
>  isllibs='-L$$r/$(HOST_SUBDIR)/isl/'"$lt_cv_objdir"' '
>  islinc='-I$$r/$(HOST_SUBDIR)/isl/include -I$$s/isl/include'
>  ENABLE_ISL_CHECK=no
> -AC_MSG_WARN([using in-tree ISL, disabling version check])
> +AC_MSG_WARN([using in-tree isl, disabling version check])
>fi
>
>isllibs="${isllibs} -lisl"
> @@ -75,7 +75,7 @@ AC_DEFUN([ISL_INIT_FLAGS],
>
>  # ISL_REQUESTED (ACTION-IF-REQUESTED, ACTION-IF-NOT)
>  # 
> -# Provide actions for failed ISL detection.
> +# Provide actions for failed isl detection.
>  AC_DEFUN([ISL_REQUESTED],
>  [
>AC_REQUIRE([ISL_INIT_FLAGS])
> @@ -106,12 +106,31 @@ AC_DEFUN([ISL_CHECK_VERSION],
>  LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs}"
>  LIBS="${_isl_saved_LIBS} -lisl"
>
> -AC_MSG_CHECKING([for compatible ISL])
> -AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include ]], [[;]])],
> +AC_MSG_CHECKING([for recommended isl 0.15])
> +AC_TRY_RUN([#include 
> +#include 
> +int main() {
> +  if (strncmp (_GENERATED_STDINT_H, "isl 0.15", 8))
> +return 1;
> +  return 0;
> +}],
> [gcc_cv_isl=yes],
> -   [gcc_cv_isl=no])
> +   [gcc_cv_isl=no], [gcc_cv_isl=no])
>  AC_MSG_RESULT([$gcc_cv_isl])
>
> +if test "${gcc_cv_isl}" = no ; then
> +   AC_MSG_CHECKING([for deprecated isl 0.14])
> +   AC_TRY_RUN([#include 
> +   #include 
> +   int main() {
> + if (strncmp (_GENERATED_STDINT_H, "isl 0.14", 8))
> +   return 1;
> + return 0;
> +   }],
> +   [gcc_cv_isl=yes],
> +   [gcc_cv_isl=no], [gcc_cv_isl=no])
> +AC_MSG_RESULT([$gcc_cv_isl, recommended isl version is 0.15, minimum 
> required isl version 0.14 is deprecated])
> +fi
>  CFLAGS=$_isl_saved_CFLAGS
>  LDFLAGS=$_isl_saved_LDFLAGS
>  LIBS=$_isl_saved_LIBS
> diff --git a/configure b/configure
> index 090615f..4284ba7 100755
> --- a/configure
> +++ b/configure
> @@ -1492,7 +1492,7 @@ Optional Features:
>build static 

[PATCH] S/390: Allow to use r1 to r4 as literal pool base.

2015-12-11 Thread Dominik Vogt
The attached patch enables using r1 to r4 as the literal pool base pointer if
one of them is unused in a leaf function.  The unpatched code supports only r5
and r13.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.c (s390_init_frame_layout): Allow to use r1, ..., r4
as base register for the literal pool too.
>From f2838924c4c02eeabfdf7d8b79bd7a7ce8228e06 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 11 Dec 2015 11:33:23 +0100
Subject: [PATCH] S/390: Allow to use r1 to r4 as literal pool base
 pointer.

The old code only considered r5 and r13.
---
 gcc/config/s390/s390.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index bc6f05b..b195ad3d 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -9543,10 +9543,17 @@ s390_init_frame_layout (void)
 	 as base register to avoid save/restore overhead.  */
   if (!base_used)
 	cfun->machine->base_reg = NULL_RTX;
-  else if (crtl->is_leaf && !df_regs_ever_live_p (5))
-	cfun->machine->base_reg = gen_rtx_REG (Pmode, 5);
   else
-	cfun->machine->base_reg = gen_rtx_REG (Pmode, BASE_REGNUM);
+	{
+	  int br = 0;
+
+	  if (crtl->is_leaf)
+	/* Prefer r5 (most likely to be free).  */
+	for (br = 5; br >= 1 && df_regs_ever_live_p (br); br--)
+	  ;
+	  cfun->machine->base_reg =
+	gen_rtx_REG (Pmode, (br > 0) ? br : BASE_REGNUM);
+	}
 
   s390_register_info ();
   s390_frame_info ();
-- 
2.3.0



[RFC] Dump ssaname info for default defs

2015-12-11 Thread Tom de Vries

Hi,

atm, we dump ssa-name info for lhs-es of statements. That leaves out the 
ssa names with default defs.


This proof-of-concept patch prints the ssa-name info for default defs, 
in the following format:

...
__attribute__((noclone, noinline))
bar (intD.6 * cD.1755, intD.6 * dD.1756)
# PT = nonlocal
# DEFAULT_DEF c_2(D)
# PT = { D.1762 } (nonlocal)
# ALIGN = 4, MISALIGN = 0
# DEFAULT_DEF d_4(D)
{
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
;;prev block 0, next block 1, flags: (NEW, REACHABLE)
;;pred:   ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  # .MEM_3 = VDEF <.MEM_1(D)>
  *c_2(D) = 1;
  # .MEM_5 = VDEF <.MEM_3>
  *d_4(D) = 2;
  # VUSE <.MEM_5>
  return;
;;succ:   EXIT [100.0%]  (EXECUTABLE)

}
...

Good idea? Any further comments, f.i. on formatting?

Thanks,
- Tom
Dump ssaname info for default defs

---
 gcc/gimple-pretty-print.c | 11 +++
 gcc/tree-cfg.c| 16 
 2 files changed, 27 insertions(+)

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index f1abf5c..75a3036 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1887,6 +1887,17 @@ dump_ssaname_info (pretty_printer *buffer, tree node, int spc)
 }
 }
 
+extern void dump_ssaname_info_to_file (FILE *, tree);
+
+void
+dump_ssaname_info_to_file (FILE *file, tree node)
+{
+  pretty_printer buffer;
+  pp_needs_newline () = true;
+  buffer.buffer->stream = file;
+  dump_ssaname_info (, node, 0);
+  pp_flush ();
+}
 
 /* Dump a PHI node PHI.  BUFFER, SPC and FLAGS are as in pp_gimple_stmt_1.
The caller is responsible for calling pp_flush on BUFFER to finalize
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 0c624aa..21cf7a1 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -7312,6 +7312,7 @@ move_sese_region_to_fn (struct function *dest_cfun, basic_block entry_bb,
   return bb;
 }
 
+extern void dump_ssaname_info_to_file (FILE *, tree);
 
 /* Dump FUNCTION_DECL FN to file FILE using FLAGS (see TDF_* in dumpfile.h)
*/
@@ -7369,6 +7370,21 @@ dump_function_to_file (tree fndecl, FILE *file, int flags)
 }
   fprintf (file, ")\n");
 
+  if (gimple_in_ssa_p (fun))
+{
+  arg = DECL_ARGUMENTS (fndecl);
+  while (arg)
+	{
+	  tree def = ssa_default_def (fun, arg);
+	  if (flags & TDF_ALIAS)
+	dump_ssaname_info_to_file (file, def);
+	  fprintf (file, "# DEFAULT_DEF ");
+	  print_generic_expr (file, def, dump_flags);
+	  fprintf (file, "\n");
+	  arg = DECL_CHAIN (arg);
+	}
+}
+
   if (flags & TDF_VERBOSE)
 print_node (file, "", fndecl, 2);
 


Re: [committed 5/5] Fix -Wmisleading-indentation warning in graphite-optimize-isl.c

2015-12-11 Thread Sebastian Pop
David Malcolm wrote:
> gcc/ChangeLog:
>   * graphite-optimize-isl.c (scop_get_domains): Fix indentation.
> ---
>  gcc/graphite-optimize-isl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
> index 8727e39..0c6a971 100644
> --- a/gcc/graphite-optimize-isl.c
> +++ b/gcc/graphite-optimize-isl.c
> @@ -359,7 +359,7 @@ scop_get_domains (scop_p scop ATTRIBUTE_UNUSED)
>FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
>  res = isl_union_set_add_set (res, isl_set_copy (pbb->domain));
>  
> -return res;
> +  return res;

Thanks!
Sebastian



Re: [PATCH] gcc: read -fdebug-prefix-map OLD from environment (improved reproducibility)

2015-12-11 Thread Daniel Kahn Gillmor
On Thu 2015-12-10 19:12:57 -0500, Daniel Kahn Gillmor wrote:
> On Thu 2015-12-10 18:59:33 -0500, Joseph Myers wrote:
>> On Thu, 10 Dec 2015, Daniel Kahn Gillmor wrote:
>>
>>> Specifically, if the first character of the "old" argument is a
>>> literal $, then gcc will treat it as an environment variable name, and
>>> use the value of the env var for prefix mapping.
>>
>> I don't think a literal $ in option arguments is a good idea; it's far too 
>> hard to pass through a sequence of shells and makefiles that you typically 
>> get in recursive make.  You end up with things like 
>> '-Wl,-rpath,'\''\\\$$\$$\\\$$\$$ORIGIN'\''/../' (part of a process for 
>> using $ORIGIN when linking GDB) if you try.
>
> yow, that's truly monstrous!
>
> Is there a different symbol or string you'd be OK using instead for the
> same approach?  What about looking for an "ENV:" prefix?
>
> so something like:
>
>  -fdebug-prefix-map=ENV:SOURCE_BUILD_DIR=/usr/src
>
> wdyt?  I could rework the patch pretty easily if that seems acceptable.

I've re-rolled the patch (attached below, here) to use the ENV: prefix
instead of the $.

I've also updated the patch on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68848 to match this
approach.

Thanks for the input, Joseph.

Regards,

--dkg

>From a1cc9f9ec219c06b06d3de0ce9d055a2e02422c3 Mon Sep 17 00:00:00 2001
From: Daniel Kahn Gillmor 
Date: Thu, 10 Dec 2015 12:09:45 -0500
Subject: [PATCH v2] gcc: read -fdebug-prefix-map OLD from environment
 (improved reproducibility)

Work on the reproducible-builds project [0] has identified that build
paths are one cause of output variation between builds.  This
changeset allows users to avoid this variation when building C objects
with debug symbols, while leaving the default behavior unchanged.

Background
--

gcc includes the build path in any generated DWARF debugging symbols,
specifically in DW_AT_comp_dir, but allows the embedded path to be
changed via -fdebug-prefix-map.

When -fdebug-prefix-map is used with the current build path, it
removes the build path from DW_AT_comp_dir but places it instead in
DW_AT_producer, so the reproducibility problem isn't resolved.

When building software for binary redistribution, the actual build
path on the build machine is irrelevant, and doesn't need to be
exposed in the debug symbols.

Resolution
--

This patch extends the first argument to -fdebug-prefix-map ("old") to
be able to read from the environment, which allows a packager to avoid
embedded build paths in the debugging symbols with something like:

  export SOURCE_BUILD_DIR="$(pwd)"
  gcc -fdebug-prefix-map=ENV:SOURCE_BUILD_DIR=/usr/src

Details
---

Specifically, if the "old" argument starts with a literal "ENV:", then
gcc will treat it as an environment variable name, and use the value
of the env var for prefix mapping.

As a result, DW_AT_producer contains the literal envvar name,
DW_AT_comp_dir contains the transformed build path, and the actual
build path is not at all present in the generated object file.

This has been tested successfully on amd64 machines, and i see no
reason why it would be platform-specific.

More discussion of alternate approaches considered and discarded in
the development of this change can be found at [1] for those
interested.

Feedback welcome!

[0] https://reproducible-builds.org
[1] https://lists.alioth.debian.org/pipermail/reproducible-builds/Week-of-Mon-20151130/004051.html
---
 gcc/doc/invoke.texi |  4 +++-
 gcc/final.c | 30 --
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5256031..3f76e03 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6440,7 +6440,9 @@ link processing time.  Merging is enabled by default.
 @item -fdebug-prefix-map=@var{old}=@var{new}
 @opindex fdebug-prefix-map
 When compiling files in directory @file{@var{old}}, record debugging
-information describing them as in @file{@var{new}} instead.
+information describing them as in @file{@var{new}} instead.  If
+@file{@var{old}} starts with a @samp{ENV:}, the corresponding environment
+variable will be dereferenced, and its value will be used instead.
 
 @item -fno-dwarf2-cfi-asm
 @opindex fdwarf2-cfi-asm
diff --git a/gcc/final.c b/gcc/final.c
index 8cb5533..1800184 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -1520,11 +1520,17 @@ static debug_prefix_map *debug_prefix_maps;
 /* Record a debug file prefix mapping.  ARG is the argument to
-fdebug-prefix-map and must be of the form OLD=NEW.  */
 
+#define ENV_PREFIX "ENV:"
+#define ENV_PREFIX_OFFSET (sizeof(ENV_PREFIX) - 1)
+
 void
 add_debug_prefix_map (const char *arg)
 {
   debug_prefix_map *map;
   const char *p;
+  char *env;
+  const char *old;
+  size_t oldlen;
 
   p = strchr (arg, '=');
   if (!p)
@@ -1532,9 +1538,29 @@ add_debug_prefix_map (const char *arg)
   error ("invalid argument %qs to -fdebug-prefix-map", arg);
 

Re: [PATCH 4/4] Add -Wmisleading-indentation to -Wall

2015-12-11 Thread Dominique d'Humières
Revision r231571 with Jan-Benedict Glaw’s fix for trailing whitespace.

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 231570)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,13 @@
+2015-12-11  Jan-Benedict Glaw  
+   Dominique d'Humieres  
+
+   PR target/26427
+   PR target/33120
+   PR testsuite/35710
+
+   * config/darwin.c (darwin_use_anchors_for_symbol_p): Fix indention and
+   trailing whitespace.
+
 2015-12-11  Jan Beulich  
 
* cfgexpand.c (expand_one_var): Exit early for static and
Index: gcc/config/darwin.c
===
--- gcc/config/darwin.c (revision 231570)
+++ gcc/config/darwin.c (working copy)
@@ -2997,23 +2997,23 @@
   SYMBOL_REF_BLOCK_OFFSET (symbol));
 }
 
-/* Disable section anchoring on any section containing a zero-sized 
+/* Disable section anchoring on any section containing a zero-sized
object.  */
 bool
 darwin_use_anchors_for_symbol_p (const_rtx symbol)
 {
-  if (DARWIN_SECTION_ANCHORS && flag_section_anchors) 
+  if (DARWIN_SECTION_ANCHORS && flag_section_anchors)
 {
   section *sect;
   /* If the section contains a zero-sized object it's ineligible.  */
   sect = SYMBOL_REF_BLOCK (symbol)->sect;
   /* This should have the effect of disabling anchors for vars that follow
- any zero-sized one, in a given section.  */ 
+ any zero-sized one, in a given section.  */
   if (sect->common.flags & SECTION_NO_ANCHOR)
return false;
 
-/* Also check the normal reasons for suppressing.  */
-return default_use_anchors_for_symbol_p (symbol);
+  /* Also check the normal reasons for suppressing.  */
+  return default_use_anchors_for_symbol_p (symbol);
 }
   else
 return false;

Dominique

> I think you can apply this as obvious
> Iain



Re: [PATCH] gcc: read -fdebug-prefix-map OLD from environment (improved reproducibility)

2015-12-11 Thread Bernd Schmidt

On 12/11/2015 05:49 PM, Daniel Kahn Gillmor wrote:

I've re-rolled the patch (attached below, here) to use the ENV: prefix
instead of the $.


It might be irrelevant at this point, but the "ENV:" prefix is used in 
AmigaOS and could be part of a filename.



+  if (0 == strncmp(ENV_PREFIX, arg, ENV_PREFIX_OFFSET))
+{
+  env = xstrndup (arg+ENV_PREFIX_OFFSET, p - (arg+ENV_PREFIX_OFFSET));


Spaces before ( and around operators like +. Please review our coding 
guidelines and have a look at the surrounding code.


Wouldn't it be simpler just to special-case -fdebug-prefix-map in 
gen_producer_string? The environment variable thing strikes me as 
unnecessary.



Bernd


Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [4/3] v2

2015-12-11 Thread Sebastian Pop
Uros Bizjak wrote:
> > Finally the mechanical changes necessary due to the API change in the 
> > walker.
> 
> You forgot to change the graphite part, as in the attached patch.
> 
> 2015-12-10  Uros Bizjak  
> 
> PR tree-optimization/68619
> * graphite-scop-detection.c (gather_bbs::before_dom_children):
> Change return type to an edge.  Always return NULL.
> 
> OK for mainline after successful bootstrap and regtest?

Yes.  Thanks for the patch.

Sebastian


Re: [PR 66616] Check for thunks when adding extra constants to clones

2015-12-11 Thread Jan Hubicka
> Hi,
> 
> PR 66616 happens because in find_more_scalar_values_for_callers_subset
> we do not do the same thunk checks like we do in
> propagate_constants_accross_call.  I am in the process of
> bootstrapping and testing the following patch to fix it.  OK if it
> passes?
> 
> Thanks,
> 
> Martin
> 
> 
> 2015-12-11  Martin Jambor  
> 
>   PR ipa/66616
>   * ipa-cp.c (propagate_constants_accross_call): Move thuk check...
>   (call_passes_through_thunk_p): ...here.
>   (find_more_scalar_values_for_callers_subset): Perform thunk checks
>   like propagate_constants_accross_call does.
> 
> testsuite/
>   * g++.dg/ipa/pr66616.C: New test.

OK,
Honza


[patch] Fix PR middle-end/68215

2015-12-11 Thread Eric Botcazou
Hi,

this is the regression of c-c++-common/opaque-vector.c on 32-bit targets where 
'long double' is 128-bit large, for example PowerPC and SPARC, with an ICE in 
the RTL expander because emit_store_flag is invoked with TImode.

As noted by Ilya, the underlying issue (ICE because emit_store_flag is invoked 
with TImode) is not a regression, but the test now runs into it because the 
veclower pass generates:

  int128_t _15;
   _16;

  _15 = BIT_FIELD_REF <_14, 128, 0>;
  _16 = _15 != 0;
  _17 = _16 ? s.0_4 : s.1_6;

The problematic line is the second one: it's a store flag for int128_t.

Now the veclower pass also generates for the same testcase:

  int128_t _12;

  _12 = BIT_FIELD_REF ;
  _13 = _12 != 0 ? -1 : 0;

which works fine because the predicate is embedded in the condition.

That's why the attached patch changes the veclower pass to embed the predicate 
in the former case too; this is sufficient to fix the regression.

Tested on x86-64/Linux and SPARC/Solaris, OK for the mainline?


2015-12-11  Eric Botcazou  

PR middle-end/68215
* tree-vect-generic.c (tree_vec_extract): Remove GSI parameter.
Do not gimplify the result.
(do_unop): Adjust call to tree_vec_extract.
(do_binop): Likewise.
(do_compare): Likewise.
(do_plus_minus): Likewise.
(do_negate): Likewise.
(expand_vector_condition): Likewise.
(do_cond): Likewise.

-- 
Eric BotcazouIndex: tree-vect-generic.c
===
--- tree-vect-generic.c	(revision 231488)
+++ tree-vect-generic.c	(working copy)
@@ -103,8 +103,7 @@ typedef tree (*elem_op_func) (gimple_stm
 			  tree);
 
 static inline tree
-tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
-		  tree t, tree bitsize, tree bitpos)
+tree_vec_extract (tree type, tree t, tree bitsize, tree bitpos)
 {
   if (TREE_CODE (t) == SSA_NAME)
 {
@@ -115,22 +114,21 @@ tree_vec_extract (gimple_stmt_iterator *
 		  && gimple_assign_rhs_code (def_stmt) == CONSTRUCTOR)))
 	t = gimple_assign_rhs1 (def_stmt);
 }
+
   if (bitpos)
 {
   if (TREE_CODE (type) == BOOLEAN_TYPE)
 	{
 	  tree itype
 	= build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
-	  tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
-	bitsize, bitpos);
-	  return gimplify_build2 (gsi, NE_EXPR, type, field,
-  build_zero_cst (itype));
+	  tree field = fold_build3 (BIT_FIELD_REF, itype, t, bitsize, bitpos);
+	  return fold_build2 (NE_EXPR, type, field, build_zero_cst (itype));
 	}
-  else
-	return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
+ 
+  return fold_build3 (BIT_FIELD_REF, type, t, bitsize, bitpos);
 }
-  else
-return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
+
+  return fold_build1 (VIEW_CONVERT_EXPR, type, t);
 }
 
 static tree
@@ -138,7 +136,7 @@ do_unop (gimple_stmt_iterator *gsi, tree
 	 tree b ATTRIBUTE_UNUSED, tree bitpos, tree bitsize,
 	 enum tree_code code, tree type ATTRIBUTE_UNUSED)
 {
-  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  a = tree_vec_extract (inner_type, a, bitsize, bitpos);
   return gimplify_build1 (gsi, code, inner_type, a);
 }
 
@@ -148,9 +146,9 @@ do_binop (gimple_stmt_iterator *gsi, tre
 	  tree type ATTRIBUTE_UNUSED)
 {
   if (TREE_CODE (TREE_TYPE (a)) == VECTOR_TYPE)
-a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+a = tree_vec_extract (inner_type, a, bitsize, bitpos);
   if (TREE_CODE (TREE_TYPE (b)) == VECTOR_TYPE)
-b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+b = tree_vec_extract (inner_type, b, bitsize, bitpos);
   return gimplify_build2 (gsi, code, inner_type, a, b);
 }
 
@@ -169,8 +167,8 @@ do_compare (gimple_stmt_iterator *gsi, t
   tree cst_true = build_all_ones_cst (stype);
   tree cmp;
 
-  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
-  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+  a = tree_vec_extract (inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (inner_type, b, bitsize, bitpos);
 
   cmp = build2 (code, boolean_type_node, a, b);
   return gimplify_build3 (gsi, COND_EXPR, stype, cmp, cst_true, cst_false);
@@ -202,8 +200,8 @@ do_plus_minus (gimple_stmt_iterator *gsi
   low_bits = build_replicated_const (word_type, inner_type, max >> 1);
   high_bits = build_replicated_const (word_type, inner_type, max & ~(max >> 1));
 
-  a = tree_vec_extract (gsi, word_type, a, bitsize, bitpos);
-  b = tree_vec_extract (gsi, word_type, b, bitsize, bitpos);
+  a = tree_vec_extract (word_type, a, bitsize, bitpos);
+  b = tree_vec_extract (word_type, b, bitsize, bitpos);
 
   signs = gimplify_build2 (gsi, BIT_XOR_EXPR, word_type, a, b);
   b_low = gimplify_build2 (gsi, BIT_AND_EXPR, word_type, b, low_bits);
@@ -235,7 +233,7 @@ do_negate (gimple_stmt_iterator *gsi, tr
   low_bits = build_replicated_const (word_type, inner_type, max >> 1);
  

[PR 66616] Check for thunks when adding extra constants to clones

2015-12-11 Thread Martin Jambor
Hi,

PR 66616 happens because in find_more_scalar_values_for_callers_subset
we do not do the same thunk checks like we do in
propagate_constants_accross_call.  I am in the process of
bootstrapping and testing the following patch to fix it.  OK if it
passes?

Thanks,

Martin


2015-12-11  Martin Jambor  

PR ipa/66616
* ipa-cp.c (propagate_constants_accross_call): Move thuk check...
(call_passes_through_thunk_p): ...here.
(find_more_scalar_values_for_callers_subset): Perform thunk checks
like propagate_constants_accross_call does.

testsuite/
* g++.dg/ipa/pr66616.C: New test.
---
 gcc/ipa-cp.c   | 25 +-
 gcc/testsuite/g++.dg/ipa/pr66616.C | 54 ++
 2 files changed, 73 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr66616.C

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 6ba2f14..f0dcdf5 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1877,6 +1877,18 @@ propagate_aggs_accross_jump_function (struct cgraph_edge 
*cs,
   return ret;
 }
 
+/* Return true if on the way cfrom CS->caller to the final (non-alias and
+   non-thunk) destination, the call passes through a thunk.  */
+
+static bool
+call_passes_through_thunk_p (cgraph_edge *cs)
+{
+  cgraph_node *alias_or_thunk = cs->callee;
+  while (alias_or_thunk->alias)
+alias_or_thunk = alias_or_thunk->get_alias_target ();
+  return alias_or_thunk->thunk.thunk_p;
+}
+
 /* Propagate constants from the caller to the callee of CS.  INFO describes the
caller.  */
 
@@ -1885,7 +1897,7 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
 {
   struct ipa_node_params *callee_info;
   enum availability availability;
-  struct cgraph_node *callee, *alias_or_thunk;
+  cgraph_node *callee;
   struct ipa_edge_args *args;
   bool ret = false;
   int i, args_count, parms_count;
@@ -1923,10 +1935,7 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
   /* If this call goes through a thunk we must not propagate to the first (0th)
  parameter.  However, we might need to uncover a thunk from below a series
  of aliases first.  */
-  alias_or_thunk = cs->callee;
-  while (alias_or_thunk->alias)
-alias_or_thunk = alias_or_thunk->get_alias_target ();
-  if (alias_or_thunk->thunk.thunk_p)
+  if (call_passes_through_thunk_p (cs))
 {
   ret |= set_all_contains_variable (ipa_get_parm_lattices (callee_info,
   0));
@@ -3493,7 +3502,11 @@ find_more_scalar_values_for_callers_subset (struct 
cgraph_node *node,
  struct ipa_jump_func *jump_func;
  tree t;
 
-  if (i >= ipa_get_cs_argument_count (IPA_EDGE_REF (cs)))
+  if (i >= ipa_get_cs_argument_count (IPA_EDGE_REF (cs))
+ || (i == 0
+ && call_passes_through_thunk_p (cs))
+ || (!cs->callee->instrumentation_clone
+ && cs->callee->function_symbol ()->instrumentation_clone))
 {
   newval = NULL_TREE;
   break;
diff --git a/gcc/testsuite/g++.dg/ipa/pr66616.C 
b/gcc/testsuite/g++.dg/ipa/pr66616.C
new file mode 100644
index 000..440ea6c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr66616.C
@@ -0,0 +1,54 @@
+// { dg-do run }
+// { dg-options "-O2 -fipa-cp-clone" }
+
+struct Distraction
+{
+  char fc[8];
+  virtual Distraction * return_self ()
+  { return this; }
+};
+
+static int go;
+
+struct A;
+
+struct A
+{
+  int fi;
+
+  A () : fi(0) {}
+  A (int pi) : fi (pi) {}
+  virtual void foo (int p) = 0;
+};
+
+struct B;
+
+struct B : public Distraction, A
+{
+  B () : Distraction(), A() { }
+  B (int pi) : Distraction (), A (pi) {}
+  virtual void foo (int p)
+  {
+int o = fi;
+for (int i = 0; i < p; i++)
+  o += i + i * i;
+go = o;
+  }
+};
+
+struct B gb2 (2);
+
+extern "C" void abort (void);
+
+int
+main (void)
+{
+  for (int i = 0; i < 2; i++)
+{
+  struct A *p = 
+  p->foo (0);
+  if (go != 2)
+   abort ();
+}
+  return 0;
+}
-- 
2.6.3



Re: [PATCH] Fix PR c++/21802 (two-stage name lookup fails for operators)

2015-12-11 Thread Patrick Palka
On Thu, Dec 10, 2015 at 4:43 PM, Patrick Palka  wrote:
> This patch fixes name-lookup of operators in template definitions whose
> operands are non-dependent expressions, i.e. PR c++/21802 (and
> incidentally 53223).
>
> The approach that this patch takes is to detect when build_new_op()
> returns a call to an overloaded function and to store a call to this
> overload intothe template AST instead of storing the raw operator
> (an operator would be erroneously subject to overload resolution during
> instantiation).
>
> The new function build_min_non_dep_op_overload is the workhorse of the
> patch.  It reconstructs the CALL_EXPR that would have been built had an
> explicit operator+, operator* etc call been used, i.e. had the overload
> gone through finish_call_expr() / build_new_method_call() instead of
> through build_new_op().  The parameter OVERLOAD of this new function is
> probably not strictly necessary -- one can probably just look at the
> CALL_EXPR_FN of the parameter NON_DEP to figure out the overload to use
> -- but since the requisite plumbing from build_new_op() already existed
> to conveniently get at the overload information I thought I might as
> well use it.
>
> I have also created a test case that hopefully exercises all the changes
> that were made and to verify that these operator calls are being built
> correctly.
>
> Does this approach seem adequate?  Bootstrap and regtesting in progress
> on x86_64, OK to commit if testing succeeds?

Unfortunately this patch doesn't work properly on operator overloads
that are defined as friend functions.  E.g. the following now fails to
compile:

struct A
{
  friend int operator* (A);
};

template 
void func(T t)
{
  A x;
  int y = *x;
}

int main()
{
  func(0);
}

I think this happens because KOENIG_LOOKUP_P is not being properly set
in in the CALL_EXPR we are reconstructing.  Not sure how to fix that
yet.


>
> gcc/cp/ChangeLog:
>
> PR c++/21802
> PR c++/53223
> * cp-tree.h (build_min_non_dep_op_overload): Declare.
> * tree.c (build_min_non_dep_op_overload): Define.
> * typeck.c (build_x_indirect_ref): Use
> build_min_non_dep_op_overload when the given expression
> has been resolved to an operator overload.
> (build_x_binary_op): Likewise.
> (build_x_array_ref): Likewise.
> (build_x_unary_op): Likewise.
> (build_x_compound_expr): Likewise.
> (build_x_modify_expr): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/21802
> PR c++/53223
> * g++.dg/cpp0x/pr53223.C: New test.
> * g++.dg/lookup/pr21802.C: New test.
> * g++.dg/lookup/two-stage4.C: Remove XFAIL.
> ---
>  gcc/cp/cp-tree.h |   1 +
>  gcc/cp/tree.c|  64 
>  gcc/cp/typeck.c  | 100 +---
>  gcc/testsuite/g++.dg/cpp0x/pr53223.C |  35 
>  gcc/testsuite/g++.dg/lookup/pr21802.C| 271 
> +++
>  gcc/testsuite/g++.dg/lookup/two-stage4.C |   2 +-
>  6 files changed, 453 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr53223.C
>  create mode 100644 gcc/testsuite/g++.dg/lookup/pr21802.C
>
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 6190f4e..3487d77 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -6513,6 +6513,7 @@ extern tree build_min (enum 
> tree_code, tree, ...);
>  extern tree build_min_nt_loc   (location_t, enum tree_code,
>  ...);
>  extern tree build_min_non_dep  (enum tree_code, tree, ...);
> +extern tree build_min_non_dep_op_overload  (enum tree_code, tree, tree, 
> ...);
>  extern tree build_min_non_dep_call_vec (tree, tree, vec 
> *);
>  extern tree build_cplus_new(tree, tree, tsubst_flags_t);
>  extern tree build_aggr_init_expr   (tree, tree);
> diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
> index 5dad0a7..2635736 100644
> --- a/gcc/cp/tree.c
> +++ b/gcc/cp/tree.c
> @@ -2744,6 +2744,70 @@ build_min_non_dep_call_vec (tree non_dep, tree fn, 
> vec *argvec)
>return convert_from_reference (t);
>  }
>
> +/* Similar to build_min_non_dep, but for expressions that have been resolved 
> to
> +   a call to an operator overload.  OP is the operator that has been
> +   overloaded.  NON_DEP is the non-dependent expression that's been built,
> +   which should be a CALL_EXPR or an INDIRECT_REF to a CALL_EXPR.  OVERLOAD 
> is
> +   the overload that NON_DEP is calling.  */
> +
> +tree
> +build_min_non_dep_op_overload (enum tree_code op,
> +  tree non_dep,
> +  tree overload, ...)
> +{
> +  va_list p;
> +  int nargs;
> +  tree fn, call;
> +  vec *args;
> +
> +  if (REFERENCE_REF_P (non_dep))
> +non_dep = 

[patch] libstdc++/59768 Fix std::invoke support for reference_wrappers

2015-12-11 Thread Jonathan Wakely

My attempt to implement LWG 2219 wasn't very good and resulted in
ambiguous overloads for the testcase in PR 59768. This fixes it, and
makes sure that const reference_wrappers can be used too.

Tested powerpc64le-linux, committed to trunk.


commit 9082b4be70fe2e4993ae0c765608fcd8da544bc5
Author: Jonathan Wakely 
Date:   Fri Dec 11 19:46:49 2015 +

Fix std::invoke support for reference_wrappers

	PR libstdc++/59768
	* include/std/functional (_Unwrap, __invfwd): Define.
	(__invoke_impl): Remove reference_wrapper overloads and use __invfwd.
	* include/std/type_traits (__result_of_memobj, __result_of_memfun):
	Add partial specializations for const reference_wrappers and simplify.
	* testsuite/20_util/bind/ref_neg.cc: Use dg-excess-errors.
	* testsuite/20_util/function_objects/invoke/59768.cc: New.

diff --git a/libstdc++-v3/include/std/functional b/libstdc++-v3/include/std/functional
index f1dc839..19caa96 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -184,6 +184,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : _Weak_result_type_impl::type>
 { };
 
+  template::type>
+struct _Unwrap
+{
+  using type = _Tp&&;
+
+  // Equivalent to std::forward<_Tp>
+  static constexpr _Tp&&
+  _S_fwd(_Tp& __t) noexcept { return static_cast<_Tp&&>(__t); }
+};
+
+  template
+struct _Unwrap<_Tp, reference_wrapper<_Up>>
+{
+  using type = _Up&;
+
+  // Get an lvalue-reference from a reference_wrapper.
+  static _Up&
+  _S_fwd(const _Tp& __t) noexcept { __t.get(); }
+};
+
+  // Used by __invoke_impl instead of std::forward<_Tp> so that a
+  // reference_wrapper is converted to an lvalue-reference.
+  template
+typename _Unwrap<_Tp>::type
+__invfwd(typename remove_reference<_Tp>::type& __t) noexcept
+{ return _Unwrap<_Tp>::_S_fwd(__t); }
+
   template
 inline _Res
 __invoke_impl(__invoke_other, _Fn&& __f, _Args&&... __args)
@@ -194,15 +221,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline _Res
 __invoke_impl(__invoke_memfun_ref, _MemFun&& __f, _Tp&& __t,
 		  _Args&&... __args)
-noexcept(noexcept((forward<_Tp>(__t).*__f)(forward<_Args>(__args)...)))
-{ return (forward<_Tp>(__t).*__f)(forward<_Args>(__args)...); }
-
-  template
-inline _Res
-__invoke_impl(__invoke_memfun_ref, _MemFun&& __f,
-		  reference_wrapper<_Tp> __t, _Args&&... __args)
-noexcept(noexcept((__t.get().*__f)(forward<_Args>(__args)...)))
-{ return (__t.get().*__f)(forward<_Args>(__args)...); }
+noexcept(noexcept((__invfwd<_Tp>(__t).*__f)(forward<_Args>(__args)...)))
+{ return (__invfwd<_Tp>(__t).*__f)(forward<_Args>(__args)...); }
 
   template
 inline _Res
@@ -214,15 +234,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 inline _Res
 __invoke_impl(__invoke_memobj_ref, _MemFun&& __f, _Tp&& __t)
-noexcept(noexcept(forward<_Tp>(__t).*__f))
-{ return forward<_Tp>(__t).*__f; }
-
-  template
-inline _Res
-__invoke_impl(__invoke_memobj_ref, _MemFun&& __f,
-		  reference_wrapper<_Tp> __t)
-noexcept(noexcept(__t.get().*__f))
-{ return __t.get().*__f; }
+noexcept(noexcept(__invfwd<_Tp>(__t).*__f))
+{ return __invfwd<_Tp>(__t).*__f; }
 
   template
 inline _Res
diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits
index e5102de..5b4d073 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2391,44 +2391,59 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2219.  INVOKE-ing a pointer to member with a reference_wrapper
   //as the object expression
-  template struct reference_wrapper;
 
   template
 struct __result_of_memobj<_Res _Class::*, reference_wrapper<_Arg>>
-: __result_of_memobj<_Res _Class::*, _Arg>
-{
-  typedef typename
-	__result_of_memobj_ref<_Res _Class::*, _Arg&>::type type;
-};
+: __result_of_memobj_ref<_Res _Class::*, _Arg&>
+{ };
 
   template
 struct __result_of_memobj<_Res _Class::*, reference_wrapper<_Arg>&>
-: __result_of_memobj<_Res _Class::*, reference_wrapper<_Arg>>
+: __result_of_memobj_ref<_Res _Class::*, _Arg&>
+{ };
+
+  template
+struct __result_of_memobj<_Res _Class::*, const reference_wrapper<_Arg>&>
+: __result_of_memobj_ref<_Res _Class::*, _Arg&>
 { };
 
   template
 struct __result_of_memobj<_Res _Class::*, reference_wrapper<_Arg>&&>
-: __result_of_memobj<_Res _Class::*, reference_wrapper<_Arg>>
+: __result_of_memobj_ref<_Res _Class::*, _Arg&>
+{ };
+
+  template
+struct __result_of_memobj<_Res _Class::*, const reference_wrapper<_Arg>&&>
+: __result_of_memobj_ref<_Res _Class::*, _Arg&>
 { };
 
   template
 struct __result_of_memfun<_Res _Class::*, reference_wrapper<_Arg>, _Args...>
-: __result_of_memfun<_Res _Class::*, _Arg&, _Args...>
-{
-  typedef typename
-	

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2015-12-11 Thread H.J. Lu
On Thu, Dec 10, 2015 at 3:24 AM, Richard Biener
 wrote:
> On Wed, Dec 9, 2015 at 10:31 PM, Markus Trippelsdorf
>  wrote:
>> On 2015.12.09 at 10:53 -0800, H.J. Lu wrote:
>>>
>>> Empty C++ class is a corner case which isn't covered in psABI nor C++ ABI.
>>> There is no mention of "empty record" in GCC documentation.  But there are
>>> plenty of "empty class" in gcc/cp.  This change affects all targets.  C++ 
>>> ABI
>>> should specify how it should be passed.
>>
>> There is a C++ ABI mailinglist, where you could discuss this issue:
>> http://sourcerytools.com/cgi-bin/mailman/listinfo/cxx-abi-dev
>
> Yep.  As long as the ABI doesn't state how to pass those I'd rather _not_ 
> change
> GCCs way.

It is agreed that GCC is wrong on this:

http://sourcerytools.com/pipermail/cxx-abi-dev/2015-December/002876.html

Here is the updated patch.   I updated -WpsABI to warn empty
record which are passed in a variable argument list or aren't the last
arguments.   They are triggered in:

/export/build/gnu/gcc-x32/build-x86_64-linux/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/hashtable.h:1507:7:
note: the ABI of passing empty record has changed in GCC 6
/export/build/gnu/gcc-x32/build-x86_64-linux/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/bits/hashtable_policy.h:901:67:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:238:4:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:266:26:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:273:4:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:289:61:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:296:11:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:304:11:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:312:11:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:320:11:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:328:11:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:341:4:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:375:19:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:375:4:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:390:19:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:390:4:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:415:16:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:425:12:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:442:29:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:449:4:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:457:4:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:500:5:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:529:5:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:547:5:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:569:5:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:617:5:
note: the ABI of passing empty record has changed in GCC 6
/export/gnu/import/git/sources/gcc/libstdc++-v3/src/c++11/cxx11-shim_facets.cc:637:5:
note: the ABI of passing 

Re: [patch] Fix PR middle-end/68215

2015-12-11 Thread Eric Botcazou
> Presumably we know the code we're generating here is always gimple.

In fact all the callers of tree_vec_extract pass the result to gimplify_build 
routines, so everything is gimplified if need be; it was over-gimplification.

> OK for the trunk.

Thanks.

-- 
Eric Botcazou


Re: [Patch, MIPS] Remove definition of TARGET_PROMOTE_PROTOTYPES

2015-12-11 Thread Steve Ellcey
Patch ping.

Steve Ellcey
sell...@imgtec.com


On Tue, 2015-11-10 at 15:57 -0800, Steve Ellcey wrote:
> This patch removes the definition of TARGET_PROMOTE_PROTOTYPES from MIPS,
> where it was defined as true, so that it now defaults to false.
> 
> Currently MIPS does prototype promotion in the caller and the callee and this
> patch removes the TARGET_PROMOTE_PROTOTYPES macro definition so that
> the promotion is only done in the caller (due to PROMOTE_MODE being defined).
> This does not break the ABI which requires the caller to do promotions anyway.
> (See https://gcc.gnu.org/ml/gcc/2015-10/msg00223.html).  This change also
> causes GCC to match what the LLVM and Greenhills compilers already do on MIPS.
> 
> After removing this macro I had three regressions, two were just tests that
> needed changing but one was a bug (gcc.dg/fixed-point/convert-sat.c).
> This test was calling a library function to convert a signed char into an
> unsigned fixed type and because we don't have tree type information about
> libcalls GCC cannot do the ABI required type promotion on those calls that it
> does on normal user defined calls.  In fact promote_mode in explow.c expicitly
> returns without doing anything if no type is given it.  Before this change it
> didn't matter on MIPS because the callee did the same promotion that the 
> caller
> was supposed to have done before using the argument.  Now that callee code is
> gone we depend on the caller doing the correct promotion and that was not
> happening.
> 
> I submitted and checked in another patch to optabs.c
> (See https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00704.html) to provide
> me with the infrastructure to do the correct type promotion in expand_fixed
> and this patch redefines TARGET_PROMOTE_FUNCTION_MODE to return the needed
> promotion mode even when type is NULL_TREE.  When type is set it does
> the same thing as it used to do.  This change allows me to remote the
> definition of TARGET_PROMOTE_PROTOTYPES without the convert-sat.c test
> failing.
> 
> The two tests that I changed are gcc.dg/tree-ssa/ssa-fre-4.c and
> gcc.target/mips/ext-2.c.  ssa-fre-4.c no longer applies to MIPS now
> that we do not define TARGET_PROMOTE_PROTOTYPES so I removed the MIPS
> target from it.  ext-2.c now generates an srl instruction instead of a
> dext instruction but the number of instructions has not changed and I
> updated the scan checks.
> 
> Tested on mips-mti-linux-gnu with no unfixed regressions.  OK to checkin?
> 
> Steve Ellcey
> sell...@imgtec.com
> 
> 
> 2015-11-10  Steve Ellcey  
> 
>   * config/mips/mips.c (mips_promote_function_mode): New function.
>   (TARGET_PROMOTE_FUNCTION_MODE): Define as above function.
>   (TARGET_PROMOTE_PROTOTYPES): Remove.
> 
> 
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 9880b23..e9c3830 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -19760,6 +19760,32 @@ mips_ira_change_pseudo_allocno_class (int regno, 
> reg_class_t allocno_class)
>  return GR_REGS;
>return allocno_class;
>  }
> +
> +/* Implement TARGET_PROMOTE_FUNCTION_MODE */
> +
> +/* This function is equivalent to 
> default_promote_function_mode_always_promote
> +   except that it returns a promoted mode even if type is NULL_TREE.  This is
> +   needed by libcalls which have no type (only a mode) such as fixed 
> conversion
> +   routines that take a signed or unsigned char/short argument and convert it
> +   to a fixed type.  */
> +
> +static machine_mode
> +mips_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
> +machine_mode mode,
> +int *punsignedp ATTRIBUTE_UNUSED,
> +const_tree fntype ATTRIBUTE_UNUSED,
> +int for_return ATTRIBUTE_UNUSED)
> +{
> +  int unsignedp;
> +
> +  if (type != NULL_TREE)
> +return promote_mode (type, mode, punsignedp);
> +
> +  unsignedp = *punsignedp;
> +  PROMOTE_MODE (mode, unsignedp, type);
> +  *punsignedp = unsignedp;
> +  return mode;
> +}
>  
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
> @@ -19864,10 +19890,7 @@ mips_ira_change_pseudo_allocno_class (int regno, 
> reg_class_t allocno_class)
>  #define TARGET_GIMPLIFY_VA_ARG_EXPR mips_gimplify_va_arg_expr
>  
>  #undef  TARGET_PROMOTE_FUNCTION_MODE
> -#define TARGET_PROMOTE_FUNCTION_MODE 
> default_promote_function_mode_always_promote
> -#undef TARGET_PROMOTE_PROTOTYPES
> -#define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true
> -
> +#define TARGET_PROMOTE_FUNCTION_MODE mips_promote_function_mode
>  #undef TARGET_FUNCTION_VALUE
>  #define TARGET_FUNCTION_VALUE mips_function_value
>  #undef TARGET_LIBCALL_VALUE
> 
> 
> 
> 
> 2015-11-10  Steve Ellcey  
> 
>   * gcc.dg/tree-ssa/ssa-fre-4.c: Remove mips*-*-* target.
>   * gcc.target/mips/ext-2.c: Update scan checks.
> 
> 
> diff --git 

[PATCH][PR tree-optimization/68844] Fix testcase expected output

2015-12-11 Thread Jeff Law


As is detailed thoroughly in the BZ entry; DOM was changed to not rely 
on the jump threader to handle trivial conditionals.


Those changes twiddle the number of discovered & realized jump threads 
on some targets.  I've gone through the resultant dumps and verified 
that we're doing the right thing for each of the jump threads that we no 
longer realize (essentially DOM optimizes those paths on its own and 
doesn't need the threader).


Installed on the trunk after verifying ppc64le passes again.

Jeff
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index af682a9..3877b19 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-12-11  Jeff Law  
+
+   PR tree-optimization/68844
+   * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Update expected output.
+
 2015-12-11  Nathan Sidwell  
 
* gcc.dg/pr59605-1.c: Reduce iterations for nvptx.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
index 77ba74c..4258fb5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
@@ -74,6 +74,10 @@ bitmap_ior_and_compl (bitmap dst, const_bitmap a, 
const_bitmap b,
2x "kill_elt->indx >= b_elt->indx" in the first "while" loop
   -> "kill_elt->indx == b_elt->indx" in the second condition,
 skipping the known-true "b_elt && kill_elt" in the second
-condition.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 4 "dom2" { target 
logical_op_short_circuit } } } */
+condition.
+
+   However, 3 of those 4 opportunities are ultimately eliminated by
+   DOM optimizing away conditionals.  So there's only one jump threading
+   opportunity left.  */
+/* { dg-final { scan-tree-dump-times "Threaded" 1 "dom2" { target 
logical_op_short_circuit } } } */
 


adding -Wshadow-local and -Wshadow-compatible-local ?

2015-12-11 Thread Jim Meyering
Hi Diego,

I noticed this patch that adds support for improved -Wshadow-related options:

  [google] Add two new -Wshadow warnings (issue4452058)
   https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02317.html
   https://codereview.appspot.com/4452058/

Here are the proposed descriptions:

-Wshadow-local which warns if a local variable shadows another local
variable or parameter,

-Wshadow-compatible-local which warns if a local variable shadows another
local variable or parameter whose type is compatible with that of the
shadowing variable.

Yet, I see no further discussion of them, other than Jason's review feedback.
Was this change deemed unsuitable for upstream gcc?

Thanks,
Jim


Re: ipa-cp heuristics fixes

2015-12-11 Thread Jan Hubicka
Actually I added
  if (!ipa_is_param_used (info, i)) 
continue;   
shortcut to gather_context_independent_values which prevents
us from recording context_independent_aggregate_values for unused
aggregate parameters. Perhaps that is causing the isssue?
We can simply record them and just avoid returning true if
all propagations happen to those.


Re: [PATCH v2] Do not sanitize left shifts for -fwrapv (PR68418)

2015-12-11 Thread Jeff Law

On 12/09/2015 10:08 AM, Paolo Bonzini wrote:

Left shifts into the sign bit is a kind of overflow, and the
standard chooses to treat left shifts of negative values the
same way.

However, the -fwrapv option modifies the language to one where
integers are defined as two's complement---which also defines
entirely the behavior of shifts.  Disable sanitization of left
shifts when -fwrapv is in effect, using the same logic as
instrument_si_overflow.  The same change was proposed
for LLVM at https://llvm.org/bugs/show_bug.cgi?id=25552.

Bootstrapped/regtested x86_64-pc-linux-gnu.  Ok for trunk, and for
GCC 5 branch after 5.3 is released?

Thanks,

Paolo

gcc:
PR sanitizer/68418
* c-family/c-ubsan.c (ubsan_instrument_shift): Disable
sanitization of left shifts for wrapping signed types as well.

gcc/testsuite:
PR sanitizer/68418
* gcc.dg/ubsan/c99-wrapv-shift-1.c,
gcc.dg/ubsan/c99-wrapv-shift-2.c: New testcases.

Thanks for the pointers to the earlier code that constrains the types.

FWIW Jan Beulich is twiddling the code leading to the 
ubsan_instrument_shift call.  In fact, your change may make Jan's change 
safe :-)



OK for the trunk.

Thanks,
Jeff




Re: [patch] Fix PR middle-end/68215

2015-12-11 Thread Jeff Law

On 12/11/2015 10:02 AM, Eric Botcazou wrote:

Hi,

this is the regression of c-c++-common/opaque-vector.c on 32-bit targets where
'long double' is 128-bit large, for example PowerPC and SPARC, with an ICE in
the RTL expander because emit_store_flag is invoked with TImode.

As noted by Ilya, the underlying issue (ICE because emit_store_flag is invoked
with TImode) is not a regression, but the test now runs into it because the
veclower pass generates:

   int128_t _15;
_16;

   _15 = BIT_FIELD_REF <_14, 128, 0>;
   _16 = _15 != 0;
   _17 = _16 ? s.0_4 : s.1_6;

The problematic line is the second one: it's a store flag for int128_t.

Now the veclower pass also generates for the same testcase:

   int128_t _12;

   _12 = BIT_FIELD_REF ;
   _13 = _12 != 0 ? -1 : 0;

which works fine because the predicate is embedded in the condition.

That's why the attached patch changes the veclower pass to embed the predicate
in the former case too; this is sufficient to fix the regression.

Tested on x86-64/Linux and SPARC/Solaris, OK for the mainline?


2015-12-11  Eric Botcazou  

PR middle-end/68215
* tree-vect-generic.c (tree_vec_extract): Remove GSI parameter.
Do not gimplify the result.
(do_unop): Adjust call to tree_vec_extract.
(do_binop): Likewise.
(do_compare): Likewise.
(do_plus_minus): Likewise.
(do_negate): Likewise.
(expand_vector_condition): Likewise.
(do_cond): Likewise.


Presumably we know the code we're generating here is always gimple.

OK for the trunk.

jeff


Re: [RFC] Request for comments on ivopts patch

2015-12-11 Thread Steve Ellcey
On Wed, 2015-12-09 at 11:24 +0100, Richard Biener wrote:

> > This second case (without the preference for the original IV)
> > generates better code on MIPS because the final assembly
> > has the increment instructions between the loads and the tests
> > of the values being loaded and so there is no delay (or less delay)
> > between the load and use.  It seems like this could easily be
> > the case for other platforms too so I was wondering what people
> > thought of this patch:
> 
> You don't comment on the comment you remove ... debugging
> programs is also important!
> 
> So if then the cost of both cases should be distinguished
> somewhere else, like granting a bonus for increment before
> exit test or so.
> 
> Richard.

Here is new patch that tries to do that.  It accomplishes the same thing
as my original patch but by checking different features.  Basically, for
machines with no autoinc/autodec it has a preference for IVs that don't
change during loop (i.e. var_before == var_after).

What do you think about this approach?

Steve Ellcey
sell...@imgtec.com


2015-12-11  Steve Ellcey  

* tree-ssa-loop-ivopts.c (determine_iv_cost): Add cost to ivs that
need to be updated during loop.


diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 98dc451..ecf9737 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5826,6 +5826,14 @@ determine_iv_cost (struct ivopts_data *data, struct 
iv_cand *cand)
   || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before)))
 cost++;
 
+  /* If we are not using autoincrement or autodecrement, prefer ivs that
+ do not have to be incremented/decremented during the loop.  This can
+ move loads ahead of the instructions that update the address.  */
+  if (cand->pos != IP_BEFORE_USE
+  && cand->pos != IP_AFTER_USE
+  && cand->var_before != cand->var_after)
+cost++;
+
   /* Prefer not to insert statements into latch unless there are some
  already (so that we do not create unnecessary jumps).  */
   if (cand->pos == IP_END




Re: [Patch, libstdc++/68863] Let lookahead regex use captured contents

2015-12-11 Thread Tim Shen
On Fri, Dec 11, 2015 at 10:08 PM, Tim Shen  wrote:
> This is a one-line quick fix for correctness.
>
> I bootstrapped trunk and tested on x86_64-pc-linux-gnu, but I wish I
> can backport it at least to gcc-5-branch.
>

Sorry, I didn't actually write the changelog :P. Updated.


-- 
Regards,
Tim Shen
commit d4bd253408c31f71adb2df6641df0f4d798855c9
Author: Tim Shen 
Date:   Fri Dec 11 21:34:38 2015 -0800

2015-12-12  Tim Shen  

PR libstdc++/68863
* include/bits/regex_executor.tcc (_Executor::_M_lookahead):
Copy the captured content for lookahead, so that the backreferences
inside can refer to them.
* testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc:
New testcase.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index a13f0d5..f5be4d7 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -147,7 +147,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
 _M_lookahead(_StateIdT __next)
 {
-  _ResultsVec __what(_M_cur_results.size());
+  // Backreferences may refer to captured content.
+  // We may want to make this faster by not copying,
+  // but let's not be clever prematurely.
+  _ResultsVec __what(_M_cur_results);
   _Executor __sub(_M_current, _M_end, __what, _M_re, _M_flags);
   __sub._M_states._M_start = __next;
   if (__sub._M_search_from_first())
diff --git 
a/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc 
b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc
new file mode 100644
index 000..9e7a9a7
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc
@@ -0,0 +1,43 @@
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// 28.11.2 regex_match
+
+#include 
+#include 
+#include 
+
+using namespace __gnu_test;
+using namespace std;
+
+// libstdc++/68863
+void
+test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  VERIFY(!std::regex_match("aa", std::regex("(.)(?!\\1).")));
+}
+
+int
+main()
+{
+  test01();
+  return 0;
+}


[Patch, libstdc++/68863] Let lookahead regex use captured contents

2015-12-11 Thread Tim Shen
This is a one-line quick fix for correctness.

I bootstrapped trunk and tested on x86_64-pc-linux-gnu, but I wish I
can backport it at least to gcc-5-branch.

Thanks!


-- 
Regards,
Tim Shen
commit 46b13f280fcbec6293ad614fb8f30f5882c7106d
Author: Tim Shen 
Date:   Fri Dec 11 21:34:38 2015 -0800

2015-12-12  Tim Shen  

PR libstdc++/68863
* include/bits/regex_executor.tcc
* testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index a13f0d5..f5be4d7 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -147,7 +147,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
 _M_lookahead(_StateIdT __next)
 {
-  _ResultsVec __what(_M_cur_results.size());
+  // Backreferences may refer to captured content.
+  // We may want to make this faster by not copying,
+  // but let's not be clever prematurely.
+  _ResultsVec __what(_M_cur_results);
   _Executor __sub(_M_current, _M_end, __what, _M_re, _M_flags);
   __sub._M_states._M_start = __next;
   if (__sub._M_search_from_first())
diff --git 
a/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc 
b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc
new file mode 100644
index 000..9e7a9a7
--- /dev/null
+++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc
@@ -0,0 +1,43 @@
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// 28.11.2 regex_match
+
+#include 
+#include 
+#include 
+
+using namespace __gnu_test;
+using namespace std;
+
+// libstdc++/68863
+void
+test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  VERIFY(!std::regex_match("aa", std::regex("(.)(?!\\1).")));
+}
+
+int
+main()
+{
+  test01();
+  return 0;
+}


[PTX] reduce testcase time

2015-12-11 Thread Nathan Sidwell
This test can timeout on PTX during execution.  Reducing the number of 
iterations as-if ptxx was a simulator, resolves that.


nathan
2015-12-11  Nathan Sidwell  

	* gcc.dg/pr59605-1.c: Reduce iterations for nvptx.

Index: gcc/testsuite/gcc.dg/pr59605-2.c
===
--- gcc/testsuite/gcc.dg/pr59605-2.c	(revision 231566)
+++ gcc/testsuite/gcc.dg/pr59605-2.c	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
-/* { dg-additional-options "-DMAX_COPY=1025" { target simulator } } */
+/* { dg-additional-options "-DMAX_COPY=1025" { target { { simulator } || { nvptx-*-* } } } } */
 /* { dg-additional-options "-minline-stringops-dynamically" { target { i?86-*-* x86_64-*-* } } } */
 
 #include "pr59605.c"


Prune TYPE_FIELDS lists more in free_lang_data

2015-12-11 Thread Jan Hubicka
Hi,
this patch further reduce memory use and time of WPA stage, especially without 
-g
 phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) wall  
855644 kB (21%) ggc
 phase stream in :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) wall 
3245604 kB (79%) ggc
 phase stream out:  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) wall  
50 kB ( 0%) ggc
 ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) wall  
 0 kB ( 0%) ggc
 ipa cp  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) wall  
340698 kB ( 8%) ggc
 ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) wall  
500427 kB (12%) ggc
 lto stream inflate  :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple in   :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) wall  
324875 kB ( 8%) ggc
 ipa lto gimple out  :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) wall  
50 kB ( 0%) ggc
 ipa lto decl in :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) wall 
2348869 kB (57%) ggc
 ipa lto decl out:  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) wall  
 0 kB ( 0%) ggc
 ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa lto decl merge  :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) wall  
 13737 kB ( 0%) ggc
 ipa reference   :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa pure const  :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa icf :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) wall  
 19203 kB ( 0%) ggc
 TOTAL : 195.2712.37   207.64
4103297 kB

to:

 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
  1996 kB ( 0%) ggc
 phase opt and generate  :  77.17 (53%) usr   1.69 ( 9%) sys  79.45 (48%) wall  
856874 kB (26%) ggc
 phase stream in :  25.92 (18%) usr   1.75 (10%) sys  27.66 (17%) wall 
2418654 kB (74%) ggc
 phase stream out:  39.90 (27%) usr  14.74 (81%) sys  54.82 (33%) wall  
50 kB ( 0%) ggc
 phase finalize  :   2.52 ( 2%) usr   0.11 ( 1%) sys   2.63 ( 2%) wall  
 0 kB ( 0%) ggc
 garbage collection  :   4.56 ( 3%) usr   0.01 ( 0%) sys   4.56 ( 3%) wall  
 0 kB ( 0%) ggc
 ipa dead code removal   :   4.32 ( 3%) usr   0.03 ( 0%) sys   4.59 ( 3%) wall  
 2 kB ( 0%) ggc
 ipa virtual call target :  23.19 (16%) usr   0.18 ( 1%) sys  23.31 (14%) wall  
 0 kB ( 0%) ggc
 ipa cp  :   4.06 ( 3%) usr   0.18 ( 1%) sys   4.10 ( 2%) wall  
339974 kB (10%) ggc
 ipa inlining heuristics :  25.05 (17%) usr   0.32 ( 2%) sys  25.86 (16%) wall  
500986 kB (15%) ggc
 lto stream inflate  :   5.50 ( 4%) usr   0.42 ( 2%) sys   5.73 ( 3%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple in   :   1.97 ( 1%) usr   0.51 ( 3%) sys   2.70 ( 2%) wall  
324937 kB (10%) ggc
 ipa lto gimple out  :   9.00 ( 6%) usr   1.59 ( 9%) sys  10.22 ( 6%) wall  
50 kB ( 0%) ggc
 ipa lto decl in :  14.29 (10%) usr   0.73 ( 4%) sys  15.18 ( 9%) wall 
1522854 kB (46%) ggc
 ipa lto decl out:  25.35 (17%) usr   0.59 ( 3%) sys  25.91 (16%) wall  
 0 kB ( 0%) ggc
 ipa lto constructors out:   1.48 ( 1%) usr   0.51 ( 3%) sys   2.38 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa lto cgraph I/O  :   0.74 ( 1%) usr   0.22 ( 1%) sys   0.97 ( 1%) wall  
408576 kB (12%) ggc
 ipa lto decl merge  :   1.94 ( 1%) usr   0.00 ( 0%) sys   1.95 ( 1%) wall  
 13556 kB ( 0%) ggc
 whopr wpa I/O   :   2.95 ( 2%) usr  12.03 (66%) sys  15.17 ( 9%) wall  
 0 kB ( 0%) ggc
 whopr partitioning  :   3.99 ( 3%) usr   0.03 ( 0%) sys   4.01 ( 2%) wall  
 13619 kB ( 0%) ggc
 ipa reference   :   2.45 ( 2%) usr   0.01 ( 0%) sys   2.46 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa pure const  :   2.30 ( 2%) usr   0.03 ( 0%) sys   2.33 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa icf :   8.30 ( 6%) usr   0.26 ( 1%) sys   8.37 ( 5%) wall  
 19276 kB ( 1%) ggc
 TOTAL : 145.5118.29   164.57
3277576 kB

With debug output the numbers are not that impressive, but sitll about 17% down 
from decl in.
It also leads to about 63% code size reduction for global decl streams.

I built WPA with -flto-partition=max and looked into one of partitions that 
seemed most absurd.
We used about 180k type delcs to produce about 700 lines of assembler that 
mostly contained
a calls to various methods. THe thing is that each method borught in a lot of 
declarations
so I looked into why and noticed that TYPE_FIELDS contains TYPE_DECLS that are 
mostly ignored
by the back-end expect for dwaf2out and dwarf2out actually ignores good portion 
of them, too.

I thus made a predicate to tell waht decls 

Re: Reduce global decl stream

2015-12-11 Thread Richard Biener
On Fri, 11 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch saves about 30% of global decl stream size in firefox.  While
> implementing the lto sections for initializers I put very stupid heursitcs
> to get_symbol_initial_value deciding whether the initializer is better 
> streamed
> inline or offline.  This ignores strings and may get bit out of hand.
> 
> With this patch and the compression, the largest ltrans unit is 
> 118479156 bytes and 103584016 out of that is a global decl stream.

So that's still 87% global decl stream.  At least for this ltrans
unit there couldn't have been a 30% saving.

Btw, the separate initializer sections will get separate string
encoders, right?  So we might end up with larger files due
to less string sharing which would happen when the strings go into
the decl section.

> Bootstrapped/regtested x86_64-linux OK?
> 
> Honza
> 
>   * lto-streamer-out.c (subtract_estimated_size): New function
>   (get_symbol_initial_value): Use it.
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 231546)
> +++ lto-streamer-out.c(working copy)
> @@ -309,6 +309,33 @@ lto_is_streamable (tree expr)
>|| TREE_CODE_CLASS (code) != tcc_statement);
>  }
>  
> +/* Very rough estimate of streaming size of the initializer.  If we ignored
> +   presence of strings, we could simply just count number of non-indexable
> +   tree nodes and number of references to indexable nodes.  Strings however
> +   may be very large and we do not want to dump them int othe global stream.
> +
> +   Count the size of initializer until the size in DATA is positive.  */
> +
> +static tree
> +subtract_estimated_size (tree *tp, int *ws, void *data)
> +{
> +  long *sum = (long *)data;
> +  if (tree_is_indexable (*tp))
> +{
> +  *sum -= 4;
> +  *ws = 0;
> +}
> +  if (TREE_CODE (*tp) == STRING_CST)
> +*sum -= TREE_STRING_LENGTH (*tp) + 8;
> +  if (TREE_CODE (*tp) == IDENTIFIER_NODE)
> +*sum -= IDENTIFIER_LENGTH (*tp) + 8;

I doubt we can ever see those.

> +  else
> +*sum -= 16;
> +  if (*sum < 0)
> +return *tp;
> +  return NULL_TREE;
> +}
> +

I'd like to see an explanation for the magic constants.  Also
a FE might construct

 int *a = _DECL;

with CONST_DECL having a large array initializer.  walk_tree doesn't
traverse CONST_DECLs DECL_INITIAL.

[insert rant about STRING_CSTs being special and not tied to a CONST_DECL]

>  /* For EXPR lookup and return what we want to stream to OB as DECL_INITIAL.  
> */
>  
> @@ -329,10 +356,16 @@ get_symbol_initial_value (lto_symtab_enc
>varpool_node *vnode;
>/* Extra section needs about 30 bytes; do not produce it for simple
>scalar values.  */
> -  if (TREE_CODE (DECL_INITIAL (expr)) == CONSTRUCTOR
> -   || !(vnode = varpool_node::get (expr))
> +  if (!(vnode = varpool_node::get (expr))
> || !lto_symtab_encoder_encode_initializer_p (encoder, vnode))
>  initial = error_mark_node;
> +  if (initial != error_mark_node)
> + {
> +   long max_size = 30;
> +   if (walk_tree (, subtract_estimated_size, (void *)_size,
> +  NULL))
> + initial = error_mark_node;
> + }
>  }
>return initial;
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH, i386]: Fix PR 67484 (version 2)

2015-12-11 Thread Martin Liška
Hello.

I've just applied suggested change that Richi proposed.
The patch can bootstrap on x86_64-linux-gnu and survives regression tests. 
Moreover,
the memory leak/invalid read has gone.

Ready for trunk?
Thanks,
Martin
>From f0eca2d3efd711498fbc8b7e8980fdf32f0abfd0 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 10 Dec 2015 14:02:13 +0100
Subject: [PATCH] Fix PR target/67484

gcc/ChangeLog:

2015-12-10  Martin Liska  
	Uros Bizjak  

	PR target/67484
	* config/i386/i386.c (ix86_valid_target_attribute_tree):
	Use ggc_strdup to copy option_strings to opts->x_ix86_arch_string and
	opts->x_ix86_tune_string.
---
 gcc/config/i386/i386.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d30fbff..2eacea7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6152,7 +6152,7 @@ ix86_valid_target_attribute_tree (tree args,
   if (option_strings[IX86_FUNCTION_SPECIFIC_ARCH])
 	{
 	  opts->x_ix86_arch_string
-	= option_strings[IX86_FUNCTION_SPECIFIC_ARCH];
+	= ggc_strdup (option_strings[IX86_FUNCTION_SPECIFIC_ARCH]);
 
 	  /* If arch= is set,  clear all bits in x_ix86_isa_flags,
 	 except for ISA_64BIT, ABI_64, ABI_X32, and CODE16.  */
@@ -6166,7 +6166,8 @@ ix86_valid_target_attribute_tree (tree args,
 	opts->x_ix86_arch_string = NULL;
 
   if (option_strings[IX86_FUNCTION_SPECIFIC_TUNE])
-	opts->x_ix86_tune_string = option_strings[IX86_FUNCTION_SPECIFIC_TUNE];
+	opts->x_ix86_tune_string
+	  = ggc_strdup (option_strings[IX86_FUNCTION_SPECIFIC_TUNE]);
   else if (orig_tune_defaulted)
 	opts->x_ix86_tune_string = NULL;
 
-- 
2.6.3



RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-11 Thread Ajit Kumar Agarwal
Hello Jeff:

Sorry for the delay in sending the benchmarks run with Split-Path change.

Here is the Summary of the results.

SPEC CPU 2000 INT benchmarks ( Target i386)
( Geomean Score without Split-Paths changes vs Geomean Score with Split-Path 
changes  =  3740.789 vs 3745.193).

SPEC CPU 2000 FP benchmarks. ( Target i386)
( Geomean Score without Split-Paths changes vs Geomean Score with Split-Path 
changes  =  4721.655 vs 4741.825).

Mibench/EEMBC benchmarks (Target Microblaze)

Automotive_qsort1(4.03%), Office_ispell(4.29%), Office_stringsearch1(3.5%). 
Telecom_adpcm_d( 1.37%), ospfv2_lite(1.35%).

We are seeing minor negative gains that are mainly noise.(less than 0.5%)

Thanks & Regards
Ajit
-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Friday, December 11, 2015 1:39 AM
To: Richard Biener
Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya Gupta; 
Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 12/03/2015 07:38 AM, Richard Biener wrote:
> This pass is now enabled by default with -Os but has no limits on the 
> amount of stmts it copies.
The more statements it copies, the more likely it is that the path spitting 
will turn out to be useful!  It's counter-intuitive.

The primary benefit AFAICT with path splitting is that it exposes additional 
CSE, DCE, etc opportunities.

IIRC  Ajit posited that it could help with live/conflict analysis, I never saw 
that, and with the changes to push splitting deeper into the pipeline I'd 
further life/conflict analysis since that work also involved preserving the 
single latch property.



  It also will make all loops with this shape have at least two
> exits (if the resulting loop will be disambiguated the inner loop will 
> have two exits).
> Having more than one exit will disable almost all loop optimizations after it.
Hmmm, the updated code keeps the single latch property, but I'm pretty sure it 
won't keep a single exit policy.

To keep a single exit policy would require keeping an additional block around.  
Each of the split paths would unconditionally transfer to this new block.  The 
new block would then either transfer to the latch block or out of the loop.


>
> The pass itself documents the transform it does but does zero to motivate it.
>
> What's the benefit of this pass (apart from disrupting further optimizations)?
It's essentially building superblocks in a special case to enable additional 
CSE, DCE and the like.

Unfortunately what is is missing is heuristics and de-duplication.  The former 
to drive cases when it's not useful and the latter to reduce codesize for any 
statements that did not participate in optimizations when they were duplicated.

The de-duplication is the "sink-statements-through-phi" problems, cross 
jumping, tail merging and the like class of problems.

It was only after I approved this code after twiddling it for Ajit that I came 
across Honza's tracer implementation, which may in fact be retargettable to 
these loops and do a better job.  I haven't experimented with that.



>
> I can see a _single_ case where duplicating the latch will allow 
> threading one of the paths through the loop header to eliminate the 
> original exit.  Then disambiguation may create a nice nested loop out 
> of this.  Of course that is only profitable again if you know the 
> remaining single exit of the inner loop (exiting to the outer one) is 
> executed infrequently (thus the inner loop actually loops).
It wasn't ever about threading.

>
> But no checks other than on the CFG shape exist (oh, it checks it will 
> at _least_ copy two stmts!).
Again, the more statements it copies the more likely it is to be profitable.  
Think superblocks to expose CSE, DCE and the like.

>
> Given the profitability constraints above (well, correct me if I am 
> wrong on these) it looks like the whole transform should be done 
> within the FSM threading code which might be able to compute whether 
> there will be an inner loop with a single exit only.
While it shares some concepts with jump threading, I don't think the 
transformation belongs in jump threading.

>
> I'm inclined to request the pass to be removed again or at least 
> disabled by default.
I wouldn't lose any sleep if we disabled by default or removed, particularly if 
we can repurpose Honza's code.  In fact, I might strongly support the former 
until we hear back from Ajit on performance data.

I also keep coming back to Click's paper on code motion -- in that context, 
copying statements would be a way to break dependencies and give the global 
code motion algorithm more freedom.  The advantage of doing it in a framework 
like Click's is it's got a built-in sinking step.


>
> What closed source benchmark was this transform invented for?
I think it was EEMBC or Coremark.  Ajit should know for sure.  I was actually 
still hoping to see benchmark results from Ajit to 

Re: Prune TYPE_FIELDS lists more in free_lang_data

2015-12-11 Thread Richard Biener
On Fri, 11 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch further reduce memory use and time of WPA stage, especially 
> without -g
>  phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) 
> wall  855644 kB (21%) ggc
>  phase stream in :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) 
> wall 3245604 kB (79%) ggc
>  phase stream out:  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) 
> wall  50 kB ( 0%) ggc
>  ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) 
> wall   0 kB ( 0%) ggc
>  ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) 
> wall   0 kB ( 0%) ggc
>  ipa cp  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) 
> wall  340698 kB ( 8%) ggc
>  ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) 
> wall  500427 kB (12%) ggc
>  lto stream inflate  :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) 
> wall   0 kB ( 0%) ggc
>  ipa lto gimple in   :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) 
> wall  324875 kB ( 8%) ggc
>  ipa lto gimple out  :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) 
> wall  50 kB ( 0%) ggc
>  ipa lto decl in :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) 
> wall 2348869 kB (57%) ggc
>  ipa lto decl out:  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) 
> wall   0 kB ( 0%) ggc
>  ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) 
> wall   0 kB ( 0%) ggc
>  ipa lto decl merge  :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) 
> wall   13737 kB ( 0%) ggc
>  ipa reference   :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) 
> wall   0 kB ( 0%) ggc
>  ipa pure const  :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) 
> wall   0 kB ( 0%) ggc
>  ipa icf :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) 
> wall   19203 kB ( 0%) ggc
>  TOTAL : 195.2712.37   207.64
> 4103297 kB
> 
> to:
> 
>  phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall1996 kB ( 0%) ggc
>  phase opt and generate  :  77.17 (53%) usr   1.69 ( 9%) sys  79.45 (48%) 
> wall  856874 kB (26%) ggc
>  phase stream in :  25.92 (18%) usr   1.75 (10%) sys  27.66 (17%) 
> wall 2418654 kB (74%) ggc
>  phase stream out:  39.90 (27%) usr  14.74 (81%) sys  54.82 (33%) 
> wall  50 kB ( 0%) ggc
>  phase finalize  :   2.52 ( 2%) usr   0.11 ( 1%) sys   2.63 ( 2%) 
> wall   0 kB ( 0%) ggc
>  garbage collection  :   4.56 ( 3%) usr   0.01 ( 0%) sys   4.56 ( 3%) 
> wall   0 kB ( 0%) ggc
>  ipa dead code removal   :   4.32 ( 3%) usr   0.03 ( 0%) sys   4.59 ( 3%) 
> wall   2 kB ( 0%) ggc
>  ipa virtual call target :  23.19 (16%) usr   0.18 ( 1%) sys  23.31 (14%) 
> wall   0 kB ( 0%) ggc
>  ipa cp  :   4.06 ( 3%) usr   0.18 ( 1%) sys   4.10 ( 2%) 
> wall  339974 kB (10%) ggc
>  ipa inlining heuristics :  25.05 (17%) usr   0.32 ( 2%) sys  25.86 (16%) 
> wall  500986 kB (15%) ggc
>  lto stream inflate  :   5.50 ( 4%) usr   0.42 ( 2%) sys   5.73 ( 3%) 
> wall   0 kB ( 0%) ggc
>  ipa lto gimple in   :   1.97 ( 1%) usr   0.51 ( 3%) sys   2.70 ( 2%) 
> wall  324937 kB (10%) ggc
>  ipa lto gimple out  :   9.00 ( 6%) usr   1.59 ( 9%) sys  10.22 ( 6%) 
> wall  50 kB ( 0%) ggc
>  ipa lto decl in :  14.29 (10%) usr   0.73 ( 4%) sys  15.18 ( 9%) 
> wall 1522854 kB (46%) ggc
>  ipa lto decl out:  25.35 (17%) usr   0.59 ( 3%) sys  25.91 (16%) 
> wall   0 kB ( 0%) ggc
>  ipa lto constructors out:   1.48 ( 1%) usr   0.51 ( 3%) sys   2.38 ( 1%) 
> wall   0 kB ( 0%) ggc
>  ipa lto cgraph I/O  :   0.74 ( 1%) usr   0.22 ( 1%) sys   0.97 ( 1%) 
> wall  408576 kB (12%) ggc
>  ipa lto decl merge  :   1.94 ( 1%) usr   0.00 ( 0%) sys   1.95 ( 1%) 
> wall   13556 kB ( 0%) ggc
>  whopr wpa I/O   :   2.95 ( 2%) usr  12.03 (66%) sys  15.17 ( 9%) 
> wall   0 kB ( 0%) ggc
>  whopr partitioning  :   3.99 ( 3%) usr   0.03 ( 0%) sys   4.01 ( 2%) 
> wall   13619 kB ( 0%) ggc
>  ipa reference   :   2.45 ( 2%) usr   0.01 ( 0%) sys   2.46 ( 1%) 
> wall   0 kB ( 0%) ggc
>  ipa pure const  :   2.30 ( 2%) usr   0.03 ( 0%) sys   2.33 ( 1%) 
> wall   0 kB ( 0%) ggc
>  ipa icf :   8.30 ( 6%) usr   0.26 ( 1%) sys   8.37 ( 5%) 
> wall   19276 kB ( 1%) ggc
>  TOTAL : 145.5118.29   164.57
> 3277576 kB
> 
> With debug output the numbers are not that impressive, but sitll about 17% 
> down from decl in.
> It also leads to about 63% code size reduction for global decl streams.
> 
> I built WPA with -flto-partition=max and looked into one of partitions that 
> seemed most absurd.
> We used about 180k type delcs to produce about 700 lines of assembler that 
> mostly contained
> a calls to various methods. THe thing is that each method borught in a lot of 
> 

Re: Reduce global decl stream

2015-12-11 Thread Jan Hubicka
> On Fri, 11 Dec 2015, Jan Hubicka wrote:
> 
> > Hi,
> > this patch saves about 30% of global decl stream size in firefox.  While
> > implementing the lto sections for initializers I put very stupid heursitcs
> > to get_symbol_initial_value deciding whether the initializer is better 
> > streamed
> > inline or offline.  This ignores strings and may get bit out of hand.
> > 
> > With this patch and the compression, the largest ltrans unit is 
> > 118479156 bytes and 103584016 out of that is a global decl stream.
> 
> So that's still 87% global decl stream.  At least for this ltrans
> unit there couldn't have been a 30% saving.

Yeah, 30% saving was the overall size of global decls streams produced
by WPA, so this one had bad luck. 
> 
> Btw, the separate initializer sections will get separate string
> encoders, right?  So we might end up with larger files due
> to less string sharing which would happen when the strings go into
> the decl section.

Yep, if we get many duplicated stirngs in decl sections, then we will
end up with larger files.  Same thing hapens if you would use the string
in many function initializers..
> > +  if (TREE_CODE (*tp) == STRING_CST)
> > +*sum -= TREE_STRING_LENGTH (*tp) + 8;
> > +  if (TREE_CODE (*tp) == IDENTIFIER_NODE)
> > +*sum -= IDENTIFIER_LENGTH (*tp) + 8;
> 
> I doubt we can ever see those.

Me too, I just went though the vairable length trees for completeness.
Can just add gcc_unreachable.
> 
> > +  else
> > +*sum -= 16;
> > +  if (*sum < 0)
> > +return *tp;
> > +  return NULL_TREE;
> > +}
> > +
> 
> I'd like to see an explanation for the magic constants.  Also

OK, nothing really scientific behind them.  I simply divided size
of the stream by number of trees in it from the lto stats and picked
nearest power of 2.

> a FE might construct
> 
>  int *a = _DECL;
> 
> with CONST_DECL having a large array initializer.  walk_tree doesn't
> traverse CONST_DECLs DECL_INITIAL.

CONST_DECL is indexale, so in this case we will always have it in global
stream. We may want to stream initializers for those but for that we need
const decl to be in symbol table (it should) for which we need to sanity
the visibility bits... We have open PR for that.

Honza


Re: fix scheduling antideps

2015-12-11 Thread Eric Botcazou
> This patch allows a target to increase the cost of anti-deps to better
> reflect the actual cost on the machine.

But it can already do it via the TARGET_SCHED_ADJUST_COST hook, can't it?

-- 
Eric Botcazou


Re: Prune TYPE_FIELDS lists more in free_lang_data

2015-12-11 Thread Jan Hubicka
> 
> We explicitely do not use debug-info-level tests in free-lang-data
> to allow mixing -g and -g0 objects.  Are you sure doing the above
> doesn't mess up tree merging enough to effectively enlarge WPA
> memory use and the merged decl sections?
> 
> [I'm quite sure firefox build system manages to mess up -g vs. -g0
> in some places ;)]

Hmm, I will try the debug build with firefox on this.  -fdump-ipa-devirt
now dumps all main variants that are duplicates of one ODR type.
We definitely have types with hundreds of duplicates, so there are
quite common cases where tree merging does not fire.
> 
> > +  return (!DECL_IGNORED_P (decl) && !is_redundant_typedef (decl));
> > +}
> > +
> 
> The patch would be ok if you simply export is_redundant_typedef
> and inline the DECL_IGNORED_P check into free-lang-data.

OK, I had that originally, will return that back.
is_redundant_typedef is declared inline.  Putting it to tree.h drags
bit too many dwarf2out internals, but I suppose it is OK to just
turn it non-inline.  It is a type of function where inliner should be
able to decide.

Honza
> 
> Thanks,
> Richard.


[PATCH] Fix PR c++/21802 (two-stage name lookup fails for operators)

2015-12-11 Thread Patrick Palka
> Unfortunately this patch doesn't work properly on operator overloads
> that are defined as friend functions.  E.g. the following now fails to
> compile:
>
> struct A
> {
>  friend int operator* (A);
> };
>
> template 
> void func(T t)
> {
>  A x;
>  int y = *x;
> }
>
> int main()
> {
>  func(0);
> }
>
> I think this happens because KOENIG_LOOKUP_P is not being properly set
> in in the CALL_EXPR we are reconstructing.  Not sure how to fix that
> yet.

Here is what I came up with.  We need to specify to perform ADL during
instantiation (by setting the KOENIG_LOOKUP_P flag) if the candidate
operator overload we chose is a friend function.  Here's an incremental
diff, followed by patch v2 (test case is updated too):

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 117dd79..a1c0b14 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -5630,6 +5630,17 @@ build_new_op_1 (location_t loc, enum tree_code code, int 
flags, tree arg1,
result = error_mark_node;
  else
result = build_over_call (cand, LOOKUP_NORMAL, complain);
+
+ if (processing_template_decl
+ && result != NULL_TREE
+ && result != error_mark_node
+ && DECL_HIDDEN_FRIEND_P (cand->fn))
+   {
+ tree call = result;
+ if (REFERENCE_REF_P (call))
+   call = TREE_OPERAND (call, 0);
+ KOENIG_LOOKUP_P (call) = 1;
+   }
}
   else
{
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 2635736..4194b1a 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -2741,6 +2741,7 @@ build_min_non_dep_call_vec (tree non_dep, tree fn, 
vec *argvec)
 non_dep = TREE_OPERAND (non_dep, 0);
   TREE_TYPE (t) = TREE_TYPE (non_dep);
   TREE_SIDE_EFFECTS (t) = TREE_SIDE_EFFECTS (non_dep);
+  KOENIG_LOOKUP_P (t) = KOENIG_LOOKUP_P (non_dep);
   return convert_from_reference (t);
 }

gcc/cp/ChangeLog:

PR c++/21802
PR c++/53223
* cp-tree.h (build_min_non_dep_op_overload): Declare.
* tree.c (build_min_non_dep_op_overload): Define.
(build_win_non_dep_call_vec): Copy KOENIG_LOOKUP_P flag.
* typeck.c (build_x_indirect_ref): Use
build_min_non_dep_op_overload when the given expression
has been resolved to an operator overload.
(build_x_binary_op): Likewise.
(build_x_array_ref): Likewise.
(build_x_unary_op): Likewise.
(build_x_compound_expr): Likewise.
(build_x_modify_expr): Likewise.
* call.c (build_new_op_1): If during template processing we
chose an operator overload that is a hidden friend function, set
the call's KOENIG_LOOKUP_P flag to 1.

gcc/testsuite/ChangeLog:

PR c++/21802
PR c++/53223
* g++.dg/cpp0x/pr53223.C: New test.
* g++.dg/lookup/pr21802.C: New test.
* g++.dg/lookup/two-stage4.C: Remove XFAIL.
---
 gcc/cp/call.c|  11 ++
 gcc/cp/cp-tree.h |   1 +
 gcc/cp/tree.c|  65 
 gcc/cp/typeck.c  | 100 ---
 gcc/testsuite/g++.dg/cpp0x/pr53223.C |  35 
 gcc/testsuite/g++.dg/lookup/pr21802.C| 276 +++
 gcc/testsuite/g++.dg/lookup/two-stage4.C |   2 +-
 7 files changed, 470 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr53223.C
 create mode 100644 gcc/testsuite/g++.dg/lookup/pr21802.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 117dd79..a1c0b14 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -5630,6 +5630,17 @@ build_new_op_1 (location_t loc, enum tree_code code, int 
flags, tree arg1,
result = error_mark_node;
  else
result = build_over_call (cand, LOOKUP_NORMAL, complain);
+
+ if (processing_template_decl
+ && result != NULL_TREE
+ && result != error_mark_node
+ && DECL_HIDDEN_FRIEND_P (cand->fn))
+   {
+ tree call = result;
+ if (REFERENCE_REF_P (call))
+   call = TREE_OPERAND (call, 0);
+ KOENIG_LOOKUP_P (call) = 1;
+   }
}
   else
{
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6190f4e..3487d77 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6513,6 +6513,7 @@ extern tree build_min (enum 
tree_code, tree, ...);
 extern tree build_min_nt_loc   (location_t, enum tree_code,
 ...);
 extern tree build_min_non_dep  (enum tree_code, tree, ...);
+extern tree build_min_non_dep_op_overload  (enum tree_code, tree, tree, 
...);
 extern tree build_min_non_dep_call_vec (tree, tree, vec 
*);
 extern tree build_cplus_new(tree, tree, tsubst_flags_t);
 extern tree build_aggr_init_expr   (tree, tree);
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c

Re: Fix PR21273

2015-12-11 Thread Jeff Law

On 12/11/2015 12:29 PM, Bernd Schmidt wrote:

Maybe not the most important PR in the database, but we might as well
fix and close it. Count the number of alternatives in a MATCH_SCRATCH
against the max.

Bootstrapped and tested on x86_64-linux (one testcase timed out, almost
certainly because another test went heavily into swap at one point). Ok?


Bernd


supscratch.diff


PR middle-end/21273
* gensupport.c (collect_insn_data): Look for number of alternatives
in MATCH_SCRATCH.
OK.  I haven't been watching closely, but you might be in the running 
for oldest bug fixed in the gcc-6 release :-)


Thanks,
Jeff



[PATCH/AARCH64] Fix -mcpu/arch=native support for LSE

2015-12-11 Thread Andrew Pinski
Hi,
  The Linux kernel calls lse as atomics in /proc/cpuinfo.  We should
change aarch64-option-extensions.def to take that into account.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions
and tested with -mcpu=native on ThunderX T88 pass 2 with Linux 4.4 to
see if lse gets enabled.

Thanks,
Andrew Pinski

ChangeLog:
 * config/aarch64/aarch64-option-extensions.def (LSE): Change
FEAT_STRING to "atomics".
Index: aarch64-option-extensions.def
===
--- aarch64-option-extensions.def   (revision 231572)
+++ aarch64-option-extensions.def   (working copy)
@@ -40,4 +40,4 @@ AARCH64_OPT_EXTENSION ("simd", AARCH64_F
   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  
AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",   AARCH64_FL_CRC, 
AARCH64_FL_CRC,"crc32")
-AARCH64_OPT_EXTENSION("lse",   AARCH64_FL_LSE, 
AARCH64_FL_LSE,"lse")
+AARCH64_OPT_EXTENSION("lse",   AARCH64_FL_LSE, 
AARCH64_FL_LSE,"atomics")


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-11 Thread Ilya Verbin
On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > --- a/libgomp/oacc-init.c
> > +++ b/libgomp/oacc-init.c
> > @@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
> >  {
> >struct gomp_device_descr *acc_dev = _dev[i];
> >gomp_mutex_lock (_dev->lock);
> > -  if (acc_dev->is_initialized)
> > +  if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
> >  {
> >   devices_active = true;
> > - gomp_fini_device (acc_dev);
> > + acc_dev->fini_device_func (acc_dev->target_id);
> > + acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
> > }
> >gomp_mutex_unlock (_dev->lock);
> >  }
> 
> I'd bet you want to set state here to GOMP_DEVICE_FINALIZED too,
> but I'd leave that to the OpenACC folks to do that incrementally
> once they test it and/or decide what to do.

libgomp/testsuite/libgomp.oacc-c-c++-common/lib-5.c contains a call to acc_init,
next acc_shutdown, and acc_init again, so I guess that OpenACC allows to
initialize the device again after acc_shutdown, but GOMP_DEVICE_FINALIZED means
that it's terminally finalized.

> > @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, 
> > size_t mapnum,
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> 
> You need to free (tgt); here I think to avoid leaking memory.
> 
> > +  return NULL;
> > +}
> >  
> >for (i = 0; i < mapnum; i++)
> >  {
> > @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
> > do_copyfrom)
> >  }
> >  
> >gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +{
> > +  gomp_mutex_unlock (>lock);
> > +  return;
> 
> Supposedly you want at least free (tgt->array); free (tgt); here.
> Plus the question is if the mappings shouldn't be removed from the splay tree
> before that.
> 
> > +/* This function finalizes all initialized devices.  */
> > +
> > +static void
> > +gomp_target_fini (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < num_devices; i++)
> > +{
> > +  struct gomp_device_descr *devicep = [i];
> > +  gomp_mutex_lock (>lock);
> > +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > +   {
> > + devicep->fini_device_func (devicep->target_id);
> > + devicep->state = GOMP_DEVICE_FINALIZED;
> > +   }
> > +  gomp_mutex_unlock (>lock);
> > +}
> > +}
> 
> The question is what will this do if there are async target tasks still
> running on some of the devices at this point (forgotten #pragma omp taskwait
> or similar if target nowait regions are started outside of parallel region,
> or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> Also there is the question that the 
> So I think the patch is ok with the above mentioned changes.
> 
> What is the state of the link clause implementation patch?  Does it depend
> on this?

It's ready, but it depends on this.  I will retest and resend "link" patch after
checking-in "init/fini" patch.

  -- Ilya


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-11 Thread Jakub Jelinek
On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
>  {
>struct gomp_device_descr *acc_dev = _dev[i];
>gomp_mutex_lock (_dev->lock);
> -  if (acc_dev->is_initialized)
> +  if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
>  {
> devices_active = true;
> -   gomp_fini_device (acc_dev);
> +   acc_dev->fini_device_func (acc_dev->target_id);
> +   acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
>   }
>gomp_mutex_unlock (_dev->lock);
>  }

I'd bet you want to set state here to GOMP_DEVICE_FINALIZED too,
but I'd leave that to the OpenACC folks to do that incrementally
once they test it and/or decide what to do.

> @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
> mapnum,
>  }
>  
>gomp_mutex_lock (>lock);
> +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> +{
> +  gomp_mutex_unlock (>lock);

You need to free (tgt); here I think to avoid leaking memory.

> +  return NULL;
> +}
>  
>for (i = 0; i < mapnum; i++)
>  {
> @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool 
> do_copyfrom)
>  }
>  
>gomp_mutex_lock (>lock);
> +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> +{
> +  gomp_mutex_unlock (>lock);
> +  return;

Supposedly you want at least free (tgt->array); free (tgt); here.
Plus the question is if the mappings shouldn't be removed from the splay tree
before that.

> +/* This function finalizes all initialized devices.  */
> +
> +static void
> +gomp_target_fini (void)
> +{
> +  int i;
> +  for (i = 0; i < num_devices; i++)
> +{
> +  struct gomp_device_descr *devicep = [i];
> +  gomp_mutex_lock (>lock);
> +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> + {
> +   devicep->fini_device_func (devicep->target_id);
> +   devicep->state = GOMP_DEVICE_FINALIZED;
> + }
> +  gomp_mutex_unlock (>lock);
> +}
> +}

The question is what will this do if there are async target tasks still
running on some of the devices at this point (forgotten #pragma omp taskwait
or similar if target nowait regions are started outside of parallel region,
or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
Also there is the question that the 
So I think the patch is ok with the above mentioned changes.

What is the state of the link clause implementation patch?  Does it depend
on this?

Jakub


Re: [hsa 1/10] Configury changes and new options

2015-12-11 Thread Jakub Jelinek
On Thu, Dec 10, 2015 at 06:52:07PM +0100, Martin Jambor wrote:
> Good catch.  I have modified this code so that it never leaves any
> holes in offload_names[i].
> 
> > names[i] is null-terminated, so it looks like you're deliberately
> > allowing anything that starts with "hsa" here, but:
> 
> Right, and that was probably a mistake, I have changed the check to
> simple strcmp.

LGTM (and thanks for Richard for reviewing that).

> 2015-12-09  Martin Jambor  
> 
>   * lto-wrapper.c (compile_images_for_offload_targets): Do not leave
>   holes in offload_names.  Use strcmp instead strncmp.
>   * doc/install.texi (--with-hsa-runtime): Fix typo.
> ---
>  gcc/doc/install.texi | 2 +-
>  gcc/lto-wrapper.c| 8 +---
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index afd891c..a85a063 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1993,7 +1993,7 @@ compiler will emit the accelerator code, no path should 
> be specified.
>  
>  If you configure GCC with HSA offloading but do not have the HSA
>  run-time library installed in a standard location then you can
> -explicitely specify the directory where they are installed.  The
> +explicitly specify the directory where they are installed.  The
>  @option{--with-hsa-runtime=@/@var{hsainstalldir}} option is a
>  shorthand for
>  @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 5609207..5b58fd6 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -736,6 +736,7 @@ compile_images_for_offload_targets (unsigned in_argc, 
> char *in_argv[],
>  return;
>unsigned num_targets = parse_env_var (target_names, , NULL);
>  
> +  int next_name_entry = 0;
>const char *compiler_path = getenv ("COMPILER_PATH");
>if (!compiler_path)
>  goto out;
> @@ -747,16 +748,17 @@ compile_images_for_offload_targets (unsigned in_argc, 
> char *in_argv[],
>  {
>/* HSA does not use LTO-like streaming and a different compiler, skip
>it. */
> -  if (strncmp(names[i], "hsa", 3) == 0)
> +  if (strcmp (names[i], "hsa") == 0)
>   continue;
>  
> -  offload_names[i]
> +  offload_names[next_name_entry]
>   = compile_offload_image (names[i], compiler_path, in_argc, in_argv,
>compiler_opts, compiler_opt_count,
>linker_opts, linker_opt_count);
> -  if (!offload_names[i])
> +  if (!offload_names[next_name_entry])
>   fatal_error (input_location,
>"problem with building target image for %s\n", names[i]);
> +  next_name_entry++;
>  }
>  
>   out:
> -- 
> 2.6.3

Jakub


[PTX] some more cleanups

2015-12-11 Thread Nathan Sidwell

This patch:
1) removes unused hard register defines

2) removes unused ASM_OUTPUT_COMMON, ASM_OUTPUT_LOCAL defines (we define the 
aligned variants)


3) name the static chain registers.

4) formatting and simplification to bunch of function argument/return hooks.  In 
particular I found nvptx_function_arg_boundary quite confusing due to its mixing 
of BITS_PER_UNIT and BITS_PER_WORD.


nathan
2015-12-11  Nathan Sidwell  

	* config/nvptx/nvptx.h (RETURN_ADDR_REGNO): Delete.
	(OUTGOING_ARG_POINTER_REGNUM): Delete.
	(ASM_OUTPUT_COMMON, ASM_OUTPUT_LOCAL): Delete.
	(REGISTER_NAMES): Name static chain regs.
	* config/nvptx/nvptx.c (nvptx_function_arg): Add ARG_UNUSED, merge
	ifs.
	(nvptx_incoming_arg): Merge ifs.
	(nvptx_function_arg_boundary): Reimplement to avoid mixing units.
	(nvptx_function_value): Tail call nvptx_libcall_value.
	(nvptx_pass_by_reference): Add ARG_UNUSED.
	(nvptx_static_chain): Use conditional op.
	(nvptx_handle_kernel_attribute): Use VOID_TYPE_P.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 231564)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -392,15 +392,13 @@ arg_promotion (machine_mode mode)
 /* Implement TARGET_FUNCTION_ARG.  */
 
 static rtx
-nvptx_function_arg (cumulative_args_t, machine_mode mode,
+nvptx_function_arg (cumulative_args_t ARG_UNUSED (cum_v), machine_mode mode,
 		const_tree, bool named)
 {
-  if (mode == VOIDmode)
+  if (mode == VOIDmode || !named)
 return NULL_RTX;
 
-  if (named)
-return gen_reg_rtx (mode);
-  return NULL_RTX;
+  return gen_reg_rtx (mode);
 }
 
 /* Implement TARGET_FUNCTION_INCOMING_ARG.  */
@@ -410,10 +408,8 @@ nvptx_function_incoming_arg (cumulative_
 			 const_tree, bool named)
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
-  if (mode == VOIDmode)
-return NULL_RTX;
 
-  if (!named)
+  if (mode == VOIDmode || !named)
 return NULL_RTX;
 
   /* No need to deal with split modes here, the only case that can
@@ -433,6 +429,7 @@ nvptx_function_arg_advance (cumulative_a
 			bool ARG_UNUSED (named))
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+
   cum->count++;
 }
 
@@ -449,6 +446,7 @@ static bool
 nvptx_strict_argument_naming (cumulative_args_t cum_v)
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+
   return cum->fntype == NULL_TREE || stdarg_p (cum->fntype);
 }
 
@@ -460,21 +458,19 @@ nvptx_function_arg_boundary (machine_mod
   unsigned int boundary = type ? TYPE_ALIGN (type) : GET_MODE_BITSIZE (mode);
 
   if (boundary > BITS_PER_WORD)
-return 2 * BITS_PER_WORD;
-
-  if (mode == BLKmode)
+boundary = 2 * BITS_PER_WORD;
+  else if (mode == BLKmode)
 {
   HOST_WIDE_INT size = int_size_in_bytes (type);
+
   if (size > 4)
-return 2 * BITS_PER_WORD;
-  if (boundary < BITS_PER_WORD)
-{
-  if (size >= 3)
-return BITS_PER_WORD;
-  if (size >= 2)
-return 2 * BITS_PER_UNIT;
-}
+boundary = 8 * BITS_PER_UNIT;
+  else if (size > 2)
+	boundary = 4 * BITS_PER_UNIT;
+  else
+	boundary = size * BITS_PER_UNIT;
 }
+
   return boundary;
 }
 
@@ -487,6 +483,7 @@ nvptx_libcall_value (machine_mode mode,
 /* Pretend to return in a hard reg for early uses before pseudos can be
generated.  */
 return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+
   return gen_reg_rtx (mode);
 }
 
@@ -503,11 +500,8 @@ nvptx_function_value (const_tree type, c
 	 , NULL_TREE, 1);
   if (outgoing)
 return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
-  if (cfun->machine->start_call == NULL_RTX)
-/* Pretend to return in a hard reg for early uses before pseudos can be
-   generated.  */
-return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
-  return gen_reg_rtx (mode);
+
+  return nvptx_libcall_value (mode, NULL_RTX);
 }
 
 /* Implement TARGET_FUNCTION_VALUE_REGNO_P.  */
@@ -522,8 +516,8 @@ nvptx_function_value_regno_p (const unsi
reference in memory.  */
 
 static bool
-nvptx_pass_by_reference (cumulative_args_t, machine_mode mode,
-			 const_tree type, bool)
+nvptx_pass_by_reference (cumulative_args_t ARG_UNUSED (cum), machine_mode mode,
+			 const_tree type, bool ARG_UNUSED (named))
 {
   return !PASS_IN_REG_P (mode, type);
 }
@@ -572,10 +566,9 @@ nvptx_static_chain (const_tree fndecl, b
   if (!DECL_STATIC_CHAIN (fndecl))
 return NULL;
 
-  if (incoming_p)
-return gen_rtx_REG (Pmode, STATIC_CHAIN_REGNUM);
-  else
-return gen_rtx_REG (Pmode, OUTGOING_STATIC_CHAIN_REGNUM);
+
+  return gen_rtx_REG (Pmode, (incoming_p ? STATIC_CHAIN_REGNUM
+			  : OUTGOING_STATIC_CHAIN_REGNUM));
 }
 
 /* Helper for write_arg.  Emit a single PTX argument of MODE, either
@@ -3829,8 +3822,7 @@ nvptx_handle_kernel_attribute (tree *nod
   error ("%qE attribute only applies to functions", name);
   *no_add_attrs = true;
 }
-
-  else if (TREE_TYPE (TREE_TYPE (decl)) != 

Re: Request permission to delete gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c

2015-12-11 Thread Bill Schmidt
On Fri, 2015-12-11 at 10:47 +0100, Richard Biener wrote:
> On Thu, Dec 10, 2015 at 8:33 PM, David Edelsohn  wrote:
> > On Thu, Dec 10, 2015 at 2:23 PM, Bill Schmidt
> >  wrote:
> >> Hi,
> >>
> >> The subject test case has been failing as follows:
> >>
> >> FAIL: gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c 
> >> scan-tree-dump-times vect "vectorization not profitable" 1
> >>
> >> The test has been failing since r223528, which is:
> >>
> >> 2015-05-22  Richard Biener  
> >>
> >> PR tree-optimization/65701
> >> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
> >> Move peeling cost models into one place.  Peel for alignment
> >> for single loads only if an aligned load is cheaper than
> >> an unaligned load.
> >>
> >> Thus with that modification, gcc now vectorizes the loop that was
> >> previously deemed unprofitable to vectorize.  As a result, the test case
> >> no longer has any reason to exist, and I would like to delete it.
> 
> Just curious - why was it not profitable before but is now?  The only
> thing that has changed is we no longer require peeling for gaps(?)
>
> Thus, did you check with -fno-vect-cost-model before/after the rev.?
> 

Right -- so, with the cost model disabled, before and after we vectorize
differently.  Previously we would vectorize by applying peeling to force
alignment.  After the change, we vectorize because the unaligned
accesses are recognized as supported by hardware.  So it's the "Peel for
alignment for single loads only if an aligned load is cheaper than an
unaligned load" that's kicking in here, I imagine.

Note that I almost always am testing on POWER8 hardware these days, for
which unaligned vector accesses are cost-effective.  With earlier
hardware, not so much, and the cost modeling reflects this.  So testing
with earlier machines I expect that the test would continue to "succeed"
by not vectorizing.  Alternatively, we could change the test to require
inefficient unaligned access and keep it around, but eventually that
would mean that it just becomes obsolete and nobody notices it.  Thus
I'd prefer to just kill the test.

On POWER8, the resulting code with r223528 is much tighter than with
r223527, because we no longer have the unnecessary loop peeling.  With
-fno-vect-cost-model for both before and after, static instruction
counts drop from 85 to 30, and the loop body is also much cleaner.

Thanks,
Bill

> We might also do outer loop vectorization if the inner loop is not unrolled?
> 
> Richard.
> 
> >> Ok for trunk?
> >>
> >> Thanks,
> >> Bill
> >>
> >>
> >> [gcc/testsuite]
> >>
> >> 2015-12-10  Bill Schmidt  
> >>
> >> * gcc.dg/vect/costmodel/ppc/costmodel-fast-math-vect-pr29925.c:
> >> Delete.
> >
> > Okay with me.
> >
> > Thanks, David
> 




update_vtable_references segfault

2015-12-11 Thread Nathan Sidwell

Jan,
it looks like your recent changes to function_and_variable_visibility and 
friends causes regressions in targets that do not support aliases (PTX for example).


specifically, we get a segfault  in update_vtable_references (ipa-visibility.c) 
at
*tp = symtab_node::get (*tp)->noninterposable_alias ()->decl;

because symtab_node::noninterposable_alias (symtab.c) returns NULL.  It does 
this on targets that do not support aliases:


#ifndef ASM_OUTPUT_DEF
  /* If aliases aren't supported by the assembler, fail.  */
  return NULL;
#endif

and update_vtable_references doesn't anticipate that happening, blindly 
dereferencing the returned value.  Is  the fix:


a) add  a check in update_vtable_references on noninterposable_alias's return 
value?

b) augment can_replace_by_local_alias_in_vtable to check whether aliases can be 
created?


c) augment function_and_variable_visibility to not attempt such replacement:
  /* Update virtual tables to point to local aliases where possible.  */
  if (DECL_VIRTUAL_P (vnode->decl)
  && !DECL_EXTERNAL (vnode->decl))
...
d) something else?

nathan


Re: [hsa 2/10] Modifications to libgomp proper

2015-12-11 Thread Jakub Jelinek
On Thu, Dec 10, 2015 at 06:52:23PM +0100, Martin Jambor wrote:
> I see, I prefer the clean approach, even if it is more work, this
> interface looks like it is going to be extended in the future.  But I
> am wondering whether embedding the value into the identifier element
> is actually worth it.  The passed array is going to be a small local
> variable and I wonder whether there is going to be any benefit in it
> having two elements instead of four (or four instead of six for
> gridified kernels), especially if it means introducing control flow on
> the part of the caller.  But if you really want it that way, I will
> implement that.

I'm fine with implementing the two (num_threads and thread_limit) always
as separate argument, or perhaps what you could do is if the argument
is constant and fits into the signed 16 bits on 32-bit arches (or any
constant, that fits into 48 bits), use embedded argument (then there is no
extra runtime cost), and if it is variable and you'd need to shift,
put it as separate argument.

> OK, so if I understand this correctly, I should not be re-queing the
> task in gomp_target_task_completion.  I have left the check to be
> NULLness of ttask->tgt but can test the capability if you prefer (but
> I at least hope the two options are equivalent).

> > > --- a/libgomp/task.c
> > > +++ b/libgomp/task.c
> > > @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data)
> > >gomp_mutex_unlock (>task_lock);
> > >  }
> > >ttask->state = GOMP_TARGET_TASK_FINISHED;
> > > +  free (ttask->firstprivate_copies);
> > >gomp_target_task_completion (team, task);
> > >gomp_mutex_unlock (>task_lock);
> > >  }
> > 
> > So, this function should have a special case for the SHARED_MEM case, handle
> > it closely to say how GOMP_taskgroup_end handles the finish_cancelled:
> > case.  Just note that the target task is missing from certain queues at that
> > point.
> 
> I'm afraid I need some help here.  I do not quite understand how is
> finish_cancelled in GOMP_taskgroup_end similar, it seems to be doing
> much more than freeing one pointer.  What is exactly the issue with
> the above?
> 
> Nevertheless, after reading through bits of task.c again, I wonder
> whether any copying (for both shared memory target and the host) in
> gomp_target_task_fn is actually necessary because it seems to be also
> done in gomp_create_target_task.  Does that not apply somehow?

The target task is scheduled for the first action as normal task, and the
scheduling of it already removes it from some of the queues (each task is
put into 1-3 queues), i.e. actions performed mostly by
gomp_task_run_pre.  Then the team task lock is unlocked and the task is run.
Finally, for normal tasks, gomp_task_run_post_handle_depend,
gomp_task_run_post_remove_parent, etc. is run.  Now, for async target tasks
that have something running on some other device at that point, we don't do
that, but instead make it GOMP_TASK_ASYNC_RUNNING.  And continue with other
stuff, until gomp_target_task_completion is run.
For non-shared mem that needs to readd the task again into the queues, so
that it will be scheduled again.  But you don't need that for shared mem
target tasks, they can just free the firstprivate_copies and finalize the
task.
At the time gomp_target_task_completion is called, the task is pretty much
in the same state as it is around the finish_cancelled:; label.
So instead of what the gomp_target_task_completion function does,
you would for SHARED_MEM do something like:
  size_t new_tasks
= gomp_task_run_post_handle_depend (task, team);
  gomp_task_run_post_remove_parent (task);
  gomp_clear_parent (>children_queue);
  gomp_task_run_post_remove_taskgroup (task);
  team->task_count--;
  do_wake = 0;
  if (new_tasks > 1)
{
  do_wake = team->nthreads - team->task_running_count
- !task->in_tied_task;
  if (do_wake > new_tasks)
do_wake = new_tasks;
}
// Unlike other places, the following will be also run with the
// task_lock held, but I'm afraid there is nothing to do about it.
// See the comment in gomp_target_task_completion.
  gomp_finish_task (task);
  free (task);
  if (do_wake)
gomp_team_barrier_wake (>barrier, do_wake);

Jakub


Re: update_vtable_references segfault

2015-12-11 Thread Jan Hubicka
> Jan,
> it looks like your recent changes to
> function_and_variable_visibility and friends causes regressions in
> targets that do not support aliases (PTX for example).
> 
> specifically, we get a segfault  in update_vtable_references 
> (ipa-visibility.c) at
>   *tp = symtab_node::get (*tp)->noninterposable_alias ()->decl;
> 
> because symtab_node::noninterposable_alias (symtab.c) returns NULL.
> It does this on targets that do not support aliases:
> 
> #ifndef ASM_OUTPUT_DEF
>   /* If aliases aren't supported by the assembler, fail.  */
>   return NULL;
> #endif
> 
> and update_vtable_references doesn't anticipate that happening,
> blindly dereferencing the returned value.  Is  the fix:
> 
> a) add  a check in update_vtable_references on noninterposable_alias's return 
> value?
> 
> b) augment can_replace_by_local_alias_in_vtable to check whether
> aliases can be created?

I think this is best: can_replace_by_local_alias_in_vtable exists to prevent the
body walk in cases we are not going to create the alias.  This is because in LTO
we may need to stream in the constructor from the object file that is not 
copletely
free and thus it is better to not touch it unless necessary.

Thanks for looking into this!
Honza
> 
> c) augment function_and_variable_visibility to not attempt such replacement:
>   /* Update virtual tables to point to local aliases where possible.  */
>   if (DECL_VIRTUAL_P (vnode->decl)
> && !DECL_EXTERNAL (vnode->decl))
>   ...
> d) something else?
> 
> nathan


fix scheduling antideps

2015-12-11 Thread Mike Stump
This patch allows a target to increase the cost of anti-deps to better reflect 
the actual cost on the machine.

This gets me get 5% more performance on an important inner loop by exposing the 
actual cost of long dep chains that have lots of anti-deps in them.  Be 
scheduling the longer chain first, we have more opportunities to fill in the 
holes with content from the other less critical chains.

I’m unsure if all machines should have a cost of 1, or just some machines.  I 
suspect that OOO can hide the del chains well enough so that the value 0 is 
more appropriate.

Ok?


Index: defaults.h
===
--- defaults.h  (revision 231539)
+++ defaults.h  (working copy)
@@ -1486,6 +1486,10 @@
 #define TARGET_VTABLE_USES_DESCRIPTORS 0
 #endif
 
+#ifndef TARGET_ANTI_DEP_COST
+#define TARGET_ANTI_DEP_COST 0
+#endif
+
 #endif /* GCC_INSN_FLAGS_H  */
 
 #endif  /* ! GCC_DEFAULTS_H */
Index: doc/tm.texi
===
--- doc/tm.texi (revision 231539)
+++ doc/tm.texi (working copy)
@@ -6970,6 +6970,13 @@
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@defmac TARGET_ANTI_DEP_COST
+The cost in cycles for an anti-dependency.  Defaults to 0.  On non-OOO
+multi-issue machines that can't issue instructions that have
+overlapping registers in the same cycle, a value of 1 will better
+reflect the actual cost of the instruction sequence.
+@end defmac
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
Index: doc/tm.texi.in
===
--- doc/tm.texi.in  (revision 231539)
+++ doc/tm.texi.in  (working copy)
@@ -4852,6 +4852,13 @@
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@defmac TARGET_ANTI_DEP_COST
+The cost in cycles for an anti-dependency.  Defaults to 0.  On non-OOO
+multi-issue machines that can't issue instructions that have
+overlapping registers in the same cycle, a value of 1 will better
+reflect the actual cost of the instruction sequence.
+@end defmac
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
Index: haifa-sched.c
===
--- haifa-sched.c   (revision 231539)
+++ haifa-sched.c   (working copy)
@@ -1470,7 +1470,7 @@
   if (INSN_CODE (insn) >= 0)
{
  if (dep_type == REG_DEP_ANTI)
-   cost = 0;
+   cost = TARGET_ANTI_DEP_COST;
  else if (dep_type == REG_DEP_OUTPUT)
{
  cost = (insn_default_latency (insn)



Re: [v4] avoid alignment of static variables affecting stack's

2015-12-11 Thread Bernd Schmidt

On 12/11/2015 02:48 PM, Jan Beulich wrote:

Function (or more narrow) scope static variables (as well as others not
placed on the stack) should also not have any effect on the stack
alignment. I noticed the issue first with Linux'es dynamic_pr_debug()
construct using an 8-byte aligned sub-file-scope local variable.

According to my checking bad behavior started with 4.6.x (4.5.3 was
still okay), but generated code got quite a bit worse as of 4.9.0.

[v4: Bail early, using is_global_var(), as requested by Bernd.]


In case I haven't made it obvious, this is OK.


Bernd


Re: [PATCH] gcc: read -fdebug-prefix-map OLD from environment (improved reproducibility)

2015-12-11 Thread Daniel Kahn Gillmor
On Fri 2015-12-11 12:03:28 -0500, Bernd Schmidt wrote:
> On 12/11/2015 05:49 PM, Daniel Kahn Gillmor wrote:
>> I've re-rolled the patch (attached below, here) to use the ENV: prefix
>> instead of the $.
>
> It might be irrelevant at this point, but the "ENV:" prefix is used in 
> AmigaOS and could be part of a filename.

As a full-path prefix?

>> +  if (0 == strncmp(ENV_PREFIX, arg, ENV_PREFIX_OFFSET))
>> +{
>> +  env = xstrndup (arg+ENV_PREFIX_OFFSET, p - (arg+ENV_PREFIX_OFFSET));
>
> Spaces before ( and around operators like +. Please review our coding 
> guidelines and have a look at the surrounding code.

right, sorry about that.  Attached at the bottom of this mail is a patch
that i think should clean this up.  (i've also updated bugzilla
correctly after a couple tries -- sorry for the noise there)

> Wouldn't it be simpler just to special-case -fdebug-prefix-map in 
> gen_producer_string? The environment variable thing strikes me as 
> unnecessary.

I think you mean so that we would just ignore -fdebug-prefix-map
entirely when writing DW_AT_producer, so that you could build
reproducibly with (for example):

 gcc -fdebug-prefix-map=$(pwd)=/usr/src

We'd considered and discarded this approach in the past out of concern
for possible build systems that can easily vary the environment, but
work with a static list of CFLAGS to pass to the compiler.  On further
inspection, it's not clear that anyone has a concrete example of a build
system with this constraint.

Here's a one-liner patch for this approach (also at
https://gcc.gnu.org/bugzilla/attachment.cgi?id=37007):

>From 13ca37337660f5385295052b3c1aebfc4e29092c Mon Sep 17 00:00:00 2001
From: Daniel Kahn Gillmor 
Date: Fri, 11 Dec 2015 13:47:45 -0500
Subject: [PATCH] gcc: ignore -fdebug-prefix-map in DW_AT_producer (improved
 reproducibility)

Work on the reproducible-builds project [0] has identified that build
paths are one cause of output variation between builds.  This
changeset allows users to avoid this variation when building C objects
with debug symbols, while leaving the default behavior unchanged.

Background
--

gcc includes the build path in any generated DWARF debugging symbols,
specifically in DW_AT_comp_dir, but allows the embedded path to be
changed via -fdebug-prefix-map.

When -fdebug-prefix-map is used with the current build path, it
removes the build path from DW_AT_comp_dir but places it instead in
DW_AT_producer, so the reproducibility problem isn't resolved.

When building software for binary redistribution, the actual build
path on the build machine is irrelevant, and doesn't need to be
exposed in the debug symbols.

Resolution
--

This patch resolves the problem by never including -fdebug-prefix-map
in the DW_AT_producer attribute.

People interested in building software reproducibly irrespective of
build path can do so by passing the build path itself as the first
argument to -fdebug-prefix-map.

[0] https://reproducible-builds.org
---
 gcc/dwarf2out.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 6af57b5..1a75ed7 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -20151,6 +20151,7 @@ gen_producer_string (void)
   case OPT_fpreprocessed:
   case OPT_fltrans_output_list_:
   case OPT_fresolution_:
+  case OPT_fdebug_prefix_map_:
 	/* Ignore these.  */
 	continue;
   default:
-- 
2.6.2



And here's the revised ENV:-prefixed approach with corrected coding
conventions (also at
https://gcc.gnu.org/bugzilla/attachment.cgi?id=37005):


>From 41d3bcf94ecd9b1ce2c4bccc0039bdff0b464315 Mon Sep 17 00:00:00 2001
From: Daniel Kahn Gillmor 
Date: Thu, 10 Dec 2015 12:09:45 -0500
Subject: [PATCH v3] gcc: read -fdebug-prefix-map OLD from environment
 (improved reproducibility)

Work on the reproducible-builds project [0] has identified that build
paths are one cause of output variation between builds.  This
changeset allows users to avoid this variation when building C objects
with debug symbols, while leaving the default behavior unchanged.

Background
--

gcc includes the build path in any generated DWARF debugging symbols,
specifically in DW_AT_comp_dir, but allows the embedded path to be
changed via -fdebug-prefix-map.

When -fdebug-prefix-map is used with the current build path, it
removes the build path from DW_AT_comp_dir but places it instead in
DW_AT_producer, so the reproducibility problem isn't resolved.

When building software for binary redistribution, the actual build
path on the build machine is irrelevant, and doesn't need to be
exposed in the debug symbols.

Resolution
--

This patch extends the first argument to -fdebug-prefix-map ("old") to
be able to read from the environment, which allows a packager to avoid
embedded build paths in the debugging symbols with something like:

  export SOURCE_BUILD_DIR="$(pwd)"
  gcc -fdebug-prefix-map=ENV:SOURCE_BUILD_DIR=/usr/src


Fix PR21273

2015-12-11 Thread Bernd Schmidt
Maybe not the most important PR in the database, but we might as well 
fix and close it. Count the number of alternatives in a MATCH_SCRATCH 
against the max.


Bootstrapped and tested on x86_64-linux (one testcase timed out, almost 
certainly because another test went heavily into swap at one point). Ok?



Bernd

	PR middle-end/21273
	* gensupport.c (collect_insn_data): Look for number of alternatives
	in MATCH_SCRATCH.

Index: gcc/gensupport.c
===
--- gcc/gensupport.c	(revision 231532)
+++ gcc/gensupport.c	(working copy)
@@ -1068,12 +1068,12 @@ collect_insn_data (rtx pattern, int *pal
   switch (code)
 {
 case MATCH_OPERAND:
-  i = n_alternatives (XSTR (pattern, 2));
+case MATCH_SCRATCH:
+  i = n_alternatives (XSTR (pattern, code == MATCH_SCRATCH ? 1 : 2));
   *palt = (i > *palt ? i : *palt);
   /* Fall through.  */
 
 case MATCH_OPERATOR:
-case MATCH_SCRATCH:
 case MATCH_PARALLEL:
   i = XINT (pattern, 0);
   if (i > *pmax)


Re: ipa-cp heuristics fixes

2015-12-11 Thread Jakub Jelinek
On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?

I think this patch (just a guess, but certainly ipa-cp related during last
24 hours) significantly regressed guality/pr36728-*.c on x86_64.
Previously we have not turned foo into foo.constprop*, now we do, and pass
just arg7 instead of arg1..arg7.  That is fine, but we really should be
emitting the debug info stuff for that case, that was added to fix PR47858,
but for whatever reason it doesn't happen in this case.  Does it take some
other path in ipa-prop.c, or bypass ipa-prop, something different?

>   * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
>   (good_cloning_opportunity_p): Likewise.
>   (gather_context_independent_values): Do not return true when
>   polymorphic call context is known or when we have known aggregate
>   value of unused parameter.
>   (estimate_local_effects): Try to create clone for all context
>   when either some params are substituted or devirtualization is possible
>   or some params can be removed; use local flag instead of
>   node->will_be_removed_from_program_if_no_direct_calls_p.
>   (identify_dead_nodes): Likewise.

Jakub


Re: [PATCH] Fix ICE with -fno-if-conversion (PR rtl-optimization/68730)

2015-12-11 Thread Jakub Jelinek
On Fri, Dec 11, 2015 at 07:30:59AM +0100, Richard Biener wrote:
> >So, to fix ICE on the following testcase, we can either do what the
> >patch
> >does, or could conditionalize both the calculate_dominance_info and
> >free_dominance_info in the convert_scalars_to_vector function (stv
> >pass)
> >on the dominance info not being computed (like other places in gcc do),
> >or we could stick free_dominance_info into all passes that break the
> >dominators just in case it would be computed (out_of_cfglayout is one
> >example).
> 
> We rely on this everywhere else so that would be preferred.

So like this instead?  Bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2015-12-10  Jakub Jelinek  

PR rtl-optimization/68730
* cfgrtl.c (cfg_layout_finalize): Free dominators.

* gcc.dg/pr68730.c: New test.

--- gcc/cfgrtl.c.jj 2015-11-04 11:12:20.0 +0100
+++ gcc/cfgrtl.c2015-12-11 09:34:58.864893607 +0100
@@ -4299,6 +4299,7 @@ void
 cfg_layout_finalize (void)
 {
   checking_verify_flow_info ();
+  free_dominance_info (CDI_DOMINATORS);
   force_one_exit_fallthru ();
   rtl_register_cfg_hooks ();
   if (reload_completed && !targetm.have_epilogue ())
--- gcc/testsuite/gcc.dg/pr68730.c.jj   2015-12-11 09:25:06.022268146 +0100
+++ gcc/testsuite/gcc.dg/pr68730.c  2015-12-11 09:25:06.022268146 +0100
@@ -0,0 +1,51 @@
+/* PR rtl-optimization/68730 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-if-conversion" } */
+/* { dg-additional-options "-march=x86-64" { target { i?86-*-* x86_64-*-* } } 
} */
+
+int b, d, e;
+unsigned long long c = 4100543410106915;
+
+void
+foo (void)
+{
+  short f, g = 4 % c;
+  int h = c;
+  if (h)
+{
+  int i = ~c;
+  if (~c)
+   i = 25662;
+  f = g = i;
+  h = c - g + ~-f;
+  c = ~(c * h - f);
+}
+  f = g;
+  unsigned long long k = g || c;
+  short l = c ^ g ^ k;
+  if (g > 25662 || c == 74074520320 || !(g < 2))
+{
+  k = c;
+  l = g;
+  c = ~((k && c) + ~l);
+  f = ~(f * (c ^ k) | l);
+  if (c > k)
+   __builtin_printf ("%d\n", f);
+}
+  short m = -f;
+  unsigned long long n = c;
+  c = m * f | n % c;
+  if (n)
+__builtin_printf ("%d\n", f);
+  while (f < -31807)
+;
+  c = ~(n | c) | f;
+  if (n < c)
+__builtin_printf ("%lld\n", (long long) f);
+  for (; d;)
+for (; e;)
+  for (;;)
+   ;
+  c = h;
+  c = l % c;
+}


Jakub


[PATCH] Fix -Werror= handling with aliases (PR c/68833)

2015-12-11 Thread Jakub Jelinek
Hi!

On Tue, Dec 08, 2015 at 12:56:20PM +0100, Bernd Schmidt wrote:
> On 12/07/2015 11:41 PM, Jakub Jelinek wrote:
> >On Mon, Dec 07, 2015 at 04:11:48PM +0100, Bernd Schmidt wrote:
> >>Let's document arguments; for the ones identical to read_cmdline_option an
> >>explicit pointer there is sufficient, but errors is new.
> >
> >>This also needs an update to the function comment.
> >>
> >>Other than that I'm ok with this. This area could probably be restructured a
> >>bit but for now I think this is good enough.
> >
> >So like this?  Bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> Yes, thanks.

Unfortunately, my patch broke some cases with warning aliases that happened
to work (by accident) and left some other warning alias cases broken.

This patch attempts to fix that (and add Warning keyword to two warning
aliases that didn't have it), so that -Werror= works even for them again.
As we do nothing beyond cancelling -Werror for -Wno-error=, there is no need
to deal with neg_alias_arg, just alias_arg is enough.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-11  Jakub Jelinek  

PR c/68833
* common.opt (Wmissing-noreturn): Add Warning option.
* opts-common.c (control_warning_option): If opt is
alias_target with alias_arg, set arg to it.

* c.opt (Wmissing-format-attribute, Wnormalized): Add Warning option.

* c-c++-common/pr68833-1.c: New test.
* c-c++-common/pr68833-2.c: New test.

--- gcc/common.opt.jj   2015-12-08 14:25:19.0 +0100
+++ gcc/common.opt  2015-12-11 16:29:30.257521500 +0100
@@ -612,7 +612,7 @@ Common Var(warn_unsafe_loop_optimization
 Warn if the loop cannot be optimized due to nontrivial assumptions.
 
 Wmissing-noreturn
-Common Alias(Wsuggest-attribute=noreturn)
+Common Warning Alias(Wsuggest-attribute=noreturn)
 
 Wodr
 Common Var(warn_odr_violations) Init(1) Warning
--- gcc/opts-common.c.jj2015-12-08 14:25:19.0 +0100
+++ gcc/opts-common.c   2015-12-11 16:11:16.499755495 +0100
@@ -1361,7 +1361,13 @@ control_warning_option (unsigned int opt
diagnostic_context *dc)
 {
   if (cl_options[opt_index].alias_target != N_OPTS)
-opt_index = cl_options[opt_index].alias_target;
+{
+  gcc_assert (!cl_options[opt_index].cl_separate_alias
+ && !cl_options[opt_index].cl_negative_alias);
+  if (cl_options[opt_index].alias_arg)
+   arg = cl_options[opt_index].alias_arg;
+  opt_index = cl_options[opt_index].alias_target;
+}
   if (opt_index == OPT_SPECIAL_ignore)
 return;
   if (dc)
--- gcc/c-family/c.opt.jj   2015-12-11 09:24:38.0 +0100
+++ gcc/c-family/c.opt  2015-12-11 16:30:40.150547817 +0100
@@ -627,7 +627,7 @@ C++ ObjC++ Var(warn_templates) Warning
 Warn on primary template declaration.
 
 Wmissing-format-attribute
-C ObjC C++ ObjC++ Alias(Wsuggest-attribute=format)
+C ObjC C++ ObjC++ Warning Alias(Wsuggest-attribute=format)
 ;
 
 Wmissing-include-dirs
@@ -678,7 +678,7 @@ C ObjC C++ ObjC++ LangEnabledBy(C ObjC C
 ;
 
 Wnormalized
-C ObjC C++ ObjC++ Alias(Wnormalized=,nfc,none)
+C ObjC C++ ObjC++ Warning Alias(Wnormalized=,nfc,none)
 ;
 
 Wnormalized=
--- gcc/testsuite/c-c++-common/pr68833-1.c.jj   2015-12-11 16:15:14.053446857 
+0100
+++ gcc/testsuite/c-c++-common/pr68833-1.c  2015-12-11 16:33:41.593020133 
+0100
@@ -0,0 +1,22 @@
+/* PR c/68833 */
+/* { dg-do compile } */
+/* { dg-options "-Werror=larger-than-65536 -Werror=format 
-Werror=missing-noreturn" } */
+
+int a[131072]; /* { dg-error "size of 'a' is \[1-9]\[0-9]* bytes" } */
+int b[1024];   /* { dg-bogus "size of 'b' is \[1-9]\[0-9]* bytes" } */
+
+void
+f1 (const char *fmt)
+{
+  __builtin_printf ("%d\n", 1.2);  /* { dg-error "expects argument of 
type" } */
+  __builtin_printf (fmt, 1.2); /* { dg-bogus "format not a string 
literal, argument types not checked" } */
+}
+
+extern void f2 (void);
+void
+f2 (void) /* { dg-error "candidate for attribute 'noreturn'" "detect noreturn 
candidate" } */
+{
+  __builtin_exit (0);
+}
+
+/* { dg-prune-output "treated as errors" } */
--- gcc/testsuite/c-c++-common/pr68833-2.c.jj   2015-12-11 16:27:32.571160818 
+0100
+++ gcc/testsuite/c-c++-common/pr68833-2.c  2015-12-11 16:28:15.296565741 
+0100
@@ -0,0 +1,16 @@
+/* PR c/68833 */
+/* { dg-do compile } */
+/* { dg-options "-Werror=missing-format-attribute" } */
+
+#include 
+
+void
+foo (const char *fmt, ...)
+{
+  va_list ap;
+  va_start (ap, fmt);
+  __builtin_vprintf (fmt, ap); /* { dg-error "candidate" "printf attribute 
warning" } */
+  va_end (ap);
+}
+
+/* { dg-prune-output "treated as errors" } */
--- gcc/testsuite/c-c++-common/pr68833-3.c.jj   2015-12-11 16:38:53.803670711 
+0100
+++ gcc/testsuite/c-c++-common/pr68833-3.c  2015-12-11 16:41:28.699512849 
+0100
@@ -0,0 +1,7 @@
+/* PR c/68833 */
+/* { dg-do preprocess } */
+/* { dg-options "-Werror=normalized" } */
+
+\u0F43  // { dg-error "`.U0f43' 

Re: [PATCHES, PING*5] Enhance standard DWARF for Ada

2015-12-11 Thread Jason Merrill

On 11/26/2015 07:34 AM, Pierre-Marie de Rodat wrote:

On 11/25/2015 07:35 PM, Jason Merrill wrote:

Actually, even though my patches introduce DWARF procedures for only
one

case (size functions from stor-layout.c), they don’t necessarily come
from code generation (GENERIC): they are just a way to factorize common
DWARF operations. Thinking more about it, it may be more sound to store
stack slot diffs instead of FUNCTION_DECL nodes in
dwarf_proc_decl_table.


Makes sense.


Done! (I repalced the dwarf_proc_decl_table hash table with a
dwarf_proc_stack_usage_map hash_map) Here's an update for the only
affected patch. Regtested again on x86_64-linux.


Hmm, can we generate the DWARF procedures during finalize_size_functions 
to avoid the need for preserve_body?


Jason



Re: [PATCH] Add pass parameter to TERMINATE_PASS_LIST

2015-12-11 Thread Jeff Law

On 12/11/2015 03:59 AM, Tom de Vries wrote:

Hi,

This patch adds a parameter to TERMINATE_PASS_LIST, that should match
the pass list it's supposed to terminate.

The intention of the patch is that it:
- makes it easier to understand the top-level hierarchy of the pass
   list (given that the top-level list may be quite long).
- ensures that INSERT_PASSES_AFTER and TERMINATE_PASS_LIST are paired.

OK for stage3/stage1 trunk, if bootstrap and reg-test succeeds?
I'd prefer to wait for stage1 on this.  It feels like we've got way too 
much macro-magic going on in here.  I guess as long as we have the 
macro-magic in here, verification like this is OK.



Jeff


Re: extend shift count warnings to vector types

2015-12-11 Thread Jeff Law

On 12/11/2015 12:28 AM, Jan Beulich wrote:

gcc/c/
2015-12-10  Jan Beulich  

* c-fold.c (c_fully_fold_internal): Also emit shift count
warnings for vector types.
* c-typeck.c (build_binary_op): Likewise.

Needs testcases for the added warnings.

My additional concern here would be that in build_binary_op, after your 
change, we'll be setting doing_shift to true.  That in turn will enable 
ubsan instrumentation of the shift.  Does ubsan work properly for vector 
shifts?



Jeff


Re: ipa-cp heuristics fixes

2015-12-11 Thread Jan Hubicka
> On Thu, Dec 10, 2015 at 08:30:37AM +0100, Jan Hubicka wrote:
> > I am bootstrapping/regtesting this on x86_64-linux, does it seem OK?
> 
> I think this patch (just a guess, but certainly ipa-cp related during last
> 24 hours) significantly regressed guality/pr36728-*.c on x86_64.
> Previously we have not turned foo into foo.constprop*, now we do, and pass
> just arg7 instead of arg1..arg7.  That is fine, but we really should be

Yes, I changed the heuristics to consider it a win to drop the arugment even
if no constant propagation is done.

> emitting the debug info stuff for that case, that was added to fix PR47858,
> but for whatever reason it doesn't happen in this case.  Does it take some
> other path in ipa-prop.c, or bypass ipa-prop, something different?

No, it is the same clonning as any other. Just the decision heuristics changed.
We would get same issue qith guality/pr36728- if one of arg1...arg7 was
constant.  I will try to take a look at your patch for PR47858 and see if I can
work out what goes wrong.

Honza
> 
> > * ipa-cp.c (ipcp_cloning_candidate_p): Use node->optimize_for_size_p.
> > (good_cloning_opportunity_p): Likewise.
> > (gather_context_independent_values): Do not return true when
> > polymorphic call context is known or when we have known aggregate
> > value of unused parameter.
> > (estimate_local_effects): Try to create clone for all context
> > when either some params are substituted or devirtualization is possible
> > or some params can be removed; use local flag instead of
> > node->will_be_removed_from_program_if_no_direct_calls_p.
> > (identify_dead_nodes): Likewise.
> 
>   Jakub


Re: [PATCH] Fix PR c++/21802 (two-stage name lookup fails for operators)

2015-12-11 Thread Jason Merrill

On 12/11/2015 02:43 PM, Patrick Palka wrote:

+ if (processing_template_decl
+ && result != NULL_TREE
+ && result != error_mark_node
+ && DECL_HIDDEN_FRIEND_P (cand->fn))
+   {
+ tree call = result;
+ if (REFERENCE_REF_P (call))
+   call = TREE_OPERAND (call, 0);
+ KOENIG_LOOKUP_P (call) = 1;
+   }


This should have a comment explaining that this prevents 
build_new_function_call from discarding the function.



+  if (op == PREINCREMENT_EXPR
+  || op == PREDECREMENT_EXPR)
+gcc_assert (nargs == 1);
+  else if (op == MODOP_EXPR)
+gcc_assert (nargs == 2);
+  else
+gcc_assert (nargs == TREE_CODE_LENGTH (op));


This should use cp_tree_operand_length.

Jason



[PATCH 2/3] Fix range/location terminology

2015-12-11 Thread David Malcolm
The terminology within rich_location has become muddled, and
with the simplifications of the previous patch, I'd like to
rename things to better reflect what's going on.

A rich_location can contain one more more nested locations, each
of which can have a start, a finish, and optionally a caret.
These nested locations are essentially a location_t plus a flag,
but for historical reasons the current code confusingly refers to
them as "ranges".

This patch consolidates the terminology throughout rich_location
and diagnostic_show_locus; I believe it's a significant
clarification.

"struct location_range" becomes "struct nested_location"
Various "range" within rich_location become "location" e.g.
"add_range" becomes "add_location".

I didn't rename class layout_range within diagnostic-show-locus.c
as this is relatively localized, and I can't think of a better name
for it.

gcc/c-family/ChangeLog:
* c-common.c (c_cpp_error): Update for renaming of
rich_location::set_range to set_location.

gcc/ChangeLog:
* diagnostic-show-locus.c (class layout_range): Update
for "range" to "location" renamings.
(layout::layout): Likewise, updating wording of comments.
* diagnostic.c (diagnostic_initialize): Likewise.
* diagnostic.h (struct diagnostic_context): Likewise.
* gcc-rich-location.c (gcc_rich_location::add_expr): Likewise.
(gcc_rich_location::maybe_add_expr): Likewise.
* pretty-print.c (text_info::set_location): Likewise.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_show_trees.c
(gcc_rich_location::add_expr): Update for "range" to "location"
renamings.
(show_tree): Drop redundant local "range".
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (add_range):
Rename to...
(add_location): ...this.  Update for corresponding renaming of
rich_location method.
(test_show_locus): Update for renamings.

libcpp/ChangeLog:
* include/line-map.h (struct location_range): Rename to...
(struct nested_location): ...this, and update wording of comment.
(class rich_location): Update comment.
(rich_location::add_range): Rename to...
(rich_location::add_location): ...this.
(rich_location::set_range): Rename to...
(rich_location::set_location): ...this.
(rich_location::get_num_locations): Update for renamings.
(rich_location::get_range): Rename to...
(rich_location::get_location): ...this.
(rich_location::MAX_RANGES): Rename to...
(rich_location::MAX_LOCATIONS): ...this.
(rich_location::m_num_ranges): Rename to...
(rich_location::m_num_locations): ...this.
(rich_location::m_locations): Update for renaming of
location_range to nested_location.
* line-map.c (rich_location::rich_location): Update for renamings.
(rich_location::get_loc): Update for renamings.
(rich_location::get_expanded_location): Update comment.
(rich_location::override_column): Likewise.
(rich_location::add_range): Rename to...
(rich_location::add_location): ...this, updating for renamings,
and renaming local "range" to "nested_loc".
(rich_location::set_range): Rename to...
(rich_location::set_location): ...this, updating for renamings,
renaming local "locrange" to "nested_loc".
---
 gcc/c-family/c-common.c|  2 +-
 gcc/diagnostic-show-locus.c| 24 -
 gcc/diagnostic.c   |  2 +-
 gcc/diagnostic.h   |  2 +-
 gcc/gcc-rich-location.c|  6 +--
 gcc/pretty-print.c |  4 +-
 .../gcc.dg/plugin/diagnostic_plugin_show_trees.c   |  3 +-
 .../plugin/diagnostic_plugin_test_show_locus.c | 24 -
 libcpp/include/line-map.h  | 61 +++---
 libcpp/line-map.c  | 47 +
 10 files changed, 88 insertions(+), 87 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 509a0ca..ab61031 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -10161,7 +10161,7 @@ c_cpp_error (cpp_reader *pfile ATTRIBUTE_UNUSED, int 
level, int reason,
   gcc_unreachable ();
 }
   if (done_lexing)
-richloc->set_range (0, input_location, true);
+richloc->set_location (0, input_location, true);
   diagnostic_set_info_translated (, msg, ap,
  richloc, dlevel);
   diagnostic_override_option_index (,
diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 16004d8..f279019 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -112,7 +112,7 @@ class layout_point
   int m_column;
 };
 
-/* A class for use by "class layout" below: a filtered location_range.  

  1   2   >