Re: Splitting up gcc/omp-low.c?

2016-05-24 Thread Thomas Schwinge
Hi!

Ping.

Given that we conceptually agreed about this task, but apparently nobody
is now interested in reviewing my proposed changes (and tells me how
they'd like me to submit the patch for review), should I maybe just
execute the steps?

On Wed, 18 May 2016 13:42:37 +0200, Thomas Schwinge  
wrote:
> Ping.
> 
> On Wed, 11 May 2016 15:44:14 +0200, I wrote:
> > Ping.
> > 
> > On Tue, 03 May 2016 11:34:39 +0200, I wrote:
> > > On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> > > > On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > > > > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek  
> > > > > wrote:
> > > > > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > > > > >how about we split up gcc/omp-low.c into several
> > > > > > > >files?  Would it make sense (I have not yet looked in detail) to 
> > > > > > > >do so
> > > > > > > >along the borders of the several passes defined therein?
> > > > 
> > > > > > > I suspect a split along the ompexp/omplow boundary would be quite 
> > > > > > > easy to
> > > > > > > achieve.
> > > > 
> > > > That was indeed the first one that I tackled, omp-expand.c (spelled out
> > > > "expand" instead of "exp" to avoid confusion as "exp" might also be 
> > > > short
> > > > for "expression"; OK?) [...]
> > > 
> > > That's the one I'd suggest to pursue next, now that GCC 6.1 has been
> > > released.  How would you like me to submit the patch for review?  (It's
> > > huge, obviously.)
> > > 
> > > A few high-level comments, and questions that remain to be answered:
> > > 
> > > > Stuff that does not relate to OMP lowering, I did not move stuff out of
> > > > omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
> > > > instead just left all that in omp-low.c.  We'll see how far we get.
> > > > 
> > > > One thing I noticed is that there sometimes is more than one suitable
> > > > place to put stuff: omp-low.c and omp-expand.c categorize by compiler
> > > > passes, and omp-offload.c -- at least in part -- [would be] about the 
> > > > orthogonal
> > > > "offloading" category.  For example, see the OMPTODO "struct oacc_loop
> > > > and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that 
> > > > goes.
> > > 
> > > > Some more comments, to help review:
> > > 
> > > > As I don't know how this is usually done: is it appropriate to remove
> > > > "Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
> > > > his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
> > > > contributing a ton of other stuff since omp-low.c has been created), or
> > > > does this line stay in omp-low.c, or do I even duplicate it into the new
> > > > files?
> > > > 
> > > > I tried not to re-order stuff when moving.  But: we may actually want to
> > > > reorder stuff, to put it into a more sensible order.  Any suggestions?
> > > 
> > > > I had to export a small number of functions (see the prototypes not 
> > > > moved
> > > > but added to the header files).
> > > > 
> > > > Because it's also used in omp-expand.c, I moved the one-line static
> > > > inline is_reference function from omp-low.c to omp-low.h, and renamed it
> > > > to omp_is_reference because of the very generic name.  Similar functions
> > > > stay in omp-low.c however, so they're no longer defined next to each
> > > > other.  OK, or does this need a different solution?


Grüße
 Thomas


Re: [PATCH] Fix up Yr constraint

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 9:02 PM, Jakub Jelinek  wrote:
> On Tue, May 24, 2016 at 08:35:12PM +0200, Uros Bizjak wrote:
>> On Tue, May 24, 2016 at 6:55 PM, Jakub Jelinek  wrote:
>> > Hi!
>> >
>> > The Yr constraint contrary to what has been said when it has been submitted
>> > actually is always NO_REX_SSE_REGS or NO_REGS, never ALL_SSE_REGS, so
>> > the RA restriction to only the first 8 regs is done no matter what we tune
>> > for.
>> >
>> > This is because we test X86_TUNE_AVOID_4BYTE_PREFIXES, which is an enum
>> > value (59), rather than actually checking if the tune flag.
>> >
>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> >
>> > 2016-05-24  Jakub Jelinek  
>> >
>> > * config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
>> > * config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
>> > rather than X86_TUNE_AVOID_4BYTE_PREFIXES.
>>
>> Uh, another brown-paper bag bug...
>>
>> OK everywhere.
>
> I fear it might be too dangerous for -mavx512* for the branches; I went
> through all the Yr uses on the trunk, but not on the branches.
> Would you be ok with using
> "TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : SSE_REGS) : 
> NO_REGS"
> on the branches instead?
> Or I guess we could use it on the trunk too, it should make no difference 
> there
> (because on the trunk it is only used when !TARGET_AVX).
> Or maybe even
> "TARGET_SSE ? ((TARGET_AVOID_4BYTE_PREFIXES && !TARGET_AVX) ? NO_REX_SSE_REGS 
> : SSE_REGS) : NO_REGS"
> (again, should make zero difference on the trunk, but might be better for
> the branches).

Indeed, let's play safe and go with the later version on branches.
Please also add a small comment, to avoid head-scratching in the
future.

Uros.


Re: libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to test

2016-05-24 Thread Thomas Schwinge
Hi!

Ping...

On Wed, 18 May 2016 13:41:25 +0200, I wrote:
> Ping.
> 
> On Wed, 11 May 2016 15:45:13 +0200, I wrote:
> > Ping.
> > 
> > On Mon, 02 May 2016 11:54:27 +0200, I wrote:
> > > On Fri, 29 Apr 2016 09:43:41 +0200, Jakub Jelinek  
> > > wrote:
> > > > On Thu, Apr 28, 2016 at 12:43:43PM +0200, Thomas Schwinge wrote:
> > > > > commit 3b521f3e35fdb4b320e95b5f6a82b8d89399481a
> > > > > Author: Thomas Schwinge 
> > > > > Date:   Thu Apr 21 11:36:39 2016 +0200
> > > > > 
> > > > > libgomp: Unconfuse offload plugins vs. offload targets
> > > > 
> > > > I don't like this patch at all, rather than unconfusing stuff it
> > > > makes stuff confusing.  Plugins are just a way to support various
> > > > offloading targets.
> > > 
> > > Huh; my patch exactly clarifies that the offload_targets variable does
> > > not actually list offload target names, but does list libgomp offload
> > > plugin names...
> > > 
> > > > Can you please post just a short patch without all those changes
> > > > that does what you want, rather than renaming everything at the same 
> > > > time?
> > > 
> > > I thought incremental, self-contained patches were easier to review.
> > > Anyway, here's the three patches merged into one:
> > > 
> > > commit 8060ae3474072eef685381d80f566d1c0942c603
> > > Author: Thomas Schwinge 
> > > Date:   Thu Apr 21 11:36:39 2016 +0200
> > > 
> > > libgomp: In OpenACC testing, cycle though $offload_targets, and by 
> > > default only build for the offload target that we're actually going to 
> > > test
> > > 
> > >   libgomp/
> > >   * plugin/configfrag.ac (offload_targets): Actually enumerate
> > >   offload targets, and add...
> > >   (offload_plugins): ... this one to enumerate offload plugins.
> > >   (OFFLOAD_PLUGINS): Renamed from OFFLOAD_TARGETS.
> > >   * target.c (gomp_target_init): Adjust to that.
> > >   * testsuite/lib/libgomp.exp: Likewise.
> > >   (offload_targets_s, offload_targets_s_openacc): Remove 
> > > variables.
> > >   (offload_target_to_openacc_device_type): New proc.
> > >   (check_effective_target_openacc_nvidia_accel_selected)
> > >   (check_effective_target_openacc_host_selected): Examine
> > >   $openacc_device_type instead of $offload_target_openacc.
> > >   * Makefile.in: Regenerate.
> > >   * config.h.in: Likewise.
> > >   * configure: Likewise.
> > >   * testsuite/Makefile.in: Likewise.
> > >   * testsuite/libgomp.oacc-c++/c++.exp: Cycle through
> > >   $offload_targets (plus "disable") instead of
> > >   $offload_targets_s_openacc, and add "-foffload=$offload_target" 
> > > to
> > >   tagopt.
> > >   * testsuite/libgomp.oacc-c/c.exp: Likewise.
> > >   * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
> > > ---
> > >  libgomp/Makefile.in|  1 +
> > >  libgomp/config.h.in|  4 +-
> > >  libgomp/configure  | 44 +++--
> > >  libgomp/plugin/configfrag.ac   | 39 +++-
> > >  libgomp/target.c   |  8 +--
> > >  libgomp/testsuite/Makefile.in  |  1 +
> > >  libgomp/testsuite/lib/libgomp.exp  | 72 
> > > ++
> > >  libgomp/testsuite/libgomp.oacc-c++/c++.exp | 30 +
> > >  libgomp/testsuite/libgomp.oacc-c/c.exp | 30 +
> > >  libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 22 ---
> > >  10 files changed, 142 insertions(+), 109 deletions(-)
> > > 
> > > diff --git libgomp/Makefile.in libgomp/Makefile.in
> > > [snipped]
> > > diff --git libgomp/config.h.in libgomp/config.h.in
> > > [snipped]
> > > diff --git libgomp/configure libgomp/configure
> > > [snipped]
> > > diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
> > > index 88b4156..de0a6f6 100644
> > > --- libgomp/plugin/configfrag.ac
> > > +++ libgomp/plugin/configfrag.ac
> > > @@ -26,8 +26,6 @@
> > >  # see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > >  # .
> > >  
> > > -offload_targets=
> > > -AC_SUBST(offload_targets)
> > >  plugin_support=yes
> > >  AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
> > >  if test x"$plugin_support" = xyes; then
> > > @@ -142,7 +140,13 @@ AC_SUBST(PLUGIN_HSA_LIBS)
> > >  
> > >  
> > >  
> > > -# Get offload targets and path to install tree of offloading compiler.
> > > +# Parse offload targets, and figure out libgomp plugin, and configure the
> > > +# corresponding offload compiler.  offload_plugins and offload_targets 
> > > will be
> > > +# populated in the same order.
> > > +offload_plugins=
> > > +offload_targets=
> > > +AC_SUBST(offload_plugins)
> > > +AC_SUBST(offload_targets)
> > >  offload_additional_options=
> > >  offload_additional_lib_paths=
> > >  AC_SUBST(offload_addit

Re: [Patch] Implement is_[nothrow_]swappable (p0185r1)

2016-05-24 Thread Daniel Krügler
2016-05-23 13:50 GMT+02:00 Jonathan Wakely :
> On 17/05/16 20:39 +0200, Daniel Krügler wrote:
>>
>> This is an implementation of the Standard is_swappable traits according to
>>
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0185r1.html
>>
>> During that work it has been found that std::array's member swap's
>> exception
>> specification for zero-size arrays was incorrectly depending on the
>> value_type
>> and that was fixed as well.
>
> This looks good to me, I'll get it committed (with some adjustment to
> the ChangeLog format) - thanks.

Unfortunately I need to withdraw the suggested patch. Besides some
obvious errors there are issues that require me to get the testsuite
run on my Windows system, which had not yet succeeded.

I would appreciate, if anyone who has succeeded to run the test suite
on a Windows system (preferably mingw), could contact me off-list.

Thanks,

- Daniel


-- 


SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154



RE: [Patch V2] Fix SLP PR58135.

2016-05-24 Thread Kumar, Venkataramanan
Hi Christophe, 

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Tuesday, May 24, 2016 8:45 PM
> To: Kumar, Venkataramanan 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch V2] Fix SLP PR58135.
> 
> Hi Venkat,
> 
> 
> On 23 May 2016 at 11:54, Kumar, Venkataramanan
>  wrote:
> > Hi Richard,
> >
> >> -Original Message-
> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> Sent: Thursday, May 19, 2016 4:08 PM
> >> To: Kumar, Venkataramanan 
> >> Cc: gcc-patches@gcc.gnu.org
> >> Subject: Re: [Patch V2] Fix SLP PR58135.
> >>
> >> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan
> >>  wrote:
> >> > Hi Richard,
> >> >
> >> >> -Original Message-
> >> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> >> Sent: Tuesday, May 17, 2016 5:40 PM
> >> >> To: Kumar, Venkataramanan 
> >> >> Cc: gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [Patch V2] Fix SLP PR58135.
> >> >>
> >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
> >> >>  wrote:
> >> >> > Hi Richard,
> >> >> >
> >> >> > I created the patch by passing -b option to git. Now the patch
> >> >> > is more
> >> >> readable.
> >> >> >
> >> >> > As per your suggestion I tried to fix the PR by splitting the
> >> >> > SLP store group at
> >> >> vector boundary after the SLP tree is built.
> >> >> >
> >> >> > Boot strap PASSED on x86_64.
> >> >> > Checked the patch with check_GNU_style.sh.
> >> >> >
> >> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization.
> >> >> > Hence it
> >> >> generated 2 more vzeroupper.
> >> >> > As recommended I adjusted the test case by adding
> >> >> > -fno-tree-slp-vectorize
> >> >> to make it as expected after loop vectorization.
> >> >> >
> >> >> > The following tests are now passing.
> >> >> >
> >> >> > -- Snip-
> >> >> > Tests that now work, but didn't before:
> >> >> >
> >> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects
> >> >> > scan-tree-dump-times
> >> >> > slp2 "basic block vectorized" 1
> >> >> >
> >> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
> >> >> > vectorized" 1
> >> >> >
> >> >> > New tests that PASS:
> >> >> >
> >> >> > gcc.dg/vect/pr58135.c (test for excess errors)
> >> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess
> >> >> > errors)
> >> >> >
> >> >> > -- Snip-
> >> >> >
> >> >> > ChangeLog
> >> >> >
> >> >> > 2016-05-14  Venkataramanan Kumar
> >> >> 
> >> >> >  PR tree-optimization/58135
> >> >> > * tree-vect-slp.c:  When group size is not multiple of vector 
> >> >> > size,
> >> >> >  allow splitting of store group at vector boundary.
> >> >> >
> >> >> > Test suite  ChangeLog
> >> >> > 2016-05-14  Venkataramanan Kumar
> >> >> 
> >> >> > * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
> >> >> > * gcc.dg/vect/pr58135.c:  Add new.
> >> >> > * gfortran.dg/pr46519-1.f: Adjust test case.
> >> >> >
> >> >> > The attached patch Ok for trunk?
> >> >>
> >> >>
> >> >> Please avoid the excessive vertical space around the
> >> >> vect_build_slp_tree
> >> call.
> >> > Yes fixed in the attached patch.
> >> >>
> >> >> +  /* Calculate the unrolling factor.  */
> >> >> +  unrolling_factor = least_common_multiple
> >> >> + (nunits, group_size) / group_size;
> >> >> ...
> >> >> +  else
> >> >> {
> >> >>   /* Calculate the unrolling factor based on the smallest type. 
> >> >>  */
> >> >>   if (max_nunits > nunits)
> >> >> -unrolling_factor = least_common_multiple (max_nunits,
> group_size)
> >> >> -   / group_size;
> >> >> +   unrolling_factor
> >> >> +   = least_common_multiple (max_nunits,
> >> >> + group_size)/group_size;
> >> >>
> >> >> please compute the "correct" unroll factor immediately and move
> >> >> the "unrolling of BB required" error into the if() case by
> >> >> post-poning the nunits < group_size check (and use max_nunits here).
> >> >>
> >> > Yes fixed in the attached patch.
> >> >
> >> >> +  if (is_a  (vinfo)
> >> >> + && nunits < group_size
> >> >> + && unrolling_factor != 1
> >> >> + && is_a  (vinfo))
> >> >> +   {
> >> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> >> +  "Build SLP failed: store group "
> >> >> +  "size not a multiple of the vector size "
> >> >> +  "in basic block SLP\n");
> >> >> + /* Fatal mismatch.  */
> >> >> + matches[nunits] = false;
> >> >>
> >> >> this is too pessimistic - you want to add the extra 'false' at
> >> >> group_size / max_nunits * max_nunits.
> >> > Yes fixed in attached patch.
> >> >
> >> >>
> >> >> It looks like you leak 'node' in the if () path as well.  You need
> >> >>
> >> >>   vect_free_slp_tree (node);
> >> >>   loads.release ();
> >> >>
> >> >> thus treat it as a failure case.
> >> >

Re: More backwards/FSM jump thread refactoring and extension

2016-05-24 Thread Jeff Law

On 05/24/2016 06:03 PM, Trevor Saunders wrote:

On Tue, May 24, 2016 at 10:58:18AM -0600, Jeff Law wrote:

--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -356,6 +356,44 @@ profitable_jump_thread_path (vec 
*&path,
   return taken_edge;
 }

+/* PATH is vector of blocks forming a jump threading path in reverse
+   order.  TAKEN_EDGE is the edge taken from path[0].
+
+   Convert that path into the form used by register_jump_thread and
+   register the path.   */
+
+static void
+convert_and_register_jump_thread_path (vec *&path,


is there a reason that isn't vec * instead of
vec *&? It seems like that's just useless indirection, and
allowing this function to be able to change more than it needs.
I didn't try to clean up anything of that nature.  It's a good follow-up 
item though.  Thanks for pointing it out.





+  edge taken_edge)
+{
+  vec *jump_thread_path = new vec ();


Its not new, but I'm always a little sad to see something that's only
sizeof(void *) big be malloced on its own.
I wouldn't be terribly surprised if the backwards/FSM threader drops the 
jump_thread_edge representation after I pull it out of the main threader 
into its own pass.


jeff


Re: [PATCH][MIPS] Add support for code_readable function attribute

2016-05-24 Thread Sandra Loosemore

On 05/24/2016 08:25 AM, Robert Suchanek wrote:

[snip]

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e4d6c1c..dd23c70 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4441,6 +4441,23 @@ On MIPS targets, you can use the @code{nocompression} 
function attribute
  to locally turn off MIPS16 and microMIPS code generation.  This attribute
  overrides the @option{-mips16} and @option{-mmicromips} options on the
  command line (@pxref{MIPS Options}).
+
+@item code_readable
+@cindex @code{code_readable} function attribute, MIPS
+For MIPS targets that support PC-relative addressing modes, this attribute
+can be used to control how an object is addressed.  The attribute takes
+a single optional argument:


The problem here is that we don't tell users that the argument has to be 
a string constant in quotes, and not just a token.


How about changing the above text to end with:

"...a single optional argument, which must be one of the following 
string constants:"


and then changing this to be @table @code and quoting the @item strings:


+
+@table @samp
+@item no
+The function should not read the instruction stream as data.
+@item yes
+The function can read the instruction stream as data.
+@item pcrel
+The function can read the instruction stream in a pc-relative mode.
+@end table
+


Then it'll be consistent with this:


+If there is no argument supplied, the default of @code{"yes"} applies.
  @end table

  @node MSP430 Function Attributes


-Sandra



Re: [PATCH][MIPS] Add -minline-intermix to ignore compression flags when inlining

2016-05-24 Thread Sandra Loosemore

On 05/24/2016 08:23 AM, Robert Suchanek wrote:



[snip]

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 73f1cb6..2f6195e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -837,6 +837,7 @@ Objective-C and Objective-C++ Dialects}.
  -mips16  -mno-mips16  -mflip-mips16 @gol
  -minterlink-compressed -mno-interlink-compressed @gol
  -minterlink-mips16  -mno-interlink-mips16 @gol
+-minline-intermix -mno-inline-intermix @gol


Funky indentation here


  -mabi=@var{abi}  -mabicalls  -mno-abicalls @gol
  -mshared  -mno-shared  -mplt  -mno-plt  -mxgot  -mno-xgot @gol
  -mgp32  -mgp64  -mfp32  -mfpxx  -mfp64  -mhard-float  -msoft-float @gol
@@ -17916,6 +17917,18 @@ Aliases of @option{-minterlink-compressed} and
  @option{-mno-interlink-compressed}.  These options predate the microMIPS ASE
  and are retained for backwards compatibility.

+@item -minline-intermix
+@itemx -mno-inline-intermix
+@opindex minline-intermix
+@opindex mno-inline-intermix
+Enable inlining of functions which have opposing compression flags e.g.
+@code{mips16}/@code{nomips16} attributes.
+This is useful when using the @code{mips16} attribute to balance code size
+and performance so that a function will be compressed when not inlined or
+vice-versa.  When using this option it is necessary to protect functions
+that cannot be compiled as MIPS16 with a @code{noinline} attribute to ensure
+they are not inlined into a MIPS16 function.


This flag applies to microMIPS inlining, too, right?  It's confusing to 
only mention MIPS16.


Maybe you could say something like this instead:

Allow inlining even if the compression flags differ between caller and 
callee.  This is useful in conjunction with the @code{mips16}, 
@code{micromips}, or @code{nocompression} function attributes.  The code 
for the inlined function is compiled using the compression flags for the 
callee, so you may need to use the @code{noinline} attribute on 
functions that must be compiled with particular compression settings.


-Sandra



Re: More backwards/FSM jump thread refactoring and extension

2016-05-24 Thread Trevor Saunders
On Tue, May 24, 2016 at 10:58:18AM -0600, Jeff Law wrote:
> --- a/gcc/tree-ssa-threadbackward.c
> +++ b/gcc/tree-ssa-threadbackward.c
> @@ -356,6 +356,44 @@ profitable_jump_thread_path (vec 
> *&path,
>return taken_edge;
>  }
>  
> +/* PATH is vector of blocks forming a jump threading path in reverse
> +   order.  TAKEN_EDGE is the edge taken from path[0].
> +
> +   Convert that path into the form used by register_jump_thread and
> +   register the path.   */
> +
> +static void
> +convert_and_register_jump_thread_path (vec *&path,

is there a reason that isn't vec * instead of
vec *&? It seems like that's just useless indirection, and
allowing this function to be able to change more than it needs.

> +edge taken_edge)
> +{
> +  vec *jump_thread_path = new vec ();

Its not new, but I'm always a little sad to see something that's only
sizeof(void *) big be malloced on its own.

Trev



Re: C++ PATCH for c++/70735 (static locals and generic lambdas)

2016-05-24 Thread Mike Stump
On May 24, 2016, at 3:35 PM, Paolo Carlini  wrote:
> On 23/05/2016 21:01, Jason Merrill wrote:
>> +// PR c++/70735
>> +// { dg-do run { target c++1y } }
>> +
> [...]
>> @@ -0,0 +1,19 @@
>> +// PR c++/70735
>> +// { dg-do run { target c++1y } }
> I'm changing these c++1y to c++14.

Thanks.  :-)  

I think:

  g++.dg/pr65295.C

can be updated to use c++14 as well.  It is the last one that needs updating.


Re: C++ PATCH for c++/70735 (static locals and generic lambdas)

2016-05-24 Thread Paolo Carlini

Hi,

On 23/05/2016 21:01, Jason Merrill wrote:

+// PR c++/70735
+// { dg-do run { target c++1y } }
+

[...]

@@ -0,0 +1,19 @@
+// PR c++/70735
+// { dg-do run { target c++1y } }

I'm changing these c++1y to c++14.

Paolo.


Re: [PATCH, rs6000 testsuite] PR71050, Fix lhs-1.c testcase

2016-05-24 Thread Segher Boessenkool
On Tue, May 24, 2016 at 04:55:45PM -0500, Pat Haugen wrote:
> The following simplifies the given testcase so it is no longer sensitive to 
> subreg (and hopefully other) codegen changes. Tested on powerpc64, ok for 
> trunk?
> 
> -Pat
> 
> 
> testsuite/ChangeLog:
> 2016-05-24  Pat Haugen  
> 
> PR target/71050
> * gcc.target/powerpc/lhs-1.c: Fix testcase to avoid subreg changes.

It is okay for trunk.  One thing (well, two)...

> Index: gcc/testsuite/gcc.target/powerpc/lhs-1.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/lhs-1.c  (revision 236325)
> +++ gcc/testsuite/gcc.target/powerpc/lhs-1.c  (working copy)
> @@ -4,19 +4,12 @@
>  /* { dg-options "-O2 -mcpu=power5" } */
>  /* { dg-final { scan-assembler-times "nop" 3 } } */
>  
> -/* Test generation of nops in load hit store situation.  */
> +/* Test generation of nops in load hit store situation. Make sure enough nop 
> insns are
> +   generated to move the load to a new dispatch group. With the simple 
> stw/lwz pair below,
> +   that would be 3 nop insns for Power5.  */

Long lines; dot space space.

Thanks,


Segher


[PATCH, rs6000 testsuite] PR71050, Fix lhs-1.c testcase

2016-05-24 Thread Pat Haugen
The following simplifies the given testcase so it is no longer sensitive to 
subreg (and hopefully other) codegen changes. Tested on powerpc64, ok for trunk?

-Pat


testsuite/ChangeLog:
2016-05-24  Pat Haugen  

PR target/71050
* gcc.target/powerpc/lhs-1.c: Fix testcase to avoid subreg changes.


Index: gcc/testsuite/gcc.target/powerpc/lhs-1.c
===
--- gcc/testsuite/gcc.target/powerpc/lhs-1.c(revision 236325)
+++ gcc/testsuite/gcc.target/powerpc/lhs-1.c(working copy)
@@ -4,19 +4,12 @@
 /* { dg-options "-O2 -mcpu=power5" } */
 /* { dg-final { scan-assembler-times "nop" 3 } } */
 
-/* Test generation of nops in load hit store situation.  */
+/* Test generation of nops in load hit store situation. Make sure enough nop 
insns are
+   generated to move the load to a new dispatch group. With the simple stw/lwz 
pair below,
+   that would be 3 nop insns for Power5.  */
 
-typedef union {
-  double val;
-  struct {
-unsigned int w1;
-unsigned int w2;
-  };
-} words;
-
-unsigned int f (double d, words *u)
+unsigned int f (volatile unsigned int *u, unsigned int u2)
 {
-  u->val = d;
-  return u->w2;
+  *u = u2;
+  return *u;
 }
-



Re: [PATCH], Add support for PowerPC ISA 3.0 VNEGD/VNEGW instructions

2016-05-24 Thread Segher Boessenkool
On Wed, May 18, 2016 at 02:30:31PM -0400, Michael Meissner wrote:
> Unlike some of my patches, this is a fairly simple patch to add support for 
> the
> VNEGW and VNEGD instructions that were added in ISA 3.0.  Note, ISA 3.0 does
> not provide negation for V16QImode/V8HImode, just V4SImode/V2DImode.
> 
> I discovered that when we added ISA 2.07 support for V2DImode, we didn't
> provide an expander for negv2di2, which I added with this patch.
> 
> [gcc]
> 2016-05-18  Michael Meissner  
> 
>   * config/rs6000/altivec.md (VNEG iterator): New iterator for
>   VNEGW/VNEGD instructions.
>   (p9_neg2): New insns for ISA 3.0 VNEGW/VNEGD.
>   (neg2): Add expander for V2DImode added in ISA 2.06, and
>   support for ISA 3.0 VNEGW/VNEGD instructions.
> 
> [gcc/testsuite]
> 2016-05-18  Michael Meissner  
> 
>   * gcc.target/powerpc/p9-vneg.c: New test for ISA 3.0 VNEGW/VNEGD
>   instructions.

I forgot to review this patch, sorry.

> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md  
> (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
> (revision 236398)
> +++ gcc/config/rs6000/altivec.md  (.../gcc/config/rs6000) (working copy)
> @@ -203,6 +203,9 @@ (define_mode_attr VP_small [(V2DI "V4SI"
>  (define_mode_attr VP_small_lc [(V2DI "v4si") (V4SI "v8hi") (V8HI "v16qi")])
>  (define_mode_attr VU_char [(V2DI "w") (V4SI "h") (V8HI "b")])
>  
> n+;; Vector negate
> +(define_mode_iterator VNEG [V4SI V2DI])

Your patch is mangled here, but you'll find out (it won't apply this way).

>  (define_expand "neg2"
> -  [(use (match_operand:VI 0 "register_operand" ""))
> -   (use (match_operand:VI 1 "register_operand" ""))]
> -  "TARGET_ALTIVEC"
> +  [(set (match_operand:VI2 0 "register_operand" "")
> + (neg:VI2 (match_operand:VI2 1 "register_operand" "")))]
> +  ""
>"
>  {
> -  rtx vzero;
> +  if (!TARGET_P9_VECTOR || (mode != V4SImode && mode != 
> V2DImode))
> +{
> +  rtx vzero;
>  
> -  vzero = gen_reg_rtx (GET_MODE (operands[0]));
> -  emit_insn (gen_altivec_vspltis (vzero, const0_rtx));
> -  emit_insn (gen_sub3 (operands[0], vzero, operands[1])); 
> -  
> -  DONE;
> +  vzero = gen_reg_rtx (GET_MODE (operands[0]));
> +  emit_move_insn (vzero, CONST0_RTX (mode));
> +  emit_insn (gen_sub3 (operands[0], vzero, operands[1])); 
> +  DONE;
> +}
>  }")

Please remove the quotes around the C block as well, while you're here?
And a trailing space.

Okay for trunk, okay for 6 after a week or so.

Thanks,


Segher


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Joseph Myers
On Tue, 24 May 2016, Uros Bizjak wrote:

> > I have thrown together a quick patch that defines target_flags as 
> > HOST_WIDE_INT.
> >
> > (Patch still needs a small correction, so opth-gen.awk will emit
> > HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
> > compile functional x86_64-apple-darwin15.5.0 crosscompiler.)
> 
> And here is attached complete (but untested!!) patch that should "just
> work"(TM).

Have you made sure that cl_host_wide_int gets set for options in 
target_flags, so that get_option_state, option_enabled etc. work correctly 
with such options?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH], Add PowerPC ISA 3.0 vector count trailing zeros and vector parity support

2016-05-24 Thread Segher Boessenkool
On Tue, May 24, 2016 at 05:05:14PM -0400, Michael Meissner wrote:
> This patch adds support for two sets of new instructions in ISA 3.0, vector
> count trailing zeros, and vector parity.  In addition, it defines many of the
> support macros that will be used by other built-in functions that will be 
> added
> shortly.
> 
> I have bootstraped this and there were no regressions.  Is it ok to apply to
> the trunk?  Assuming it is ok to apply to the trunk, is it ok to back port to
> the GCC 6.2 branch?

Okay for trunk.  Okay for 6 after a week or so.  A few typoes...

> [gcc/testsuite]
> 2016-05-24  Michael Meissner  
> 
>   * gcc.target/powerpc/p9-vparity.c: New file to check SIA 3.0
>   vector parity built-in functions.

Typo (ISA).

> +/* Miscellaneous builtins for instructions added in ISA 3.0.  These
> +   instructions don't require either the DFP or VSX options, just the basic 

Trailing space (multiple times).

> +If the ISA 3.00 additions to the vector/scalar (power9-vector)
> +instruction set are available:

3.0 (multiple times).

Thanks,


Segher


Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-05-24 Thread Alexander Monakov
On Fri, 20 May 2016, Nathan Sidwell wrote:
> ah,  that's much more understandable,  thanks.  Presumably this doesn't
> support worker-single mode (in OpenACC parlance, I don't know what the OpenMP
> version of that is?)

I don't see why you have concerns.  In OpenMP, what OpenACC calls
'worker-single mode' should correspond to execution of a sequential region
(outside of any 'parallel' region). The region is executed by the initial
thread (warp 0), while other warps, having formed a thread pool, are suspended
on that thread pool's barrier.  When the initial thread reaches the parallel
region, it unblocks the warps in the pool.  The other warps may need data that
is allocated on warp 0 stack, so here it's essential that soft-stacks can
exist on global memory and thus be world-readable.

> And neither would it support calls  from vector-partitioned code (I think
> that's SIMD in OpenMP-land?).

Actually it would: the plan is to switch soft-stack pointer to a region of
.local memory when entering OpenMP SIMD region.  This makes soft-stacks use
lane-private storage inside of SIMD regions (but then it's, of course, no
longer world-readable and not modifiable by atomics).

> It seems like we should reject the combination of -msoft-stack -fopenacc?

Possibly; the doc text makes it explicit that the option is exposed only for
the purpose of testing the compiler, anyway.

> why so many changelogs?  The on-branch development history is irrelvant for
> trunk -- the usual single changelog style should be followed.

OK, if branch history is not interesting for review, I can squash it; I'll
have to do that for the final commit anyway.

> > +  else if (need_frameptr || cfun->machine->has_varadic ||
> > cfun->calls_alloca)
> > +{
> > +  /* Maintain 64-bit stack alignment.  */
> 
> This block needs a more descriptive comment -- it appears to be doing a great
> deal more than maintaining 64-bit stack alignment!

The comment is just for the line that follows, not the whole block.

> > +  int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
> > +  sz = ROUND_UP (sz, keep_align);
> > +  int bits = POINTER_SIZE;
> > +  fprintf (file, "\t.reg.u%d %%frame;\n", bits);
> > +  fprintf (file, "\t.reg.u32 %%fstmp0;\n");
> > +  fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits);
> > +  fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits);
> 
> Some of these register names appear to be long lived -- and referenced in
> other functions.  It would be better to give those more descriptive names, or
> even give them hard-regs.

That's just %fstmp2 (pointer into __nvptx_stacks) and %fstmp1 (previous stack
pointer that we need to restore). I can rename them to %ssloc and %ssold
(better names welcome), but I don't see a value in making them hard-regs --
there's no interface with the middle-end that would be interested in those.

> You should  certainly  do so for those that are already hard regs (%frame &
> %stack)

Sorry, do what? They are already hard regs, and have descriptive names.

> -- is it more feasible to augment init_frame to initialize them?

I don't think so. The whole block could be moved to a separate function though.

>   Since ptx is a virtual target, we just define a few
> > hard registers for special purposes and leave pseudos unallocated.
> > @@ -200,6 +205,7 @@ struct GTY(()) machine_function
> >bool is_varadic;  /* This call is varadic  */
> >bool has_varadic;  /* Current function has a varadic call.  */
> >bool has_chain; /* Current function has outgoing static chain.  */
> > +  bool using_softstack; /* Need to restore __nvptx_stacks[tid.y].  */
> 
> Comment should describe what the attribute is, not what it implies.  In this
> case I think it's /*  Current function has   a soft stack frame.  */

Yes; note it's false when current function is leaf, so the description should
be more like "Current function has a soft stack frame that needs restoring".

> > diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> > index 33a4862..e5650b6 100644
> > --- a/gcc/config/nvptx/nvptx.md
> > +++ b/gcc/config/nvptx/nvptx.md
> 
> 
> > +(define_insn "set_softstack_insn"
> > +  [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)]
> > +  "TARGET_SOFT_STACK"
> > +{
> > +  return (cfun->machine->using_softstack
> > + ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;"
> > + : "");
> > +})
> 
> Is this alloca related (UNSPEC_ALLOCA) or restore related (invoked in
> restore_stack_block), or stack setting (as insn name suggests).  Things seem
> inconsistently named.  Comments would be good.

OK, I'll add some in a respin. This is related to stack setting. I can add a
new UNSPEC for that (UNSPEC_SET_SOFTSTACK).

> >
> >  (define_expand "restore_stack_block"
> >[(match_operand 0 "register_operand" "")
> >(match_operand 1 "register_operand" "")]
> >""
> > {
> > +  if (TARGET_SOFT_STACK)
> > +{
> > +  emit_move_insn (operands[0], operands[1]);
> > +  emi

Re: [PATCH] c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

2016-05-24 Thread Jason Merrill

On 05/24/2016 04:43 PM, Martin Sebor wrote:

On 05/24/2016 12:51 PM, Jason Merrill wrote:

On 05/24/2016 12:15 PM, Martin Sebor wrote:

+  else if (TREE_CODE (type) == ARRAY_TYPE /* && TYPE_DOMAIN (type) */)


Why is this commented out rather than removed in this version of the
patch?  Let's remove it, as before.  OK with that change.


It was commented out by accident.

Since c++/71147 is a regression, should I also backport the patch
to the 6.x branch?


OK.

Jason




[PATCH], Add PowerPC ISA 3.0 vector count trailing zeros and vector parity support

2016-05-24 Thread Michael Meissner
This patch adds support for two sets of new instructions in ISA 3.0, vector
count trailing zeros, and vector parity.  In addition, it defines many of the
support macros that will be used by other built-in functions that will be added
shortly.

I have bootstraped this and there were no regressions.  Is it ok to apply to
the trunk?  Assuming it is ok to apply to the trunk, is it ok to back port to
the GCC 6.2 branch?

[gcc]
2016-05-24  Michael Meissner  

* config/rs6000/altivec.md (VParity): New mode iterator for vector
parity built-in functions.
(p9v_ctz2): Add support for ISA 3.0 vector count trailing
zeros.
(p9v_parity2): Likewise.
* config/rs6000/vector.md (VEC_IP): New mode iterator for vector
parity.
(ctz2): ISA 3.0 expander for vector count trailing zeros.
(parity2): ISA 3.0 expander for vector parity.
* config/rs6000/rs6000-builtin.def (BU_P9_MISC_1): New macros for
power9 built-ins.
(BU_P9_64BIT_MISC_0): Likewise.
(BU_P9_MISC_0): Likewise.
(BU_P9V_AV_1): Likewise.
(BU_P9V_AV_2): Likewise.
(BU_P9V_AV_3): Likewise.
(BU_P9V_AV_P): Likewise.
(BU_P9V_VSX_1): Likewise.
(BU_P9V_OVERLOAD_1): Likewise.
(BU_P9V_OVERLOAD_2): Likewise.
(BU_P9V_OVERLOAD_3): Likewise.
(VCTZB): Add vector count trailing zeros support.
(VCTZH): Likewise.
(VCTZW): Likewise.
(VCTZD): Likewise.
(VPRTYBD): Add vector parity support.
(VPRTYBQ): Likewise.
(VPRTYBW): Likewise.
(VCTZ): Add overloaded vector count trailing zeros support.
(VPRTYB): Add overloaded vector parity support.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
overloaded vector count trailing zeros and parity instructions.
* config/rs6000/rs6000.md (wd mode attribute): Add V1TI and TI for
vector parity support.
* config/rs6000/altivec.h (vec_vctz): Add ISA 3.0 vector count
trailing zeros support.
(vec_cntlz): Likewise.
(vec_vctzb): Likewise.
(vec_vctzd): Likewise.
(vec_vctzh): Likewise.
(vec_vctzw): Likewise.
(vec_vprtyb): Add ISA 3.0 vector parity support.
(vec_vprtybd): Likewise.
(vec_vprtybw): Likewise.
(vec_vprtybq): Likewise.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Document
the ISA 3.0 vector count trailing zeros and vector parity built-in
functions.

[gcc/testsuite]
2016-05-24  Michael Meissner  

* gcc.target/powerpc/p9-vparity.c: New file to check SIA 3.0
vector parity built-in functions.
* gcc.target/powerpc/ctz-3.c: New file to check ISA 3.0 vector
count trailing zeros automatic vectorization.
* gcc.target/powerpc/ctz-4.c: New file to check ISA 3.0 vector
count trailing zeros built-in functions.



-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236663)
+++ gcc/config/rs6000/altivec.md(.../gcc/config/rs6000) (working copy)
@@ -193,6 +193,13 @@ (define_mode_iterator VM2 [V4SI
   (KF "FLOAT128_VECTOR_P (KFmode)")
   (TF "FLOAT128_VECTOR_P (TFmode)")])
 
+;; Specific iterator for parity which does not have a byte/half-word form, but
+;; does have a quad word form
+(define_mode_iterator VParity [V4SI
+  V2DI
+  V1TI
+  (TI "TARGET_VSX_TIMODE")])
+
 (define_mode_attr VI_char [(V2DI "d") (V4SI "w") (V8HI "h") (V16QI "b")])
 (define_mode_attr VI_scalar [(V2DI "DI") (V4SI "SI") (V8HI "HI") (V16QI "QI")])
 (define_mode_attr VI_unit [(V16QI "VECTOR_UNIT_ALTIVEC_P (V16QImode)")
@@ -3415,7 +3422,7 @@ (define_expand "vec_unpacku_float_lo_v8h
 }")
 
 
-;; Power8 vector instructions encoded as Altivec instructions
+;; Power8/power9 vector instructions encoded as Altivec instructions
 
 ;; Vector count leading zeros
 (define_insn "*p8v_clz2"
@@ -3426,6 +3433,15 @@ (define_insn "*p8v_clz2"
   [(set_attr "length" "4")
(set_attr "type" "vecsimple")])
 
+;; Vector count trailing zeros
+(define_insn "*p9v_ctz2"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+   (ctz:VI2 (match_operand:VI2 1 "register_operand" "v")))]
+  "TARGET_P9_VECTOR"
+  "vctz %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
 ;; Vector population count
 (define_insn "*p8v_popcount2"
   [(set (match_operand:VI2 0 "register_operand" "=v")
@@ -3435,6 +3451,15 @@ (define_insn "*p8v_popcount2"
   [(set_attr "length" "4")
(set_attr "type" "vecsimple")])
 

Re: New hashtable power 2 rehash policy

2016-05-24 Thread François Dumont

Attached patch applied then.

I had to regorganize things a little now that some pieces have been 
integrated in 71181 patch.


2016-05-24  François Dumont  

* include/bits/c++config (_GLIBCXX14_USE_CONSTEXPR): New.
* include/bits/hashtable_policy.h
(_Prime_rehash_policy::__has_load_factor): New. Mark rehash policy
having load factor management.
(_Mask_range_hashing): New.
(__clp2): New.
(_Power2_rehash_policy): New.
(_Inserts<>): Remove last template parameter, _Unique_keys, so that
partial specializations only depend on whether iterators are constant
or not.
* testsuite/23_containers/unordered_set/hash_policy/26132.cc: Adapt to
test new hash policy.
* testsuite/23_containers/unordered_set/hash_policy/load_factor.cc:
Likewise.
* testsuite/23_containers/unordered_set/hash_policy/rehash.cc:
Likewise.
* testsuite/23_containers/unordered_set/insert/hash_policy.cc:
Likewise.
* testsuite/23_containers/unordered_set/max_load_factor/robustness.cc:
Likewise.
* testsuite/23_containers/unordered_set/hash_policy/power2_rehash.cc:
New.
* testsuite/performance/23_containers/insert/54075.cc: Add benchmark
using the new hash policy.
* testsuite/performance/23_containers/insert_erase/41975.cc: Likewise.

François

On 23/05/2016 13:31, Jonathan Wakely wrote:

On 17/05/16 22:28 +0200, François Dumont wrote:

On 14/05/2016 19:06, Daniel Krügler wrote:

1) The function __clp2 is declared using _GLIBCXX14_CONSTEXPR, which
means that it is an inline function if and *only* if
_GLIBCXX14_CONSTEXPR really expands to constexpr, otherwise it is
*not* inline, which is probably not intended and could easily cause
ODR problems. I suggest to mark it unconditionally as inline,
regardless of _GLIBCXX14_CONSTEXPR.


Maybe _GLIBCXX14_CONSTEXPR should take inline value previous to C++14 
mode.


That's probably a good idea.


For the moment I simply added the inline as done in other situations.


OK, thanks.



2) Furthermore I suggest to declare __clp2 as noexcept - this is
(intentionally) *not* implied by constexpr.

3) Is there any reason, why _Power2_rehash_policy::_M_next_bkt
shouldn't be noexcept?

4) Similar to (3) for _Power2_rehash_policy's member functions
_M_bkt_for_elements, _M_need_rehash, _M_state, _M_reset
For noexcept I throught we were only adding it if necessary. We might 
have to go through a lot of code to find all places where noexcept 
could be added. Jonathan will give his feedback.


I'm in favour of adding it anywhere that that definitely can't throw.
We don't *need* to do that everywhere, but it doesn't hurt.


For the moment I have added it on all those methods.


Great.


Thanks for feedback, updated and tested patch attached.


OK for trunk - thanks!




Index: include/bits/c++config
===
--- include/bits/c++config	(revision 236662)
+++ include/bits/c++config	(working copy)
@@ -106,8 +106,10 @@
 #ifndef _GLIBCXX14_CONSTEXPR
 # if __cplusplus >= 201402L
 #  define _GLIBCXX14_CONSTEXPR constexpr
+#  define _GLIBCXX14_USE_CONSTEXPR constexpr
 # else
 #  define _GLIBCXX14_CONSTEXPR
+#  define _GLIBCXX14_USE_CONSTEXPR const
 # endif
 #endif
 
Index: include/bits/hashtable_policy.h
===
--- include/bits/hashtable_policy.h	(revision 236662)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -31,6 +31,8 @@
 #ifndef _HASHTABLE_POLICY_H
 #define _HASHTABLE_POLICY_H 1
 
+#include  // for std::min.
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -457,6 +459,8 @@
   /// smallest prime that keeps the load factor small enough.
   struct _Prime_rehash_policy
   {
+using __has_load_factor = std::true_type;
+
 _Prime_rehash_policy(float __z = 1.0) noexcept
 : _M_max_load_factor(__z), _M_next_resize(0) { }
 
@@ -501,6 +505,135 @@
 mutable std::size_t	_M_next_resize;
   };
 
+  /// Range hashing function assuming that second arg is a power of 2.
+  struct _Mask_range_hashing
+  {
+typedef std::size_t first_argument_type;
+typedef std::size_t second_argument_type;
+typedef std::size_t result_type;
+
+result_type
+operator()(first_argument_type __num,
+	   second_argument_type __den) const noexcept
+{ return __num & (__den - 1); }
+  };
+
+  /// Compute closest power of 2.
+  _GLIBCXX14_CONSTEXPR
+  inline std::size_t
+  __clp2(std::size_t n) noexcept
+  {
+#if __SIZEOF_SIZE_T__ >= 8
+std::uint_fast64_t x = n;
+#else
+std::uint_fast32_t x = n;
+#endif
+// Algorithm from Hacker's Delight, Figure 3-3.
+x = x - 1;
+x = x | (x >> 1);
+x = x | (x >> 2);
+x = x | (x >> 4);
+x = x | (x >> 8);
+x = x | (x >>16);
+#if __SIZEOF_SIZE_T__ >= 8
+x = x | (x >>32);
+#endif
+return x + 1;
+  }
+
+  /// Rehash policy providing power of 2 bucket numbers. Avoids modulo
+  /// operations.
+  struct _Powe

Re: C++ PATCH for c++/70584 (parenthesized argument to x86 builtin)

2016-05-24 Thread Jason Merrill

On 05/23/2016 02:58 PM, Jason Merrill wrote:

The C++14 decltype(auto) obfuscation was confusing the x86 builtin; it's
a simple matter to undo it during delayed folding, thanks to the
maybe_undo_parenthesized_ref function that Patrick recently introduced.


But using cp_fold_maybe_rvalue here is wrong, as it will mean 
unconditionally replacing a variable with its initializer.  Better to 
use plain cp_fold and improve cp_fold_maybe_rvalue to handle getting a 
decl back from cp_fold.


Tested x86_64-pc-linux-gnu, applying to trunk.




Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread H.J. Lu
On Tue, May 24, 2016 at 12:06 PM, H.J. Lu  wrote:
> On Tue, May 24, 2016 at 11:44 AM, Uros Bizjak  wrote:
>> On Tue, May 24, 2016 at 8:15 PM, Uros Bizjak  wrote:
>>> On Tue, May 24, 2016 at 7:18 PM, H.J. Lu  wrote:
>>>
> Oh, target_flags is only a 32bit integer :(. Is there a reason it
> can't be extended to HOST_WIDE_INT, as is the case with
> ix86_isa_flags?

 target_flags is generic, not target specific.  I want to limit my
 change to x86 backend and -mgeneral-regs-only doesn't need
 to use target_flags .
>>>
>>> I have thrown together a quick patch that defines target_flags as 
>>> HOST_WIDE_INT.
>>>
>>> (Patch still needs a small correction, so opth-gen.awk will emit
>>> HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
>>> compile functional x86_64-apple-darwin15.5.0 crosscompiler.)
>>
>> And here is attached complete (but untested!!) patch that should "just
>> work"(TM).
>>
>
> -mgeneral-regs-only doesn't need to use target_flags and it shouldn't
> use target_flags.
>

Use target_flags won't hurt -mgeneral-regs-only.   I have no problem
with it.

-- 
H.J.


Re: [PATCH] c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

2016-05-24 Thread Martin Sebor

On 05/24/2016 12:51 PM, Jason Merrill wrote:

On 05/24/2016 12:15 PM, Martin Sebor wrote:

+  else if (TREE_CODE (type) == ARRAY_TYPE /* && TYPE_DOMAIN (type) */)


Why is this commented out rather than removed in this version of the
patch?  Let's remove it, as before.  OK with that change.


It was commented out by accident.

Since c++/71147 is a regression, should I also backport the patch
to the 6.x branch?

Martin



Re: [patch,openacc] use firstprivate pointers for subarrays in c and c++

2016-05-24 Thread Jakub Jelinek
On Tue, May 24, 2016 at 12:16:35PM -0700, Cesar Philippidis wrote:
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -11939,8 +11939,7 @@ c_finish_omp_cancellation_point (location_t loc, tree 
> clauses)
>  
>  static tree
>  handle_omp_array_sections_1 (tree c, tree t, vec &types,
> -  bool &maybe_zero_len, unsigned int &first_non_one,
> -  bool is_omp)
> +  bool &maybe_zero_len, unsigned int &first_non_one)
>  {
>tree ret, low_bound, length, type;
>if (TREE_CODE (t) != TREE_LIST)
> @@ -11949,7 +11948,6 @@ handle_omp_array_sections_1 (tree c, tree t, 
> vec &types,
>   return error_mark_node;
>ret = t;
>if (TREE_CODE (t) == COMPONENT_REF
> -   && is_omp

Sorry, I've missed this one.  The patch is ok if you add on top of the
current patch ort argument to c-typeck.c (handle_omp_array_sections{,_1})
and use here && ort == C_ORT_OMP like in the C++ FE.

Jakub


Re: [patch,openacc] use firstprivate pointers for subarrays in c and c++

2016-05-24 Thread Cesar Philippidis
On 05/23/2016 11:09 PM, Jakub Jelinek wrote:
> On Mon, May 23, 2016 at 07:31:53PM -0700, Cesar Philippidis wrote:
>> @@ -12559,7 +12560,7 @@ c_finish_omp_clauses (tree clauses, enum 
>> c_omp_region_type ort)
>>t = OMP_CLAUSE_DECL (c);
>>if (TREE_CODE (t) == TREE_LIST)
>>  {
>> -  if (handle_omp_array_sections (c, ort & C_ORT_OMP))
>> +  if (handle_omp_array_sections (c, ort & (C_ORT_OMP | C_ORT_ACC)))
>>  {
>>remove = true;
>>break;
> 
> You haven't touched the /c/ handle_omp_array_sections{,_1}.  As I said, I 
> believe
> you can just drop the is_omp argument altogether (unlike C++), or, pass for
> consistency ort itself there as well.  But I bet the argument will be
> unused.

OK, I removed is_omp. I only had to guard one call to
handle_omp_array_sections from c_finish_omp_clauses because OpenACC
doesn't support array reductions.

Is this OK for trunk?

Cesar

2016-05-24  Cesar Philippidis  

	gcc/c
	* c-parser.c (c_parser_oacc_declare): Add support for
	GOMP_MAP_FIRSTPRIVATE_POINTER.
	* c-typeck.c (handle_omp_array_sections_1): Remove is_omp argument.
	(handle_omp_array_sections): Likewise.
	(c_finish_omp_clauses): Add specific errors and warning messages for
	OpenACC.  Use firsrtprivate pointers for OpenACC subarrays.  Update
	calls to handle_omp_array_sections.

	gcc/cp/
	* parser.c (cp_parser_oacc_declare): Add support for
	GOMP_MAP_FIRSTPRIVATE_POINTER.
	* semantics.c (handle_omp_array_sections_1): Replace bool is_omp
	argument with enum c_omp_region_type ort.  Don't privatize OpenACC
	non-static members.
	(handle_omp_array_sections): Replace bool is_omp argument with enum
	c_omp_region_type ort.  Update call to handle_omp_array_sections_1.
	(finish_omp_clauses): Add specific errors and warning messages for
	OpenACC.  Use firsrtprivate pointers for OpenACC subarrays.  Update
	call to handle_omp_array_sections.

	gcc/
	* gimplify.c (omp_notice_variable): Use zero-length arrays for data
	pointers inside OACC_DATA regions.
	(gimplify_scan_omp_clauses): Prune firstprivate clause associated
	with OACC_DATA, OACC_ENTER_DATA and OACC_EXIT data regions.
	(gimplify_adjust_omp_clauses): Fix typo in comment.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-3.c: Likewise.
	* c-c++-common/goacc/kernels-alias-4.c: Likewise.
	* c-c++-common/goacc/kernels-alias-5.c: Likewise.
	* c-c++-common/goacc/kernels-alias-8.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/pcopy.c: Likewise.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/pr70688.c: New test.
	* c-c++-common/goacc/present-1.c: Adjust test.
	* c-c++-common/goacc/reduction-5.c: Likewise.
	* g++.dg/goacc/data-1.C: New test.

	libgomp/
	* oacc-mem.c (acc_malloc): Update handling of shared-memory targets.
	(acc_free): Likewise.
	(acc_memcpy_to_device): Likewise.
	(acc_memcpy_from_device): Likewise.
	(acc_deviceptr): Likewise.
	(acc_hostptr): Likewise.
	(acc_is_present): Likewise.
	(acc_map_data): Likewise.
	(acc_unmap_data): Likewise.
	(present_create_copy): Likewise.
	(delete_copyout): Likewise.
	(update_dev_host): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Remove xfail.
	* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/enter_exit-lib.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: Adjust test so that
	it only runs on nvptx targets.
	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-15.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-16.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-17.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-18.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-20.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-21.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-22.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-23.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-24.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-25.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-28.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-29.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-30.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-34.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-42.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-43.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-44.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-47.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-48.c: Likewise.
	* testsuite/libgomp.oacc-c-c+

Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread H.J. Lu
On Tue, May 24, 2016 at 11:44 AM, Uros Bizjak  wrote:
> On Tue, May 24, 2016 at 8:15 PM, Uros Bizjak  wrote:
>> On Tue, May 24, 2016 at 7:18 PM, H.J. Lu  wrote:
>>
 Oh, target_flags is only a 32bit integer :(. Is there a reason it
 can't be extended to HOST_WIDE_INT, as is the case with
 ix86_isa_flags?
>>>
>>> target_flags is generic, not target specific.  I want to limit my
>>> change to x86 backend and -mgeneral-regs-only doesn't need
>>> to use target_flags .
>>
>> I have thrown together a quick patch that defines target_flags as 
>> HOST_WIDE_INT.
>>
>> (Patch still needs a small correction, so opth-gen.awk will emit
>> HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
>> compile functional x86_64-apple-darwin15.5.0 crosscompiler.)
>
> And here is attached complete (but untested!!) patch that should "just
> work"(TM).
>

-mgeneral-regs-only doesn't need to use target_flags and it shouldn't
use target_flags.


-- 
H.J.


Re: [PATCH] Fix up Yr constraint

2016-05-24 Thread Jakub Jelinek
On Tue, May 24, 2016 at 08:35:12PM +0200, Uros Bizjak wrote:
> On Tue, May 24, 2016 at 6:55 PM, Jakub Jelinek  wrote:
> > Hi!
> >
> > The Yr constraint contrary to what has been said when it has been submitted
> > actually is always NO_REX_SSE_REGS or NO_REGS, never ALL_SSE_REGS, so
> > the RA restriction to only the first 8 regs is done no matter what we tune
> > for.
> >
> > This is because we test X86_TUNE_AVOID_4BYTE_PREFIXES, which is an enum
> > value (59), rather than actually checking if the tune flag.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2016-05-24  Jakub Jelinek  
> >
> > * config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
> > * config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
> > rather than X86_TUNE_AVOID_4BYTE_PREFIXES.
> 
> Uh, another brown-paper bag bug...
> 
> OK everywhere.

I fear it might be too dangerous for -mavx512* for the branches; I went
through all the Yr uses on the trunk, but not on the branches.
Would you be ok with using 
"TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : SSE_REGS) : 
NO_REGS"
on the branches instead?
Or I guess we could use it on the trunk too, it should make no difference there
(because on the trunk it is only used when !TARGET_AVX).
Or maybe even
"TARGET_SSE ? ((TARGET_AVOID_4BYTE_PREFIXES && !TARGET_AVX) ? NO_REX_SSE_REGS : 
SSE_REGS) : NO_REGS"
(again, should make zero difference on the trunk, but might be better for
the branches).

> > --- gcc/config/i386/i386.h.jj   2016-05-24 10:56:02.0 +0200
> > +++ gcc/config/i386/i386.h  2016-05-24 15:13:05.715906018 +0200
> > @@ -465,6 +465,8 @@ extern unsigned char ix86_tune_features[
> > ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
> >  #define TARGET_VECTOR_PARALLEL_EXECUTION \
> > ix86_tune_features[X86_TUNE_VECTOR_PARALLEL_EXECUTION]
> > +#define TARGET_AVOID_4BYTE_PREFIXES \
> > +   ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
> >  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
> > ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
> >  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
> > --- gcc/config/i386/constraints.md.jj   2016-05-12 10:29:41.0 +0200
> > +++ gcc/config/i386/constraints.md  2016-05-24 15:14:21.647914550 +0200
> > @@ -142,7 +142,7 @@ (define_register_constraint "Yf"
> >   "@internal Any x87 register when 80387 FP arithmetic is enabled.")
> >
> >  (define_register_constraint "Yr"
> > - "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
> > ALL_SSE_REGS) : NO_REGS"
> > + "TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
> > ALL_SSE_REGS) : NO_REGS"
> >   "@internal Lower SSE register when avoiding REX prefix and all SSE 
> > registers otherwise.")
> >
> >  (define_register_constraint "Yv"

Jakub


Re: [PATCH] c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

2016-05-24 Thread Jason Merrill

On 05/24/2016 12:15 PM, Martin Sebor wrote:

+  else if (TREE_CODE (type) == ARRAY_TYPE /* && TYPE_DOMAIN (type) */)


Why is this commented out rather than removed in this version of the 
patch?  Let's remove it, as before.  OK with that change.


Jason



Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 8:15 PM, Uros Bizjak  wrote:
> On Tue, May 24, 2016 at 7:18 PM, H.J. Lu  wrote:
>
>>> Oh, target_flags is only a 32bit integer :(. Is there a reason it
>>> can't be extended to HOST_WIDE_INT, as is the case with
>>> ix86_isa_flags?
>>
>> target_flags is generic, not target specific.  I want to limit my
>> change to x86 backend and -mgeneral-regs-only doesn't need
>> to use target_flags .
>
> I have thrown together a quick patch that defines target_flags as 
> HOST_WIDE_INT.
>
> (Patch still needs a small correction, so opth-gen.awk will emit
> HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
> compile functional x86_64-apple-darwin15.5.0 crosscompiler.)

And here is attached complete (but untested!!) patch that should "just
work"(TM).

Uros.
Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 236644)
+++ common/config/i386/i386-common.c(working copy)
@@ -223,6 +223,11 @@
 #define OPTION_MASK_ISA_RDRND_UNSET OPTION_MASK_ISA_RDRND
 #define OPTION_MASK_ISA_F16C_UNSET OPTION_MASK_ISA_F16C
 
+#define OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET \
+  (OPTION_MASK_ISA_MMX_UNSET \
+   | OPTION_MASK_ISA_SSE_UNSET \
+   | OPTION_MASK_ISA_MPX)
+
 /* Implement TARGET_HANDLE_OPTION.  */
 
 bool
@@ -236,6 +241,21 @@
 
   switch (code)
 {
+case OPT_mgeneral_regs_only:
+  if (value)
+   {
+ /* Disable MPX, MMX, SSE and x87 instructions if only the
+general registers are allowed..  */
+ opts->x_ix86_isa_flags
+   &= ~OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+ opts->x_ix86_isa_flags_explicit
+   |= OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+ opts->x_target_flags &= ~MASK_80387;
+   }
+  else
+   gcc_unreachable ();
+  return true;
+
 case OPT_mmmx:
   if (value)
{
Index: common.opt
===
--- common.opt  (revision 236644)
+++ common.opt  (working copy)
@@ -23,7 +23,7 @@
 ; Please try to keep this file in ASCII collating order.
 
 Variable
-int target_flags
+HOST_WIDE_INT target_flags
 
 Variable
 int optimize
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 236645)
+++ config/i386/i386.c  (working copy)
@@ -5337,7 +5337,10 @@
&& !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_PKU))
  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PKU;
 
-   if (!(opts_set->x_target_flags & MASK_80387))
+   /* Don't enable x87 instructions if only the general registers
+  are allowed.  */
+   if (!(opts_set->x_target_flags & MASK_GENERAL_REGS_ONLY)
+   && !(opts_set->x_target_flags & MASK_80387))
  {
if (processor_alias_table[i].flags & PTA_NO_80387)
  opts->x_target_flags &= ~MASK_80387;
Index: config/i386/i386.opt
===
--- config/i386/i386.opt(revision 236644)
+++ config/i386/i386.opt(working copy)
@@ -74,7 +74,7 @@
 
 ;; which flags were passed by the user
 Variable
-int ix86_target_flags_explicit
+HOST_WIDE_INT ix86_target_flags_explicit
 
 ;; which flags were passed by the user
 TargetSave
@@ -897,3 +897,7 @@
 mmitigate-rop
 Target Var(flag_mitigate_rop) Init(0)
 Attempt to avoid generating instruction sequences containing ret bytes.
+
+mgeneral-regs-only
+Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
+Generate code which uses only the general registers.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 236644)
+++ doc/invoke.texi (working copy)
@@ -1173,7 +1173,7 @@
 -msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
 -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
 -malign-data=@var{type} -mstack-protector-guard=@var{guard} @gol
--mmitigate-rop}
+-mmitigate-rop -mgeneral-regs-only}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
@@ -24298,6 +24298,12 @@
 this option is limited in what it can do and should not be relied
 on to provide serious protection.
 
+@item -mgeneral-regs-only
+@opindex mgeneral-regs-only
+Generate code that uses only the general-purpose registers.  This
+prevents the compiler from using floating-point, vector, mask and bound
+registers.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
Index: doc/tm.texi
===
--- doc/tm.texi (revision 236644)
+++ doc/tm.texi (working copy)
@@ -652,7 +652,7 @@
 it yourself.
 @end defmac
 
-@deftypevar {extern int} target_flags
+@deftypevar {extern HOST_WIDE_INT} target_flags
 This variable is declared in @file{options.h}, which is included before
 any target-specific headers.
 @end deft

Re: [PATCH] Fix one more Yr use

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:52 PM, Jakub Jelinek  wrote:
> Hi!
>
> Another case (separate patch because I thought I should add an avx512f
> alternative here, but later on found out it is already handled by
> having the vrndscale* patterns defined before these ones
> and having the same RTL for them (except allowing 0 to 255 instead
> of just 0 to 15).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-24  Jakub Jelinek  
>
> * config/i386/sse.md (_round):
> Limit 1st alternative to noavx isa, split 2nd alternative into one
> noavx and one avx alternative, use *x and Bm in the former and
> x and m in the latter.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
> +++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
> @@ -14986,22 +14996,19 @@ (define_insn "_ptest"
> (set_attr "mode" "")])
>
>  (define_insn "_round"
> -  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x")
> +  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> (unspec:VF_128_256
> - [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm")
> -  (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
> + [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> +  (match_operand:SI 2 "const_0_to_15_operand" "n,n,n")]
>   UNSPEC_ROUND))]
>"TARGET_ROUND"
>"%vround\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "type" "ssecvt")
> -   (set (attr "prefix_data16")
> - (if_then_else
> -   (match_test "TARGET_AVX")
> - (const_string "*")
> - (const_string "1")))
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssecvt")
> +   (set_attr "prefix_data16" "1,1,*")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig,vex")
> (set_attr "mode" "")])
>
>  (define_expand "_round_sfix"
>
> Jakub


Re: [PATCH] Fix Yr constraint uses in various insns

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:50 PM, Jakub Jelinek  wrote:
> Hi!
>
> Similarly to the last patch, this one fixes various misc patterns.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-24  Jakub Jelinek  
>
> * config/i386/sse.md (vec_set_0): Use sse4_noavx isa instead
> of sse4 for the first alternative, drop %v from the template
> and d operand modifier.  Split second alternative into one sse4_noavx
> and one avx alternative, use *x instead of *v in the former and v
> instead of *v in the latter.
> (*sse4_1_extractps): Use noavx isa instead of * for the first
> alternative, drop %v from the template.  Split second alternative into
> one noavx and one avx alternative, use *x instead of *v in the
> former and v instead of *v in the latter.
> (_movntdqa): Guard the first 2 alternatives
> with noavx and the last one with avx.
> (sse4_1_phminposuw): Guard first alternative with noavx isa,
> split the second one into one noavx and one avx alternative,
> use *x and Bm in the former and x and m in the latter one.
> (_ptest): Use noavx instead of * for the first two
> alternatives.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
> +++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
> @@ -6623,18 +6623,19 @@ (define_expand "vec_init"
>  ;; see comment above inline_secondary_memory_needed function in i386.c
>  (define_insn "vec_set_0"
>[(set (match_operand:VI4F_128 0 "nonimmediate_operand"
> - "=Yr,*v,v,Yi,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
> + "=Yr,*x,v,v,Yi,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
> (vec_merge:VI4F_128
>   (vec_duplicate:VI4F_128
> (match_operand: 2 "general_operand"
> - " Yr,*v,m,r ,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
> + " Yr,*x,v,m,r ,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
>   (match_operand:VI4F_128 1 "vector_move_operand"
> - " C , C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
> + " C , C,C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
>   (const_int 1)))]
>"TARGET_SSE"
>"@
> -   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
> -   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
> +   insertps\t{$0xe, %2, %0|%0, %2, 0xe}
> +   insertps\t{$0xe, %2, %0|%0, %2, 0xe}
> +   vinsertps\t{$0xe, %2, %2, %0|%0, %2, %2, 0xe}
> %vmov\t{%2, %0|%0, %2}
> %vmovd\t{%2, %0|%0, %2}
> movss\t{%2, %0|%0, %2}
> @@ -6646,20 +6647,20 @@ (define_insn "vec_set_0"
> #
> #
> #"
> -  [(set_attr "isa" 
> "sse4,sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
> +  [(set_attr "isa" 
> "sse4_noavx,sse4_noavx,avx,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
> (set (attr "type")
> - (cond [(eq_attr "alternative" "0,1,7,8,9")
> + (cond [(eq_attr "alternative" "0,1,2,8,9,10")
>   (const_string "sselog")
> -   (eq_attr "alternative" "11")
> - (const_string "imov")
> (eq_attr "alternative" "12")
> + (const_string "imov")
> +   (eq_attr "alternative" "13")
>   (const_string "fmov")
>]
>(const_string "ssemov")))
> -   (set_attr "prefix_extra" "*,*,*,*,*,*,*,1,1,1,*,*,*")
> -   (set_attr "length_immediate" "*,*,*,*,*,*,*,1,1,1,*,*,*")
> -   (set_attr "prefix" 
> "maybe_vex,maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
> -   (set_attr "mode" "SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
> +   (set_attr "prefix_extra" "*,*,*,*,*,*,*,*,1,1,1,*,*,*")
> +   (set_attr "length_immediate" "*,*,*,*,*,*,*,*,1,1,1,*,*,*")
> +   (set_attr "prefix" 
> "orig,orig,maybe_evex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
> +   (set_attr "mode" "SF,SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
>
>  ;; A subset is vec_setv4sf.
>  (define_insn "*vec_setv4sf_sse4_1"
> @@ -6761,14 +6762,15 @@ (define_insn_and_split "*vec_extractv4sf
>"operands[1] = gen_lowpart (SFmode, operands[1]);")
>
>  (define_insn_and_split "*sse4_1_extractps"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,v,v")
> +  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v")
> (vec_select:SF
> - (match_operand:V4SF 1 "register_operand" "Yr,*v,0,v")
> - (parallel [(match_operand:SI 2 "const_0_to_3_operand" 
> "n,n,n,n")])))]
> + (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
> + (parallel [(match_operand:SI 2 "const_0_to_3_operand" 
> "n,n,n,n,n")])))]
>"TARGET_SSE4_1"
>"@
> -   %vextractps\t{%2, %1, %0|%0, %1, %2}
> -   %vextractps\t{%2, %1, %0|%0, %1, %2}
> +   extractps\t{%2, %1, %0|%0, %1, %2}
> +   extractps\t{%2, %1, %0|%0, %1, %2}
> +   vextractps\t{%2, %1, %0|%0, %1, %2}
> #
> #"
>"&& reload_completed && SSE_REG_P (operands[0])"
> @@ -6793,13 +6795,13 @@ (define_insn_and_split "*sse4_1_extractp
>  }
>DONE;
>  }
> -  [(set_attr "isa"

Re: [PATCH] Fix Yr constraint uses in vpmov* insns

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:49 PM, Jakub Jelinek  wrote:
> Hi!
>
> Looking at the Yr constraint, it seems to me it is really meant to be used
> for noavx, only in that case whether we use xmm0-xmm7 or xmm8+ matters for
> the size of the instruction (number of prefixes).
> In most of the places where Yr is used, we typically have 2 noavx
> alternatives, one with Yr constraint, another one with *x, and then one
> avx alternative with x or v.
>
> But in a couple of spots we do the wrong thing, e.g. use Yr constraint
> always (which (ought to act, see a later patch) acts as first half of x
> for -mtune={silvermont,intel} and otherwise as v, and otherwise
> uses *, which means limiting RA unnecessarily.
>
> The following patch fixes the vpmov* insns.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Uros.

> 2016-05-24  Jakub Jelinek  
>
> * config/i386/sse.md (sse4_1_v8qiv8hi2): Limit
> first two alternatives to noavx, use *x instead of *v in the second
> one, add avx alternative without *.
> (sse4_1_v4qiv4si2, sse4_1_v4hiv4si2,
> sse4_1_v2qiv2di2, sse4_1_v2hiv2di2,
> sse4_1_v2siv2di2): Likewise.
>
> --- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
> +++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
> @@ -14748,19 +14752,20 @@ (define_insn "avx512bw_v32qiv32hi2
> (set_attr "mode" "XI")])
>
>  (define_insn "sse4_1_v8qiv8hi2"
> -  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*v")
> +  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
> (any_extend:V8HI
>   (vec_select:V8QI
> -   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
> +   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> (parallel [(const_int 0) (const_int 1)
>(const_int 2) (const_int 3)
>(const_int 4) (const_int 5)
>(const_int 6) (const_int 7)]]
>"TARGET_SSE4_1 &&  && "
>"%vpmovbw\t{%1, %0|%0, %q1}"
> -  [(set_attr "type" "ssemov")
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssemov")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig,maybe_evex")
> (set_attr "mode" "TI")])
>
>  (define_insn "avx512f_v16qiv16si2"
> @@ -14790,17 +14795,18 @@ (define_insn "avx2_v8qiv8si2 (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_v4qiv4si2"
> -  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
> (any_extend:V4SI
>   (vec_select:V4QI
> -   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
> +   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> (parallel [(const_int 0) (const_int 1)
>(const_int 2) (const_int 3)]]
>"TARGET_SSE4_1 && "
>"%vpmovbd\t{%1, %0|%0, %k1}"
> -  [(set_attr "type" "ssemov")
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssemov")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig,maybe_evex")
> (set_attr "mode" "TI")])
>
>  (define_insn "avx512f_v16hiv16si2"
> @@ -14825,17 +14831,18 @@ (define_insn "avx2_v8hiv8si2 (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_v4hiv4si2"
> -  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
> (any_extend:V4SI
>   (vec_select:V4HI
> -   (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*vm")
> +   (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> (parallel [(const_int 0) (const_int 1)
>(const_int 2) (const_int 3)]]
>"TARGET_SSE4_1 && "
>"%vpmovwd\t{%1, %0|%0, %q1}"
> -  [(set_attr "type" "ssemov")
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssemov")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig,maybe_evex")
> (set_attr "mode" "TI")])
>
>  (define_insn "avx512f_v8qiv8di2"
> @@ -14868,16 +14875,17 @@ (define_insn "avx2_v4qiv4di2 (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_v2qiv2di2"
> -  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
> +  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
> (any_extend:V2DI
>   (vec_select:V2QI
> -   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
> +   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> (parallel [(const_int 0) (const_int 1)]]
>"TARGET_SSE4_1 && "
>"%vpmovbq\t{%1, %0|%0, %w1}"
> -  [(set_attr "type" "ssemov")
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssemov")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig

Re: [PATCH] Fix up Yr constraint

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:55 PM, Jakub Jelinek  wrote:
> Hi!
>
> The Yr constraint contrary to what has been said when it has been submitted
> actually is always NO_REX_SSE_REGS or NO_REGS, never ALL_SSE_REGS, so
> the RA restriction to only the first 8 regs is done no matter what we tune
> for.
>
> This is because we test X86_TUNE_AVOID_4BYTE_PREFIXES, which is an enum
> value (59), rather than actually checking if the tune flag.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-24  Jakub Jelinek  
>
> * config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
> * config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
> rather than X86_TUNE_AVOID_4BYTE_PREFIXES.

Uh, another brown-paper bag bug...

OK everywhere.

Thanks,
Uros.

> --- gcc/config/i386/i386.h.jj   2016-05-24 10:56:02.0 +0200
> +++ gcc/config/i386/i386.h  2016-05-24 15:13:05.715906018 +0200
> @@ -465,6 +465,8 @@ extern unsigned char ix86_tune_features[
> ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
>  #define TARGET_VECTOR_PARALLEL_EXECUTION \
> ix86_tune_features[X86_TUNE_VECTOR_PARALLEL_EXECUTION]
> +#define TARGET_AVOID_4BYTE_PREFIXES \
> +   ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
>  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
> ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
>  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
> --- gcc/config/i386/constraints.md.jj   2016-05-12 10:29:41.0 +0200
> +++ gcc/config/i386/constraints.md  2016-05-24 15:14:21.647914550 +0200
> @@ -142,7 +142,7 @@ (define_register_constraint "Yf"
>   "@internal Any x87 register when 80387 FP arithmetic is enabled.")
>
>  (define_register_constraint "Yr"
> - "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
> ALL_SSE_REGS) : NO_REGS"
> + "TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
> ALL_SSE_REGS) : NO_REGS"
>   "@internal Lower SSE register when avoiding REX prefix and all SSE 
> registers otherwise.")
>
>  (define_register_constraint "Yv"
>
> Jakub


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 7:18 PM, H.J. Lu  wrote:

>> Oh, target_flags is only a 32bit integer :(. Is there a reason it
>> can't be extended to HOST_WIDE_INT, as is the case with
>> ix86_isa_flags?
>
> target_flags is generic, not target specific.  I want to limit my
> change to x86 backend and -mgeneral-regs-only doesn't need
> to use target_flags .

I have thrown together a quick patch that defines target_flags as HOST_WIDE_INT.

(Patch still needs a small correction, so opth-gen.awk will emit
HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
compile functional x86_64-apple-darwin15.5.0 crosscompiler.)

Uros.
Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 236644)
+++ common/config/i386/i386-common.c(working copy)
@@ -223,6 +223,11 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_RDRND_UNSET OPTION_MASK_ISA_RDRND
 #define OPTION_MASK_ISA_F16C_UNSET OPTION_MASK_ISA_F16C
 
+#define OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET \
+  (OPTION_MASK_ISA_MMX_UNSET \
+   | OPTION_MASK_ISA_SSE_UNSET \
+   | OPTION_MASK_ISA_MPX)
+
 /* Implement TARGET_HANDLE_OPTION.  */
 
 bool
@@ -236,6 +241,21 @@ ix86_handle_option (struct gcc_options *opts,
 
   switch (code)
 {
+case OPT_mgeneral_regs_only:
+  if (value)
+   {
+ /* Disable MPX, MMX, SSE and x87 instructions if only the
+general registers are allowed..  */
+ opts->x_ix86_isa_flags
+   &= ~OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+ opts->x_ix86_isa_flags_explicit
+   |= OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+ opts->x_target_flags &= ~MASK_80387;
+   }
+  else
+   gcc_unreachable ();
+  return true;
+
 case OPT_mmmx:
   if (value)
{
Index: common.opt
===
--- common.opt  (revision 236644)
+++ common.opt  (working copy)
@@ -23,7 +23,7 @@
 ; Please try to keep this file in ASCII collating order.
 
 Variable
-int target_flags
+HOST_WIDE_INT target_flags
 
 Variable
 int optimize
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 236645)
+++ config/i386/i386.c  (working copy)
@@ -5337,7 +5337,10 @@ ix86_option_override_internal (bool main_args_p,
&& !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_PKU))
  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PKU;
 
-   if (!(opts_set->x_target_flags & MASK_80387))
+   /* Don't enable x87 instructions if only the general registers
+  are allowed.  */
+   if (!(opts_set->x_target_flags & MASK_GENERAL_REGS_ONLY)
+   && !(opts_set->x_target_flags & MASK_80387))
  {
if (processor_alias_table[i].flags & PTA_NO_80387)
  opts->x_target_flags &= ~MASK_80387;
Index: config/i386/i386.opt
===
--- config/i386/i386.opt(revision 236644)
+++ config/i386/i386.opt(working copy)
@@ -74,7 +74,7 @@ HOST_WIDE_INT x_ix86_isa_flags_explicit
 
 ;; which flags were passed by the user
 Variable
-int ix86_target_flags_explicit
+HOST_WIDE_INT ix86_target_flags_explicit
 
 ;; which flags were passed by the user
 TargetSave
@@ -897,3 +897,7 @@ Enum(stack_protector_guard) String(global) Value(S
 mmitigate-rop
 Target Var(flag_mitigate_rop) Init(0)
 Attempt to avoid generating instruction sequences containing ret bytes.
+
+mgeneral-regs-only
+Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
+Generate code which uses only the general registers.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 236644)
+++ doc/invoke.texi (working copy)
@@ -1173,7 +1173,7 @@ See RS/6000 and PowerPC Options.
 -msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
 -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
 -malign-data=@var{type} -mstack-protector-guard=@var{guard} @gol
--mmitigate-rop}
+-mmitigate-rop -mgeneral-regs-only}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
@@ -24298,6 +24298,12 @@ opcodes, to mitigate against certain forms of atta
 this option is limited in what it can do and should not be relied
 on to provide serious protection.
 
+@item -mgeneral-regs-only
+@opindex mgeneral-regs-only
+Generate code that uses only the general-purpose registers.  This
+prevents the compiler from using floating-point, vector, mask and bound
+registers.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
Index: doc/tm.texi
===
--- doc/tm.texi (revision 236644)
+++ doc/tm.texi (working copy)
@@ -652,7 +652,7 @@ macro to define @code{__ELF__}, so you probably do
 it yourself.
 @e

[gomp4.5] Linear clause modifiers

2016-05-24 Thread Jakub Jelinek
Hi!

This patch adds parsing/resolving/translation of linear clause
modifiers, adds support for linear-step that is a uniform dummy argument
and tweaks a couple of further linear clause related things.

Tested on x86_64-linux, committed to gomp-4_5-branch.

2016-05-24  Jakub Jelinek  

* gfortran.h (enum gfc_omp_linear_op): New.
(struct gfc_omp_namelist): Add u.linear_op field.
* openmp.c (gfc_match_omp_clauses): Add support for parsing
linear clause modifiers.
(resolve_omp_clauses): Diagnose linear clause modifiers when not
in declare simd.  Only check for integer type if ref modifier is not
used.  Remove diagnostics for required VALUE attribute.  Diagnose
VALUE attribute with ref or uval modifiers.  Allow non-constant
linear-step, if it is a dummy argument alone and is mentioned in
uniform clause.
* dump-parse-tree.c (show_omp_namelist): Print linear clause
modifiers.
* trans-openmp.c (gfc_trans_omp_clauses): Test declare_simd
instead of block == NULL_TREE.  Translate linear clause modifiers
and clause with uniform dummy argument linear-step.

* gfortran.dg/gomp/declare-simd-2.f90: New test.
* gfortran.dg/gomp/linear-1.f90: New test.

--- gcc/fortran/gfortran.h.jj   2016-05-13 12:37:21.0 +0200
+++ gcc/fortran/gfortran.h  2016-05-23 17:20:09.508803607 +0200
@@ -1134,6 +1134,14 @@ enum gfc_omp_map_op
   OMP_MAP_ALWAYS_TOFROM
 };
 
+enum gfc_omp_linear_op
+{
+  OMP_LINEAR_DEFAULT,
+  OMP_LINEAR_REF,
+  OMP_LINEAR_VAL,
+  OMP_LINEAR_UVAL
+};
+
 /* For use in OpenMP clauses in case we need extra information
(aligned clause alignment, linear clause step, etc.).  */
 
@@ -1146,6 +1154,7 @@ typedef struct gfc_omp_namelist
   gfc_omp_reduction_op reduction_op;
   gfc_omp_depend_op depend_op;
   gfc_omp_map_op map_op;
+  gfc_omp_linear_op linear_op;
 } u;
   struct gfc_omp_namelist_udr *udr;
   struct gfc_omp_namelist *next;
--- gcc/fortran/openmp.c.jj 2016-05-16 17:56:25.0 +0200
+++ gcc/fortran/openmp.c2016-05-24 17:40:34.636152910 +0200
@@ -1092,13 +1092,50 @@ gfc_match_omp_clauses (gfc_omp_clauses *
  end_colon = false;
  head = NULL;
  if ((mask & OMP_CLAUSE_LINEAR)
- && gfc_match_omp_variable_list ("linear (",
- &c->lists[OMP_LIST_LINEAR],
- false, &end_colon,
- &head) == MATCH_YES)
+ && gfc_match ("linear (") == MATCH_YES)
{
+ gfc_omp_linear_op linear_op = OMP_LINEAR_DEFAULT;
  gfc_expr *step = NULL;
 
+ if (gfc_match_omp_variable_list (" ref (",
+  &c->lists[OMP_LIST_LINEAR],
+  false, NULL, &head)
+ == MATCH_YES)
+   linear_op = OMP_LINEAR_REF;
+ else if (gfc_match_omp_variable_list (" val (",
+   &c->lists[OMP_LIST_LINEAR],
+   false, NULL, &head)
+ == MATCH_YES)
+   linear_op = OMP_LINEAR_VAL;
+ else if (gfc_match_omp_variable_list (" uval (",
+   &c->lists[OMP_LIST_LINEAR],
+   false, NULL, &head)
+ == MATCH_YES)
+   linear_op = OMP_LINEAR_UVAL;
+ else if (gfc_match_omp_variable_list ("",
+   &c->lists[OMP_LIST_LINEAR],
+   false, &end_colon, &head)
+ == MATCH_YES)
+   linear_op = OMP_LINEAR_DEFAULT;
+ else
+   {
+ gfc_free_omp_namelist (*head);
+ gfc_current_locus = old_loc;
+ *head = NULL;
+ break;
+   }
+ if (linear_op != OMP_LINEAR_DEFAULT)
+   {
+ if (gfc_match (" :") == MATCH_YES)
+   end_colon = true;
+ else if (gfc_match (" )") != MATCH_YES)
+   {
+ gfc_free_omp_namelist (*head);
+ gfc_current_locus = old_loc;
+ *head = NULL;
+ break;
+   }
+   }
  if (end_colon && gfc_match (" %e )", &step) != MATCH_YES)
{
  gfc_free_omp_namelist (*head);
@@ -1114,6 +1151,9 @@ gfc_match_omp_clauses (gfc_omp_clauses *
  mpz_set_si (step->value.integer, 1);
}
  (*head)->expr = step;
+ if (linear_op != OMP_LINEAR_DEFAULT)
+   for (gfc_omp_namelist *n = *head; n; n = n->next)
+

Re: [PATCH #3], Add PowerPC ISA 3.0 vpermr/xxpermr support

2016-05-24 Thread Kelvin Nilsen

I have committed gcc.target/powerpc/p9-vpermr.c to trunk (separately
from the other files mentioned in this ChangeLog), revision 236655.
Approved offline.

On 05/23/2016 05:16 PM, Segher Boessenkool wrote:
> On Mon, May 23, 2016 at 06:22:22PM -0400, Michael Meissner wrote:
>> Here are the patches for xxpermr/vpermr support that are broken out from 
>> fixing
>> the xxperm fusion bug.  I have built a compiler with these patches (and the
>> xxperm patches) and it bootstraps and does not cause a regression.  Are they 
>> ok
>> to add to GCC 7 and eventually to GCC 6.2?
>>
>> [gcc]
>> 2016-05-23  Michael Meissner  
>>  Kelvin Nilsen  
>>
>>  * config/rs6000/rs6000.c (rs6000_expand_vector_set): Generate
>>  vpermr/xxpermr on ISA 3.0.
>>  (altivec_expand_vec_perm_le): Likewise.
>>  * config/rs6000/altivec.md (UNSPEC_VPERMR): New unspec.
>>  (altivec_vpermr__internal): Add VPERMR/XXPERMR support for
>>  ISA 3.0.
>>
>> [gcc/testsuite]
>> 2016-05-23  Michael Meissner  
>>  Kelvin Nilsen  
>>
>>  * gcc.target/powerpc/p9-vpermr.c: New test for ISA 3.0 vpermr
>>  support.
> 
> Okay for trunk.  Okay for 6 after a week or so.
> 
> Thanks,
> 
> 
> Segher
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH 2/2][GCC] Add one more pattern to RTL if-conversion

2016-05-24 Thread Mikhail Maltsev
On 05/23/2016 05:15 PM, Kyrill Tkachov wrote:
> 
> expand_simple_binop may fail. I think you should add a check that diff_rtx is
> non-NULL
> and bail out early if it is.
> 
Fixed.

-- 
Regards,
Mikhail Maltsev
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a9c146b..e1473eb 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1260,6 +1260,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   {
 ST_ADD_FLAG,
 ST_SHIFT_FLAG,
+ST_SHIFT_ADD_FLAG,
 ST_IOR_FLAG
   };
 
@@ -1384,6 +1385,12 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  normalize = -1;
 	  reversep = true;
 	}
+  else if (exact_log2 (abs_hwi (diff)) >= 0
+	   && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
+	{
+	  strategy = ST_SHIFT_ADD_FLAG;
+	  normalize = 1;
+	}
   else
 	return FALSE;
 
@@ -1453,6 +1460,24 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	gen_int_mode (ifalse, mode), if_info->x,
 	0, OPTAB_WIDEN);
 	  break;
+	case ST_SHIFT_ADD_FLAG:
+	  {
+	/* if (test) x = 5; else x = 1;
+	   =>   x = (test != 0) << 2 + 1;  */
+	HOST_WIDE_INT diff_log = exact_log2 (abs_hwi (diff));
+	rtx diff_rtx
+	  = expand_simple_binop (mode, ASHIFT, target, GEN_INT (diff_log),
+ if_info->x, 0, OPTAB_WIDEN);
+	if (!diff_rtx)
+	  {
+		end_sequence ();
+		return false;
+	  }
+	target = expand_simple_binop (mode, (diff < 0) ? MINUS : PLUS,
+	  gen_int_mode (ifalse, mode), diff_rtx,
+	  if_info->x, 0, OPTAB_WIDEN);
+	break;
+	  }
 	}
 
   if (! target)
diff --git a/gcc/testsuite/gcc.dg/ifcvt-6.c b/gcc/testsuite/gcc.dg/ifcvt-6.c
new file mode 100644
index 000..c2cfb17
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+int
+test1 (int a)
+{
+  return a % 2 != 0 ? 7 : 3;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
+/* { dg-final { scan-assembler-not "sbbl" } } */


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread H.J. Lu
On Tue, May 24, 2016 at 9:53 AM, Uros Bizjak  wrote:
> On Tue, May 24, 2016 at 6:22 PM, H.J. Lu  wrote:
>> On Tue, May 24, 2016 at 8:52 AM, Uros Bizjak  wrote:
>>> On Tue, May 24, 2016 at 5:40 PM, H.J. Lu  wrote:
>>>
> No, this is a flag, not a variable. Let's figure out how to extend
> target flags to more than 63 flags first.

 Extending target flags to more than 63 bits requires replacing
 HOST_WIDE_INT with a bit vector.  Since target flags is used in
 TARGET_SUBTARGET_DEFAULT, change it to a bit vector is a
 non-trivial change.  On the other hand, -mgeneral-regs-only is a
 command-line option which doesn't require support for
 TARGET_SUBTARGET_DEFAULT, similar to other -m options like
 -mmitigate-rop.  Using flag_general_regs_only is an option.
>>>
>>> I have been informed that Intel people are looking into how to extend
>>> target flags to accommodate additional ISA flags. There is no point to
>>> hurry with an unoptimal solution. Perhaps you can coordinate your
>>> patch with their efforts?
>>
>> iISA flags use x86_isa_flags, not target_flags.  -mgeneral-regs-only
>> shouldn't use x86_isa_flags.  It was my oversight to use target_flags
>> with -mgeneral-regs-only to begin with.   I don't think using
>> flag_general_regs_only is not an optimal solution, which I should have
>> used in the first place.  The x86 change for interrupt handler depends
>> on -mgeneral-regs-only.
>
> Oh, target_flags is only a 32bit integer :(. Is there a reason it
> can't be extended to HOST_WIDE_INT, as is the case with
> ix86_isa_flags?

target_flags is generic, not target specific.  I want to limit my
change to x86 backend and -mgeneral-regs-only doesn't need
to use target_flags .

-- 
H.J.


Re: [PATCH] Fix PR70434, change FE IL for vector indexing

2016-05-24 Thread Richard Biener
On May 24, 2016 6:17:19 PM GMT+02:00, Jakub Jelinek  wrote:
>On Mon, May 23, 2016 at 04:22:57PM +0200, Richard Biener wrote:
>> *** /dev/null1970-01-01 00:00:00.0 +
>> --- gcc/testsuite/c-c++-common/vector-subscript-5.c  2016-05-23
>16:17:41.148043066 +0200
>> ***
>> *** 0 
>> --- 1,13 
>> + /* { dg-do compile } */
>> + 
>> + typedef int U __attribute__ ((vector_size (16)));
>> + 
>> + int
>> + foo (int i)
>> + {
>> +   register U u
>> + #if __SSE2__
>> +   asm ("xmm0");
>> + #endif
>> +   return u[i];
>> + }
>
>This test fails on i?86 (and supposedly on all non-x86 arches too).

Oops, sorry.  And thanks for the fix.

Richard.

>I've tested following fix and committed as obvious to trunk:
>
>2016-05-24  Jakub Jelinek  
>
>   PR middle-end/70434
>   PR c/69504
>   * c-c++-common/vector-subscript-5.c (foo): Move ; out of the ifdef.
>
>--- gcc/testsuite/c-c++-common/vector-subscript-5.c.jj 2016-05-24
>10:56:00.0 +0200
>+++ gcc/testsuite/c-c++-common/vector-subscript-5.c2016-05-24
>18:11:51.778520055 +0200
>@@ -7,7 +7,8 @@ foo (int i)
> {
>   register U u
> #if __SSE2__
>-  asm ("xmm0");
>+  asm ("xmm0")
> #endif
>+  ;
>   return u[i];
> }
>
>   Jakub




Re: [PATCH, ARM] Do not set ARM_ARCH_ISA_THUMB for armv5

2016-05-24 Thread Kyrill Tkachov

Hi Thomas,

On 10/05/16 14:26, Thomas Preudhomme wrote:

Hi,

ARM_ARCH_ISA_THUMB is currently set to 1 when compiling for armv5 despite
armv5 not supporting Thumb instructions (armv5t does):

arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1

The reason is TARGET_ARM_ARCH_ISA_THUMB being set to 1 if target does not
support Thumb-2 and is ARMv4T, ARMv5 or later. This patch replaces that logic
by checking whether the given architecture has the right feature bit
(FL_THUMB).

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-05-06  Thomas Preud'homme  

 * config/arm/arm-protos.h (arm_arch_thumb): Declare.
 * config/arm/arm.c (arm_arch_thumb): Define.
 (arm_option_override): Initialize arm_arch_thumb.
 * config/arm/arm.h (TARGET_ARM_ARCH_ISA_THUMB): Use arm_arch_thumb to
 determine if target support Thumb-1 ISA.


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index
d8179c441bb53dced94d2ebf497aad093e4ac600..4d11c91133ff1b875afcbf58abc4491c2c93768e
100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -603,6 +603,9 @@ extern int arm_tune_cortex_a9;
 interworking clean.  */
  extern int arm_cpp_interwork;
  
+/* Nonzero if chip supports Thumb.  */

+extern int arm_arch_thumb;
+


Bit of bikeshedding really, but I think a better name would be
arm_arch_thumb1.
This is because we also have the macros TARGET_THUMB and TARGET_THUMB2
where TARGET_THUMB2 means either Thumb-1 or Thumb-2 and a casual reader
might think that arm_arch_thumb means that there is support for either.

Also, please add a simple test that compiles something with -march=armv5 (plus 
-marm)
and checks that __ARM_ARCH_ISA_THUMB is not defined.

Ok with that change and the test.

Thanks,
Kyrill

P.S. I think your mailer sometimes mangles long lines in the patches
(for example the git hash headers). Can you please send your patches as
attachments? That will also make it easier for me to extract and apply
them to my tree without having to manually select the inlined patch
from the message.


  /* Nonzero if chip supports Thumb 2.  */
  extern int arm_arch_thumb2;
  
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

index
ad123dde991a3e4c4b9563ee6ebb84981767988f..f64e8caa8bc08b7aff9fe385567de9936a964004
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2191,9 +2191,8 @@ extern int making_const_table;
  #define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
  
  /* The highest Thumb instruction set version supported by the chip.  */

-#define TARGET_ARM_ARCH_ISA_THUMB  \
-  (arm_arch_thumb2 ? 2 \
-  : ((TARGET_ARM_ARCH >= 5 || arm_arch4t) ? 1 : 0))
+#define TARGET_ARM_ARCH_ISA_THUMB  \
+  (arm_arch_thumb2 ? 2 : (arm_arch_thumb ? 1 : 0))
  
  /* Expands to an upper-case char of the target's architectural

 profile.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
71b51439dc7ba5be67671e9fb4c3f18040cce58f..de1c2d4600529518a92ed44815cff05308baa31c
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -852,6 +852,9 @@ int arm_tune_cortex_a9 = 0;
 interworking clean.  */
  int arm_cpp_interwork = 0;
  
+/* Nonzero if chip supports Thumb.  */

+int arm_arch_thumb;
+
  /* Nonzero if chip supports Thumb 2.  */
  int arm_arch_thumb2;
  
@@ -3170,6 +3173,7 @@ arm_option_override (void)

arm_arch7em = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH7EM);
arm_arch8 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH8);
arm_arch8_1 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_1);
+  arm_arch_thumb = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
arm_arch_thumb2 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB2);
arm_arch_xscale = ARM_FSET_HAS_CPU1 (insn_flags, FL_XSCALE);
  



Before patch:

% arm-none-eabi-gcc -dM -march=armv4 -E - < /dev/null | grep ISA_THUMB
cc1: warning: target CPU does not support THUMB instructions
% arm-none-eabi-gcc -dM -march=armv4t -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1
% arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
cc1: warning: target CPU does not support THUMB instructions
#define __ARM_ARCH_ISA_THUMB 1
% arm-none-eabi-gcc -dM -march=armv5t -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1

After patch:

% arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
cc1: warning: target CPU does not support THUMB instructions
% arm-none-eabi-gcc -dM -march=armv5t -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1
% arm-none-eabi-gcc -dM -march=armv4 -E - < /dev/null | grep ISA_THUMB
cc1: warning: target CPU does not support THUMB instructions
% arm-none-eabi-gcc -dM -march=armv4t -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1





More backwards/FSM jump thread refactoring and extension

2016-05-24 Thread Jeff Law
Here's the next patch which does a bit more refactoring in the backwards 
jump threader and extends the backwards jump threader to handle simple 
copies and constant initializations.


The extension isn't all that useful right now -- while it does fire 
often during bootstraps, its doing so for cases that would be caught 
slightly later (within the same pass).  As a result there's no changes 
in the testsuite.


The extension becomes useful in an upcoming patch where the backwards 
threader is disentangled from DOM/VRP entirely.  In that mode the 
threader can't depend on cprop to have eliminated the copies and 
propagated as many constants as possible into PHI arguments.


Bootstrapped and regression tested on x86_64 linux.  Installing on the 
trunk.


Jeff



commit 913a4b1f209105a774789311094e90986db322fb
Author: Jeff Law 
Date:   Tue May 24 11:56:50 2016 -0400

* tree-ssa-threadbackwards.c (convert_and_register_jump_thread_path):
New function, extracted from...
(fsm_find_control_statement_thread_paths): Here.  Use the new function.
Allow simple copies and constant initializations in the SSA chain.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2b20cc8..9442109 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-05-24  Jeff Law  
+
+   * tree-ssa-threadbackwards.c (convert_and_register_jump_thread_path):
+   New function, extracted from...
+   (fsm_find_control_statement_thread_paths): Here.  Use the new function.
+   Allow simple copies and constant initializations in the SSA chain.
+
 2016-05-24  Marek Polacek  
 
PR c/71249
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 73ab4ea..4d0fd9c 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -356,6 +356,44 @@ profitable_jump_thread_path (vec 
*&path,
   return taken_edge;
 }
 
+/* PATH is vector of blocks forming a jump threading path in reverse
+   order.  TAKEN_EDGE is the edge taken from path[0].
+
+   Convert that path into the form used by register_jump_thread and
+   register the path.   */
+
+static void
+convert_and_register_jump_thread_path (vec *&path,
+  edge taken_edge)
+{
+  vec *jump_thread_path = new vec ();
+
+  /* Record the edges between the blocks in PATH.  */
+  for (unsigned int j = 0; j < path->length () - 1; j++)
+{
+  basic_block bb1 = (*path)[path->length () - j - 1];
+  basic_block bb2 = (*path)[path->length () - j - 2];
+  if (bb1 == bb2)
+   continue;
+
+  edge e = find_edge (bb1, bb2);
+  gcc_assert (e);
+  jump_thread_edge *x = new jump_thread_edge (e, EDGE_FSM_THREAD);
+  jump_thread_path->safe_push (x);
+}
+
+  /* Add the edge taken when the control variable has value ARG.  */
+  jump_thread_edge *x
+= new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK);
+  jump_thread_path->safe_push (x);
+
+  register_jump_thread (jump_thread_path);
+  --max_threaded_paths;
+
+  /* Remove BBI from the path.  */
+  path->pop ();
+}
+
 /* We trace the value of the SSA_NAME NAME back through any phi nodes looking
for places where it gets a constant value and save the path.  Stop after
having recorded MAX_PATHS jump threading paths.  */
@@ -377,24 +415,30 @@ fsm_find_control_statement_thread_paths (tree name,
   if (var_bb == NULL)
 return;
 
-  /* For the moment we assume that an SSA chain only contains phi nodes, and
- eventually one of the phi arguments will be an integer constant.  In the
- future, this could be extended to also handle simple assignments of
- arithmetic operations.  */
+  /* We allow the SSA chain to contains PHIs and simple copies and constant
+ initializations.  */
   if (gimple_code (def_stmt) != GIMPLE_PHI
-  || (gimple_phi_num_args (def_stmt)
+  && gimple_code (def_stmt) != GIMPLE_ASSIGN)
+return;
+
+  if (gimple_code (def_stmt) == GIMPLE_PHI
+  && (gimple_phi_num_args (def_stmt)
  >= (unsigned) PARAM_VALUE (PARAM_FSM_MAXIMUM_PHI_ARGUMENTS)))
 return;
 
+  if (gimple_code (def_stmt) == GIMPLE_ASSIGN
+  && gimple_assign_rhs_code (def_stmt) != INTEGER_CST
+  && gimple_assign_rhs_code (def_stmt) != SSA_NAME)
+return;
+
   /* Avoid infinite recursion.  */
   if (visited_bbs->add (var_bb))
 return;
 
-  gphi *phi = as_a  (def_stmt);
   int next_path_length = 0;
   basic_block last_bb_in_path = path->last ();
 
-  if (loop_containing_stmt (phi)->header == gimple_bb (phi))
+  if (loop_containing_stmt (def_stmt)->header == gimple_bb (def_stmt))
 {
   /* Do not walk through more than one loop PHI node.  */
   if (seen_loop_phi)
@@ -469,9 +513,9 @@ fsm_find_control_statement_thread_paths (tree name,
 
   /* Iterate over the arguments of PHI.  */
   unsigned int i;
-  if (gimple_phi_num_args (phi)
-  < (unsigned) PARAM_VALUE (PARAM_FSM_MAXIMUM_PHI_ARGUMENTS))
+  if (gimple_code (def_stmt) == GIMPLE_PHI)
 {
+  

[PATCH] Fix up Yr constraint

2016-05-24 Thread Jakub Jelinek
Hi!

The Yr constraint contrary to what has been said when it has been submitted
actually is always NO_REX_SSE_REGS or NO_REGS, never ALL_SSE_REGS, so
the RA restriction to only the first 8 regs is done no matter what we tune
for.

This is because we test X86_TUNE_AVOID_4BYTE_PREFIXES, which is an enum
value (59), rather than actually checking if the tune flag.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-24  Jakub Jelinek  

* config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
* config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
rather than X86_TUNE_AVOID_4BYTE_PREFIXES.

--- gcc/config/i386/i386.h.jj   2016-05-24 10:56:02.0 +0200
+++ gcc/config/i386/i386.h  2016-05-24 15:13:05.715906018 +0200
@@ -465,6 +465,8 @@ extern unsigned char ix86_tune_features[
ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
 #define TARGET_VECTOR_PARALLEL_EXECUTION \
ix86_tune_features[X86_TUNE_VECTOR_PARALLEL_EXECUTION]
+#define TARGET_AVOID_4BYTE_PREFIXES \
+   ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
 #define TARGET_FUSE_CMP_AND_BRANCH_32 \
ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
 #define TARGET_FUSE_CMP_AND_BRANCH_64 \
--- gcc/config/i386/constraints.md.jj   2016-05-12 10:29:41.0 +0200
+++ gcc/config/i386/constraints.md  2016-05-24 15:14:21.647914550 +0200
@@ -142,7 +142,7 @@ (define_register_constraint "Yf"
  "@internal Any x87 register when 80387 FP arithmetic is enabled.")
 
 (define_register_constraint "Yr"
- "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
ALL_SSE_REGS) : NO_REGS"
+ "TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : ALL_SSE_REGS) 
: NO_REGS"
  "@internal Lower SSE register when avoiding REX prefix and all SSE registers 
otherwise.")
 
 (define_register_constraint "Yv"

Jakub


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:22 PM, H.J. Lu  wrote:
> On Tue, May 24, 2016 at 8:52 AM, Uros Bizjak  wrote:
>> On Tue, May 24, 2016 at 5:40 PM, H.J. Lu  wrote:
>>
 No, this is a flag, not a variable. Let's figure out how to extend
 target flags to more than 63 flags first.
>>>
>>> Extending target flags to more than 63 bits requires replacing
>>> HOST_WIDE_INT with a bit vector.  Since target flags is used in
>>> TARGET_SUBTARGET_DEFAULT, change it to a bit vector is a
>>> non-trivial change.  On the other hand, -mgeneral-regs-only is a
>>> command-line option which doesn't require support for
>>> TARGET_SUBTARGET_DEFAULT, similar to other -m options like
>>> -mmitigate-rop.  Using flag_general_regs_only is an option.
>>
>> I have been informed that Intel people are looking into how to extend
>> target flags to accommodate additional ISA flags. There is no point to
>> hurry with an unoptimal solution. Perhaps you can coordinate your
>> patch with their efforts?
>
> iISA flags use x86_isa_flags, not target_flags.  -mgeneral-regs-only
> shouldn't use x86_isa_flags.  It was my oversight to use target_flags
> with -mgeneral-regs-only to begin with.   I don't think using
> flag_general_regs_only is not an optimal solution, which I should have
> used in the first place.  The x86 change for interrupt handler depends
> on -mgeneral-regs-only.

Oh, target_flags is only a 32bit integer :(. Is there a reason it
can't be extended to HOST_WIDE_INT, as is the case with
ix86_isa_flags?

Uros.


[PATCH] Fix one more Yr use

2016-05-24 Thread Jakub Jelinek
Hi!

Another case (separate patch because I thought I should add an avx512f
alternative here, but later on found out it is already handled by
having the vrndscale* patterns defined before these ones
and having the same RTL for them (except allowing 0 to 255 instead
of just 0 to 15).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-24  Jakub Jelinek  

* config/i386/sse.md (_round):
Limit 1st alternative to noavx isa, split 2nd alternative into one
noavx and one avx alternative, use *x and Bm in the former and
x and m in the latter.

--- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
+++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
@@ -14986,22 +14996,19 @@ (define_insn "_ptest"
(set_attr "mode" "")])
 
 (define_insn "_round"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
(unspec:VF_128_256
- [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm")
-  (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
+ [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
+  (match_operand:SI 2 "const_0_to_15_operand" "n,n,n")]
  UNSPEC_ROUND))]
   "TARGET_ROUND"
   "%vround\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "ssecvt")
-   (set (attr "prefix_data16")
- (if_then_else
-   (match_test "TARGET_AVX")
- (const_string "*")
- (const_string "1")))
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "prefix_data16" "1,1,*")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex")
+   (set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "")])
 
 (define_expand "_round_sfix"

Jakub


[PATCH] Fix Yr constraint uses in various insns

2016-05-24 Thread Jakub Jelinek
Hi!

Similarly to the last patch, this one fixes various misc patterns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-24  Jakub Jelinek  

* config/i386/sse.md (vec_set_0): Use sse4_noavx isa instead
of sse4 for the first alternative, drop %v from the template
and d operand modifier.  Split second alternative into one sse4_noavx
and one avx alternative, use *x instead of *v in the former and v
instead of *v in the latter.
(*sse4_1_extractps): Use noavx isa instead of * for the first
alternative, drop %v from the template.  Split second alternative into
one noavx and one avx alternative, use *x instead of *v in the
former and v instead of *v in the latter.
(_movntdqa): Guard the first 2 alternatives
with noavx and the last one with avx.
(sse4_1_phminposuw): Guard first alternative with noavx isa,
split the second one into one noavx and one avx alternative,
use *x and Bm in the former and x and m in the latter one.
(_ptest): Use noavx instead of * for the first two
alternatives.

--- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
+++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
@@ -6623,18 +6623,19 @@ (define_expand "vec_init"
 ;; see comment above inline_secondary_memory_needed function in i386.c
 (define_insn "vec_set_0"
   [(set (match_operand:VI4F_128 0 "nonimmediate_operand"
- "=Yr,*v,v,Yi,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
+ "=Yr,*x,v,v,Yi,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
(vec_merge:VI4F_128
  (vec_duplicate:VI4F_128
(match_operand: 2 "general_operand"
- " Yr,*v,m,r ,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
+ " Yr,*x,v,m,r ,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
  (match_operand:VI4F_128 1 "vector_move_operand"
- " C , C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
+ " C , C,C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
  (const_int 1)))]
   "TARGET_SSE"
   "@
-   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
-   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
+   insertps\t{$0xe, %2, %0|%0, %2, 0xe}
+   insertps\t{$0xe, %2, %0|%0, %2, 0xe}
+   vinsertps\t{$0xe, %2, %2, %0|%0, %2, %2, 0xe}
%vmov\t{%2, %0|%0, %2}
%vmovd\t{%2, %0|%0, %2}
movss\t{%2, %0|%0, %2}
@@ -6646,20 +6647,20 @@ (define_insn "vec_set_0"
#
#
#"
-  [(set_attr "isa" 
"sse4,sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
+  [(set_attr "isa" 
"sse4_noavx,sse4_noavx,avx,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
(set (attr "type")
- (cond [(eq_attr "alternative" "0,1,7,8,9")
+ (cond [(eq_attr "alternative" "0,1,2,8,9,10")
  (const_string "sselog")
-   (eq_attr "alternative" "11")
- (const_string "imov")
(eq_attr "alternative" "12")
+ (const_string "imov")
+   (eq_attr "alternative" "13")
  (const_string "fmov")
   ]
   (const_string "ssemov")))
-   (set_attr "prefix_extra" "*,*,*,*,*,*,*,1,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,*,*,*,*,*,1,1,1,*,*,*")
-   (set_attr "prefix" 
"maybe_vex,maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
-   (set_attr "mode" "SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
+   (set_attr "prefix_extra" "*,*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "length_immediate" "*,*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "prefix" 
"orig,orig,maybe_evex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
+   (set_attr "mode" "SF,SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
 
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
@@ -6761,14 +6762,15 @@ (define_insn_and_split "*vec_extractv4sf
   "operands[1] = gen_lowpart (SFmode, operands[1]);")
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,v,v")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v")
(vec_select:SF
- (match_operand:V4SF 1 "register_operand" "Yr,*v,0,v")
- (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n,n")])))]
+ (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
+ (parallel [(match_operand:SI 2 "const_0_to_3_operand" 
"n,n,n,n,n")])))]
   "TARGET_SSE4_1"
   "@
-   %vextractps\t{%2, %1, %0|%0, %1, %2}
-   %vextractps\t{%2, %1, %0|%0, %1, %2}
+   extractps\t{%2, %1, %0|%0, %1, %2}
+   extractps\t{%2, %1, %0|%0, %1, %2}
+   vextractps\t{%2, %1, %0|%0, %1, %2}
#
#"
   "&& reload_completed && SSE_REG_P (operands[0])"
@@ -6793,13 +6795,13 @@ (define_insn_and_split "*sse4_1_extractp
 }
   DONE;
 }
-  [(set_attr "isa" "*,*,noavx,avx")
-   (set_attr "type" "sselog,sselog,*,*")
-   (set_attr "prefix_data16" "1,1,*,*")
-   (set_attr "prefix_extra" "1,1,*,*")
-   (set_attr "length_immediate" "1,1,*,*")
-   (set_attr "prefix" "maybe_vex,maybe_vex,*,*")
-   (set_attr "mode" "V4SF,V4SF,*,*")])
+  [(set_attr "is

[PATCH] Fix Yr constraint uses in vpmov* insns

2016-05-24 Thread Jakub Jelinek
Hi!

Looking at the Yr constraint, it seems to me it is really meant to be used
for noavx, only in that case whether we use xmm0-xmm7 or xmm8+ matters for
the size of the instruction (number of prefixes).
In most of the places where Yr is used, we typically have 2 noavx
alternatives, one with Yr constraint, another one with *x, and then one
avx alternative with x or v.

But in a couple of spots we do the wrong thing, e.g. use Yr constraint
always (which (ought to act, see a later patch) acts as first half of x
for -mtune={silvermont,intel} and otherwise as v, and otherwise
uses *, which means limiting RA unnecessarily.

The following patch fixes the vpmov* insns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-24  Jakub Jelinek  

* config/i386/sse.md (sse4_1_v8qiv8hi2): Limit
first two alternatives to noavx, use *x instead of *v in the second
one, add avx alternative without *.
(sse4_1_v4qiv4si2, sse4_1_v4hiv4si2,
sse4_1_v2qiv2di2, sse4_1_v2hiv2di2,
sse4_1_v2siv2di2): Likewise.

--- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
+++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
@@ -14748,19 +14752,20 @@ (define_insn "avx512bw_v32qiv32hi2
(set_attr "mode" "XI")])
 
 (define_insn "sse4_1_v8qiv8hi2"
-  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*v")
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
(any_extend:V8HI
  (vec_select:V8QI
-   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
+   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
(parallel [(const_int 0) (const_int 1)
   (const_int 2) (const_int 3)
   (const_int 4) (const_int 5)
   (const_int 6) (const_int 7)]]
   "TARGET_SSE4_1 &&  && "
   "%vpmovbw\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "maybe_vex")
+   (set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "TI")])
 
 (define_insn "avx512f_v16qiv16si2"
@@ -14790,17 +14795,18 @@ (define_insn "avx2_v8qiv8si2v4qiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
(any_extend:V4SI
  (vec_select:V4QI
-   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
+   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
(parallel [(const_int 0) (const_int 1)
   (const_int 2) (const_int 3)]]
   "TARGET_SSE4_1 && "
   "%vpmovbd\t{%1, %0|%0, %k1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "maybe_vex")
+   (set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "TI")])
 
 (define_insn "avx512f_v16hiv16si2"
@@ -14825,17 +14831,18 @@ (define_insn "avx2_v8hiv8si2v4hiv4si2"
-  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
(any_extend:V4SI
  (vec_select:V4HI
-   (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*vm")
+   (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*xm,vm")
(parallel [(const_int 0) (const_int 1)
   (const_int 2) (const_int 3)]]
   "TARGET_SSE4_1 && "
   "%vpmovwd\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "maybe_vex")
+   (set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "TI")])
 
 (define_insn "avx512f_v8qiv8di2"
@@ -14868,16 +14875,17 @@ (define_insn "avx2_v4qiv4di2v2qiv2di2"
-  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
(any_extend:V2DI
  (vec_select:V2QI
-   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
+   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
(parallel [(const_int 0) (const_int 1)]]
   "TARGET_SSE4_1 && "
   "%vpmovbq\t{%1, %0|%0, %w1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "maybe_vex")
+   (set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "TI")])
 
 (define_insn "avx512f_v8hiv8di2"
@@ -14905,16 +14913,17 @@ (define_insn "avx2_v4hiv4di2v2hiv2di2"
-  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
(any_extend:V2DI
  (vec_select:V2HI
-   (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*vm")
+   (match_operand:V8HI 1 "nonimmediate_operand" "Yrm

Re: [v3 PATCH] PR libstdc++/66338

2016-05-24 Thread Ville Voutilainen
On 24 May 2016 at 19:35, Ville Voutilainen  wrote:
> Slight tweak. The avoidance of _NotSameTuple wasn't quite correct for
> the templates that
> take const tuple<_UElements...>& or  tuple<_UElements...>&& instead of
> const _UElements&...
> or _UElements&&...
>
> This patch introduces a new helper alias to cover those cases and
> takes it into use where appropriate.
> All tests pass, but I don't have any sane tests to verify this tweak.


..and I don't need to be quite so round-about in the new helper, it
can just check !is_same
instead of doing a nested _TC call. Changelog the same as in the previous one.
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 7522e43..ea88793 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -620,14 +620,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Shortcut for the cases where constructors taking _UElements...
   // need to be constrained.
   template using _TMC =
-  _TC<(sizeof...(_Elements) == sizeof...(_UElements)),
+  _TC<(sizeof...(_Elements) == sizeof...(_UElements))
+ && (_TC<(sizeof...(_UElements)==1), _Elements...>::
+ template _NotSameTuple<_UElements...>()),
+  _Elements...>;
+
+  // Shortcut for the cases where constructors taking tuple<_UElements...>
+  // need to be constrained.
+  template using _TMCT =
+  _TC<(sizeof...(_Elements) == sizeof...(_UElements))
+ && !is_same,
+ tuple<_UElements...>>::value,
   _Elements...>;
 
   template::template
-   _NotSameTuple<_UElements...>()
- && _TMC<_UElements...>::template
+ _TMC<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
   && _TMC<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
@@ -638,9 +646,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template::template
-   _NotSameTuple<_UElements...>()
- && _TMC<_UElements...>::template
+ _TMC<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
   && !_TMC<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
@@ -660,9 +666,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Elements...>;
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _ConstructibleTuple<_UElements...>()
-  && _TMC<_UElements...>::template
+  && _TMCT<_UElements...>::template
 _ImplicitlyConvertibleTuple<_UElements...>()
   && _TNTC<_Dummy>::template
 _NonNestedTuple&>(),
@@ -672,9 +678,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { }
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _ConstructibleTuple<_UElements...>()
-  && !_TMC<_UElements...>::template
+  && !_TMCT<_UElements...>::template
 _ImplicitlyConvertibleTuple<_UElements...>()
   && _TNTC<_Dummy>::template
 _NonNestedTuple&>(),
@@ -684,9 +690,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { }
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
-  && _TMC<_UElements...>::template
+  && _TMCT<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
   && _TNTC<_Dummy>::template
 _NonNestedTuple&&>(),
@@ -695,9 +701,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : _Inherited(static_cast<_Tuple_impl<0, _UElements...>&&>(__in)) { }
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
-  && !_TMC<_UElements...>::template
+  && !_TMCT<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
   && _TNTC<_Dummy>::template
 _NonNestedTuple&&>(),
@@ -764,9 +770,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _Inherited(__tag, __a, static_cast<_Inherited&&>(__in)) { }
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _ConstructibleTuple<_UElements...>()
-  && _TMC<_UElements...>::template
+  && _TMCT<_UElements...>::template
 _ImplicitlyConvertibleTuple<_UElements...>(),
 bool>::type=true>
tuple(allocator_arg_t __tag, const _Alloc& __a,
@@ -776,9 +782,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{ }
 
   template::template
+   

Re: [PATCH, ARM, ping1] Do not set ARM_ARCH_ISA_THUMB for armv5

2016-05-24 Thread Thomas Preudhomme
Ping?

Best regards,

Thomas

On Tuesday 10 May 2016 14:26:04 Thomas Preudhomme wrote:
> Hi,
> 
> ARM_ARCH_ISA_THUMB is currently set to 1 when compiling for armv5 despite
> armv5 not supporting Thumb instructions (armv5t does):
> 
> arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
> #define __ARM_ARCH_ISA_THUMB 1
> 
> The reason is TARGET_ARM_ARCH_ISA_THUMB being set to 1 if target does not
> support Thumb-2 and is ARMv4T, ARMv5 or later. This patch replaces that
> logic by checking whether the given architecture has the right feature bit
> (FL_THUMB).
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2016-05-06  Thomas Preud'homme  
> 
> * config/arm/arm-protos.h (arm_arch_thumb): Declare.
> * config/arm/arm.c (arm_arch_thumb): Define.
> (arm_option_override): Initialize arm_arch_thumb.
> * config/arm/arm.h (TARGET_ARM_ARCH_ISA_THUMB): Use arm_arch_thumb
> to determine if target support Thumb-1 ISA.
> 
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index
> d8179c441bb53dced94d2ebf497aad093e4ac600..4d11c91133ff1b875afcbf58abc4491c2c
> 93768e 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -603,6 +603,9 @@ extern int arm_tune_cortex_a9;
> interworking clean.  */
>  extern int arm_cpp_interwork;
> 
> +/* Nonzero if chip supports Thumb.  */
> +extern int arm_arch_thumb;
> +
>  /* Nonzero if chip supports Thumb 2.  */
>  extern int arm_arch_thumb2;
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index
> ad123dde991a3e4c4b9563ee6ebb84981767988f..f64e8caa8bc08b7aff9fe385567de9936a
> 964004 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -2191,9 +2191,8 @@ extern int making_const_table;
>  #define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
> 
>  /* The highest Thumb instruction set version supported by the chip.  */
> -#define TARGET_ARM_ARCH_ISA_THUMB\
> -  (arm_arch_thumb2 ? 2   \
> -: ((TARGET_ARM_ARCH >= 5 || arm_arch4t) ? 1 : 0))
> +#define TARGET_ARM_ARCH_ISA_THUMB\
> +  (arm_arch_thumb2 ? 2 : (arm_arch_thumb ? 1 : 0))
> 
>  /* Expands to an upper-case char of the target's architectural
> profile.  */
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index
> 71b51439dc7ba5be67671e9fb4c3f18040cce58f..de1c2d4600529518a92ed44815cff05308
> baa31c 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -852,6 +852,9 @@ int arm_tune_cortex_a9 = 0;
> interworking clean.  */
>  int arm_cpp_interwork = 0;
> 
> +/* Nonzero if chip supports Thumb.  */
> +int arm_arch_thumb;
> +
>  /* Nonzero if chip supports Thumb 2.  */
>  int arm_arch_thumb2;
> 
> @@ -3170,6 +3173,7 @@ arm_option_override (void)
>arm_arch7em = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH7EM);
>arm_arch8 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH8);
>arm_arch8_1 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_1);
> +  arm_arch_thumb = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
>arm_arch_thumb2 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB2);
>arm_arch_xscale = ARM_FSET_HAS_CPU1 (insn_flags, FL_XSCALE);
> 
> 
> 
> Before patch:
> 
> % arm-none-eabi-gcc -dM -march=armv4 -E - < /dev/null | grep ISA_THUMB
> cc1: warning: target CPU does not support THUMB instructions
> % arm-none-eabi-gcc -dM -march=armv4t -E - < /dev/null | grep ISA_THUMB
> #define __ARM_ARCH_ISA_THUMB 1
> % arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
> cc1: warning: target CPU does not support THUMB instructions
> #define __ARM_ARCH_ISA_THUMB 1
> % arm-none-eabi-gcc -dM -march=armv5t -E - < /dev/null | grep ISA_THUMB
> #define __ARM_ARCH_ISA_THUMB 1
> 
> After patch:
> 
> % arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
> cc1: warning: target CPU does not support THUMB instructions
> % arm-none-eabi-gcc -dM -march=armv5t -E - < /dev/null | grep ISA_THUMB
> #define __ARM_ARCH_ISA_THUMB 1
> % arm-none-eabi-gcc -dM -march=armv4 -E - < /dev/null | grep ISA_THUMB
> cc1: warning: target CPU does not support THUMB instructions
> % arm-none-eabi-gcc -dM -march=armv4t -E - < /dev/null | grep ISA_THUMB
> #define __ARM_ARCH_ISA_THUMB 1



Re: [v3 PATCH] PR libstdc++/66338

2016-05-24 Thread Ville Voutilainen
On 24 May 2016 at 17:59, Ville Voutilainen  wrote:


Slight tweak. The avoidance of _NotSameTuple wasn't quite correct for
the templates that
take const tuple<_UElements...>& or  tuple<_UElements...>&& instead of
const _UElements&...
or _UElements&&...

This patch introduces a new helper alias to cover those cases and
takes it into use where appropriate.
All tests pass, but I don't have any sane tests to verify this tweak.

2016-05-24  Ville Voutilainen  

PR libstdc++/66338
* include/std/tuple (_TMC): Add a check for _NotSameTuple.
* include/std/tuple (tuple(_UElements&&...)): Remove the separate
check for _NotSameTuple.
* include/std/tuple (_TMCT): New.
* include/std/tuple (tuple(const tuple<_UElements...>&)): Use it.
* include/std/tuple (tuple(tuple<_UElements...>&&)): Likewise.
* include/std/tuple (tuple(allocator_arg_t, const _Alloc&,
  const tuple<_UElements...>&)): Likewise.
* include/std/tuple (tuple(allocator_arg_t, const _Alloc&,
  tuple<_UElements...>&&)): Likewise.
* testsuite/20_util/tuple/cons/66338.cc: New.
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 7522e43..4de36e5 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -620,14 +620,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Shortcut for the cases where constructors taking _UElements...
   // need to be constrained.
   template using _TMC =
-  _TC<(sizeof...(_Elements) == sizeof...(_UElements)),
+  _TC<(sizeof...(_Elements) == sizeof...(_UElements))
+ && (_TC<(sizeof...(_UElements)==1), _Elements...>::
+ template _NotSameTuple<_UElements...>()),
+  _Elements...>;
+
+  // Shortcut for the cases where constructors taking tuple<_UElements...>
+  // need to be constrained.
+  template using _TMCT =
+  _TC<(sizeof...(_Elements) == sizeof...(_UElements))
+ && (_TC::
+ template _NotSameTuple>()),
   _Elements...>;
 
   template::template
-   _NotSameTuple<_UElements...>()
- && _TMC<_UElements...>::template
+ _TMC<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
   && _TMC<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
@@ -638,9 +646,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template::template
-   _NotSameTuple<_UElements...>()
- && _TMC<_UElements...>::template
+ _TMC<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
   && !_TMC<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
@@ -660,9 +666,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Elements...>;
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _ConstructibleTuple<_UElements...>()
-  && _TMC<_UElements...>::template
+  && _TMCT<_UElements...>::template
 _ImplicitlyConvertibleTuple<_UElements...>()
   && _TNTC<_Dummy>::template
 _NonNestedTuple&>(),
@@ -672,9 +678,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { }
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _ConstructibleTuple<_UElements...>()
-  && !_TMC<_UElements...>::template
+  && !_TMCT<_UElements...>::template
 _ImplicitlyConvertibleTuple<_UElements...>()
   && _TNTC<_Dummy>::template
 _NonNestedTuple&>(),
@@ -684,9 +690,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { }
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
-  && _TMC<_UElements...>::template
+  && _TMCT<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
   && _TNTC<_Dummy>::template
 _NonNestedTuple&&>(),
@@ -695,9 +701,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : _Inherited(static_cast<_Tuple_impl<0, _UElements...>&&>(__in)) { }
 
   template::template
+enable_if<_TMCT<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
-  && !_TMC<_UElements...>::template
+  && !_TMCT<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
   && _TNTC<_Dummy>::template
 _NonNestedTuple&&>(),
@@ -764,9 +770,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
: _Inherited(__tag, __a, static_cast<_Inherited&&>(__in)) { }
 
   template::tem

[committed] Allow non-integer/pointer references in linear(ref()) (PR c++/71257)

2016-05-24 Thread Jakub Jelinek
Hi!

Something I've discovered only when looking at the standard again while
working on Fortran changes - linear clause with ref modifier should
in C++ accept all arguments with reference type, not just those where
it references integer or pointer (that is the restriction of all the other
kinds).

The simd-clone-6.cc testcase ICEd also due to missing
vectorizable_simd_clone_call hunks, fixed that as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
and 6.2.

2016-05-24  Jakub Jelinek  

PR c++/71257
* tree-vect-stmts.c (vectorizable_simd_clone_call): Handle
SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP like
SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP.  Add
SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP and
SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP cases explicitly.

* semantics.c (finish_omp_clauses) :
For OMP_CLAUSE_LINEAR_REF don't require type to be
integral or pointer.

* g++.dg/vect/simd-clone-6.cc: New test.
* g++.dg/gomp/declare-simd-6.C: New test.

--- gcc/tree-vect-stmts.c.jj2016-05-20 09:05:08.0 +0200
+++ gcc/tree-vect-stmts.c   2016-05-24 12:49:49.257147827 +0200
@@ -3012,8 +3012,10 @@ vectorizable_simd_clone_call (gimple *st
 {
   STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_push (bestn->decl);
   for (i = 0; i < nargs; i++)
-   if (bestn->simdclone->args[i].arg_type
-   == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
+   if ((bestn->simdclone->args[i].arg_type
+== SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
+   || (bestn->simdclone->args[i].arg_type
+   == SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP))
  {
STMT_VINFO_SIMD_CLONE_INFO (stmt_info).safe_grow_cleared (i * 3
+ 1);
@@ -3148,6 +3150,7 @@ vectorizable_simd_clone_call (gimple *st
  vargs.safe_push (op);
  break;
case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
+   case SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP:
  if (j == 0)
{
  gimple_seq stmts;
@@ -3211,6 +3214,8 @@ vectorizable_simd_clone_call (gimple *st
  vargs.safe_push (new_temp);
}
  break;
+   case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP:
+   case SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP:
case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
case SIMD_CLONE_ARG_TYPE_LINEAR_REF_VARIABLE_STEP:
case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP:
--- gcc/cp/semantics.c.jj   2016-05-13 22:23:03.0 +0200
+++ gcc/cp/semantics.c  2016-05-24 11:04:15.160634109 +0200
@@ -5881,7 +5881,7 @@ finish_omp_clauses (tree clauses, enum c
  break;
}
}
- else
+ else if (OMP_CLAUSE_LINEAR_KIND (c) != OMP_CLAUSE_LINEAR_REF)
{
  if (!INTEGRAL_TYPE_P (type)
  && TREE_CODE (type) != POINTER_TYPE)
--- gcc/testsuite/g++.dg/vect/simd-clone-6.cc.jj2016-05-24 
12:58:46.321054736 +0200
+++ gcc/testsuite/g++.dg/vect/simd-clone-6.cc   2016-05-24 13:33:16.261767563 
+0200
@@ -0,0 +1,43 @@
+// PR c++/71257
+// { dg-require-effective-target vect_simd_clones }
+// { dg-additional-options "-fopenmp-simd -fno-inline" }
+// { dg-additional-options "-mavx" { target avx_runtime } }
+
+#include "../../gcc.dg/vect/tree-vect.h"
+
+#define N 1024
+struct S { int a; };
+int c[N], e[N], f[N];
+S d[N];
+
+#pragma omp declare simd linear(ref(b, c) : 1)
+int
+foo (int a, S &b, int &c)
+{
+  return a + b.a + c;
+}
+
+void
+do_main ()
+{
+  int i;
+  for (i = 0; i < N; i++)
+{
+  c[i] = i;
+  d[i].a = 2 * i;
+  f[i] = 3 * i;
+}
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+e[i] = foo (c[i], d[i], f[i]);
+  for (i = 0; i < N; i++)
+if (e[i] != 6 * i)
+  __builtin_abort ();
+}
+
+int
+main ()
+{
+  check_vect ();
+  return 0;
+}
--- gcc/testsuite/g++.dg/gomp/declare-simd-6.C.jj   2016-05-24 
11:21:39.591853711 +0200
+++ gcc/testsuite/g++.dg/gomp/declare-simd-6.C  2016-05-24 13:33:43.016416143 
+0200
@@ -0,0 +1,37 @@
+// PR c++/71257
+// { dg-do compile }
+// { dg-options "-fopenmp-simd" }
+
+struct S { int a; };
+#pragma omp declare simd linear(val(a):2)
+int f1 (int &a);
+#pragma omp declare simd linear(uval(a):2)
+unsigned short f2 (unsigned short &a);
+#pragma omp declare simd linear(ref(a):1)
+int f3 (long long int &a);
+#pragma omp declare simd linear(a:1)
+int f4 (int &a);
+#pragma omp declare simd linear(val(a))
+int f5 (int a);
+#pragma omp declare simd linear(uval(a):2) // { dg-error "modifier 
applied to non-reference variable" }
+int f6 (unsigned short a);
+#pragma omp declare simd linear(ref(a):1)  // { dg-error "modifier 
applied to non-reference variabl

C/C++ OpenACC routine directive, undeclared name error: try to help the user, once

2016-05-24 Thread Thomas Schwinge
Hi!

Some users of C/C++ OpenACC are surprised to see code such as:

#pragma acc routine (F)
[declaration or definition of F]

... run into "error: 'F' has not been declared".  If the routine
directive is meant to apply to the lexically following function
declaration or definition, either don't specify '(F)' here:

#pragma acc routine
[declaration or definition of F]

..., or place a function declaration before the directive:

[declaration of F]
#pragma acc routine (F)

[definition or use of F]

OK for trunk?

commit 83442d8baef0d0c09128368879b69873cbf9bf01
Author: Thomas Schwinge 
Date:   Tue May 24 17:21:54 2016 +0200

C/C++ OpenACC routine directive, undeclared name error: try to help the 
user, once

gcc/c/
* c-parser.c (c_parser_oacc_routine): If running into an
undeclared name error, try to help the user, once.
gcc/cp/
* parser.c (cp_parser_oacc_routine): If running into an undeclared
name error, try to help the user, once.
gcc/testsuite/
* c-c++-common/goacc/routine-5.c: Update.
---
 gcc/c/c-parser.c | 17 +++--
 gcc/cp/parser.c  | 17 +++--
 gcc/testsuite/c-c++-common/goacc/routine-5.c | 15 ++-
 3 files changed, 44 insertions(+), 5 deletions(-)

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index cbd4e4c..6b589a4 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -13989,8 +13989,21 @@ c_parser_oacc_routine (c_parser *parser, enum 
pragma_context context)
{
  decl = lookup_name (token->value);
  if (!decl)
-   error_at (token->location, "%qE has not been declared",
- token->value);
+   {
+ error_at (token->location, "%qE has not been declared",
+   token->value);
+ static bool informed_once = false;
+ if (!informed_once)
+   {
+ inform (token->location,
+ "if the routine directive is meant to apply to the "
+ "lexically following function declaration or "
+ "definition, either don't specify %<(%E)%> here, or "
+ "place a function declaration before the directive",
+ token->value);
+ informed_once = true;
+   }
+   }
  c_parser_consume_token (parser);
}
   else
diff --git gcc/cp/parser.c gcc/cp/parser.c
index 6485cbd..4d542a0 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -36504,8 +36504,21 @@ cp_parser_oacc_routine (cp_parser *parser, cp_token 
*pragma_tok,
 /*optional_p=*/false);
   decl = cp_parser_lookup_name_simple (parser, id, token->location);
   if (id != error_mark_node && decl == error_mark_node)
-   cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
-token->location);
+   {
+ cp_parser_name_lookup_error (parser, id, decl, NLE_NULL,
+  token->location);
+ static bool informed_once = false;
+ if (!informed_once)
+   {
+ inform (token->location,
+ "if the routine directive is meant to apply to the "
+ "lexically following function declaration or "
+ "definition, either don't specify %<(%E)%> here, or "
+ "place a function declaration before the directive",
+ id);
+ informed_once = true;
+   }
+   }
 
   if (decl == error_mark_node
  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
diff --git gcc/testsuite/c-c++-common/goacc/routine-5.c 
gcc/testsuite/c-c++-common/goacc/routine-5.c
index 1efd154..9c30e87 100644
--- gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -71,7 +71,20 @@ void Foo ()
 
 #pragma acc routine (Foo) gang // { dg-error "must be applied before 
definition" }
 
-#pragma acc routine (Baz) // { dg-error "not been declared" }
+#pragma acc routine (Baz) worker
+/* { dg-error ".Baz. has not been declared" "" { target *-*-* } 74 }
+   Try to help the user:
+   { dg-message "note: if the routine directive" "" { target *-*-* } 74 } */
+
+#pragma acc routine (Baz) vector
+/* { dg-error ".Baz. has not been declared" "" { target *-*-* } 79 }
+   Don't try to help the user again:
+   { dg-bogus "note: if the routine directive" "" { target *-*-* } 79 } */
+
+#pragma acc routine (Qux) seq
+/* { dg-error ".Qux. has not been declared" "" { target *-*-* } 84 }
+   Don't try to help the user again:
+   { dg-bogus "note: if the routine directive" "" { target *-*-* } 84 } */
 
 
 int vb1;   /* { dg-error "directive for use" } */


Grüße
 Thomas


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread H.J. Lu
On Tue, May 24, 2016 at 8:52 AM, Uros Bizjak  wrote:
> On Tue, May 24, 2016 at 5:40 PM, H.J. Lu  wrote:
>
>>> No, this is a flag, not a variable. Let's figure out how to extend
>>> target flags to more than 63 flags first.
>>
>> Extending target flags to more than 63 bits requires replacing
>> HOST_WIDE_INT with a bit vector.  Since target flags is used in
>> TARGET_SUBTARGET_DEFAULT, change it to a bit vector is a
>> non-trivial change.  On the other hand, -mgeneral-regs-only is a
>> command-line option which doesn't require support for
>> TARGET_SUBTARGET_DEFAULT, similar to other -m options like
>> -mmitigate-rop.  Using flag_general_regs_only is an option.
>
> I have been informed that Intel people are looking into how to extend
> target flags to accommodate additional ISA flags. There is no point to
> hurry with an unoptimal solution. Perhaps you can coordinate your
> patch with their efforts?

iISA flags use x86_isa_flags, not target_flags.  -mgeneral-regs-only
shouldn't use x86_isa_flags.  It was my oversight to use target_flags
with -mgeneral-regs-only to begin with.   I don't think using
flag_general_regs_only is not an optimal solution, which I should have
used in the first place.  The x86 change for interrupt handler depends
on -mgeneral-regs-only.

-- 
H.J.


RE: [PATCH][MIPS] Add support for P6600

2016-05-24 Thread Matthew Fortune
Hi Robert,

A few comments inlined. I'd like to review again and/or let Catherine
comment before commit.

Thanks,
Matthew

> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 06acd30..cbe1007 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -18496,6 +18519,28 @@ mips_orphaned_high_part_p (mips_offset_table *htab, 
> rtx_insn
> *insn)
>return false;
>  }
> 
> +/* Subroutine of mips_avoid_hazard.  We classify unconditional branches
> +   of interest for the P6600 for performance reasons.  We are interested
> +   in differentiating BALC from JIC, JIALC and BC.  */
> +
> +static enum mips_ucbranch_type
> +mips_classify_branch_p6600 (rtx_insn *insn)
> +{
> +  if (!(insn
> + && USEFUL_INSN_P (insn)
> + && GET_CODE (PATTERN (insn)) != SEQUENCE))
> +return UC_UNDEFINED;

This might be easier to read if you move the not inside.

> +  if (get_attr_jal (insn) == JAL_INDIRECT /* JIC and JIALC.  */
> +  || get_attr_type (insn) == TYPE_JUMP) /* BC as it is a loose jump.  */
> +return UC_OTHER;

I think I understand what 'loose jump' means here. It is saying that
this is a TYPE_JUMP instruction which does not in a sequence which
means it must not have a delay slot. I would just say /* BC.  */ in the
comment though.

> +  if (CALL_P (insn) && get_attr_jal (insn) == JAL_DIRECT)
> +return UC_BALC;
> +
> +  return UC_UNDEFINED;
> +}
> +
>  /* Subroutine of mips_reorg_process_insns.  If there is a hazard between
> INSN and a previous instruction, avoid it by inserting nops after
> instruction AFTER.
> @@ -18548,14 +18593,36 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn *insn, 
> int
> *hilo_delay,
>  && GET_CODE (pattern) != ASM_INPUT
>  && asm_noperands (pattern) < 0)
>  nops = 1;
> +  /* The P6600's branch predictor does not handle certain static
> + sequences of back-to-back jumps well.  Inserting a no-op only
> + costs space as the dispatch unit will disregard the nop.
> + Here we handle the cases of back to back unconditional branches
> + that are inefficient.  */
> +  else if (TUNE_P6600 && TARGET_CB_MAYBE && !optimize_size
> +&& ((mips_classify_branch_p6600 (after) == UC_BALC
> + && mips_classify_branch_p6600 (insn) == UC_OTHER)
> +|| (mips_classify_branch_p6600 (insn) == UC_BALC
> +&& (mips_classify_branch_p6600 (after) == UC_OTHER
> +nops = 1;
>else
>  nops = 0;
> 
>/* Insert the nops between this instruction and the previous one.
>   Each new nop takes us further from the last hilo hazard.  */
>*hilo_delay += nops;
> +
> +  /* If we're tuning for the P6600, we come across an annoying GCC
> + assumption that debug information always follows a call.  Move
> + past any debug information in that case.  */
> +  rtx_insn *real_after = after;
> +  if (real_after && nops && CALL_P (real_after))
> +while (TUNE_P6600 && real_after
> +&& (NOTE_P (NEXT_INSN (real_after))
> +|| BARRIER_P (NEXT_INSN (real_after
> +  real_after = NEXT_INSN (real_after);
> +

As Bernhard pointed out this could be improved by hoisting the TUNE_P6600
up to the if.

The comment on this doesn't really say what is going on here.  We have
to move to the next real instruction because we are going to insert a
NOP and the current instruction is a call which will have debug
information.  We can't separate the call from the debug info so move
past it. As to whether this should only happen for the P6600 specific
hazard is subjective. It should be save all the time so TUNE_P6600 could
be deleted entirely.

>while (nops-- > 0)
> -emit_insn_after (gen_hazard_nop (), after);
> +emit_insn_after (gen_hazard_nop (), real_after);
> 
>/* Set up the state for the next instruction.  */
>*hilo_delay += ninsns;
> @@ -18565,6 +18632,14 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn *insn, 
> int
> *hilo_delay,
>  switch (get_attr_hazard (insn))
>{
>case HAZARD_NONE:
> + /* For the P6600, flag some unconditional branches as having
> +a pseudo-forbidden slot.  This will cause additional nop insertion
> +or SEQUENCE breaking as required.  */

This should explain that the addition of nops is for performance reasons
not correctness.

> + if (TUNE_P6600
> + && !optimize_size
> + && TARGET_CB_MAYBE
> + && mips_classify_branch_p6600 (insn) == UC_OTHER)
> +   *fs_delay = true;
>   break;
> 
>case HAZARD_FORBIDDEN_SLOT:
> @@ -18806,7 +18881,10 @@ mips_reorg_process_insns (void)
>sequence and replace it with the delay slot instruction
>then the jump to clear the forbidden slot hazard.  */
> 
> -   if (fs_delay)
> +   if (fs_delay || (TUNE_P6600
> +&& TARGET_CB_MAYBE
> +&& mips_classify_branch_p6600 (insn)
> +  

Re: [PATCH] Fix PR70434, change FE IL for vector indexing

2016-05-24 Thread Jakub Jelinek
On Mon, May 23, 2016 at 04:22:57PM +0200, Richard Biener wrote:
> *** /dev/null 1970-01-01 00:00:00.0 +
> --- gcc/testsuite/c-c++-common/vector-subscript-5.c   2016-05-23 
> 16:17:41.148043066 +0200
> ***
> *** 0 
> --- 1,13 
> + /* { dg-do compile } */
> + 
> + typedef int U __attribute__ ((vector_size (16)));
> + 
> + int
> + foo (int i)
> + {
> +   register U u
> + #if __SSE2__
> +   asm ("xmm0");
> + #endif
> +   return u[i];
> + }

This test fails on i?86 (and supposedly on all non-x86 arches too).

I've tested following fix and committed as obvious to trunk:

2016-05-24  Jakub Jelinek  

PR middle-end/70434
PR c/69504
* c-c++-common/vector-subscript-5.c (foo): Move ; out of the ifdef.

--- gcc/testsuite/c-c++-common/vector-subscript-5.c.jj  2016-05-24 
10:56:00.0 +0200
+++ gcc/testsuite/c-c++-common/vector-subscript-5.c 2016-05-24 
18:11:51.778520055 +0200
@@ -7,7 +7,8 @@ foo (int i)
 {
   register U u
 #if __SSE2__
-  asm ("xmm0");
+  asm ("xmm0")
 #endif
+  ;
   return u[i];
 }

Jakub


Re: [PATCH] c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

2016-05-24 Thread Martin Sebor

Thanks for the suggestions.  I implemented them in the attached
update to the the patch.  The macro I added evaluates its argument
multiple times.  That normally isn't a problem unless it's invoked
with a non-trivial argument like a call to complete_type() that's
passed to COMPLETE_TYPE_P() in grokdeclarator.  One way to avoid
possible problems due to evaluating the macro argument more than
once is to introduce a helper inline function.  I haven't seen
it done in tree.h so I didn't introduce one in this patch either,
but it might be worth considering for the new macro and any other
non-trivial macros like it.


Yes, let's just make it an inline function (of which there are already
quite a few in tree.h).


Attached is an updated patch with this change.

Martin
PR c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

gcc/ChangeLog:
2016-05-24  Martin Sebor  

	PR c++/71147
	* gcc/tree.h (complete_or_array_type_p): New inline function.

gcc/testsuite/ChangeLog:
2016-05-24  Martin Sebor  

	PR c++/71147
	* g++.dg/ext/flexary16.C: New test.

gcc/cp/ChangeLog:
2016-05-24  Martin Sebor  

	PR c++/71147
	* decl.c (layout_var_decl, grokdeclarator): Use complete_or_array_type_p.
	* pt.c (instantiate_class_template_1): Try to complete the element
	type of a flexible array member.
	(can_complete_type_without_circularity): Handle arrays of unknown bound.
	* typeck.c (complete_type): Also complete the type of the elements of
	arrays with an unspecified bound.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 7a69711..ef5fd66 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5305,10 +5305,7 @@ layout_var_decl (tree decl)
 complete_type (type);
   if (!DECL_SIZE (decl)
   && TREE_TYPE (decl) != error_mark_node
-  && (COMPLETE_TYPE_P (type)
-	  || (TREE_CODE (type) == ARRAY_TYPE
-	  && !TYPE_DOMAIN (type)
-	  && COMPLETE_TYPE_P (TREE_TYPE (type)
+  && complete_or_array_type_p (type))
 layout_decl (decl, 0);
 
   if (!DECL_EXTERNAL (decl) && DECL_SIZE (decl) == NULL_TREE)
@@ -11165,8 +11162,7 @@ grokdeclarator (const cp_declarator *declarator,
 	  }
 	else if (!staticp && !dependent_type_p (type)
 		 && !COMPLETE_TYPE_P (complete_type (type))
-		 && (TREE_CODE (type) != ARRAY_TYPE
-		 || !COMPLETE_TYPE_P (TREE_TYPE (type))
+		 && (!complete_or_array_type_p (type)
 		 || initialized == 0))
 	  {
 	if (TREE_CODE (type) != ARRAY_TYPE
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 2bba571..03dee66 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9554,7 +9554,7 @@ can_complete_type_without_circularity (tree type)
 return 0;
   else if (COMPLETE_TYPE_P (type))
 return 1;
-  else if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
+  else if (TREE_CODE (type) == ARRAY_TYPE /* && TYPE_DOMAIN (type) */)
 return can_complete_type_without_circularity (TREE_TYPE (type));
   else if (CLASS_TYPE_P (type)
 	   && TYPE_BEING_DEFINED (TYPE_MAIN_VARIANT (type)))
@@ -10119,17 +10119,12 @@ instantiate_class_template_1 (tree type)
 			  if (can_complete_type_without_circularity (rtype))
 			complete_type (rtype);
 
-  if (TREE_CODE (r) == FIELD_DECL
-  && TREE_CODE (rtype) == ARRAY_TYPE
-  && COMPLETE_TYPE_P (TREE_TYPE (rtype))
-  && !COMPLETE_TYPE_P (rtype))
-{
-  /* Flexible array mmembers of elements
- of complete type have an incomplete type
- and that's okay.  */
-}
-  else if (!COMPLETE_TYPE_P (rtype))
+			  if (!complete_or_array_type_p (rtype))
 			{
+			  /* If R's type couldn't be completed and
+ it isn't a flexible array member (whose
+ type is incomplete by definition) give
+ an error.  */
 			  cxx_incomplete_type_error (r, rtype);
 			  TREE_TYPE (r) = error_mark_node;
 			}
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index cd058fa..2688ab4 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -112,7 +112,7 @@ complete_type (tree type)
 
   if (type == error_mark_node || COMPLETE_TYPE_P (type))
 ;
-  else if (TREE_CODE (type) == ARRAY_TYPE && TYPE_DOMAIN (type))
+  else if (TREE_CODE (type) == ARRAY_TYPE)
 {
   tree t = complete_type (TREE_TYPE (type));
   unsigned int needs_constructing, has_nontrivial_dtor;
diff --git a/gcc/testsuite/g++.dg/ext/flexary16.C b/gcc/testsuite/g++.dg/ext/flexary16.C
new file mode 100644
index 000..a3e040d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/flexary16.C
@@ -0,0 +1,37 @@
+// PR c++/71147 - [6 Regression] Flexible array member wrongly rejected
+//   in template
+// { dg-do compile }
+
+template 
+struct container
+{
+  struct elem {
+unsigned u;
+  };
+
+  struct incomplete {
+int x;
+elem array[];
+  };
+};
+
+unsigned f (container::incomplete* i)
+{
+  return i->a

Re: RFC [1/2] divmod transform

2016-05-24 Thread Prathamesh Kulkarni
On 24 May 2016 at 19:39, Richard Biener  wrote:
> On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>
>> On 24 May 2016 at 17:42, Richard Biener  wrote:
>> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 23 May 2016 at 17:35, Richard Biener  
>> >> wrote:
>> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >> >  wrote:
>> >> >> Hi,
>> >> >> I have updated my patch for divmod (attached), which was originally
>> >> >> based on Kugan's patch.
>> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> >> having same operands to divmod representation, so we can cse 
>> >> >> computation of mod.
>> >> >>
>> >> >> t1 = a TRUNC_DIV_EXPR b;
>> >> >> t2 = a TRUNC_MOD_EXPR b
>> >> >> is transformed to:
>> >> >> complex_tmp = DIVMOD (a, b);
>> >> >> t1 = REALPART_EXPR (complex_tmp);
>> >> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >> >>
>> >> >> * New hook divmod_expand_libfunc
>> >> >> The rationale for introducing the hook is that different targets have
>> >> >> incompatible calling conventions for divmod libfunc.
>> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> >> return quotient and store remainder in argument passed as pointer,
>> >> >> while the arm version takes two arguments and returns both
>> >> >> quotient and remainder having mode double the size of the operand mode.
>> >> >> The port should hence override the hook expand_divmod_libfunc
>> >> >> to generate call to target-specific divmod.
>> >> >> Ports should define this hook if:
>> >> >> a) The port does not have divmod or div insn for the given mode.
>> >> >> b) The port defines divmod libfunc for the given mode.
>> >> >> The default hook default_expand_divmod_libfunc() generates call
>> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> >> are of DImode.
>> >> >>
>> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> >> cross-tested on arm*-*-*.
>> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> >> Does this patch look OK ?
>> >> >
>> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> >> > index 6b4601b..e4a021a 100644
>> >> > --- a/gcc/targhooks.c
>> >> > +++ b/gcc/targhooks.c
>> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> >> > machine_mode, optimization_type)
>> >> >return true;
>> >> >  }
>> >> >
>> >> > +void
>> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> >> > +  rtx op0, rtx op1,
>> >> > +  rtx *quot_p, rtx *rem_p)
>> >> >
>> >> > functions need a comment.
>> >> >
>> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  
>> >> > In that
>> >> > case we could avoid the target hook.
>> >> Well I would prefer adding the hook because that's more easier -;)
>> >> Would it be ok for now to go with the hook ?
>> >> >
>> >> > +  /* If target overrides expand_divmod_libfunc hook
>> >> > +then perform divmod by generating call to the target-specifc 
>> >> > divmod
>> >> > libfunc.  */
>> >> > +  if (targetm.expand_divmod_libfunc != 
>> >> > default_expand_divmod_libfunc)
>> >> > +   return true;
>> >> > +
>> >> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> >> > +  return (mode == DImode && unsignedp);
>> >> >
>> >> > I don't understand this - we know optab_libfunc returns non-NULL for 
>> >> > 'mode'
>> >> > but still restrict this to DImode && unsigned?  Also if
>> >> > targetm.expand_divmod_libfunc
>> >> > is not the default we expect the target to handle all modes?
>> >> Ah indeed, the check for DImode is unnecessary.
>> >> However I suppose the check for unsignedp should be there,
>> >> since we want to generate call to __udivmoddi4 only if operand is 
>> >> unsigned ?
>> >
>> > The optab libfunc for sdivmod should be NULL in that case.
>> Ah indeed, thanks.
>> >
>> >> >
>> >> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >> >
>> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the 
>> >> > target
>> >> > supports a specific operation (for example SImode divmod would use 
>> >> > DImode
>> >> > divmod by means of widening operands - for the unsigned case of course).
>> >> Thanks for pointing out. So if a target does not support divmod
>> >> libfunc for a mode
>> >> but for a wider mode, then we could zero-extend operands to the 
>> >> wider-mode,
>> >> perform divmod on the wider-mode, and then cast result back to the
>> >> original mode.
>> >> I haven't done that in this patch, would it be OK to do that as a follow 
>> >> up ?
>> >
>> > I think that you should conservatively handle the div_optab query, thus if
>> > the target has a HW division in a wider mode don't use the divmod IFN.
>> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
>> > if (optab_handler (div_optab, mode) != CODE_FOR_nothi

[PATCH] Add priority_queue::value_compare (LWG 2684)

2016-05-24 Thread Jonathan Wakely

* include/bits/stl_queue.h (priority_queue::value_compare): Define.

This is only Tentatively Ready but I don't think there's any harm in
making the change now. Libc++ have been shipping this for years,
without realising it wasn't actually in the standard :-)

Tested x86_64, committed to trunk.

commit 64c647342c0786ae01b4f0b4b4ce716da7faa757
Author: redi 
Date:   Tue May 24 15:59:05 2016 +

Add priority_queue::value_compare (LWG 2684)

* include/bits/stl_queue.h (priority_queue::value_compare): Define.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@236646 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/bits/stl_queue.h 
b/libstdc++-v3/include/bits/stl_queue.h
index 9caca03..a292309 100644
--- a/libstdc++-v3/include/bits/stl_queue.h
+++ b/libstdc++-v3/include/bits/stl_queue.h
@@ -417,6 +417,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef typename _Sequence::const_reference   const_reference;
   typedef typename _Sequence::size_type size_type;
   typedef  _Sequencecontainer_type;
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // DR 2684. priority_queue lacking comparator typedef
+  typedef _Compare value_compare;
 
 protected:
   //  See queue::c for notes on these names.


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 5:40 PM, H.J. Lu  wrote:

>> No, this is a flag, not a variable. Let's figure out how to extend
>> target flags to more than 63 flags first.
>
> Extending target flags to more than 63 bits requires replacing
> HOST_WIDE_INT with a bit vector.  Since target flags is used in
> TARGET_SUBTARGET_DEFAULT, change it to a bit vector is a
> non-trivial change.  On the other hand, -mgeneral-regs-only is a
> command-line option which doesn't require support for
> TARGET_SUBTARGET_DEFAULT, similar to other -m options like
> -mmitigate-rop.  Using flag_general_regs_only is an option.

I have been informed that Intel people are looking into how to extend
target flags to accommodate additional ISA flags. There is no point to
hurry with an unoptimal solution. Perhaps you can coordinate your
patch with their efforts?

Uros.


Re: Allow embedded timestamps by C/C++ macros to be set externally (3)

2016-05-24 Thread Jeff Law

On 05/23/2016 04:59 PM, Dhole wrote:

PING

Note Bernd is on PTO for another week or so.

jeff


Re: [C/C++ PATCH] Fix bogus warning with -Wswitch-unreachable (PR c/71249)

2016-05-24 Thread Jason Merrill
OK.

Jason

On Tue, May 24, 2016 at 7:41 AM, Marek Polacek  wrote:
> Martin S. noticed that cc1plus bogusly warns on the following test.  That's
> because I didn't realize that GIMPLE_BINDs might be nested in C++ so we need 
> to
> look through them, and only then get the first statement in the seq.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2016-05-24  Marek Polacek  
>
> PR c/71249
> * gimplify.c (gimplify_switch_expr): Look into the innermost lexical
> scope.
>
> * c-c++-common/Wswitch-unreachable-2.c: New test.
>
> diff --git gcc/gimplify.c gcc/gimplify.c
> index 6473544..5c5e9d6 100644
> --- gcc/gimplify.c
> +++ gcc/gimplify.c
> @@ -1605,8 +1605,9 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
>   && switch_body_seq != NULL)
> {
>   gimple_seq seq = switch_body_seq;
> - if (gimple_code (switch_body_seq) == GIMPLE_BIND)
> -   seq = gimple_bind_body (as_a  (switch_body_seq));
> + /* Look into the innermost lexical scope.  */
> + while (gimple_code (seq) == GIMPLE_BIND)
> +   seq = gimple_bind_body (as_a  (seq));
>   gimple *stmt = gimple_seq_first_stmt (seq);
>   enum gimple_code code = gimple_code (stmt);
>   if (code != GIMPLE_LABEL && code != GIMPLE_TRY)
> diff --git gcc/testsuite/c-c++-common/Wswitch-unreachable-2.c 
> gcc/testsuite/c-c++-common/Wswitch-unreachable-2.c
> index e69de29..8f57392 100644
> --- gcc/testsuite/c-c++-common/Wswitch-unreachable-2.c
> +++ gcc/testsuite/c-c++-common/Wswitch-unreachable-2.c
> @@ -0,0 +1,18 @@
> +/* PR c/71249 */
> +/* { dg-do compile } */
> +
> +int
> +f (int i)
> +{
> +  switch (i)
> +{
> +  {
> +   int j;
> +  foo:
> +   return i; /* { dg-bogus "statement will never be executed" } */
> +  };
> +case 3:
> +  goto foo;
> +}
> +  return i;
> +}
>
> Marek


Re: [C++ Patch] PR 69872 ("[6/7 Regression] -Wnarrowing note without warning/errror")

2016-05-24 Thread Jason Merrill
OK.

Jason


On Tue, May 24, 2016 at 8:32 AM, Paolo Carlini  wrote:
> Hi,
>
> in this small diagnostic regression we emit an inform without a preceding
> warning/error: checking the return value of the pedwarn, as we normally want
> to do, fixes the problem. Tested x86_64-linux.
>
> Thanks,
> Paolo.
>
> /


[PATCH] Fixes to must-tail-call tests

2016-05-24 Thread David Malcolm
The following fixes the known failures of the must-tail-call tests.

Tested with --target=
* aarch64-unknown-linux-gnu
* ia64-unknown-linux-gnu
* m68k-unknown-linux-gnu
* x86_64-pc-linux-gnu
 
OK for trunk?

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/must-tail-call-2.c (test_2_caller): Generalize
expected error message to allow "argument must by passed by
copying".
(test_3): Generalize expected error message to allow for failure
of targetm.function_ok_for_sibcall.
(test_4): Likewise.
(test_5): Likewise.
---
 gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c 
b/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
index c5504f8..ca81b35 100644
--- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
@@ -29,14 +29,14 @@ int __attribute__((noinline,noclone))
 test_2_caller (int i)
 {
   struct box b;
-  return test_2_callee (i + 1, b); /* { dg-error "cannot tail-call: callee 
required more stack slots than the caller" } */
+  return test_2_callee (i + 1, b); /* { dg-error "cannot tail-call: callee 
required more stack slots than the caller|argument must be passed by copying" } 
*/
 }
 
 extern void setjmp (void);
 void
 test_3 (void)
 {
-  setjmp (); /* { dg-error "cannot tail-call: callee returns twice" } */
+  setjmp (); /* { dg-error "cannot tail-call: callee returns twice|target is 
not able to optimize the call into a sibling call" } */
 }
 
 void
@@ -45,7 +45,7 @@ test_4 (void)
   void nested (void)
   {
   }
-  nested (); /* { dg-error "cannot tail-call: nested function" } */
+  nested (); /* { dg-error "cannot tail-call: nested function|target is not 
able to optimize the call into a sibling call" } */
 }
 
 typedef void (fn_ptr_t) (void);
@@ -54,5 +54,5 @@ volatile fn_ptr_t fn_ptr;
 void
 test_5 (void)
 {
-  fn_ptr (); /* { dg-error "cannot tail-call: callee does not return" } */
+  fn_ptr (); /* { dg-error "cannot tail-call: callee does not return|target is 
not able to optimize the call into a sibling call" } */
 }
-- 
1.8.5.3



Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread H.J. Lu
On Sat, May 21, 2016 at 12:48 AM, Uros Bizjak  wrote:
> On Fri, May 20, 2016 at 7:49 PM, H.J. Lu  wrote:
>> On Fri, May 20, 2016 at 10:15 AM, Rainer Orth
>>  wrote:
>>> "H.J. Lu"  writes:
>>>
 On Thu, May 12, 2016 at 10:54 AM, H.J. Lu  wrote:
>>> Here is a patch to add
>>> -mgeneral-regs-only option to x86 backend.   We can update
>>> spec for interrupt handle to recommend compiling interrupt handler
>>> with -mgeneral-regs-only option and add a note for compiler
>>> implementers.
>>>
>>> OK for trunk if there is no regression?
>>
>>
>> I can't comment on the code patch, but for the documentation part:
>>
>>> @@ -24242,6 +24242,12 @@ opcodes, to mitigate against certain forms of
>>> attack. At the moment,
>>>  this option is limited in what it can do and should not be relied
>>>  on to provide serious protection.
>>>
>>> +@item -mgeneral-regs-only
>>> +@opindex mgeneral-regs-only
>>> +Generate code which uses only the general-purpose registers.  This will
>>
>>
>> s/which/that/
>>
>>> +prevent the compiler from using floating-point, vector, mask and bound
>>
>>
>> s/will prevent/prevents/
>>
>>> +registers, but will not impose any restrictions on the assembler.
>>
>>
>> Maybe you mean to say "does not restrict use of those registers in inline
>> assembly code"?  In any case, please get rid of the future tense here, 
>> too.
>
> I changed it to
>
> ---
> @item -mgeneral-regs-only
> @opindex mgeneral-regs-only
> Generate code that uses only the general-purpose registers.  This
> prevents the compiler from using floating-point, vector, mask and bound
> registers.
> ---
>

 Here is the updated patch.  Tested on x86-64.  OK for trunk?
>>>
>>> This patch broke {i386,x86_64}-apple-darwin15.5.0 bootstrap:
>>>
>>> In file included from ./tm.h:16:0,
>>>  from /vol/gcc/src/hg/trunk/local/gcc/genattrtab.c:108:
>>> ./options.h:5443:2: error: #error too many target masks
>>>  #error too many target masks
>>>   ^
>>> Makefile:2497: recipe for target 'build/genattrtab.o' failed
>>> make[3]: *** [build/genattrtab.o] Error 1
>>>
>>> options.h has
>>>
>>> #define OPTION_MASK_ISA_XSAVES (HOST_WIDE_INT_1 << 62)
>>> #error too many target masks
>>>
>>> The tree bootstraps just fine at the previous revision.
>>>
>>
>> Tested on x86-64.  OK for trunk?
>
> No, this is a flag, not a variable. Let's figure out how to extend
> target flags to more than 63 flags first.

Extending target flags to more than 63 bits requires replacing
HOST_WIDE_INT with a bit vector.  Since target flags is used in
TARGET_SUBTARGET_DEFAULT, change it to a bit vector is a
non-trivial change.  On the other hand, -mgeneral-regs-only is a
command-line option which doesn't require support for
TARGET_SUBTARGET_DEFAULT, similar to other -m options like
-mmitigate-rop.  Using flag_general_regs_only is an option.

-- 
H.J.


Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-24 Thread Evandro Menezes

On 05/23/16 15:32, Evandro Menezes wrote:


I'm fine with this patch, as it achieves in part what I intended 
before: going beyond the  default_case_values_threshold, too 
conservative for Exynos M1.  My concern is particularly what happens 
to in-order targets, like the ubiquitous A53.


I'll get make some figures available soon.



Here's what I noticed using our internal benchmark suite:

 * On A53, no noticeable regressions (< -1%) and a handful of minor
   improvements (< 3%) here and there.
 * On A57, again, no noticeable regressions, a handful of minor
   improvements and a some significant improvements (< 5%) here and there.
 * On M1, it was more checkered, with some minor regressions, but with
   a few minor and significant improvements, resulting in an overall
   minor improvement.


I'm still comfortable with the idea of the patch.  I agree with Jim that 
the logic that it adds begs to be polished up along the lines that he 
suggested.


Cheers,

--
Evandro Menezes



Re: [Patch V2] Fix SLP PR58135.

2016-05-24 Thread Christophe Lyon
Hi Venkat,


On 23 May 2016 at 11:54, Kumar, Venkataramanan
 wrote:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Thursday, May 19, 2016 4:08 PM
>> To: Kumar, Venkataramanan 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [Patch V2] Fix SLP PR58135.
>>
>> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan
>>  wrote:
>> > Hi Richard,
>> >
>> >> -Original Message-
>> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> >> Sent: Tuesday, May 17, 2016 5:40 PM
>> >> To: Kumar, Venkataramanan 
>> >> Cc: gcc-patches@gcc.gnu.org
>> >> Subject: Re: [Patch V2] Fix SLP PR58135.
>> >>
>> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
>> >>  wrote:
>> >> > Hi Richard,
>> >> >
>> >> > I created the patch by passing -b option to git. Now the patch is
>> >> > more
>> >> readable.
>> >> >
>> >> > As per your suggestion I tried to fix the PR by splitting the SLP
>> >> > store group at
>> >> vector boundary after the SLP tree is built.
>> >> >
>> >> > Boot strap PASSED on x86_64.
>> >> > Checked the patch with check_GNU_style.sh.
>> >> >
>> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization. Hence
>> >> > it
>> >> generated 2 more vzeroupper.
>> >> > As recommended I adjusted the test case by adding
>> >> > -fno-tree-slp-vectorize
>> >> to make it as expected after loop vectorization.
>> >> >
>> >> > The following tests are now passing.
>> >> >
>> >> > -- Snip-
>> >> > Tests that now work, but didn't before:
>> >> >
>> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects
>> >> > scan-tree-dump-times
>> >> > slp2 "basic block vectorized" 1
>> >> >
>> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
>> >> > vectorized" 1
>> >> >
>> >> > New tests that PASS:
>> >> >
>> >> > gcc.dg/vect/pr58135.c (test for excess errors)
>> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess
>> >> > errors)
>> >> >
>> >> > -- Snip-
>> >> >
>> >> > ChangeLog
>> >> >
>> >> > 2016-05-14  Venkataramanan Kumar
>> >> 
>> >> >  PR tree-optimization/58135
>> >> > * tree-vect-slp.c:  When group size is not multiple of vector size,
>> >> >  allow splitting of store group at vector boundary.
>> >> >
>> >> > Test suite  ChangeLog
>> >> > 2016-05-14  Venkataramanan Kumar
>> >> 
>> >> > * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
>> >> > * gcc.dg/vect/pr58135.c:  Add new.
>> >> > * gfortran.dg/pr46519-1.f: Adjust test case.
>> >> >
>> >> > The attached patch Ok for trunk?
>> >>
>> >>
>> >> Please avoid the excessive vertical space around the vect_build_slp_tree
>> call.
>> > Yes fixed in the attached patch.
>> >>
>> >> +  /* Calculate the unrolling factor.  */
>> >> +  unrolling_factor = least_common_multiple
>> >> + (nunits, group_size) / group_size;
>> >> ...
>> >> +  else
>> >> {
>> >>   /* Calculate the unrolling factor based on the smallest type.  
>> >> */
>> >>   if (max_nunits > nunits)
>> >> -unrolling_factor = least_common_multiple (max_nunits, group_size)
>> >> -   / group_size;
>> >> +   unrolling_factor
>> >> +   = least_common_multiple (max_nunits,
>> >> + group_size)/group_size;
>> >>
>> >> please compute the "correct" unroll factor immediately and move the
>> >> "unrolling of BB required" error into the if() case by post-poning
>> >> the nunits < group_size check (and use max_nunits here).
>> >>
>> > Yes fixed in the attached patch.
>> >
>> >> +  if (is_a  (vinfo)
>> >> + && nunits < group_size
>> >> + && unrolling_factor != 1
>> >> + && is_a  (vinfo))
>> >> +   {
>> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >> +  "Build SLP failed: store group "
>> >> +  "size not a multiple of the vector size "
>> >> +  "in basic block SLP\n");
>> >> + /* Fatal mismatch.  */
>> >> + matches[nunits] = false;
>> >>
>> >> this is too pessimistic - you want to add the extra 'false' at
>> >> group_size / max_nunits * max_nunits.
>> > Yes fixed in attached patch.
>> >
>> >>
>> >> It looks like you leak 'node' in the if () path as well.  You need
>> >>
>> >>   vect_free_slp_tree (node);
>> >>   loads.release ();
>> >>
>> >> thus treat it as a failure case.
>> >
>> > Yes fixed. I added an else part before scalar_stmts.release call for the 
>> > case
>> when SLP tree is not built. This avoids double freeing.
>> > Bootstrapped and reg tested on X86_64.
>> >

This patch is causing regressions on armeb:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/236591/report-build-info.html
The following fortran tests now fail at runtime:

  gfortran.dg/c_f_pointer_logical.f03   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
  gfortran.dg/c_f_pointer_l

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-24 Thread Evandro Menezes

On 05/24/16 07:08, Wilco Dijkstra wrote:

Jim Wilson wrote:

It looks like a slight lose on qdf24xx on SPEC CPU2006 at -O3.  I see
about a 0.37% loss on the integer benchmarks, and no significant
change on the FP benchmarks.  The integer loss is mainly due to
458.sjeng which drops 2%.  We had tried various values for
max_case_values earlier, and didn't see any performance improvement
from setting it, so we are using the default value.

That's interesting as sjeng shows ~2% gain on Cortex-A72 due to the
hot switches being badly laid out... I wonder whether the loss you see is
due to code alignment or some other secondary effect.


I always thought that this patch, that lays out the branch tree more 
optimally, deserved to be revisited: 
https://gcc.gnu.org/ml/gcc-patches/2008-04/msg02197.html


Cheers,

--
Evandro Menezes



[v3 PATCH] PR libstdc++/66338

2016-05-24 Thread Ville Voutilainen
Tested on Linux-PPC64. The idea here is to get the constructor templates
to step out of the way if what they are dealing with will be handled by
a special member function, like the copy constructor. Doing so properly
fixes the bug at hand and likely other similar cases, too. The problem
in getting the templates out of the way is that none of their is_constructible
or is_convertible checks should be evaluated at all, but since we have
these handy pseudo-concepts in place, doing that is merely a matter
of adjusting the shortcut alias for one of those pseudo-concepts.

2016-05-24  Ville Voutilainen  

PR libstdc++/66338
* include/std/tuple (_TMC): Add a check for _NotSameTuple.
* include/std/tuple (tuple(_UElements&&...)): Remove the separate
check for _NotSameTuple.
* testsuite/20_util/tuple/cons/66338.cc: New.
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 7522e43..3cc6a2c 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -620,14 +620,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Shortcut for the cases where constructors taking _UElements...
   // need to be constrained.
   template using _TMC =
-  _TC<(sizeof...(_Elements) == sizeof...(_UElements)),
+  _TC<(sizeof...(_Elements) == sizeof...(_UElements))
+ && (_TC<(sizeof...(_UElements)==1), _Elements...>::
+ template _NotSameTuple<_UElements...>()),
   _Elements...>;
 
   template::template
-   _NotSameTuple<_UElements...>()
- && _TMC<_UElements...>::template
+ _TMC<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
   && _TMC<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
@@ -638,9 +638,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template::template
-   _NotSameTuple<_UElements...>()
- && _TMC<_UElements...>::template
+ _TMC<_UElements...>::template
 _MoveConstructibleTuple<_UElements...>()
   && !_TMC<_UElements...>::template
 _ImplicitlyMoveConvertibleTuple<_UElements...>()
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/66338.cc 
b/libstdc++-v3/testsuite/20_util/tuple/cons/66338.cc
new file mode 100644
index 000..f57eae9
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/66338.cc
@@ -0,0 +1,35 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+struct S {
+  int i_;
+
+  template
+  S(T&& i)
+noexcept(noexcept(i_ = i))
+  { i_ = i; }
+
+  S() noexcept : i_{0} {};
+};
+
+int main()
+{
+  std::tuple(std::forward_as_tuple(S{}));
+  return 0;
+}


RE: [PATCH][MIPS] Remove "new" MIPS TLS access patterns

2016-05-24 Thread Matthew Fortune
Robert Suchanek  writes:
> The below finishes the revert of r137670 that was already partially reverted
> in r137734 as part of PR target/35802.
> 
> It would appear that the revert was not completed because of a spill failure
> at the time.  As LRA can handle the 'v' constraint just fine and MIPS is going
> to drop the support for the classic reload, there is no need for this 
> workaround
> that introduces spurious moves hurting the performance.
> 
> No regression. Ok to commit?

This is OK but should wait for a patch to remove the -mlra option and associated
non-lra code.

Thanks,
Matthew


RE: [PATCH][MIPS] P5600 scheduler fix

2016-05-24 Thread Matthew Fortune
Robert Suchanek  writes:
> gcc/
>   * config/mips/p5600.md (p5600_fpu_fadd): Remove checking for
>   `fabs' and `fneg' type attributes.
>   (p5600_fpu_fabs): Add `fmove' to the comment.

OK.

Thanks,
Matthew


[PATCH][MIPS] P5600 scheduler fix

2016-05-24 Thread Robert Suchanek
Hi,

The below is a fix for the P5600 scheduler.  Ok to commit?

Regards,
Robert

2016-05-24  Simon Dardis  
Prachi Godbole  

gcc/
* config/mips/p5600.md (p5600_fpu_fadd): Remove checking for
`fabs' and `fneg' type attributes.
(p5600_fpu_fabs): Add `fmove' to the comment.
---
 gcc/config/mips/p5600.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/mips/p5600.md b/gcc/config/mips/p5600.md
index 694a745..4500ceb 100644
--- a/gcc/config/mips/p5600.md
+++ b/gcc/config/mips/p5600.md
@@ -163,10 +163,10 @@ (define_insn_reservation "msa_long_div" 10
 ;; fadd, fsub
 (define_insn_reservation "p5600_fpu_fadd" 4
   (and (eq_attr "cpu" "p5600")
-   (eq_attr "type" "fadd,fabs,fneg"))
+   (eq_attr "type" "fadd"))
   "p5600_fpu_long, p5600_fpu_apu")
 
-;; fabs, fneg, fcmp
+;; fabs, fneg, fcmp, fmove
 (define_insn_reservation "p5600_fpu_fabs" 2
   (and (eq_attr "cpu" "p5600")
(eq_attr "type" "fabs,fneg,fcmp,fmove"))
-- 
2.8.2.396.g5fe494c


[PATCH][MIPS] Remove "new" MIPS TLS access patterns

2016-05-24 Thread Robert Suchanek
Hi,

The below finishes the revert of r137670 that was already partially reverted
in r137734 as part of PR target/35802.

It would appear that the revert was not completed because of a spill failure
at the time.  As LRA can handle the 'v' constraint just fine and MIPS is going
to drop the support for the classic reload, there is no need for this workaround
that introduces spurious moves hurting the performance.

No regression. Ok to commit?

Regards,
Robert

2016-05-24  Simon Dardis  

gcc/
* config/mips/constraints.md (V1_REG): Update comment.
* config/mips/mips.md (get_tls_get_tp_): Remove.
(*get_tls_tp_): Rename.
* doc/md.texi: Update the MIPS "v" constraint.
---
 gcc/config/mips/constraints.md |  6 +++---
 gcc/config/mips/mips.md| 26 +++---
 gcc/doc/md.texi|  3 +--
 3 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/gcc/config/mips/constraints.md b/gcc/config/mips/constraints.md
index 56b363e..155b212 100644
--- a/gcc/config/mips/constraints.md
+++ b/gcc/config/mips/constraints.md
@@ -60,11 +60,11 @@ (define_register_constraint "e" "LEA_REGS"
 (define_register_constraint "j" "PIC_FN_ADDR_REG"
   "@internal")
 
+;; FIXME: Remove this comment and below once the MIPS backend can
+;; only be used with LRA.
 ;; Don't use this constraint in gcc code!  It runs the risk of
 ;; introducing a spill failure; see tls_get_tp_.
-(define_register_constraint "v" "V1_REG"
-  "Register @code{$3}.  Do not use this constraint in new code;
-   it is retained only for compatibility with glibc.")
+(define_register_constraint "v" "V1_REG" "@internal")
 
 (define_register_constraint "y" "GR_REGS"
   "Equivalent to @code{r}; retained for backwards compatibility.")
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 527f2e1..432ab1a 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -7386,29 +7386,9 @@ (define_insn "*mips16e_save_restore"
 ;; MIPS 32r2 specification, but we use it on any architecture because
 ;; we expect it to be emulated.  Use .set to force the assembler to
 ;; accept it.
-;;
-;; We do not use a constraint to force the destination to be $3
-;; because $3 can appear explicitly as a function return value.
-;; If we leave the use of $3 implicit in the constraints until
-;; reload, we may end up making a $3 return value live across
-;; the instruction, leading to a spill failure when reloading it.
-(define_insn_and_split "tls_get_tp_"
-  [(set (match_operand:P 0 "register_operand" "=d")
-   (unspec:P [(const_int 0)] UNSPEC_TLS_GET_TP))
-   (clobber (reg:P TLS_GET_TP_REGNUM))]
-  "HAVE_AS_TLS && !TARGET_MIPS16"
-  "#"
-  "&& reload_completed"
-  [(set (reg:P TLS_GET_TP_REGNUM)
-   (unspec:P [(const_int 0)] UNSPEC_TLS_GET_TP))
-   (set (match_dup 0) (reg:P TLS_GET_TP_REGNUM))]
-  ""
-  [(set_attr "type" "unknown")
-   (set_attr "mode" "")
-   (set_attr "insn_count" "2")])
 
-(define_insn "*tls_get_tp__split"
-  [(set (reg:P TLS_GET_TP_REGNUM)
+(define_insn "tls_get_tp_"
+  [(set (match_operand:P 0 "register_operand" "=v")
(unspec:P [(const_int 0)] UNSPEC_TLS_GET_TP))]
   "HAVE_AS_TLS && !TARGET_MIPS16"
   {
@@ -7418,7 +7398,7 @@ (define_insn "*tls_get_tp__split"
 return ".set\tpush\;.set\tmips32r2\t\;rdhwr\t$3,$29\;.set\tpop";
   }
   [(set_attr "type" "unknown")
-   ; Since rdhwr always generates a trap for now, putting it in a delay
+   ; Since rdhwr may generate a trap, putting it in a delay
; slot would make the kernel's emulation of it much slower.
(set_attr "can_delay" "no")
(set_attr "mode" "")])
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index f2360c8..d1c88d2 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -2705,8 +2705,7 @@ A register suitable for use in an indirect jump.  This 
will always be
 @code{$25} for @option{-mabicalls}.
 
 @item v
-Register @code{$3}.  Do not use this constraint in new code;
-it is retained only for compatibility with glibc.
+Register @code{$3}.  The register for acquiring the TLS pointer.
 
 @item y
 Equivalent to @code{r}; retained for backwards compatibility.
-- 
2.8.2.396.g5fe494


[PATCH][MIPS] Don't split shifts by default for MIPS16.

2016-05-24 Thread Robert Suchanek
Hi,

The following changes the default behaviour of shift splitting
for MIPS16 e.g. the shifts will be split only when used with
undocumented -mno-debugd option that is now switched on by default.

This appears to enable better optimization in certain cases, and hence,
giving slightly better performance.

Ok to apply?

Regards,
Robert

gcc/
* config/mips/mips.md (3): Don't split shifts
when used with -mdebugd.
* config/mips/mips.opt (mdebugd): Init to 1 by default.
---
 gcc/config/mips/mips.md  | 1 +
 gcc/config/mips/mips.opt | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 22f4f0b..01d7edd 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -5602,6 +5602,7 @@ (define_expand "3"
  be careful not to allocate a new register if we've reached the
  reload pass.  */
   if (TARGET_MIPS16
+  && !TARGET_DEBUG_D_MODE
   && optimize
   && CONST_INT_P (operands[2])
   && INTVAL (operands[2]) > 8
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 53feb23..b6c839d 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -127,7 +127,7 @@ mdebug
 Target Var(TARGET_DEBUG_MODE) Undocumented
 
 mdebugd
-Target Var(TARGET_DEBUG_D_MODE) Undocumented
+Target Var(TARGET_DEBUG_D_MODE) Undocumented Init(1)
 
 meb
 Target Report RejectNegative Mask(BIG_ENDIAN)
-- 
2.8.2.396.g5fe494


[PATCH][MIPS] Add support for code_readable function attribute

2016-05-24 Thread Robert Suchanek
Hi,

The patch adds support for __attribute__ ((code_readable)) with
optional argument that accepts `no', `yes' or `pcrel' just
like the -mcode-readable= command line switch.  If the argument
is not specified then the default `yes' is applied.

This of course has only effect on targets supporting -mcode-readable=.

Regards,
Robert

2016-05-24  Matthew Fortune  
Simon Dardis  

gcc/
* config/mips/mips.c (mips_base_code_readable): New.
(mips_handle_code_readable_attr): New static function.
(mips_get_code_readable_attr): Likewise.
(mips_set_current_function): Add support for the code_readable
attribute.
(mips_option_override): Likewise.
* doc/extend.text: Document the new attribute.

gcc/testsuite/
* gcc.target/mips/code-readable-attr-1.c: New test.
* gcc.target/mips/code-readable-attr-2.c: Ditto.
* gcc.target/mips/code-readable-attr-3.c: Ditto.
* gcc.target/mips/code-readable-attr-4.c: Ditto.
* gcc.target/mips/code-readable-attr-5.c: Ditto.
---
 gcc/config/mips/mips.c | 94 +-
 gcc/doc/extend.texi| 17 
 .../gcc.target/mips/code-readable-attr-1.c | 51 
 .../gcc.target/mips/code-readable-attr-2.c | 49 +++
 .../gcc.target/mips/code-readable-attr-3.c | 50 
 .../gcc.target/mips/code-readable-attr-4.c | 51 
 .../gcc.target/mips/code-readable-attr-5.c |  5 ++
 7 files changed, 316 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-5.c

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 24d98fe..6a469a2 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -492,6 +492,9 @@ static int mips_base_target_flags;
 /* The default compression mode.  */
 unsigned int mips_base_compression_flags;
 
+/* The default code readable setting.  */
+enum mips_code_readable_setting mips_base_code_readable;
+
 /* The ambient values of other global variables.  */
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
@@ -596,6 +599,7 @@ const enum reg_class 
mips_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   ALL_REGS,ALL_REGS,   ALL_REGS,   ALL_REGS
 };
 
+static tree mips_handle_code_readable_attr (tree *, tree, tree, int, bool *);
 static tree mips_handle_interrupt_attr (tree *, tree, tree, int, bool *);
 static tree mips_handle_use_shadow_register_set_attr (tree *, tree, tree, int,
  bool *);
@@ -616,6 +620,8 @@ static const struct attribute_spec mips_attribute_table[] = 
{
   { "micromips",   0, 0, true,  false, false, NULL, false },
   { "nomicromips", 0, 0, true,  false, false, NULL, false },
   { "nocompression", 0, 0, true,  false, false, NULL, false },
+  { "code_readable", 0, 1, true,  false, false, mips_handle_code_readable_attr,
+false },
   /* Allow functions to be specified as interrupt handlers */
   { "interrupt",   0, 1, false, true,  true, mips_handle_interrupt_attr,
 false },
@@ -1296,6 +1302,77 @@ mips_use_debug_exception_return_p (tree type)
   TYPE_ATTRIBUTES (type)) != NULL;
 }
 
+/* Verify the arguments to a code_readable attribute.  */
+
+static tree
+mips_handle_code_readable_attr (tree *node, tree name, tree args,
+   int flags ATTRIBUTE_UNUSED, bool *no_add_attrs)
+{
+  const char * str;
+
+  if (!is_attribute_p ("code_readable", name) || args == NULL)
+return NULL_TREE;
+
+  if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  warning (OPT_Wattributes,
+  "%qE attribute requires a string argument", name);
+  *no_add_attrs = true;
+}
+  else if (strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "no") != 0
+  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "pcrel") != 0
+  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "yes") != 0)
+{
+  warning (OPT_Wattributes,
+  "argument to %qE attribute is neither no, pcrel nor yes", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
+/* Return the code_readable setting for a function if it has one.  If there
+   is no argument or the function does not have the attribute, return GCC's
+   default.  */
+
+static enum mips_code_readable_setting
+mips_get_code_readable_attr (tree decl)
+{
+  tree attr;
+
+  if (decl == NULL)
+return mips_base_code_readable;
+
+  attr = lookup_attribute ("code_readable", DECL_ATTRIBUTES (decl));
+
+  if (att

[PATCH][MIPS] Add -minline-intermix to ignore compression flags when inlining

2016-05-24 Thread Robert Suchanek
Hi,

The below allows us to inline functions that have different compression flags
for better tuning of performance/code size balance.

Ok to commit?

Regards,
Robert

2016-05-24  Matthew Fortune  

gcc/
* config/mips/mips.c (mips_can_inline_p): Allow inlining of
functions with opposing compression flags.
* config/mips/mips.opt (minline-intermix): New option.
* doc/invoke.texi: Document the new option.
---
 gcc/config/mips/mips.c   |  3 ++-
 gcc/config/mips/mips.opt |  4 
 gcc/doc/invoke.texi  | 13 +
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5ecde46..4312368 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -1476,7 +1476,8 @@ mips_merge_decl_attributes (tree olddecl, tree newdecl)
 static bool
 mips_can_inline_p (tree caller, tree callee)
 {
-  if (mips_get_compress_mode (callee) != mips_get_compress_mode (caller))
+  if (mips_get_compress_mode (callee) != mips_get_compress_mode (caller)
+  && !TARGET_INLINE_INTERMIX)
 return false;
   return default_target_can_inline_p (caller, callee);
 }
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 08dd83e..3b92ef5 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -443,3 +443,7 @@ Enum(mips_cb_setting) String(optimal) Value(MIPS_CB_OPTIMAL)
 
 EnumValue
 Enum(mips_cb_setting) String(always) Value(MIPS_CB_ALWAYS)
+
+minline-intermix
+Target Report Var(TARGET_INLINE_INTERMIX)
+Allow inlining even if the compression flags differ between caller and callee.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 73f1cb6..2f6195e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -837,6 +837,7 @@ Objective-C and Objective-C++ Dialects}.
 -mips16  -mno-mips16  -mflip-mips16 @gol
 -minterlink-compressed -mno-interlink-compressed @gol
 -minterlink-mips16  -mno-interlink-mips16 @gol
+-minline-intermix -mno-inline-intermix @gol
 -mabi=@var{abi}  -mabicalls  -mno-abicalls @gol
 -mshared  -mno-shared  -mplt  -mno-plt  -mxgot  -mno-xgot @gol
 -mgp32  -mgp64  -mfp32  -mfpxx  -mfp64  -mhard-float  -msoft-float @gol
@@ -17916,6 +17917,18 @@ Aliases of @option{-minterlink-compressed} and
 @option{-mno-interlink-compressed}.  These options predate the microMIPS ASE
 and are retained for backwards compatibility.
 
+@item -minline-intermix
+@itemx -mno-inline-intermix
+@opindex minline-intermix
+@opindex mno-inline-intermix
+Enable inlining of functions which have opposing compression flags e.g.
+@code{mips16}/@code{nomips16} attributes.
+This is useful when using the @code{mips16} attribute to balance code size
+and performance so that a function will be compressed when not inlined or
+vice-versa.  When using this option it is necessary to protect functions
+that cannot be compiled as MIPS16 with a @code{noinline} attribute to ensure
+they are not inlined into a MIPS16 function.
+
 @item -mabi=32
 @itemx -mabi=o64
 @itemx -mabi=n32
-- 
2.8.2.396.g5fe494c


RE: [PATCH] Disable -mbranch-likely for -Os when targetting generic architecture

2016-05-24 Thread Robert Suchanek
Hi Catherine,

Apologies for the (very) late reply.
It appears that I never replied to the last message.

> > gcc/
> > * config/mips/mips-cpus.def: Replace PTF_AVOID_BRANCHLIKELY
> > with
> > PTF_AVOID_BRANCHLIKELY_ALWAYS for generic architecture and
> > with
> > PTF_AVOID_BRANCHLIKELY_SPEED for others.
> > (mips2, mips3, mips4): Add PTF_AVOID_BRANCHLIKELY_SIZE to tune
> > flags.
> > * config/mips/mips.c (mips_option_override): Enable the branch
> > likely
> > depending on the tune flags and optimization level.
> > * config/mips/mips.h (PTF_AVOID_BRANCHLIKELY): Remove.
> > (PTF_AVOID_BRANCHLIKELY_SPEED): Define.
> > (PTF_AVOID_BRANCHLIKELY_SIZE): Likewise.
> > (PTF_AVOID_BRANCHLIKELY_ALWAYS): Likewise.
> > ---
> >  gcc/config/mips/mips-cpus.def | 56 +---
> > ---
> >  gcc/config/mips/mips.c|  6 +++--
> >  gcc/config/mips/mips.h| 20 
> >  3 files changed, 47 insertions(+), 35 deletions(-)
> >
> > a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 0e0ecf2..f8775c4
> > 100644
> > --- a/gcc/config/mips/mips.c
> > +++ b/gcc/config/mips/mips.c
> > @@ -17916,8 +17916,10 @@ mips_option_override (void)
> >if ((target_flags_explicit & MASK_BRANCHLIKELY) == 0)
> >  {
> >if (ISA_HAS_BRANCHLIKELY
> > - && (optimize_size
> > - || (mips_tune_info->tune_flags & PTF_AVOID_BRANCHLIKELY)
> > == 0))
> > + && ((optimize_size && (mips_tune_info->tune_flags
> > +& PTF_AVOID_BRANCHLIKELY_SIZE) == 0)
> > +  || (!optimize_size && (mips_tune_info->tune_flags
> > + & PTF_AVOID_BRANCHLIKELY_SPEED) ==
> > 0)))
> > target_flags |= MASK_BRANCHLIKELY;
> >else
> > target_flags &= ~MASK_BRANCHLIKELY;
> 
> Should this check be:
> Index: mips.c
> ===
> --- mips.c  (revision 229138)
> +++ mips.c  (working copy)
> @@ -17758,8 +17758,15 @@
>if ((target_flags_explicit & MASK_BRANCHLIKELY) == 0)
>  {
>if (ISA_HAS_BRANCHLIKELY
> - && (optimize_size
> - || (mips_tune_info->tune_flags & PTF_AVOID_BRANCHLIKELY) == 0))
> + && ((optimize_size
> +  && (mips_tune_info->tune_flags
> +  & PTF_AVOID_BRANCHLIKELY_SIZE) == 0)
> + || (!optimize_size
> +&& optimize > 0
> +&& ((mips_tune_info->tune_flags
> + & PTF_AVOID_BRANCHLIKELY_SPEED) == 0))
> +|| (mips_tune_info->tune_flags
> + & PTF_AVOID_BRANCHLIKELY_ALWAYS) == 0))
> target_flags |= MASK_BRANCHLIKELY;
>else
> target_flags &= ~MASK_BRANCHLIKELY;
> 
> Instead?  I don't see a use of PTF_AVOID_BRANCH_ALWAYS in your patch, but it
> seems like it should be checked.
> 

I did that on purpose at the time as the check looked redundant as it will be 
one
or the other.  However, for easier reading and a potential redefinition of 
*_ALWAYS
e.g. to a unique value then the extra check is a must.

I'm happy to include this.  Ok to commit with this change?

Regards,
Robert


Re: [PATCH] Fix PR71230

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Jakub Jelinek wrote:

> On Tue, May 24, 2016 at 03:28:42PM +0200, Richard Biener wrote:
> > The following fixes the ICEs in PR71230.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Wouldn't it be enough to use TYPE_SIZE_UNIT instead of TYPE_PRECISION
> for the non-INTEGRAL_TYPE_Ps and just deal with it at the bswap_replace
> point?  I mean, if we don't want to optimize those further (the tests I've
> posted in the PR), then it is just a matter of creating low part
> BIT_FIELD_REF instead of the one we have and use that.  This patch as is
> will I think not attempt to optimize the testcase with unsigned char l[8];
> array instead of two ints.

Yes, but TYPE_PRECISION is used in multiple places and I'd prefer to
fix the ICE quickly.  For vector types I think we want to try
replacement with a VEC_PERM_EXPR even.

Richard.


Re: [PATCH] Fix PR71230

2016-05-24 Thread Jakub Jelinek
On Tue, May 24, 2016 at 03:28:42PM +0200, Richard Biener wrote:
> The following fixes the ICEs in PR71230.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Wouldn't it be enough to use TYPE_SIZE_UNIT instead of TYPE_PRECISION
for the non-INTEGRAL_TYPE_Ps and just deal with it at the bswap_replace
point?  I mean, if we don't want to optimize those further (the tests I've
posted in the PR), then it is just a matter of creating low part
BIT_FIELD_REF instead of the one we have and use that.  This patch as is
will I think not attempt to optimize the testcase with unsigned char l[8];
array instead of two ints.

> 2016-05-24  Richard Biener  
> 
>   PR tree-optimization/71240
>   * tree-ssa-math-opts.c (init_symbolic_number): Verify the source
>   has integral type.
> 
>   * gcc.dg/optimize-bswapsi-5.c: New testcase.

Jakub


Re: RFC [1/2] divmod transform

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

> On 24 May 2016 at 17:42, Richard Biener  wrote:
> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
> >
> >> On 23 May 2016 at 17:35, Richard Biener  wrote:
> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >> >  wrote:
> >> >> Hi,
> >> >> I have updated my patch for divmod (attached), which was originally
> >> >> based on Kugan's patch.
> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> >> >> having same operands to divmod representation, so we can cse 
> >> >> computation of mod.
> >> >>
> >> >> t1 = a TRUNC_DIV_EXPR b;
> >> >> t2 = a TRUNC_MOD_EXPR b
> >> >> is transformed to:
> >> >> complex_tmp = DIVMOD (a, b);
> >> >> t1 = REALPART_EXPR (complex_tmp);
> >> >> t2 = IMAGPART_EXPR (complex_tmp);
> >> >>
> >> >> * New hook divmod_expand_libfunc
> >> >> The rationale for introducing the hook is that different targets have
> >> >> incompatible calling conventions for divmod libfunc.
> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> >> return quotient and store remainder in argument passed as pointer,
> >> >> while the arm version takes two arguments and returns both
> >> >> quotient and remainder having mode double the size of the operand mode.
> >> >> The port should hence override the hook expand_divmod_libfunc
> >> >> to generate call to target-specific divmod.
> >> >> Ports should define this hook if:
> >> >> a) The port does not have divmod or div insn for the given mode.
> >> >> b) The port defines divmod libfunc for the given mode.
> >> >> The default hook default_expand_divmod_libfunc() generates call
> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> >> are of DImode.
> >> >>
> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> >> cross-tested on arm*-*-*.
> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> >> Does this patch look OK ?
> >> >
> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> >> > index 6b4601b..e4a021a 100644
> >> > --- a/gcc/targhooks.c
> >> > +++ b/gcc/targhooks.c
> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> >> > machine_mode, optimization_type)
> >> >return true;
> >> >  }
> >> >
> >> > +void
> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> >> > +  rtx op0, rtx op1,
> >> > +  rtx *quot_p, rtx *rem_p)
> >> >
> >> > functions need a comment.
> >> >
> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  
> >> > In that
> >> > case we could avoid the target hook.
> >> Well I would prefer adding the hook because that's more easier -;)
> >> Would it be ok for now to go with the hook ?
> >> >
> >> > +  /* If target overrides expand_divmod_libfunc hook
> >> > +then perform divmod by generating call to the target-specifc 
> >> > divmod
> >> > libfunc.  */
> >> > +  if (targetm.expand_divmod_libfunc != 
> >> > default_expand_divmod_libfunc)
> >> > +   return true;
> >> > +
> >> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
> >> > +  return (mode == DImode && unsignedp);
> >> >
> >> > I don't understand this - we know optab_libfunc returns non-NULL for 
> >> > 'mode'
> >> > but still restrict this to DImode && unsigned?  Also if
> >> > targetm.expand_divmod_libfunc
> >> > is not the default we expect the target to handle all modes?
> >> Ah indeed, the check for DImode is unnecessary.
> >> However I suppose the check for unsignedp should be there,
> >> since we want to generate call to __udivmoddi4 only if operand is unsigned 
> >> ?
> >
> > The optab libfunc for sdivmod should be NULL in that case.
> Ah indeed, thanks.
> >
> >> >
> >> > That said - I expected the above piece to be simply a 'return true;' ;)
> >> >
> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the 
> >> > target
> >> > supports a specific operation (for example SImode divmod would use DImode
> >> > divmod by means of widening operands - for the unsigned case of course).
> >> Thanks for pointing out. So if a target does not support divmod
> >> libfunc for a mode
> >> but for a wider mode, then we could zero-extend operands to the wider-mode,
> >> perform divmod on the wider-mode, and then cast result back to the
> >> original mode.
> >> I haven't done that in this patch, would it be OK to do that as a follow 
> >> up ?
> >
> > I think that you should conservatively handle the div_optab query, thus if
> > the target has a HW division in a wider mode don't use the divmod IFN.
> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
> > out if that is available.
> Done.
> >
> >> > +  /* Disable the transform if either is a constant, since
> >> > division-by-constant
> >> > + may have spec

Re: Tighten syntax checking for OpenACC routine construct in C

2016-05-24 Thread Thomas Schwinge
Hi!

On Tue, 24 May 2016 10:54:53 +0200, Jakub Jelinek  wrote:
> On Tue, May 24, 2016 at 10:51:15AM +0200, Thomas Schwinge wrote:
> > OK for trunk?

> Ok.

Committed without changes in r236639:

commit c9d624bd2672463771546e73bf3d6446d64e43c0
Author: tschwinge 
Date:   Tue May 24 14:00:39 2016 +

Tighten syntax checking for OpenACC routine construct in C

gcc/c/
* c-parser.c (c_parser_oacc_routine): Tighten syntax checks.
gcc/testsuite/
* c-c++-common/goacc/routine-5.c: Add tests.
* g++.dg/goacc/routine-2.C: Remove duplicate tests.
* gfortran.dg/goacc/routine-6.f90: Add tests.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@236639 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/c/ChangeLog   |  4 
 gcc/c/c-parser.c  | 19 +--
 gcc/testsuite/ChangeLog   |  6 ++
 gcc/testsuite/c-c++-common/goacc/routine-5.c  | 21 +
 gcc/testsuite/g++.dg/goacc/routine-2.C|  6 --
 gcc/testsuite/gfortran.dg/goacc/routine-6.f90 |  7 +++
 6 files changed, 43 insertions(+), 20 deletions(-)

diff --git gcc/c/ChangeLog gcc/c/ChangeLog
index 9bb5ec1..3d69cd5 100644
--- gcc/c/ChangeLog
+++ gcc/c/ChangeLog
@@ -1,3 +1,7 @@
+2016-05-24  Thomas Schwinge  
+
+   * c-parser.c (c_parser_oacc_routine): Tighten syntax checks.
+
 2016-05-24  Richard Biener  
 
PR middle-end/70434
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index c2c8314..1bc5eed 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -13983,25 +13983,24 @@ c_parser_oacc_routine (c_parser *parser, enum 
pragma_context context)
   c_parser_consume_token (parser);
 
   c_token *token = c_parser_peek_token (parser);
-
   if (token->type == CPP_NAME && (token->id_kind == C_ID_ID
  || token->id_kind == C_ID_TYPENAME))
{
  decl = lookup_name (token->value);
  if (!decl)
-   {
- error_at (token->location, "%qE has not been declared",
-   token->value);
- decl = error_mark_node;
-   }
+   error_at (token->location, "%qE has not been declared",
+ token->value);
+ c_parser_consume_token (parser);
}
   else
c_parser_error (parser, "expected function name");
 
-  if (token->type != CPP_CLOSE_PAREN)
-   c_parser_consume_token (parser);
-
-  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+  if (!decl
+ || !c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+   {
+ c_parser_skip_to_pragma_eol (parser, false);
+ return;
+   }
 }
 
   /* Build a chain of clauses.  */
diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index 586202e..361fbbd 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2016-05-24  Thomas Schwinge  
+
+   * c-c++-common/goacc/routine-5.c: Add tests.
+   * g++.dg/goacc/routine-2.C: Remove duplicate tests.
+   * gfortran.dg/goacc/routine-6.f90: Add tests.
+
 2016-05-24  Richard Biener  
 
PR tree-optimization/71253
diff --git gcc/testsuite/c-c++-common/goacc/routine-5.c 
gcc/testsuite/c-c++-common/goacc/routine-5.c
index 2a9db90..1efd154 100644
--- gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -38,13 +38,26 @@ namespace g {}
 #pragma acc routine /* { dg-error "not followed by" "" { target c++ } } */
 using namespace g;
 
-#pragma acc routine (g) /* { dg-error "does not refer to" "" { target c++ } } 
*/
+#pragma acc routine (g) /* { dg-error "does not refer to a function" "" { 
target c++ } } */
 
-#endif
+#endif /* __cplusplus */
 
-#pragma acc routine (a) /* { dg-error "does not refer to" } */
+#pragma acc routine (a) /* { dg-error "does not refer to a function" } */
   
-#pragma acc routine (c) /* { dg-error "does not refer to" } */
+#pragma acc routine (c) /* { dg-error "does not refer to a function" } */
+
+
+#pragma acc routine () vector /* { dg-error "expected (function 
name|unqualified-id) before .\\). token" } */
+
+#pragma acc routine (+) /* { dg-error "expected (function name|unqualified-id) 
before .\\+. token" } */
+
+
+extern void R1(void);
+extern void R2(void);
+#pragma acc routine (R1, R2, R3) worker /* { dg-error "expected .\\). before 
.,. token" } */
+#pragma acc routine (R1 R2 R3) worker /* { dg-error "expected .\\). before 
.R2." } */
+#pragma acc routine (R1) worker
+#pragma acc routine (R2) worker
 
 
 void Bar ();
diff --git gcc/testsuite/g++.dg/goacc/routine-2.C 
gcc/testsuite/g++.dg/goacc/routine-2.C
index 2d16466..ea7c9bf 100644
--- gcc/testsuite/g++.dg/goacc/routine-2.C
+++ gcc/testsuite/g++.dg/goacc/routine-2.C
@@ -14,15 +14,9 @@ one()
 
 int incr (int);
 float incr (float);
-int inc;
 
 #pragma acc routine (incr) /* { dg-error "names a set of overloads" } */
 
-#pragma acc routi

Re: [PATCH] Introduce can_remove_lhs_p

2016-05-24 Thread Marek Polacek
On Tue, May 24, 2016 at 02:17:10PM +0200, Richard Biener wrote:
> On Mon, 23 May 2016, Marek Polacek wrote:
> 
> > On Mon, May 23, 2016 at 04:36:30PM +0200, Jakub Jelinek wrote:
> > > On Mon, May 23, 2016 at 04:28:33PM +0200, Marek Polacek wrote:
> > > > As promised in 
> > > > ,
> > > > this is a simple clean-up which makes use of a new predicate.  Richi 
> > > > suggested
> > > > adding maybe_drop_lhs_from_noreturn_call which would be nicer, but I 
> > > > didn't
> > > > know how to do that, given the handling if lhs is an SSA_NAME.
> > > 
> > > Shouldn't it be should_remove_lhs_p instead?
> > > I mean, it is not just an optimization, but part of how we define the IL.
> >  
> > Aha, ok.  Renamed.
> > 
> > > Shouldn't it be also used in tree-cfg.c (verify_gimple_call)?
> > 
> > I left that spot on purpose but now I don't quite see why, fixed.  Thanks,
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> Can you move should_remove_lhs_p to tree-cfg.h please?
 
Sure.

> Ok with that change.

Thus:

2016-05-24  Marek Polacek  

* tree-cfg.h (should_remove_lhs_p): New predicate.
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Use it.
* gimplify.c (gimplify_modify_expr): Likewise.
* tree-cfg.c (verify_gimple_call): Likewise.
* tree-cfgcleanup.c (fixup_noreturn_call): Likewise.
* gimple-fold.c: Include "tree-cfg.h".
(gimple_fold_call): Use should_remove_lhs_p.

diff --git gcc/cgraph.c gcc/cgraph.c
index cf9192f..1a4f665 100644
--- gcc/cgraph.c
+++ gcc/cgraph.c
@@ -1513,10 +1513,7 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
 }
 
   /* If the call becomes noreturn, remove the LHS if possible.  */
-  if (lhs
-  && (gimple_call_flags (new_stmt) & ECF_NORETURN)
-  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
-  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
+  if (gimple_call_noreturn_p (new_stmt) && should_remove_lhs_p (lhs))
 {
   if (TREE_CODE (lhs) == SSA_NAME)
{
diff --git gcc/gimple-fold.c gcc/gimple-fold.c
index 858f484..adcc45f 100644
--- gcc/gimple-fold.c
+++ gcc/gimple-fold.c
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "optabs-query.h"
 #include "omp-low.h"
 #include "ipa-chkp.h"
+#include "tree-cfg.h"
 
 
 /* Return true when DECL can be referenced from current unit.
@@ -3052,12 +3053,9 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
inplace)
  == void_type_node))
gimple_call_set_fntype (stmt, TREE_TYPE (fndecl));
  /* If the call becomes noreturn, remove the lhs.  */
- if (lhs
- && (gimple_call_flags (stmt) & ECF_NORETURN)
+ if (gimple_call_noreturn_p (stmt)
  && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
- || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
-  == INTEGER_CST)
- && !TREE_ADDRESSABLE (TREE_TYPE (lhs)
+ || should_remove_lhs_p (lhs)))
{
  if (TREE_CODE (lhs) == SSA_NAME)
{
diff --git gcc/gimplify.c gcc/gimplify.c
index 6473544..e702bc4 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -4873,9 +4873,7 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
}
}
   notice_special_calls (call_stmt);
-  if (!gimple_call_noreturn_p (call_stmt)
- || TREE_ADDRESSABLE (TREE_TYPE (*to_p))
- || TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (*to_p))) != INTEGER_CST)
+  if (!gimple_call_noreturn_p (call_stmt) || !should_remove_lhs_p (*to_p))
gimple_call_set_lhs (call_stmt, *to_p);
   else if (TREE_CODE (*to_p) == SSA_NAME)
/* The above is somewhat premature, avoid ICEing later for a
diff --git gcc/tree-cfg.c gcc/tree-cfg.c
index 7c2ee78..82f0da6c 100644
--- gcc/tree-cfg.c
+++ gcc/tree-cfg.c
@@ -3385,11 +3385,9 @@ verify_gimple_call (gcall *stmt)
   return true;
 }
 
-  if (lhs
-  && gimple_call_ctrl_altering_p (stmt)
+  if (gimple_call_ctrl_altering_p (stmt)
   && gimple_call_noreturn_p (stmt)
-  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
-  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
+  && should_remove_lhs_p (lhs))
 {
   error ("LHS in noreturn call");
   return true;
diff --git gcc/tree-cfg.h gcc/tree-cfg.h
index 802e292..3e2a1ee 100644
--- gcc/tree-cfg.h
+++ gcc/tree-cfg.h
@@ -108,4 +108,14 @@ extern bool gimple_find_sub_bbs (gimple_seq, 
gimple_stmt_iterator *);
 extern bool extract_true_false_controlled_edges (basic_block, basic_block,
 edge *, edge *);
 
+/* Return true if the LHS of a call should be removed.  */
+
+inline bool
+should_remove_lhs_p (tree lhs)
+{
+  return (lhs
+   

Re: RFC [1/2] divmod transform

2016-05-24 Thread Prathamesh Kulkarni
On 24 May 2016 at 17:42, Richard Biener  wrote:
> On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>
>> On 23 May 2016 at 17:35, Richard Biener  wrote:
>> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >  wrote:
>> >> Hi,
>> >> I have updated my patch for divmod (attached), which was originally
>> >> based on Kugan's patch.
>> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> having same operands to divmod representation, so we can cse computation 
>> >> of mod.
>> >>
>> >> t1 = a TRUNC_DIV_EXPR b;
>> >> t2 = a TRUNC_MOD_EXPR b
>> >> is transformed to:
>> >> complex_tmp = DIVMOD (a, b);
>> >> t1 = REALPART_EXPR (complex_tmp);
>> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >>
>> >> * New hook divmod_expand_libfunc
>> >> The rationale for introducing the hook is that different targets have
>> >> incompatible calling conventions for divmod libfunc.
>> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> return quotient and store remainder in argument passed as pointer,
>> >> while the arm version takes two arguments and returns both
>> >> quotient and remainder having mode double the size of the operand mode.
>> >> The port should hence override the hook expand_divmod_libfunc
>> >> to generate call to target-specific divmod.
>> >> Ports should define this hook if:
>> >> a) The port does not have divmod or div insn for the given mode.
>> >> b) The port defines divmod libfunc for the given mode.
>> >> The default hook default_expand_divmod_libfunc() generates call
>> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> are of DImode.
>> >>
>> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> cross-tested on arm*-*-*.
>> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> Does this patch look OK ?
>> >
>> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> > index 6b4601b..e4a021a 100644
>> > --- a/gcc/targhooks.c
>> > +++ b/gcc/targhooks.c
>> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> > machine_mode, optimization_type)
>> >return true;
>> >  }
>> >
>> > +void
>> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> > +  rtx op0, rtx op1,
>> > +  rtx *quot_p, rtx *rem_p)
>> >
>> > functions need a comment.
>> >
>> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In 
>> > that
>> > case we could avoid the target hook.
>> Well I would prefer adding the hook because that's more easier -;)
>> Would it be ok for now to go with the hook ?
>> >
>> > +  /* If target overrides expand_divmod_libfunc hook
>> > +then perform divmod by generating call to the target-specifc 
>> > divmod
>> > libfunc.  */
>> > +  if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
>> > +   return true;
>> > +
>> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> > +  return (mode == DImode && unsignedp);
>> >
>> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
>> > but still restrict this to DImode && unsigned?  Also if
>> > targetm.expand_divmod_libfunc
>> > is not the default we expect the target to handle all modes?
>> Ah indeed, the check for DImode is unnecessary.
>> However I suppose the check for unsignedp should be there,
>> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>
> The optab libfunc for sdivmod should be NULL in that case.
Ah indeed, thanks.
>
>> >
>> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >
>> > Usually we use some can_expand_XXX helper in optabs.c to query if the 
>> > target
>> > supports a specific operation (for example SImode divmod would use DImode
>> > divmod by means of widening operands - for the unsigned case of course).
>> Thanks for pointing out. So if a target does not support divmod
>> libfunc for a mode
>> but for a wider mode, then we could zero-extend operands to the wider-mode,
>> perform divmod on the wider-mode, and then cast result back to the
>> original mode.
>> I haven't done that in this patch, would it be OK to do that as a follow up ?
>
> I think that you should conservatively handle the div_optab query, thus if
> the target has a HW division in a wider mode don't use the divmod IFN.
> You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
> if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
> out if that is available.
Done.
>
>> > +  /* Disable the transform if either is a constant, since
>> > division-by-constant
>> > + may have specialized expansion.  */
>> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
>> > +return false;
>> >
>> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>> >
>> > +  if (TYPE_OVERFLOW_TRAPS (type))
>> > +return false;
>> >
>> > why's that?  Generally please first test ch

Re: [PATCH] Vectorize inductions that are live after the loop.

2016-05-24 Thread Richard Biener
On Mon, May 23, 2016 at 2:53 PM, Alan Hayward  wrote:
>
> Thanks for the review.
>
> On 23/05/2016 11:35, "Richard Biener"  wrote:
>
>>
>>@@ -6332,79 +6324,81 @@ vectorizable_live_operation (gimple *stmt,
>>   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>>   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>>-  tree op;
>>-  gimple *def_stmt;
>>-  ssa_op_iter iter;
>>+  imm_use_iterator imm_iter;
>>+  tree lhs, lhs_type, vec_lhs;
>>+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>>+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
>>+  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
>>+  gimple *use_stmt;
>>
>>   gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
>>
>>+  if (STMT_VINFO_TYPE (stmt_info) == reduc_vec_info_type)
>>+return true;
>>+
>>
>>This is an odd check - it says the stmt is handled by
>>vectorizable_reduction.  And your
>>return claims it is handled by vectorizable_live_operation ...
>
> Previously this check was made to decide whether to call
> vectorizable_live_operation,
> So it made sense to put this check inside the function.
>
> But, yes, I agree that the return value of the function no longer makes
> sense.
> I can revert this.

Please.

>>
>>You removed the SIMD lane handling?
>
> The SIMD lane handling effectively checked for a special case, then added
> code which would extract the final value of the vector.
> The new code I’ve added does the exact same thing for more generic cases,
> so the SIMD check can be removed and it’ll still be vectorized correctly.

Ah, that's nice then.

>>
>>@@ -303,6 +335,16 @@ vect_stmt_relevant_p (gimple *stmt, loop_vec_info
>>loop_vinfo,
>>}
>> }
>>
>>+  if (*live_p && *relevant == vect_unused_in_scope
>>+  && !is_simple_and_all_uses_invariant (stmt, loop_vinfo))
>>+{
>>+  if (dump_enabled_p ())
>>+   dump_printf_loc (MSG_NOTE, vect_location,
>>+"vec_stmt_relevant_p: live and not all uses "
>>+"invariant.\n");
>>+  *relevant = vect_used_only_live;
>>+}
>>
>>But that's a missed invariant motion / code sinking opportunity then.
>>Did you have a
>>testcase for this?
>
> I don’t have a test case :(
> It made sense that this was the correct action to do on the failure
> (rather than assert).

I meant a testcase that has is_gimple_and_all_uses_invariant == true.

>>
>>@@ -618,57 +660,31 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
>>loop_vinfo)
>>}
>>
>>   /* Examine the USEs of STMT. For each USE, mark the stmt that
>>defines it
>>-(DEF_STMT) as relevant/irrelevant and live/dead according to the
>>-liveness and relevance properties of STMT.  */
>>+(DEF_STMT) as relevant/irrelevant according to the relevance
>>property
>>+of STMT.  */
>>   stmt_vinfo = vinfo_for_stmt (stmt);
>>   relevant = STMT_VINFO_RELEVANT (stmt_vinfo);
>>-  live_p = STMT_VINFO_LIVE_P (stmt_vinfo);
>>-
>>-  /* Generally, the liveness and relevance properties of STMT are
>>-propagated as is to the DEF_STMTs of its USEs:
>>- live_p <-- STMT_VINFO_LIVE_P (STMT_VINFO)
>>- relevant <-- STMT_VINFO_RELEVANT (STMT_VINFO)
>>-
>>-One exception is when STMT has been identified as defining a
>>reduction
>>-variable; in this case we set the liveness/relevance as follows:
>>-  live_p = false
>>-  relevant = vect_used_by_reduction
>>-This is because we distinguish between two kinds of relevant
>>stmts -
>>-those that are used by a reduction computation, and those that
>>are
>>-(also) used by a regular computation.  This allows us later on to
>>-identify stmts that are used solely by a reduction, and
>>therefore the
>>-order of the results that they produce does not have to be kept.
>> */
>>-
>>-  def_type = STMT_VINFO_DEF_TYPE (stmt_vinfo);
>>-  tmp_relevant = relevant;
>>-  switch (def_type)
>>+
>>+  switch (STMT_VINFO_DEF_TYPE (stmt_vinfo))
>> {
>>
>>you removed this comment.  Is it no longer valid?  Can you please
>>instead update it?
>>This is a tricky area.
>
> I’ll replace with a new comment.
>
>>
>>
>>@@ -1310,17 +1325,14 @@ vect_init_vector (gimple *stmt, tree val, tree
>>type, gimple_stmt_iterator *gsi)
>>In case OP is an invariant or constant, a new stmt that creates a
>>vector def
>>needs to be introduced.  VECTYPE may be used to specify a required
>>type for
>>vector invariant.  */
>>-
>>-tree
>>-vect_get_vec_def_for_operand (tree op, gimple *stmt, tree vectype)
>>+static tree
>>+vect_get_vec_def_for_operand_internal (tree op, gimple *stmt,
>>+  loop_vec_info loop_vinfo, tree
>>vectype)
>> {
>>   tree vec_oprnd;
>>...
>>
>>+tree
>>+vect_get_vec_def_for_operand (tree op, gimple *stmt, tree vectype)
>>+{
>>+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
>>+  loop_vec_info loop_vinfo = STMT_

Re: [PATCH v3] gcov: Runtime configurable destination output

2016-05-24 Thread Nathan Sidwell

On 05/23/16 16:03, Aaron Conole wrote:

The previous gcov behavior was to always output errors on the stderr channel.
This is fine for most uses, but some programs will require stderr to be
untouched by libgcov for certain tests. This change allows configuring
the gcov output via an environment variable which will be used to open
the appropriate file.


this patch is nearly there, but a couple of nits and an error on my part.



+/* Configured via the GCOV_ERROR_FILE environment variable;
+   it will either be stderr, or a file of the user's choosing. */
+static FILE *gcov_error_file;


I was wrong about making this static.  Your original externally visible 
definition (with leading __) was right.  The reason is that multiple gcov-aware 
shared objects should use the same FILE for errors.  If you could restore that 
part of your previous patch, along  with a comment explaining why 
gcov_error_file is externally visible, but get_gcov_error is static, that'd be 
great.



+
+/* A utility function to populate the gcov_error_file pointer */
+
+static FILE *
+get_gcov_error_file(void)
+{
+#if IN_GCOV_TOOL
+  return stderr;
+#endif


Prefer #else ... #endif to encapsulate the  remaining bit of the function.



 /* A utility function for outputing errors.  */


May as well fix the spelling error
  outputing -> outputting



+#if !IN_GCOV_TOOL
+static void
+gcov_error_exit(void)
+{
+  if (gcov_error_file && gcov_error_file != stderr)
+{
+  fclose(gcov_error_file);


needs space -- the habit'll grow



+#if !IN_GCOV_TOOL
+static void gcov_error_exit(void);


space before '('

nathan


[PATCH][ARM][3/4] Cleanup casts from INTVAL to [unsigned] HOST_WIDE_INT

2016-05-24 Thread Kyrill Tkachov

Hi all,

We have a few instances in the arm backend where we take the INTVAL of an RTX 
and immediately cast it to
an (unsigned HOST_WIDE_INT). This is exactly equivalent to taking the UINTVAL 
of the RTX.

This patch fixes such uses. A couple of uses in arm.md take the INTVAL and then 
compare it to the constant
1 which can be replaced by a comparison with CONST1_RTX without extracting the 
INTVAL.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing as obvious.

Thanks,
Kyrill

2016-05-24  Kyrylo Tkachov  

* config/arm/arm.md (ashldi3): Replace comparison of INTVAL of
operands[2] against 1 with comparison against CONST1_RTX.
(ashrdi3): Likewise.
(lshrdi3): Likewise.
(ashlsi3): Replace cast of INTVAL to unsigned HOST_WIDE_INT with
UINTVAL.
(ashrsi3): Likewise.
(lshrsi3): Likewise.
(rotrsi3): Likewise.
(define_split above *compareqi_eq0): Likewise.
(define_split above "prologue"): Likewise.
* config/arm/arm.c (thumb1_size_rtx_costs): Likewise.
* config/arm/predicates.md (shift_operator): Likewise.
(shift_nomul_operator): Likewise.
(sat_shift_operator): Likewise.
(thumb1_cmp_operand): Likewise.
(const_neon_scalar_shift_amount_operand): Replace manual range
check with IN_RANGE.
* config/arm/thumb1.md (define_peephole2 above *thumb_subdi3):
Replace cast of INTVAL to unsigned HOST_WIDE_INT with UINTVAL.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 78478303593522d186734c452c970fb013bf846e..55b3a82618ef4138573baad3f0654162a33e1032 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8986,7 +8986,7 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
 case CONST_INT:
   if (outer == SET)
 {
-  if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
+  if (UINTVAL (x) < 256)
 return COSTS_N_INSNS (1);
 	  /* See split "TARGET_THUMB1 && satisfies_constraint_J".  */
 	  if (INTVAL (x) >= -255 && INTVAL (x) <= -1)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8c63bf7b75c4e84283ffee471375389f5a5b1a34..e78ede8945fb2d0c0ac5a5af7b96a64d061cf5c3 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3761,8 +3761,7 @@ (define_expand "ashldi3"
 {
   rtx scratch1, scratch2;
 
-  if (CONST_INT_P (operands[2])
-	  && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
+  if (operands[2] == CONST1_RTX (SImode))
 {
   emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
   DONE;
@@ -3807,7 +3806,7 @@ (define_expand "ashlsi3"
   "TARGET_EITHER"
   "
   if (CONST_INT_P (operands[2])
-  && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
+  && (UINTVAL (operands[2])) > 31)
 {
   emit_insn (gen_movsi (operands[0], const0_rtx));
   DONE;
@@ -3835,8 +3834,7 @@ (define_expand "ashrdi3"
 {
   rtx scratch1, scratch2;
 
-  if (CONST_INT_P (operands[2])
-	  && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
+  if (operands[2] == CONST1_RTX (SImode))
 {
   emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
   DONE;
@@ -3881,7 +3879,7 @@ (define_expand "ashrsi3"
   "TARGET_EITHER"
   "
   if (CONST_INT_P (operands[2])
-  && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
+  && UINTVAL (operands[2]) > 31)
 operands[2] = GEN_INT (31);
   "
 )
@@ -3906,8 +3904,7 @@ (define_expand "lshrdi3"
 {
   rtx scratch1, scratch2;
 
-  if (CONST_INT_P (operands[2])
-	  && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
+  if (operands[2] == CONST1_RTX (SImode))
 {
   emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
   DONE;
@@ -3952,7 +3949,7 @@ (define_expand "lshrsi3"
   "TARGET_EITHER"
   "
   if (CONST_INT_P (operands[2])
-  && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
+  && (UINTVAL (operands[2])) > 31)
 {
   emit_insn (gen_movsi (operands[0], const0_rtx));
   DONE;
@@ -3986,7 +3983,7 @@ (define_expand "rotrsi3"
   if (TARGET_32BIT)
 {
   if (CONST_INT_P (operands[2])
-  && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
+  && UINTVAL (operands[2]) > 31)
 operands[2] = GEN_INT (INTVAL (operands[2]) % 32);
 }
   else /* TARGET_THUMB1 */
@@ -5129,7 +5126,7 @@ (define_split
 		 (match_operator 5 "subreg_lowpart_operator"
 		  [(match_operand:SI 4 "s_register_operand" "")]]
   "TARGET_32BIT
-   && ((unsigned HOST_WIDE_INT) INTVAL (operands[3])
+   && (UINTVAL (operands[3])
== (GET_MODE_MASK (GET_MODE (operands[5]))
& (GET_MODE_MASK (GET_MODE (operands[5]))
 	  << (INTVAL (operands[2])"
@@ -10187,8 +10184,8 @@ (define_split
 	 (match_operand 1 "const_int_operand" "")))
(clobber (match_scratch:SI 2 ""))]
   "TARGET_ARM
-   && (((unsigned HOST_WIDE_INT) INTVAL (operands[1]))
-   == (((unsigned HOST_WIDE_INT) INTVAL (operands[1])) >> 24) << 24)"
+   && ((UINTVA

[PATCH] Fix PR71230 some more

2016-05-24 Thread Richard Biener

There were more omissions in how zero_one_operation works with the new
way of processing negates in the context of multiplication chains.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.  CPU 2006
build in progress as well.

Richard.

2016-05-24  Richard Biener  

PR tree-optimization/71230
* tree-ssa-reassoc.c (zero_one_operation): Handle negate special ops.

* gcc.dg/torture/pr71230.c: New testcase.
* g++.dg/torture/pr71230.C: Likewise.

Index: gcc/tree-ssa-reassoc.c
===
*** gcc/tree-ssa-reassoc.c  (revision 236630)
--- gcc/tree-ssa-reassoc.c  (working copy)
*** zero_one_operation (tree *def, enum tree
*** 1189,1200 
  {
tree name;
  
!   if (opcode == MULT_EXPR
! && stmt_is_power_of_op (stmt, op))
{
! if (decrement_power (stmt) == 1)
!   propagate_op_to_single_use (op, stmt, def);
! return;
}
  
name = gimple_assign_rhs1 (stmt);
--- 1191,1210 
  {
tree name;
  
!   if (opcode == MULT_EXPR)
{
! if (stmt_is_power_of_op (stmt, op))
!   {
! if (decrement_power (stmt) == 1)
!   propagate_op_to_single_use (op, stmt, def);
! return;
!   }
! else if (gimple_assign_rhs_code (stmt) == NEGATE_EXPR
!  && gimple_assign_rhs1 (stmt) == op)
!   {
! propagate_op_to_single_use (op, stmt, def);
! return;
!   }
}
  
name = gimple_assign_rhs1 (stmt);
*** zero_one_operation (tree *def, enum tree
*** 1213,1219 
}
  
/* We might have a multiply of two __builtin_pow* calls, and
!the operand might be hiding in the rightmost one.  */
if (opcode == MULT_EXPR
  && gimple_assign_rhs_code (stmt) == opcode
  && TREE_CODE (gimple_assign_rhs2 (stmt)) == SSA_NAME
--- 1223,1230 
}
  
/* We might have a multiply of two __builtin_pow* calls, and
!the operand might be hiding in the rightmost one.  Likewise
!this can happen for a negate.  */
if (opcode == MULT_EXPR
  && gimple_assign_rhs_code (stmt) == opcode
  && TREE_CODE (gimple_assign_rhs2 (stmt)) == SSA_NAME
*** zero_one_operation (tree *def, enum tree
*** 1226,1231 
--- 1237,1249 
propagate_op_to_single_use (op, stmt2, def);
  return;
}
+ else if (is_gimple_assign (stmt2)
+  && gimple_assign_rhs_code (stmt2) == NEGATE_EXPR
+  && gimple_assign_rhs1 (stmt2) == op)
+   {
+ propagate_op_to_single_use (op, stmt2, def);
+ return;
+   }
}
  
/* Continue walking the chain.  */
Index: gcc/testsuite/gcc.dg/torture/pr71230.c
===
*** gcc/testsuite/gcc.dg/torture/pr71230.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71230.c  (working copy)
***
*** 0 
--- 1,25 
+ /* { dg-do compile } */
+ /* { dg-additional-options "-ffast-math" } */
+ 
+ void metric_carttosphere(int *cctk_lsh, double txz, double tyz, double txx,
+double tzz, double sint, double cosp, double cost,
+double tyy, double sinp, double txy, double *grp,
+double *grq, double *r)
+ {
+   int i;
+   for(i=0; i class Tensor;
+template  class Point {
+public:
+Point (const double x, const double y, const double z);
+double operator () (const unsigned int index) const;
+};
+template  class TriaObjectAccessor  {
+Point & vertex (const unsigned int i) const;
+Point barycenter (double, double, double, double, double) const;
+};
+template <> Point<3> TriaObjectAccessor<3, 3>::barycenter (double s6, double 
s7, double s1, double s2, double s3) const
+{
+const double x[8] = {
+   vertex(0)(0),vertex(1)(0),vertex(2)(0),vertex(3)(0),
vertex(4)(0),vertex(5)(0),vertex(6)(0),vertex(7)(0) };
+const double y[8] = {
+   vertex(0)(1),vertex(1)(1),vertex(2)(1),vertex(3)(1),
vertex(4)(1),vertex(5)(1),vertex(6)(1),vertex(7)(1) };
+const double z[8] = {
+   vertex(0)(2),vertex(1)(2),vertex(2)(2),vertex(3)(2),
vertex(4)(2),vertex(5)(2),vertex(6)(2),vertex(7)(2) };
+double s4, s5, s8;
+const double unknown0 = s1*s2;
+const double unknown1 = s1*s2;
+s8 = -z[2]*x[1]*y[2]*z[5]+z[2]*y[1]*x[2]*z[5]-z[2]*z[1]*x[2]*y[5]+z[2]*z   
 
[1]*x[5]*y[2]+2.0*y[5]*x[7]*z[4]*z[4]-y[1]*x[2]*z[0]*z[0]+x[0]*y[3]*z[7]*z[7]   
 
-2.0*z[5]*z[5]*x[4]*y[1]+2.0*z[5]*z[5]*x[1]*y[4]+z[5]*z[5]*x[0]*y[4]-2.0*z[2]*z 
   
[2]*x[1]*y[3]+2.0*z[2]*z[2]*x[3]*y[1]-x[0]*y[4]*z[7]*z[7]-y[0]*x[3]*z[7]*z[7]+x 
   [1]*y[0]*z[5]*z[5];
+s5 = s8

[PATCH][ARM][2/4] Replace casts of 1 to HOST_WIDE_INT by HOST_WIDE_INT_1 and HOST_WIDE_INT_1U

2016-05-24 Thread Kyrill Tkachov

Hi all,

hwint.h defines a number of useful macros to access the constants -1,0,1 cast to
HOST_WIDE_INT or unsigned HOST_WIDE_INT. We can use these to save some 
horizontal
space and parentheses when we need such constants.

This patch replaces such uses with these macros to slightly improve the 
readability
of some of the expressions in the arm backend.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Will commit as obvious.

Thanks,
Kyrill

P.S. One such usage remains in thumb1_rtx_costs since Thomas will be removing 
it as part
of his ARMv8-M patches, so I didn't want to introduce a dependency.

2016-05-24  Kyrylo Tkachov  

* config/arm/arm.md (andsi3): Replace cast of 1 to HOST_WIDE_INT
with HOST_WIDE_INT_1.
(insv): Likewise.
* config/arm/arm.c (optimal_immediate_sequence): Replace cast of
1 to unsigned HOST_WIDE_INT with HOST_WIDE_INT_1U.
(arm_canonicalize_comparison): Likewise.
(thumb1_rtx_costs): Replace cast of 1 to HOST_WIDE_INT with
HOST_WIDE_INT_1.
(thumb1_size_rtx_costs): Likewise.
(vfp_const_double_index): Replace cast of 1 to unsigned
HOST_WIDE_INT with HOST_WIDE_INT_1U.
(get_jump_table_size): Replace cast of 1 to HOST_WIDE_INT with
HOST_WIDE_INT_1.
(arm_asan_shadow_offset): Replace cast of 1 to unsigned
HOST_WIDE_INT with HOST_WIDE_INT_1U.
* config/arm/neon.md (vec_set): Replace cast of 1 to
HOST_WIDE_INT with HOST_WIDE_INT_1.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3fe6eab46f3c18ace6899b5be45ad646992f43e4..78478303593522d186734c452c970fb013bf846e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -4053,7 +4053,7 @@ optimal_immediate_sequence (enum rtx_code code, unsigned HOST_WIDE_INT val,
  yield a shorter sequence, we may as well use zero.  */
   insns1 = optimal_immediate_sequence_1 (code, val, return_sequence, best_start);
   if (best_start != 0
-  && unsigned HOST_WIDE_INT) 1) << best_start) < val))
+  && ((HOST_WIDE_INT_1U << best_start) < val))
 {
   insns2 = optimal_immediate_sequence_1 (code, val, &tmp_sequence, 0);
   if (insns2 <= insns1)
@@ -4884,7 +4884,7 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
   if (mode == VOIDmode)
 mode = GET_MODE (*op1);
 
-  maxval = (((unsigned HOST_WIDE_INT) 1) << (GET_MODE_BITSIZE(mode) - 1)) - 1;
+  maxval = (HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (mode) - 1)) - 1;
 
   /* For DImode, we have GE/LT/GEU/LTU comparisons.  In ARM mode
  we can also use cmp/cmpeq for GTU/LEU.  GT/LE must be either
@@ -8254,8 +8254,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
 	  int i;
 	  /* This duplicates the tests in the andsi3 expander.  */
 	  for (i = 9; i <= 31; i++)
-	if HOST_WIDE_INT) 1) << i) - 1 == INTVAL (x)
-		|| (((HOST_WIDE_INT) 1) << i) - 1 == ~INTVAL (x))
+	if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (x)
+		|| (HOST_WIDE_INT_1 << i) - 1 == ~INTVAL (x))
 	  return COSTS_N_INSNS (2);
 	}
   else if (outer == ASHIFT || outer == ASHIFTRT
@@ -9007,8 +9007,8 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
   int i;
   /* This duplicates the tests in the andsi3 expander.  */
   for (i = 9; i <= 31; i++)
-if HOST_WIDE_INT) 1) << i) - 1 == INTVAL (x)
-|| (((HOST_WIDE_INT) 1) << i) - 1 == ~INTVAL (x))
+if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (x)
+|| (HOST_WIDE_INT_1 << i) - 1 == ~INTVAL (x))
   return COSTS_N_INSNS (2);
 }
   else if (outer == ASHIFT || outer == ASHIFTRT
@@ -12122,7 +12122,7 @@ vfp3_const_double_index (rtx x)
 
   /* We can permit four significant bits of mantissa only, plus a high bit
  which is always 1.  */
-  mask = ((unsigned HOST_WIDE_INT)1 << (point_pos - 5)) - 1;
+  mask = (HOST_WIDE_INT_1U << (point_pos - 5)) - 1;
   if ((mantissa & mask) != 0)
 return -1;
 
@@ -16216,7 +16216,7 @@ get_jump_table_size (rtx_jump_table_data *insn)
 	{
 	case 1:
 	  /* Round up size  of TBB table to a halfword boundary.  */
-	  size = (size + 1) & ~(HOST_WIDE_INT)1;
+	  size = (size + 1) & ~HOST_WIDE_INT_1;
 	  break;
 	case 2:
 	  /* No padding necessary for TBH.  */
@@ -29694,7 +29694,7 @@ arm_fusion_enabled_p (unsigned int op)
 static unsigned HOST_WIDE_INT
 arm_asan_shadow_offset (void)
 {
-  return (unsigned HOST_WIDE_INT) 1 << 29;
+  return HOST_WIDE_INT_1U << 29;
 }
 
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2b190e23a11f23f6e076a84bd309260c8bc4b9da..8c63bf7b75c4e84283ffee471375389f5a5b1a34 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2154,13 +2154,13 @@ (define_expand "andsi3"
 
   for (i = 9; i <= 31; i++)
 	{
-	  if HOST_WIDE_INT) 1) << i) - 1 == INTVAL (operands[2]))
+	  if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (operands[2]))
 	{
 	  emit_insn (gen_extzv (operands[0], operands[1], GEN_INT (i),
 			 	cons

[PATCH][ARM][1/4] Replace uses of int_log2 by exact_log2

2016-05-24 Thread Kyrill Tkachov

Hi all,

The int_log2 function in arm.c is not really useful since we already have a 
generic function for calculating
the log2 of HOST_WIDE_INTs. The only difference in functionality is that 
int_log2 also asserts that the result
is no greater than 31.

This patch removes int_log2 in favour of exact_log2 and adds an assert on the 
result to make sure the return
value was as expected.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Is this ok? Or is there something I'm missing about int_log2?

Thanks,
Kyrill

2016-05-24  Kyrylo Tkachov  

* config/arm/arm.c (int_log2): Delete definition and prototype.
(shift_op): Use exact_log2 instead of int_log2.
(vfp3_const_double_for_fract_bits): Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6cc0feb6f87157171c889e998e52b4e5d8683c66..3fe6eab46f3c18ace6899b5be45ad646992f43e4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -104,7 +104,6 @@ static void arm_print_operand_address (FILE *, machine_mode, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
-static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
 static const char *output_multi_immediate (rtx *, const char *, const char *,
 	   int, HOST_WIDE_INT);
 static const char *shift_op (rtx, HOST_WIDE_INT *);
@@ -18920,7 +18919,8 @@ shift_op (rtx op, HOST_WIDE_INT *amountp)
 	  return NULL;
 	}
 
-  *amountp = int_log2 (*amountp);
+  *amountp = exact_log2 (*amountp);
+  gcc_assert (IN_RANGE (*amountp, 0, 31));
   return ARM_LSL_NAME;
 
 default:
@@ -18952,22 +18952,6 @@ shift_op (rtx op, HOST_WIDE_INT *amountp)
   return mnem;
 }
 
-/* Obtain the shift from the POWER of two.  */
-
-static HOST_WIDE_INT
-int_log2 (HOST_WIDE_INT power)
-{
-  HOST_WIDE_INT shift = 0;
-
-  while HOST_WIDE_INT) 1 << shift) & power) == 0)
-{
-  gcc_assert (shift <= 31);
-  shift++;
-}
-
-  return shift;
-}
-
 /* Output a .ascii pseudo-op, keeping track of lengths.  This is
because /bin/as is horribly restrictive.  The judgement about
whether or not each character is 'printable' (and can be output as
@@ -27691,7 +27675,11 @@ vfp3_const_double_for_fract_bits (rtx operand)
 	  HOST_WIDE_INT value = real_to_integer (&r0);
 	  value = value & 0x;
 	  if ((value != 0) && ( (value & (value - 1)) == 0))
-	return int_log2 (value);
+	{
+	  int ret = exact_log2 (value);
+	  gcc_assert (IN_RANGE (ret, 0, 31));
+	  return ret;
+	}
 	}
 }
   return 0;


[PATCH] Fix PR71230

2016-05-24 Thread Richard Biener

The following fixes the ICEs in PR71230.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-05-24  Richard Biener  

PR tree-optimization/71240
* tree-ssa-math-opts.c (init_symbolic_number): Verify the source
has integral type.

* gcc.dg/optimize-bswapsi-5.c: New testcase.

Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c(revision 236630)
--- gcc/tree-ssa-math-opts.c(working copy)
*** init_symbolic_number (struct symbolic_nu
*** 2051,2056 
--- 2051,2059 
  {
int size;
  
+   if (! INTEGRAL_TYPE_P (TREE_TYPE (src)))
+ return false;
+ 
n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE;
  
/* Set up the symbolic number N by setting each byte to a value between 1 
and
Index: gcc/testsuite/gcc.dg/optimize-bswapsi-5.c
===
*** gcc/testsuite/gcc.dg/optimize-bswapsi-5.c   (revision 0)
--- gcc/testsuite/gcc.dg/optimize-bswapsi-5.c   (working copy)
***
*** 0 
--- 1,31 
+ /* { dg-do compile } */
+ /* { dg-require-effective-target bswap32 } */
+ /* { dg-options "-O2 -fdump-tree-bswap" } */
+ /* { dg-additional-options "-march=z900" { target s390-*-* } } */
+ 
+ struct L { unsigned int l[2]; };
+ union U { double a; struct L l; } u;
+ 
+ void
+ foo (double a, struct L *p)
+ {
+   u.a = a;
+   struct L l = u.l, m;
+   m.l[0] = (((l.l[1] & 0xff00) >> 24)
+   | ((l.l[1] & 0x00ff) >> 8)
+   | ((l.l[1] & 0xff00) << 8)
+   | ((l.l[1] & 0x00ff) << 24));
+   m.l[1] = (((l.l[0] & 0xff00) >> 24)
+   | ((l.l[0] & 0x00ff) >> 8)
+   | ((l.l[0] & 0xff00) << 8)
+   | ((l.l[0] & 0x00ff) << 24));
+   *p = m;
+ }
+ 
+ void
+ bar (double a, struct L *p)
+ {
+   foo (a, p);
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 2 
"bswap" } } */


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
> As said I'd simply use NULL TYPE_MAX_VALUE, not drop TYPE_DOMAIN 
> completely (yes, NULL TYPE_DOMAIN is equal to [0:] so we can as well
> print that - as you say, not sure what else breaks with that ;))

NULL TYPE_MAX_VALUE was used by my previous patch, because it used 
gfc_array_range_type that was built as such.  I am testing the patch
bellow + tree-pretty-print update:
Index: tree-pretty-print.c
===
--- tree-pretty-print.c (revision 236556)
+++ tree-pretty-print.c (working copy)
@@ -362,7 +362,7 @@ dump_array_domain (pretty_printer *pp, t
}
 }
   else
-pp_string (pp, "");
+pp_string (pp, "0:");
   pp_right_bracket (pp);
 }
 
I suppose this is slightly better because it will make things more regular
across frontends and will make LTO to merge bit more.

Honza


[PATCH][ARM][4/4] Simplify checks for CONST_INT_P and comparison against 1/0

2016-05-24 Thread Kyrill Tkachov

Hi all,

Following up from patch 3/4 there are a few more instances where we check that 
an RTX is CONST_INT_P and then
compare its INTVAL against 1 or 0. These can be replaced by just comparing the 
RTX directly against CONST1_RTX
or CONST0_RTX.

This patch does that.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk as obvious.

Thanks,
Kyrill

2016-05-24  Kyrylo Tkachov  

* config/arm/neon.md (ashldi3_neon):  Replace comparison of INTVAL of
operands[2] against 1 with comparison against CONST1_RTX.
(di3_neon): Likewise.
* config/arm/predicates.md (const0_operand): Replace with comparison
against CONST0_RTX.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 21eed7bb99c48d508a1c8be9c8f992ae07f3d550..e2fdfbb04621ee6f8603849be089e8bce624214d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1082,7 +1082,7 @@ (define_insn_and_split "ashldi3_neon"
   }
 else
   {
-	if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1
+	if (operands[2] == CONST1_RTX (SImode)
 	&& (!reg_overlap_mentioned_p (operands[0], operands[1])
 		|| REGNO (operands[0]) == REGNO (operands[1])))
 	  /* This clobbers CC.  */
@@ -1184,7 +1184,7 @@ (define_insn_and_split "di3_neon"
   }
 else
   {
-	if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1
+	if (operands[2] == CONST1_RTX (SImode)
 	&& (!reg_overlap_mentioned_p (operands[0], operands[1])
 		|| REGNO (operands[0]) == REGNO (operands[1])))
 	  /* This clobbers CC.  */
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 86c1bb62ae9ba433afe3169e07055c1b818e26c8..762c828c98bdccebb773142f1202ec171e3438f7 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -149,8 +149,7 @@ (define_predicate "arm_not_immediate_operand"
(match_test "const_ok_for_arm (~INTVAL (op))")))
 
 (define_predicate "const0_operand"
-  (and (match_code "const_int")
-   (match_test "INTVAL (op) == 0")))
+  (match_test "op == CONST0_RTX (mode)"))
 
 ;; Something valid on the RHS of an ARM data-processing instruction
 (define_predicate "arm_rhs_operand"


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Jan Hubicka wrote:

> Hi,
> I tried the attached patch that gets rid of gfc_array_range_type because it
> seems pointless from middle-end POV. It however affects .original dumps in the
> following way:
> --- assumed_type_2.f90.003t.original2016-05-24 14:32:45.771503552 +0200
> +++ ../assumed_type_2.f90.003t.original 2016-05-24 14:34:07.637311579 +0200
> @@ -246,7 +246,7 @@
>  parm.20.offset = NON_LVALUE_EXPR ;
>  D.3504 = _gfortran_internal_pack (&parm.20);
>  sub_array_assumed (D.3504);
> -if ((void *[0:] *) parm.20.data != (void *[0:] *) D.3504)
> +if ((void *[] *) parm.20.data != (void *[] *) D.3504)
>{ 
>  _gfortran_internal_unpack (&parm.20, D.3504);
>  __builtin_free (D.3504);
> @@ -576,12 +576,12 @@
>  { 
>static logical(kind=4) C.3584 = 1;
> 
> -  sub_scalar (&(*(real(kind=4)[0:] * restrict) 
> array_real_alloc.data)[(array_real_alloc.offset + 
> array_real_alloc.dim[1].stride * 2) + 3], &C.3584);
> +  sub_scalar (&(*(real(kind=4)[] * restrict) 
> array_real_alloc.data)[(array_real_alloc.offset + 
> array_real_alloc.dim[1].stride * 2) + 3], &C.3584);
>  }
>  { 
>static logical(kind=4) C.3585 = 1;
> 
> -  sub_scalar (&(*(character(kind=1)[0:][1:1] *) 
> array_char_ptr.data)[array_char_ptr.offset + NON_LVALUE_EXPR 
> ], &C.3585, 1);
> +  sub_scalar (&(*(character(kind=1)[][1:1] *) 
> array_char_ptr.data)[array_char_ptr.offset + NON_LVALUE_EXPR 
> ], &C.3585, 1);
>  }
>  { 
>static logical(kind=4) C.3586 = 1;
> 
> Which breaks testsuite.  Perhaps just
>  can be printed as 0: (because that is what NULL domain means).  This
> is done by dump_array_domain in pretty-print.c and I am not quite sure who
> else relies on the format.
> Or we can just compoensate the testsuite given that the bounds are really
> unknown...

As said I'd simply use NULL TYPE_MAX_VALUE, not drop TYPE_DOMAIN 
completely (yes, NULL TYPE_DOMAIN is equal to [0:] so we can as well
print that - as you say, not sure what else breaks with that ;))

Richard.

> Honza
> 
> Index: trans-types.c
> ===
> --- trans-types.c (revision 236556)
> +++ trans-types.c (working copy)
> @@ -52,7 +52,6 @@ along with GCC; see the file COPYING3.
>  CInteropKind_t c_interop_kinds_table[ISOCBINDING_NUMBER];
>  
>  tree gfc_array_index_type;
> -tree gfc_array_range_type;
>  tree gfc_character1_type_node;
>  tree pvoid_type_node;
>  tree prvoid_type_node;
> @@ -945,12 +944,6 @@ gfc_init_types (void)
>  = build_pointer_type (build_function_type_list (void_type_node, 
> NULL_TREE));
>  
>gfc_array_index_type = gfc_get_int_type (gfc_index_integer_kind);
> -  /* We cannot use gfc_index_zero_node in definition of gfc_array_range_type,
> - since this function is called before gfc_init_constants.  */
> -  gfc_array_range_type
> -   = build_range_type (gfc_array_index_type,
> -   build_int_cst (gfc_array_index_type, 0),
> -   NULL_TREE);
>  
>/* The maximum array element size that can be handled is determined
>   by the number of bits available to store this field in the array
> @@ -1920,12 +1913,12 @@ gfc_get_array_type_bounds (tree etype, i
>  
>/* We define data as an array with the correct size if possible.
>   Much better than doing pointer arithmetic.  */
> -  if (stride)
> +  if (stride && akind >= GFC_ARRAY_UNKNOWN)
>  rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
> int_const_binop (MINUS_EXPR, stride,
>  build_int_cst (TREE_TYPE 
> (stride), 1)));
>else
> -rtype = gfc_array_range_type;
> +rtype = NULL;
>arraytype = build_array_type (etype, rtype);
>arraytype = build_pointer_type (arraytype);
>if (restricted)
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][ARM] PR target/70830: Avoid POP-{reglist}^ when returning from interrupt handlers

2016-05-24 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01211.html

Thanks,
Kyrill

On 17/05/16 11:40, Kyrill Tkachov wrote:


On 13/05/16 12:05, Kyrill Tkachov wrote:

Hi Christophe,

On 12/05/16 20:57, Christophe Lyon wrote:

On 12 May 2016 at 11:48, Ramana Radhakrishnan  wrote:

On Thu, May 5, 2016 at 12:50 PM, Kyrill Tkachov
 wrote:

Hi all,

In this PR we deal with some fallout from the conversion to unified
assembly.
We now end up emitting instructions like:
   pop {r0,r1,r2,r3,pc}^
which is not legal. We have to use an LDM form.

There are bugs in two arm.c functions: output_return_instruction and
arm_output_multireg_pop.

In output_return_instruction the buggy hunk from the conversion was:
   else
-   if (TARGET_UNIFIED_ASM)
   sprintf (instr, "pop%s\t{", conditional);
-   else
- sprintf (instr, "ldm%sfd\t%%|sp!, {", conditional);

The code was already very obscurely structured and arguably the bug was
latent.
It emitted POP only when TARGET_UNIFIED_ASM was on, and since
TARGET_UNIFIED_ASM was on
only for Thumb, we never went down this path interrupt handling code, since
the interrupt
attribute is only available for ARM code. After the removal of
TARGET_UNIFIED_ASM we ended up
using POP unconditionally. So this patch adds a check for IS_INTERRUPT and
outputs the
appropriate LDM form.

In arm_output_multireg_pop the buggy hunk was:
-  if ((regno_base == SP_REGNUM) && TARGET_THUMB)
+  if ((regno_base == SP_REGNUM) && update)
  {
-  /* Output pop (not stmfd) because it has a shorter encoding.  */
-  gcc_assert (update);
sprintf (pattern, "pop%s\t{", conditional);
  }

Again, the POP was guarded on TARGET_THUMB and so would never be taken on
interrupt handling
routines. This patch guards that with the appropriate check on interrupt
return.

Also, there are a couple of bugs in the 'else' branch of that 'if':
* The "ldmfd%s" was output without a '\t' at the end which meant that the
base register
name would be concatenated with the 'ldmfd', creating invalid assembly.

* The logic:

   if (regno_base == SP_REGNUM)
   /* update is never true here, hence there is no need to handle
  pop here.  */
 sprintf (pattern, "ldmfd%s", conditional);

   if (update)
 sprintf (pattern, "ldmia%s\t", conditional);
   else
 sprintf (pattern, "ldm%s\t", conditional);

Meant that for "regno == SP_REGNUM && !update" we'd end up printing
"ldmfd%sldm%s\t"
to pattern. I didn't manage to reproduce that condition though, so maybe it
can't ever occur.
This patch fixes both these issues nevertheless.

I've added the testcase from the PR to catch the fix in
output_return_instruction.
The testcase doesn't catch the bugs in arm_output_multireg_pop, but the
existing tests
gcc.target/arm/interrupt-1.c and gcc.target/arm/interrupt-2.c would have
caught them
if only they were assemble tests rather than just compile. So this patch
makes them
assembly tests (and reverts the scan-assembler checks for the correct LDM
pattern).

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk and GCC 6?


Hi Kyrill,

Did you test --with-mode=thumb?
When using arm mode, I see regressions:

   gcc.target/arm/neon-nested-apcs.c (test for excess errors)
   gcc.target/arm/nested-apcs.c (test for excess errors)


It's because I have a local patch in my binutils that makes gas warn on the
deprecated sequences that these two tests generate (they use the deprecated 
-mapcs option),
so these tests were already showing the (test for excess errors) FAIL for me,
so I they didn't appear in my tests diff for this patch. :(

I've reproduced the failure with a clean tree.
Where before we generated:
ldmsp, {fp, sp, pc}
now we generate:
pop{fp, sp, pc}

which are not equivalent (pop performs a write-back) and gas warns:
Warning: writeback of base register when in register list is UNPREDICTABLE

I'm testing a patch to fix this.
Sorry for the regression.


Here is the fix.
I had remove the update from the condition for the "pop" erroneously. Of 
course, if we're not
updating the SP we can't use POP that has an implicit writeback.

Bootstrapped on arm-none-linux-gnueabihf. Tested with -mthumb and -marm.

Ok for trunk and GCC 6?

Thanks,
Kyrill

2016-05-17  Kyrylo Tkachov  

PR target/70830
* config/arm/arm.c (arm_output_multireg_pop): Guard "pop" on update.





Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
Hi,
I tried the attached patch that gets rid of gfc_array_range_type because it
seems pointless from middle-end POV. It however affects .original dumps in the
following way:
--- assumed_type_2.f90.003t.original2016-05-24 14:32:45.771503552 +0200
+++ ../assumed_type_2.f90.003t.original 2016-05-24 14:34:07.637311579 +0200
@@ -246,7 +246,7 @@
 parm.20.offset = NON_LVALUE_EXPR ;
 D.3504 = _gfortran_internal_pack (&parm.20);
 sub_array_assumed (D.3504);
-if ((void *[0:] *) parm.20.data != (void *[0:] *) D.3504)
+if ((void *[] *) parm.20.data != (void *[] *) D.3504)
   { 
 _gfortran_internal_unpack (&parm.20, D.3504);
 __builtin_free (D.3504);
@@ -576,12 +576,12 @@
 { 
   static logical(kind=4) C.3584 = 1;

-  sub_scalar (&(*(real(kind=4)[0:] * restrict) 
array_real_alloc.data)[(array_real_alloc.offset + 
array_real_alloc.dim[1].stride * 2) + 3], &C.3584);
+  sub_scalar (&(*(real(kind=4)[] * restrict) 
array_real_alloc.data)[(array_real_alloc.offset + 
array_real_alloc.dim[1].stride * 2) + 3], &C.3584);
 }
 { 
   static logical(kind=4) C.3585 = 1;

-  sub_scalar (&(*(character(kind=1)[0:][1:1] *) 
array_char_ptr.data)[array_char_ptr.offset + NON_LVALUE_EXPR 
], &C.3585, 1);
+  sub_scalar (&(*(character(kind=1)[][1:1] *) 
array_char_ptr.data)[array_char_ptr.offset + NON_LVALUE_EXPR 
], &C.3585, 1);
 }
 { 
   static logical(kind=4) C.3586 = 1;

Which breaks testsuite.  Perhaps just
 can be printed as 0: (because that is what NULL domain means).  This
is done by dump_array_domain in pretty-print.c and I am not quite sure who
else relies on the format.
Or we can just compoensate the testsuite given that the bounds are really
unknown...

Honza

Index: trans-types.c
===
--- trans-types.c   (revision 236556)
+++ trans-types.c   (working copy)
@@ -52,7 +52,6 @@ along with GCC; see the file COPYING3.
 CInteropKind_t c_interop_kinds_table[ISOCBINDING_NUMBER];
 
 tree gfc_array_index_type;
-tree gfc_array_range_type;
 tree gfc_character1_type_node;
 tree pvoid_type_node;
 tree prvoid_type_node;
@@ -945,12 +944,6 @@ gfc_init_types (void)
 = build_pointer_type (build_function_type_list (void_type_node, 
NULL_TREE));
 
   gfc_array_index_type = gfc_get_int_type (gfc_index_integer_kind);
-  /* We cannot use gfc_index_zero_node in definition of gfc_array_range_type,
- since this function is called before gfc_init_constants.  */
-  gfc_array_range_type
- = build_range_type (gfc_array_index_type,
- build_int_cst (gfc_array_index_type, 0),
- NULL_TREE);
 
   /* The maximum array element size that can be handled is determined
  by the number of bits available to store this field in the array
@@ -1920,12 +1913,12 @@ gfc_get_array_type_bounds (tree etype, i
 
   /* We define data as an array with the correct size if possible.
  Much better than doing pointer arithmetic.  */
-  if (stride)
+  if (stride && akind >= GFC_ARRAY_UNKNOWN)
 rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
  int_const_binop (MINUS_EXPR, stride,
   build_int_cst (TREE_TYPE 
(stride), 1)));
   else
-rtype = gfc_array_range_type;
+rtype = NULL;
   arraytype = build_array_type (etype, rtype);
   arraytype = build_pointer_type (arraytype);
   if (restricted)


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
> > Hmm, you are probably right. If we can have array with TYPE_DOMAIN != NULL
> > and sane bounds, but with TYPE_SIZE == NULL, we probably need to punt on 
> > NULL
> > TYPE_SIZE.  I can add it just to be sure.
> 
> As a MEM_REF embeds a VIEW_CONVERT you can placement-new
> 
> struct { int a[5]; char b[]; };

Yep. This is what I am trying to handle with the TYPE_SIZE condition.
> 
> ontop of char X[24]; and access MEM_REF[&x].a[3] (not at struct end)
> and MEM_REF[&x].b[4] but _both_ accesses would have TYPE_SIZE NULL.
> 
> So I'm not sure TYPE_SIZE tells you anything here...

Well, here when parsing  MEM_REF[&x].a[3] array_at_struct_end_p should return
true because it parses the handled components and will see FIELD_REF for .a
that is not at end:

  while (handled_component_p (ref)) 
{   
  /* If the reference chain contains a component reference to a 
 non-union type and there follows another field the reference   
 is not at the end of a structure.  */  
  if (TREE_CODE (ref) == COMPONENT_REF  
  && TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 0))) == RECORD_TYPE)  
{   
  tree nextf = DECL_CHAIN (TREE_OPERAND (ref, 1));  
  while (nextf && TREE_CODE (nextf) != FIELD_DECL)  
nextf = DECL_CHAIN (nextf); 
  if (nextf)
return false;   
}   

  ref = TREE_OPERAND (ref, 0);  
}   

The size compare is meant to make difference between
struct a { int a[5]; char b[5]; };
placed in char buf[sizeof(struct a)]
or in placed in char buf[sizeof(struct a)+5]

The REF seen at this pace is the REF of ourter type after unwinding handled 
compoennts,
so it should have TYPE_SIZE defined in this case I think.

Honza
> 
> Richard.
> 
> > I am testing
> > 
> > Index: tree.c
> > ===
> > --- tree.c  (revision 236557)
> > +++ tree.c  (working copy)
> > @@ -13079,7 +13079,8 @@ array_at_struct_end_p (tree ref)
> >tree size = NULL;
> >  
> >if (TREE_CODE (ref) == MEM_REF
> > -  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
> > +  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
> > +  && TYPE_SIZE (TREE_TYPE (ref)))
> >  {
> >size = TYPE_SIZE (TREE_TYPE (ref));
> >ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Jan Hubicka wrote:

> > 
> > Ah, yes.  Now I see.
> > 
> > >  The test I updated that looks for DECL simply assumes
> > > that declarations can not be accessed past their end.
> > > It would make more sense to use object size machinery here somehow.
> > > (i.e. even in fortran we have accesses to mallocated buffers of constant 
> > > size).
> > > But this probably could be better handled at niter side where we can also 
> > > deal with
> > > case of real trailing arrays of known size.
> > 
> > But then I'm not sure that TYPE_SIZE (TREE_TYPE (ref)) == NULL is
> > handled correctly.  I suppose you can hope for the array to be the
> > one forcing it NULL and thus its TYPE_DOMAIN max val being NULL ...
> 
> Hmm, you are probably right. If we can have array with TYPE_DOMAIN != NULL
> and sane bounds, but with TYPE_SIZE == NULL, we probably need to punt on NULL
> TYPE_SIZE.  I can add it just to be sure.

As a MEM_REF embeds a VIEW_CONVERT you can placement-new

struct { int a[5]; char b[]; };

ontop of char X[24]; and access MEM_REF[&x].a[3] (not at struct end)
and MEM_REF[&x].b[4] but _both_ accesses would have TYPE_SIZE NULL.

So I'm not sure TYPE_SIZE tells you anything here...

Richard.

> I am testing
> 
> Index: tree.c
> ===
> --- tree.c(revision 236557)
> +++ tree.c(working copy)
> @@ -13079,7 +13079,8 @@ array_at_struct_end_p (tree ref)
>tree size = NULL;
>  
>if (TREE_CODE (ref) == MEM_REF
> -  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
> +  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
> +  && TYPE_SIZE (TREE_TYPE (ref)))
>  {
>size = TYPE_SIZE (TREE_TYPE (ref));
>ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[C++ Patch] PR 69872 ("[6/7 Regression] -Wnarrowing note without warning/errror")

2016-05-24 Thread Paolo Carlini

Hi,

in this small diagnostic regression we emit an inform without a 
preceding warning/error: checking the return value of the pedwarn, as we 
normally want to do, fixes the problem. Tested x86_64-linux.


Thanks,
Paolo.

/
/cp
2016-05-24  Paolo Carlini  

PR c++/69872
* typeck2.c (check_narrowing): Check pedwarn return value.

/testsuite
2016-05-24  Paolo Carlini  

PR c++/69872
* g++.dg/warn/Wno-narrowing1.C: New.
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 236630)
+++ cp/typeck2.c(working copy)
@@ -950,10 +950,12 @@ check_narrowing (tree type, tree init, tsubst_flag
{
  if (complain & tf_warning_or_error)
{
- if (!almost_ok || pedantic)
-   pedwarn (loc, OPT_Wnarrowing, "narrowing conversion of %qE "
-"from %qT to %qT inside { }", init, ftype, type);
- if (pedantic && almost_ok)
+ if ((!almost_ok || pedantic)
+ && pedwarn (loc, OPT_Wnarrowing,
+ "narrowing conversion of %qE "
+ "from %qT to %qT inside { }",
+ init, ftype, type)
+ && almost_ok)
inform (loc, " the expression has a constant value but is not "
"a C++ constant-expression");
  ok = true;
Index: testsuite/g++.dg/warn/Wno-narrowing1.C
===
--- testsuite/g++.dg/warn/Wno-narrowing1.C  (revision 0)
+++ testsuite/g++.dg/warn/Wno-narrowing1.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/69872
+// { dg-options "-Wall -Wextra -pedantic -Wno-narrowing" }
+
+struct s { int x, y; };
+short offsets[1] = {
+  ((char*) &(((struct s*)16)->y) - (char *)16),  // { dg-bogus "note" }
+};


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
> 
> Ah, yes.  Now I see.
> 
> >  The test I updated that looks for DECL simply assumes
> > that declarations can not be accessed past their end.
> > It would make more sense to use object size machinery here somehow.
> > (i.e. even in fortran we have accesses to mallocated buffers of constant 
> > size).
> > But this probably could be better handled at niter side where we can also 
> > deal with
> > case of real trailing arrays of known size.
> 
> But then I'm not sure that TYPE_SIZE (TREE_TYPE (ref)) == NULL is
> handled correctly.  I suppose you can hope for the array to be the
> one forcing it NULL and thus its TYPE_DOMAIN max val being NULL ...

Hmm, you are probably right. If we can have array with TYPE_DOMAIN != NULL
and sane bounds, but with TYPE_SIZE == NULL, we probably need to punt on NULL
TYPE_SIZE.  I can add it just to be sure.

I am testing

Index: tree.c
===
--- tree.c  (revision 236557)
+++ tree.c  (working copy)
@@ -13079,7 +13079,8 @@ array_at_struct_end_p (tree ref)
   tree size = NULL;
 
   if (TREE_CODE (ref) == MEM_REF
-  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
+  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
+  && TYPE_SIZE (TREE_TYPE (ref)))
 {
   size = TYPE_SIZE (TREE_TYPE (ref));
   ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);


Re: [PATCH v2] Ensure source_date_epoch is always initialised

2016-05-24 Thread Dhole
On 16-05-24 12:06:48, James Clarke wrote:
> Hi,
> > On 24 May 2016, at 11:59, Dhole  wrote:
> > 
> > Hey!
> > 
> > I'm the original author of the SOURCE_DATE_EPOCH patch.
> > 
> > I've just seen this.  I believe that this bug was fixed in the the
> > rework of the patch I sent some days ago [1], although the latest
> > version of that patch has not been reviewed yet.  In [1] the
> > initialization of source_date_epoch is done at init.c
> > (cpp_create_reader), so now it should be initialized properly even when
> > just calling the preprocessor.  I tested your example and it gives the
> > expected output.
> > 
> > Although thinking further, maybe it would be more wise to use "0" as a
> > default value, to mean "not yet set", so that errors like this are
> > avoided.  So source_date_epoch could be:
> > 0: not yet set
> > negative: disabled
> > positive: use this value as SOURCE_DATE_EPOCH
> > 
> > In such case, SOURCE_DATE_EPOCH would need to be a positive number
> > instead of a non-negative number.
> 
> 0 *is* a valid SOURCE_DATE_EPOCH, ie Jan  1 1970 00:00:00, and should
> definitely be allowed.

You're right in the sense that 0 is a valid unix epoch.  In my
suggestion I was considering that SOURCE_DATE_EPOCH is used to set the
date the source code was last modified, and I guess no build process
nowadays has code that was last modified in 1970.  But it may be easier
to understand if 0 is left as a valid value.

> I see your patch continues to put some of the code inside c-family? Is
> there a reason for doing that instead of keeping it all inside libcpp
> like mine, given it’s inherently preprocessor-only? You’ve also merged
> all the error paths into one message which is not as helpful.

I merged the error paths into one as suggested in [1].  I'm not that
knowledgable of GCC to give a call on this, so I just followed the
suggestion from Martin.  But it could be reverted if needed.

Regarding having the code inside c-family, I'm following the suggestion
from Joseph [2]:

Joseph Myers wrote:
> Since cpplib is a library and doesn't have any existing getenv calls, I 
> wonder if it would be better for the cpplib client (i.e. something in the 
> gcc/ directory) to be what calls getenv and then informs cpplib of the 
> timestamp it should treat as being the time of compilation.

Jakub also found it reasonable [3]:

Jakub Jelinek wrote:
> Doing this on the gcc/ side is of course reasonable, but can be done through
> callbacks, libcpp already has lots of other callbacks into the gcc/ code,
> look for e.g. cpp_get_callbacks in gcc/c-family/* and in libcpp/ for
> corresponding code.


[1] https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01889.html
[2] https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02270.html
[3] https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01930.html


Cheers,
-- 
Dhole


signature.asc
Description: PGP signature


Re: [PATCH] Introduce can_remove_lhs_p

2016-05-24 Thread Richard Biener
On Mon, 23 May 2016, Marek Polacek wrote:

> On Mon, May 23, 2016 at 04:36:30PM +0200, Jakub Jelinek wrote:
> > On Mon, May 23, 2016 at 04:28:33PM +0200, Marek Polacek wrote:
> > > As promised in ,
> > > this is a simple clean-up which makes use of a new predicate.  Richi 
> > > suggested
> > > adding maybe_drop_lhs_from_noreturn_call which would be nicer, but I 
> > > didn't
> > > know how to do that, given the handling if lhs is an SSA_NAME.
> > 
> > Shouldn't it be should_remove_lhs_p instead?
> > I mean, it is not just an optimization, but part of how we define the IL.
>  
> Aha, ok.  Renamed.
> 
> > Shouldn't it be also used in tree-cfg.c (verify_gimple_call)?
> 
> I left that spot on purpose but now I don't quite see why, fixed.  Thanks,
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Can you move should_remove_lhs_p to tree-cfg.h please?

Ok with that change.

Richard.

> 2016-05-23  Marek Polacek  
> 
>   * tree.h (should_remove_lhs_p): New predicate.
>   * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Use it.
>   * gimple-fold.c (gimple_fold_call): Likewise.
>   * gimplify.c (gimplify_modify_expr): Likewise.
>   * tree-cfg.c (verify_gimple_call): Likewise.
>   * tree-cfgcleanup.c (fixup_noreturn_call): Likewise.
> 
> diff --git gcc/cgraph.c gcc/cgraph.c
> index cf9192f..1a4f665 100644
> --- gcc/cgraph.c
> +++ gcc/cgraph.c
> @@ -1513,10 +1513,7 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>  }
>  
>/* If the call becomes noreturn, remove the LHS if possible.  */
> -  if (lhs
> -  && (gimple_call_flags (new_stmt) & ECF_NORETURN)
> -  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
> -  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
> +  if (gimple_call_noreturn_p (new_stmt) && should_remove_lhs_p (lhs))
>  {
>if (TREE_CODE (lhs) == SSA_NAME)
>   {
> diff --git gcc/gimple-fold.c gcc/gimple-fold.c
> index 858f484..6b50d43 100644
> --- gcc/gimple-fold.c
> +++ gcc/gimple-fold.c
> @@ -3052,12 +3052,9 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
> inplace)
> == void_type_node))
>   gimple_call_set_fntype (stmt, TREE_TYPE (fndecl));
> /* If the call becomes noreturn, remove the lhs.  */
> -   if (lhs
> -   && (gimple_call_flags (stmt) & ECF_NORETURN)
> +   if (gimple_call_noreturn_p (stmt)
> && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
> -   || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> -== INTEGER_CST)
> -   && !TREE_ADDRESSABLE (TREE_TYPE (lhs)
> +   || should_remove_lhs_p (lhs)))
>   {
> if (TREE_CODE (lhs) == SSA_NAME)
>   {
> diff --git gcc/gimplify.c gcc/gimplify.c
> index 4a544e3..c77eb51 100644
> --- gcc/gimplify.c
> +++ gcc/gimplify.c
> @@ -4847,9 +4847,7 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>   }
>   }
>notice_special_calls (call_stmt);
> -  if (!gimple_call_noreturn_p (call_stmt)
> -   || TREE_ADDRESSABLE (TREE_TYPE (*to_p))
> -   || TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (*to_p))) != INTEGER_CST)
> +  if (!gimple_call_noreturn_p (call_stmt) || !should_remove_lhs_p 
> (*to_p))
>   gimple_call_set_lhs (call_stmt, *to_p);
>else if (TREE_CODE (*to_p) == SSA_NAME)
>   /* The above is somewhat premature, avoid ICEing later for a
> diff --git gcc/tree-cfg.c gcc/tree-cfg.c
> index 7c2ee78..82f0da6c 100644
> --- gcc/tree-cfg.c
> +++ gcc/tree-cfg.c
> @@ -3385,11 +3385,9 @@ verify_gimple_call (gcall *stmt)
>return true;
>  }
>  
> -  if (lhs
> -  && gimple_call_ctrl_altering_p (stmt)
> +  if (gimple_call_ctrl_altering_p (stmt)
>&& gimple_call_noreturn_p (stmt)
> -  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
> -  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
> +  && should_remove_lhs_p (lhs))
>  {
>error ("LHS in noreturn call");
>return true;
> diff --git gcc/tree-cfgcleanup.c gcc/tree-cfgcleanup.c
> index 46d0fa3..4134c38 100644
> --- gcc/tree-cfgcleanup.c
> +++ gcc/tree-cfgcleanup.c
> @@ -604,8 +604,7 @@ fixup_noreturn_call (gimple *stmt)
>   temporaries of variable-sized types is not supported.  Also don't
>   do this with TREE_ADDRESSABLE types, as assign_temp will abort.  */
>tree lhs = gimple_call_lhs (stmt);
> -  if (lhs && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
> -  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
> +  if (should_remove_lhs_p (lhs))
>  {
>gimple_call_set_lhs (stmt, NULL_TREE);
>  
> diff --git gcc/tree.h gcc/tree.h
> index 2510d16..1d72437 100644
> --- gcc/tree.h
> +++ gcc/tree.h
> @@ -5471,4 +5471,14 @@ desired_pro_or_demotion_p (const_tree 

Re: RFC [1/2] divmod transform

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

> On 23 May 2016 at 17:35, Richard Biener  wrote:
> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >  wrote:
> >> Hi,
> >> I have updated my patch for divmod (attached), which was originally
> >> based on Kugan's patch.
> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> >> having same operands to divmod representation, so we can cse computation 
> >> of mod.
> >>
> >> t1 = a TRUNC_DIV_EXPR b;
> >> t2 = a TRUNC_MOD_EXPR b
> >> is transformed to:
> >> complex_tmp = DIVMOD (a, b);
> >> t1 = REALPART_EXPR (complex_tmp);
> >> t2 = IMAGPART_EXPR (complex_tmp);
> >>
> >> * New hook divmod_expand_libfunc
> >> The rationale for introducing the hook is that different targets have
> >> incompatible calling conventions for divmod libfunc.
> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> return quotient and store remainder in argument passed as pointer,
> >> while the arm version takes two arguments and returns both
> >> quotient and remainder having mode double the size of the operand mode.
> >> The port should hence override the hook expand_divmod_libfunc
> >> to generate call to target-specific divmod.
> >> Ports should define this hook if:
> >> a) The port does not have divmod or div insn for the given mode.
> >> b) The port defines divmod libfunc for the given mode.
> >> The default hook default_expand_divmod_libfunc() generates call
> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> are of DImode.
> >>
> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> cross-tested on arm*-*-*.
> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> Does this patch look OK ?
> >
> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > index 6b4601b..e4a021a 100644
> > --- a/gcc/targhooks.c
> > +++ b/gcc/targhooks.c
> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> > machine_mode, optimization_type)
> >return true;
> >  }
> >
> > +void
> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> > +  rtx op0, rtx op1,
> > +  rtx *quot_p, rtx *rem_p)
> >
> > functions need a comment.
> >
> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In 
> > that
> > case we could avoid the target hook.
> Well I would prefer adding the hook because that's more easier -;)
> Would it be ok for now to go with the hook ?
> >
> > +  /* If target overrides expand_divmod_libfunc hook
> > +then perform divmod by generating call to the target-specifc divmod
> > libfunc.  */
> > +  if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> > +   return true;
> > +
> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
> > +  return (mode == DImode && unsignedp);
> >
> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> > but still restrict this to DImode && unsigned?  Also if
> > targetm.expand_divmod_libfunc
> > is not the default we expect the target to handle all modes?
> Ah indeed, the check for DImode is unnecessary.
> However I suppose the check for unsignedp should be there,
> since we want to generate call to __udivmoddi4 only if operand is unsigned ?

The optab libfunc for sdivmod should be NULL in that case.

> >
> > That said - I expected the above piece to be simply a 'return true;' ;)
> >
> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
> > supports a specific operation (for example SImode divmod would use DImode
> > divmod by means of widening operands - for the unsigned case of course).
> Thanks for pointing out. So if a target does not support divmod
> libfunc for a mode
> but for a wider mode, then we could zero-extend operands to the wider-mode,
> perform divmod on the wider-mode, and then cast result back to the
> original mode.
> I haven't done that in this patch, would it be OK to do that as a follow up ?

I think that you should conservatively handle the div_optab query, thus if
the target has a HW division in a wider mode don't use the divmod IFN.
You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
out if that is available.

> > +  /* Disable the transform if either is a constant, since
> > division-by-constant
> > + may have specialized expansion.  */
> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> > +return false;
> >
> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
> >
> > +  if (TYPE_OVERFLOW_TRAPS (type))
> > +return false;
> >
> > why's that?  Generally please first test cheap things (trapping, 
> > constant-ness)
> > before checking expensive stuff (target_supports_divmod_p).
> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
> https://www.mail-archive.com/gc

  1   2   >