Re: Optimise CONCAT handling in emit_group_load

2016-11-15 Thread Eric Botcazou
> 2016-11-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
>   * expr.c (emit_group_load_1): Tighten check for whether an
>   access involves only one operand of a CONCAT.  Use extract_bit_field
>   for constants if the bit range does span the whole operand.

OK, thanks.

-- 
Eric Botcazou


Re: [PATCH] fix PR68468

2016-11-15 Thread Jakub Jelinek
On Wed, Nov 16, 2016 at 07:31:59AM +0100, Waldemar Brodkorb wrote:
> > On Wed, Nov 09, 2016 at 04:08:39PM +0100, Bernd Schmidt wrote:
> > > On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote:
> > > >Hi,
> > > >
> > > >the following patch fixes PR68468.
> > > >Patch is used for a while in Buildroot without issues.
> > > >
> > > >2016-11-05  Waldemar Brodkorb 
> > 
> > Two spaces before < instead of just one.
> > > >
> > > >   PR gcc/68468
> > 
> > PR libgcc/68468
> > instead.
> > 
> > > >   * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets.
> > 
> > Capital F in Fix.
> > No libgcc/ prefix for files in libgcc/ChangeLog.
> > 
> > > This is ok.
> > 
> > I think Waldemar does not have SVN write access, are you going to check it
> > in or who will do that?
> 
> Should I resend the patch with the suggested fixes or will someone
> with write access fix it up for me?

As nobody committed it yet, I've made the changes and committed it for you.

Jakub


Re: [PATCH] fix PR68468

2016-11-15 Thread Waldemar Brodkorb
Hi,
Jakub Jelinek wrote,

> On Wed, Nov 09, 2016 at 04:08:39PM +0100, Bernd Schmidt wrote:
> > On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote:
> > >Hi,
> > >
> > >the following patch fixes PR68468.
> > >Patch is used for a while in Buildroot without issues.
> > >
> > >2016-11-05  Waldemar Brodkorb 
> 
> Two spaces before < instead of just one.
> > >
> > >   PR gcc/68468
> 
>   PR libgcc/68468
> instead.
> 
> > >   * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets.
> 
> Capital F in Fix.
> No libgcc/ prefix for files in libgcc/ChangeLog.
> 
> > This is ok.
> 
> I think Waldemar does not have SVN write access, are you going to check it
> in or who will do that?

Should I resend the patch with the suggested fixes or will someone
with write access fix it up for me?

Thanks
 Waldemar


Re: Rework subreg_get_info

2016-11-15 Thread Joseph Myers
On Tue, 15 Nov 2016, Richard Sandiford wrote:

> Richard Sandiford  writes:
> > This isn't intended to change the behaviour, just rewrite the
> > existing logic in a different (and hopefully clearer) way.
> > The new form -- particularly the part based on the "block"
> > concept -- is easier to convert to polynomial sizes.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Sorry, I should have said: this was also tested by compiling the
> testsuite before and after the change at -O2 -ftree-vectorize on:
> 
> aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
> arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf
> epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
> hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
> i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
> m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu
> mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
> nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu
> powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf

e500 double (both DFmode and TFmode) was the case that motivated the 
original creation of subreg_get_info.  I think powerpc-linux-gnuspe 
--enable-e500-double would be a good case for verifying the patch doesn't 
change generated code.  (Though when I tried building it from mainline 
sources last week I got an ICE in LRA building __multc3, so testing it 
might be problematic at present.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C++ PATCH for C++17 selection statements with initializer

2016-11-15 Thread Marek Polacek
On Sat, Nov 05, 2016 at 10:03:37PM -0400, David Edelsohn wrote:
> The patch adds testcase init-statement6.C, which includes the declaration
> 
> extern void publish (int), raise (int);
> 
> POSIX defines
> 
> int raise (int);
> 
> in  which gets included by the C++ headers for the testcase on AIX.
> 
> This is causes the error message:
> 
> /nasfarm/edelsohn/src/src/gcc/testsuite/g++.dg/cpp1z/init-statement6.C:10:28:
> error: ambiguating new declaration of 'void raise(int)'
> ...
> /tmp/GCC/gcc/include-fixed/sys/signal.h:103:12: note: old declaration
> 'int raise(int)'
> 
> Is there a reason for the conflicting / ambiguating declaration?

Oops, no reason at all.  I'm fixing this with:

Tested on x86_64-linux, applying to trunk.

2016-11-15  Marek Polacek  

* g++.dg/cpp1z/init-statement6.C: Rename a function.

diff --git gcc/testsuite/g++.dg/cpp1z/init-statement6.C 
gcc/testsuite/g++.dg/cpp1z/init-statement6.C
index 53b0d31..e8e24b5 100644
--- gcc/testsuite/g++.dg/cpp1z/init-statement6.C
+++ gcc/testsuite/g++.dg/cpp1z/init-statement6.C
@@ -7,14 +7,14 @@
 
 std::map m;
 extern int xread (int *);
-extern void publish (int), raise (int);
+extern void publish (int), xraise (int);
 
 void
 foo ()
 {
   if (auto it = m.find (10); it != m.end ()) { std::string s = it->second; }
   if (char buf[10]; std::fgets(buf, 10, stdin)) { m[0] += buf; }
-  if (int s; int count = xread ()) { publish(count); raise(s); }
+  if (int s; int count = xread ()) { publish(count); xraise(s); }
 
   const char *s;
   if (auto keywords = {"if", "for", "while"};

Marek


Re: [patch] remove more GCJ references

2016-11-15 Thread Matthias Klose
On 15.11.2016 23:03, Eric Gallager wrote:
> On 11/15/16, Matthias Klose  wrote:
>> On 15.11.2016 21:41, Matthias Klose wrote:
>>> On 15.11.2016 16:52, Jeff Law wrote:
 On 11/15/2016 03:55 AM, Matthias Klose wrote:
> This patch removes some references to gcj in the top level and config
> directories and in the gcc documentation.  The change to the config
> directory requires regenerating aclocal.m4 and configure in each sub
> directory.
>
> Ok for the trunk?
>
> Matthias
>
> 
>
> 2016-11-14  Matthias Klose  
>
> * config-ml.in: Remove references to GCJ.
> * configure.ac: Likewise.
> * configure: Regenerate.
>
> config/
>
> 2016-11-14  Matthias Klose  
>
> multi.m4: Don't set GCJ.
>
> gcc/
>
> 2016-11-14  Matthias Klose  
>
> * doc/install.texi: Remove references to gcj/libjava.
> * doc/invoke.texi: Likewise.
>
 OK.
 jeff
>>>
>>> I was missing more references in the documentation, committing the
>>> remaining changes as obvious:
>>>
>>> gcc/
>>> 2016-11-15  Matthias Klose  
>>>
>>> * doc/install.texi: Remove references to java/libjava.
>>> * doc/sourcebuild.texi: Likewise.
>>
>> and here are the remaining java reference in the user oriented
>> documentation:
>>
>> 2016-11-15  Matthias Klose  
>>
>> * doc/install.texi: Remove references to java/libjava.
>> * doc/invoke.texi: Likewise.
>> * doc/standards.texi: Likewise.
>>
>> Index: gcc/doc/install.texi
>> ===
>> --- gcc/doc/install.texi (revision 242455)
>> +++ gcc/doc/install.texi (working copy)
>> @@ -4021,7 +4021,7 @@
>> it sorts relocations for REL targets (o32, o64, EABI).  This can cause
>> bad code to be generated for simple C++ programs.  Also the linker
>> from GNU binutils versions prior to 2.17 has a bug which causes the
>> -runtime linker stubs in very large programs, like @file{libgcj.so}, to
>> +runtime linker stubs in very large programs to
>> be incorrectly generated.  GNU Binutils 2.18 and later (and snapshots
>> made after Nov. 9, 2006) should be free from both of these problems.
>>
>> Index: gcc/doc/invoke.texi
>> ===
>> --- gcc/doc/invoke.texi  (revision 242455)
>> +++ gcc/doc/invoke.texi  (working copy)
>> @@ -1316,12 +1316,6 @@
>> @item @var{file}.go
>> Go source code.
>>
>> -@c FIXME: Descriptions of Java file types.
>> -@c @var{file}.java
>> -@c @var{file}.class
>> -@c @var{file}.zip
>> -@c @var{file}.jar
>> -
>> @item @var{file}.ads
>> Ada source code file that contains a library unit declaration (a
>> declaration of a package, subprogram, or generic, or a generic
>> @@ -1370,7 +1364,6 @@
>> ada
>> f77  f77-cpp-input f95  f95-cpp-input
>> go
>> -java
>> @end smallexample
>>
>> @item -x none
>> @@ -3174,7 +3167,7 @@
>> @item -fobjc-exceptions
>> @opindex fobjc-exceptions
>> Enable syntactic support for structured exception handling in
>> -Objective-C, similar to what is offered by C++ and Java.  This option
>> +Objective-C, similar to what is offered by C++.  This option
>> is required to use the Objective-C keywords @code{@@try},
>> @code{@@throw}, @code{@@catch}, @code{@@finally} and
>> @code{@@synchronized}.  This option is available with both the GNU
>> @@ -10800,7 +10793,7 @@
>> @opindex fbounds-check
>> For front ends that support it, generate additional code to check that
>> indices used to access arrays are within the declared range.  This is
>> -currently only supported by the Java and Fortran front ends, where
>> +currently only supported by the Fortran front end, where
>> this option defaults to true and false respectively.
> 
> 
> The "defaults to true and false respectively" part no longer makes sense.
> It should probably just be "defaults to false"

thanks, committed.



[PATCH, IRA] PR78325, R_MIPS_JALR failures

2016-11-15 Thread Alan Modra
This is a fix for my PR70890 patch, which incorrectly removed all
REG_EQUIV notes rather than just one regarding a reg that dies.
Bootstrapped and regression tested powerpc64le-linux,
x86_64-linux, and mips-linux.  OK to apply?

PR rtl-optimization/78325
PR rtl-optimization/70890
* ira.c (combine_and_move_insns): Only remove REG_EQUIV notes
for dead regno.

diff --git a/gcc/ira.c b/gcc/ira.c
index 315b847..4ee99d7 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3747,7 +3747,7 @@ combine_and_move_insns (void)
 use_insn, when regno was seen as non-local.  Now that
 regno is local to this block, and dies, such an
 equivalence is invalid.  */
- if (find_reg_note (use_insn, REG_EQUIV, NULL_RTX))
+ if (find_reg_note (use_insn, REG_EQUIV, regno_reg_rtx[regno]))
{
  rtx set = single_set (use_insn);
  if (set && REG_P (SET_DEST (set)))

-- 
Alan Modra
Australia Development Lab, IBM


Re: [patch] remove more GCJ references

2016-11-15 Thread Eric Gallager
On 11/15/16, Matthias Klose  wrote:
> On 15.11.2016 21:41, Matthias Klose wrote:
>> On 15.11.2016 16:52, Jeff Law wrote:
>>> On 11/15/2016 03:55 AM, Matthias Klose wrote:
 This patch removes some references to gcj in the top level and config
 directories and in the gcc documentation.  The change to the config
 directory requires regenerating aclocal.m4 and configure in each sub
 directory.

 Ok for the trunk?

 Matthias

 

 2016-11-14  Matthias Klose  

 * config-ml.in: Remove references to GCJ.
 * configure.ac: Likewise.
 * configure: Regenerate.

 config/

 2016-11-14  Matthias Klose  

 multi.m4: Don't set GCJ.

 gcc/

 2016-11-14  Matthias Klose  

 * doc/install.texi: Remove references to gcj/libjava.
 * doc/invoke.texi: Likewise.

>>> OK.
>>> jeff
>>
>> I was missing more references in the documentation, committing the
>> remaining changes as obvious:
>>
>> gcc/
>> 2016-11-15  Matthias Klose  
>>
>> * doc/install.texi: Remove references to java/libjava.
>> * doc/sourcebuild.texi: Likewise.
>
> and here are the remaining java reference in the user oriented
> documentation:
>
> 2016-11-15  Matthias Klose  
>
> * doc/install.texi: Remove references to java/libjava.
> * doc/invoke.texi: Likewise.
> * doc/standards.texi: Likewise.
>
> Index: gcc/doc/install.texi
> ===
> --- gcc/doc/install.texi  (revision 242455)
> +++ gcc/doc/install.texi  (working copy)
> @@ -4021,7 +4021,7 @@
> it sorts relocations for REL targets (o32, o64, EABI).  This can cause
> bad code to be generated for simple C++ programs.  Also the linker
> from GNU binutils versions prior to 2.17 has a bug which causes the
> -runtime linker stubs in very large programs, like @file{libgcj.so}, to
> +runtime linker stubs in very large programs to
> be incorrectly generated.  GNU Binutils 2.18 and later (and snapshots
> made after Nov. 9, 2006) should be free from both of these problems.
>
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi   (revision 242455)
> +++ gcc/doc/invoke.texi   (working copy)
> @@ -1316,12 +1316,6 @@
> @item @var{file}.go
> Go source code.
>
> -@c FIXME: Descriptions of Java file types.
> -@c @var{file}.java
> -@c @var{file}.class
> -@c @var{file}.zip
> -@c @var{file}.jar
> -
> @item @var{file}.ads
> Ada source code file that contains a library unit declaration (a
> declaration of a package, subprogram, or generic, or a generic
>@@ -1370,7 +1364,6 @@
> ada
> f77  f77-cpp-input f95  f95-cpp-input
> go
> -java
> @end smallexample
>
> @item -x none
> @@ -3174,7 +3167,7 @@
> @item -fobjc-exceptions
> @opindex fobjc-exceptions
> Enable syntactic support for structured exception handling in
> -Objective-C, similar to what is offered by C++ and Java.  This option
> +Objective-C, similar to what is offered by C++.  This option
> is required to use the Objective-C keywords @code{@@try},
> @code{@@throw}, @code{@@catch}, @code{@@finally} and
> @code{@@synchronized}.  This option is available with both the GNU
> @@ -10800,7 +10793,7 @@
> @opindex fbounds-check
> For front ends that support it, generate additional code to check that
> indices used to access arrays are within the declared range.  This is
> -currently only supported by the Java and Fortran front ends, where
> +currently only supported by the Fortran front end, where
> this option defaults to true and false respectively.


The "defaults to true and false respectively" part no longer makes sense.
It should probably just be "defaults to false"


> @item -fcheck-pointer-bounds
> @@ -11861,8 +11854,7 @@
> This option instructs the compiler to assume that signed arithmetic
> overflow of addition, subtraction and multiplication wraps around
> using twos-complement representation.  This flag enables some optimizations
> -and disables others.  This option is enabled by default for the Java
> -front end, as required by the Java language specification.
> +and disables others.
> The options @option{-ftrapv} and @option{-fwrapv} override each other, so 
> using
> @option{-ftrapv} @option{-fwrapv} on the command-line results in
> @option{-fwrapv} being effective.  Note that only active options override, so
> Index: gcc/doc/standards.texi
> ===
> --- gcc/doc/standards.texi(revision 242455)
> +++ gcc/doc/standards.texi(working copy)
> @@ -315,6 +315,3 @@
>
> @xref{Standards,,Standards, gfortran, The GNU Fortran Compiler}, for details
> of standards supported by GNU Fortran.
> -
> -@xref{Compatibility,,Compatibility with the Java Platform, gcj, GNU gcj},

Re: [PATCH] Significantly reduce memory usage of genattrtab

2016-11-15 Thread Richard Sandiford
Bernd Edlinger  writes:
> On 11/15/16 13:21, Richard Sandiford wrote:
>> Bernd Edlinger  writes:
>>> Hi!
>>>
>>> The genattrtab build-tool uses way too much memory in general.
>>> I think there is no other build step that uses more memory.
>>>
>>> On the currently trunk it takes around 700MB to build the
>>> ARM latency tab files.  I debugged that yesterday
>>> and found that this can be reduced to 8MB (!).  Yes, really.
>>>
>>> So the attached patch does try really hard to hash and re-use
>>> all ever created rtx objects.
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM.
>>> Is it OK for trunk?
>>
>> Just to check: does this produce the same output as before?
>> And did you notice any difference in the time genattrtab
>> takes to run?
>>
>
> The run time was in the range of 24-25s, with and without the patch.
>
> However the tables are a bit different, although that seems only to be
> w flaw with the ATTR_CURR_SIMPLIFIED_P which is now re-used when a
> matching rtx was found in the hash.  As I said, the generated functions
> do really work, probably because just a few simplifications are missing.
>
> So it looks like I need to clear the ATTR_CURR_SIMPLIFIED_P on re-used
> binary ops.  That I found out just by try-and-error.  I can say that now
> the generated functions are exactly identical for i386, arm and mips.
> The memory and the run time did not change due to this re-hashing.

OK, thanks for checking.

>> ATTR_PERMANENT_P is supposed to guarantee that no other rtx like it exists,
>> so that x != y when x or y is "permanent" implies that the attributes
>> must be different.  This lets attr_equal_p avoid a recursive walk:
>>
>> static int
>> attr_equal_p (rtx x, rtx y)
>> {
>>   return (x == y || (! (ATTR_PERMANENT_P (x) && ATTR_PERMANENT_P (y))
>>   && rtx_equal_p (x, y)));
>> }
>>
>> Does the patch still guarantee that?
>>
>
> Hmm, I see.  I expected that ATTR_PERMANENT_P means more or less,
> that it is in the hash table.  I believe that a long time ago, there
> was a kind of garbage collection of temporary rtx objects, that needed
> to be copied from the temporary space to the permanent space, after
> the simplification was done.  And then all temporary objects were
> just tossed away.  But that was long before my time.  Today
> everything is permanent, that is why the memory usage is unbounded.
>
> But I can fix that, by only setting ATTR_PERMANENT_P on the hashed
> rtx when both sub-rtx are also ATTR_PERMANENT_P.
>
>
> How does that new version look, is it OK?

OK.  Thanks for doing this, certainly an impressive headline number :-)

Richard


[PATCH] Follow-up patch on enabling new AVX512 instructions

2016-11-15 Thread Andrew Senkevich
Hi,

this is follow-up with tests for new __target__ attributes and
__builtin_cpu_supports update.

gcc/
* config/i386/i386.c (processor_features): Add
F_AVX5124VNNIW, F_AVX5124FMAPS.
(isa_names_table): Handle new features.
libgcc/
* config/i386/cpuinfo.c (processor_features): Add
FEATURE_AVX5124VNNIW, FEATURE_AVX5124FMAPS.
gcc/testsuite/
* gcc.target/i386/builtin_target.c: Handle new "avx5124vnniw",
"avx5124fmaps".
* gcc.target/i386/funcspec-56.inc: Test new attributes.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1da1abc..823930d
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -33205,6 +33205,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
 F_AVX512PF,
 F_AVX512VBMI,
 F_AVX512IFMA,
+F_AVX5124VNNIW,
+F_AVX5124FMAPS,
 F_MAX
   };

@@ -33317,6 +33319,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
   {"avx512pf",F_AVX512PF},
   {"avx512vbmi",F_AVX512VBMI},
   {"avx512ifma",F_AVX512IFMA},
+  {"avx5124vnniw",F_AVX5124VNNIW},
+  {"avx5124fmaps",F_AVX5124FMAPS},
 };

   tree __processor_model_type = build_processor_model_struct ();
diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c
b/gcc/testsuite/gcc.target/i386/builtin_target.c
index 8d45d83..c620a74
--- a/gcc/testsuite/gcc.target/i386/builtin_target.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
@@ -213,6 +213,10 @@ check_features (unsigned int ecx, unsigned int edx,
assert (__builtin_cpu_supports ("avx512ifma"));
   if (ecx & bit_AVX512VBMI)
assert (__builtin_cpu_supports ("avx512vbmi"));
+  if (edx & bit_AVX5124VNNIW)
+   assert (__builtin_cpu_supports ("avx5124vnniw"));
+  if (edx & bit_AVX5124FMAPS)
+   assert (__builtin_cpu_supports ("avx5124fmaps"));
 }
 }

@@ -311,6 +315,10 @@ quick_check ()

   assert (__builtin_cpu_supports ("avx512f") >= 0);

+  assert (__builtin_cpu_supports ("avx5124vnniw") >= 0);
+
+  assert (__builtin_cpu_supports ("avx5124fmaps") >= 0);
+
   /* Check CPU type.  */
   assert (__builtin_cpu_is ("amd") >= 0);

diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index af203f2..4a0ad25
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -115,7 +115,9 @@ enum processor_features
   FEATURE_AVX512ER,
   FEATURE_AVX512PF,
   FEATURE_AVX512VBMI,
-  FEATURE_AVX512IFMA
+  FEATURE_AVX512IFMA,
+  FEATURE_AVX5124VNNIW,
+  FEATURE_AVX5124FMAPS
 };

 struct __processor_model
@@ -359,6 +361,10 @@ get_available_features (unsigned int ecx, unsigned int edx,
features |= (1 << FEATURE_AVX512IFMA);
   if (ecx & bit_AVX512VBMI)
features |= (1 << FEATURE_AVX512VBMI);
+  if (edx & bit_AVX5124VNNIW)
+   features |= (1 << FEATURE_AVX5124VNNIW);
+  if (edx & bit_AVX5124FMAPS)
+   features |= (1 << FEATURE_AVX5124FMAPS);
 }

   unsigned int ext_level;
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
index 521ac8a..9334e9e 100644
--- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
+++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
@@ -28,6 +28,8 @@ extern void test_avx512dq(void)
 __attribute__((__target__("avx512dq")));
 extern void test_avx512er(void)
__attribute__((__target__("avx512er")));
 extern void test_avx512pf(void)
__attribute__((__target__("avx512pf")));
 extern void test_avx512cd(void)
__attribute__((__target__("avx512cd")));
+extern void test_avx5124fmaps(void)
__attribute__((__target__("avx5124fmaps")));
+extern void test_avx5124vnniw(void)
__attribute__((__target__("avx5124vnniw")));
 extern void test_bmi (void)
__attribute__((__target__("bmi")));
 extern void test_bmi2 (void)
__attribute__((__target__("bmi2")));

@@ -59,6 +61,8 @@ extern void test_no_avx512dq(void)
__attribute__((__target__("no-avx512dq")));
 extern void test_no_avx512er(void)
__attribute__((__target__("no-avx512er")));
 extern void test_bo_avx512pf(void)
__attribute__((__target__("no-avx512pf")));
 extern void test_no_avx512cd(void)
__attribute__((__target__("no-avx512cd")));
+extern void test_no_avx5124fmaps(void)
__attribute__((__target__("no-avx5124fmaps")));
+extern void test_no_avx5124vnniw(void)
__attribute__((__target__("no-avx5124vnniw")));
 extern void test_no_bmi (void)
__attribute__((__target__("no-bmi")));
 extern void test_no_bmi2 (void)
__attribute__((__target__("no-bmi2")));


--
WBR,
Andrew


followup_tests.patch
Description: Binary data


Re: Use df_read_modify_subreg_p in cprop.c

2016-11-15 Thread Jeff Law

On 11/15/2016 09:27 AM, Richard Sandiford wrote:

local_cprop_find_used_regs punted on all multiword registers,
with the comment:

  /* Setting a subreg of a register larger than word_mode leaves
 the non-written words unchanged.  */

But this only applies if the outer mode is smaller than the
inner mode.  If they're the same size then writes to the subreg
are a normal full update.

This patch uses df_read_modify_subreg_p instead.  A later patch
adds more uses of the same routine, but this part had a (positive)
effect on code generation for the testsuite whereas the others
seemed to be simple clean-ups.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* cprop.c (local_cprop_find_used_regs): Use df_read_modify_subreg_p.

OK.
jeff



Re: Add a mem_alias_size helper class

2016-11-15 Thread Richard Sandiford
Eric Botcazou  writes:
>> alias.c encodes memory sizes as follows:
>> 
>> size > 0: the exact size is known
>> size == 0: the size isn't known
>> size < 0: the exact size of the reference itself is known,
>>   but the address has been aligned via AND.  In this case
>>   "-size" includes the size of the reference and the worst-case
>>   number of bytes traversed by the AND.
>> 
>> This patch wraps this up in a helper class and associated
>> functions.  The new routines fix what seems to be a hole
>> in the old logic: if the size of a reference A was unknown,
>> offset_overlap_p would assume that it could conflict with any
>> other reference B, even if we could prove that B comes before A.
>> 
>> The fallback CONSTANT_P (x) && CONSTANT_P (y) case looked incorrect.
>> Either "c" is trustworthy as a distance between the two constants,
>> in which case the alignment handling should work as well there as
>> elsewhere, or "c" isn't trustworthy, in which case offset_overlap_p
>> is unsafe.  I think the latter's true; AFAICT we have no evidence
>> that "c" really is the distance between the two references, so using
>> it in the check doesn't make sense.
>> 
>> At this point we've excluded cases for which:
>> 
>> (a) the base addresses are the same
>> (b) x and y are SYMBOL_REFs, or SYMBOL_REF-based constants
>> wrapped in a CONST
>> (c) x and y are both constant integers
>> 
>> No useful cases should be left.  As things stood, we would
>> assume that:
>> 
>>   (mem:SI (const_int X))
>> 
>> could overlap:
>> 
>>   (mem:SI (symbol_ref Y))
>> 
>> but not:
>> 
>>   (mem:SI (const (plus (symbol_ref Y) (const_int 4
>
> Frankly this seems to be an example of counter-productive C++ization: the 
> class doesn't provide any useful abstraction and the code gets obfuscated by 
> all the wrapper methods.  Moreover it's mixed with real changes so very hard 
> to review.  Can't you just fix what needs to be fixed first?

Sorry, I should have said, but this wasn't C++-ification for its own sake.
It was part of the changes to make modes and MEM_OFFSETs be runtime
invariants of the form a+bX for runtime X.  Making that change to modes
and MEM_OFFSETs meant that these alias sizes also become runtime invariants.
The classification above is then:

  size may be greater than 0: the exact size is known
  size must be equal to 0: the size is unknown
  size may be less than 0: -size is the maximum size including alignment

with the assumption that a and b cannot be such that a+bX>0 for some X
and <0 for other X.  So we were faced the prospect of having to change
every existing ==0, >0 and <0 test anyway.  The question was whether
to change them to "may be greater than 0?" etc. or change them to
something more mnemonic like "exact size known?".  The latter seemed
better.

That part in itself could be done using inline functions.  What the
class gives is that it also enforces statically that the restrictions
on a and b above hold, i.e. that a+bX is always ordered wrt 0.

Similarly, abs(a+bX) cannot be represented as a'+b'X for all a and b,
so abs() is not unconditionally computable on these runtime invariants.
The use of abs() on the encoded sizes would therefore also need either
a wrapper like "max_size" or a cut-&-paste change to cope with the fact
that abs() is always computable on these sizes.  Using the more mnemonic
"max_size" seemed better than preserving the use of "abs"; to me, abs
implies we have a negative displacement from an end point, whereas
really the sign bit is being used as a boolean to indicate inexactness.
Again, "max_size" could just be an inline function, but the class
enforces statically that abs() is computable.

We separated the patch out because it made the actual switch to support
runtime invariants almost mechanical.

Thanks,
Richard


Re: [PATCH] Significantly reduce memory usage of genattrtab

2016-11-15 Thread Bernd Edlinger
On 11/15/16 13:21, Richard Sandiford wrote:
> Bernd Edlinger  writes:
>> Hi!
>>
>> The genattrtab build-tool uses way too much memory in general.
>> I think there is no other build step that uses more memory.
>>
>> On the currently trunk it takes around 700MB to build the
>> ARM latency tab files.  I debugged that yesterday
>> and found that this can be reduced to 8MB (!).  Yes, really.
>>
>> So the attached patch does try really hard to hash and re-use
>> all ever created rtx objects.
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM.
>> Is it OK for trunk?
>
> Just to check: does this produce the same output as before?
> And did you notice any difference in the time genattrtab
> takes to run?
>

The run time was in the range of 24-25s, with and without the patch.

However the tables are a bit different, although that seems only to be
w flaw with the ATTR_CURR_SIMPLIFIED_P which is now re-used when a
matching rtx was found in the hash.  As I said, the generated functions
do really work, probably because just a few simplifications are missing.

So it looks like I need to clear the ATTR_CURR_SIMPLIFIED_P on re-used
binary ops.  That I found out just by try-and-error.  I can say that now
the generated functions are exactly identical for i386, arm and mips.
The memory and the run time did not change due to this re-hashing.

>
> ATTR_PERMANENT_P is supposed to guarantee that no other rtx like it exists,
> so that x != y when x or y is "permanent" implies that the attributes
> must be different.  This lets attr_equal_p avoid a recursive walk:
>
> static int
> attr_equal_p (rtx x, rtx y)
> {
>   return (x == y || (! (ATTR_PERMANENT_P (x) && ATTR_PERMANENT_P (y))
>&& rtx_equal_p (x, y)));
> }
>
> Does the patch still guarantee that?
>

Hmm, I see.  I expected that ATTR_PERMANENT_P means more or less,
that it is in the hash table.  I believe that a long time ago, there
was a kind of garbage collection of temporary rtx objects, that needed
to be copied from the temporary space to the permanent space, after
the simplification was done.  And then all temporary objects were
just tossed away.  But that was long before my time.  Today
everything is permanent, that is why the memory usage is unbounded.

But I can fix that, by only setting ATTR_PERMANENT_P on the hashed
rtx when both sub-rtx are also ATTR_PERMANENT_P.


How does that new version look, is it OK?


Thanks
Bernd.
2016-11-15  Bernd Edlinger  

* genattrtab.c (attr_rtx_1): Avoid allocating new rtx objects.
Clear ATTR_CURR_SIMPLIFIED_P for re-used binary rtx objects.
Use DEF_ATTR_STRING for string arguments.  Use RTL_HASH for
integer arguments.  Only set ATTR_PERMANENT_P on newly hashed
rtx when all sub-rtx are also permanent.
(attr_eq): Simplify.
(attr_copy_rtx): Remove.
(make_canonical, get_attr_value): Use attr_equal_p.
(copy_boolean): Rehash NOT.
(simplify_test_exp_in_temp,
optimize_attrs): Remove call to attr_copy_rtx.
(attr_alt_intersection, attr_alt_union,
attr_alt_complement, mk_attr_alt): Rehash EQ_ATTR_ALT.
(make_automaton_attrs): Use attr_eq.
Index: gcc/genattrtab.c
===
--- gcc/genattrtab.c	(revision 242335)
+++ gcc/genattrtab.c	(working copy)
@@ -386,6 +386,7 @@ attr_rtx_1 (enum rtx_code code, va_list p)
   unsigned int hashcode;
   struct attr_hash *h;
   struct obstack *old_obstack = rtl_obstack;
+  int permanent_p = 1;
 
   /* For each of several cases, search the hash table for an existing entry.
  Use that entry if one is found; otherwise create a new RTL and add it
@@ -395,13 +396,8 @@ attr_rtx_1 (enum rtx_code code, va_list p)
 {
   rtx arg0 = va_arg (p, rtx);
 
-  /* A permanent object cannot point to impermanent ones.  */
   if (! ATTR_PERMANENT_P (arg0))
-	{
-	  rt_val = rtx_alloc (code);
-	  XEXP (rt_val, 0) = arg0;
-	  return rt_val;
-	}
+	permanent_p = 0;
 
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
@@ -425,14 +421,8 @@ attr_rtx_1 (enum rtx_code code, va_list p)
   rtx arg0 = va_arg (p, rtx);
   rtx arg1 = va_arg (p, rtx);
 
-  /* A permanent object cannot point to impermanent ones.  */
   if (! ATTR_PERMANENT_P (arg0) || ! ATTR_PERMANENT_P (arg1))
-	{
-	  rt_val = rtx_alloc (code);
-	  XEXP (rt_val, 0) = arg0;
-	  XEXP (rt_val, 1) = arg1;
-	  return rt_val;
-	}
+	permanent_p = 0;
 
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
@@ -440,7 +430,10 @@ attr_rtx_1 (enum rtx_code code, va_list p)
 	&& GET_CODE (h->u.rtl) == code
 	&& XEXP (h->u.rtl, 0) == arg0
 	&& XEXP (h->u.rtl, 1) == arg1)
-	  return h->u.rtl;
+	  {
+	

Re: [PATCH, Fortran, pr78356, v1] [7 Regression] [OOP] segfault allocating polymorphic variable with polymorphic component with allocatable component

2016-11-15 Thread Janus Weil
Hi Andre,

> attached patch fixes the issue raised. The issue here was, that a copy of the
> base class was generated and its address passed to the _vptr->copy()-method,
> which then accessed memory, that was not present in the copy being an object 
> of
> the base class. The patch fixes this by making sure the temporary handle is a
> pointer to the data to copy.
>
> Sorry, when that is not clear. I am not feeling so well today. So here in
> terms of pseudo code. This code was formerly generated:
>
> struct ac {};
> struct a : struct ac { integer *i; };
>
> a src, dst;
> ac temp;
>
> temp = src; // temp is now only a copy of ac
>
> _vptr.copy(, ); // temp does not denote memory having a pointer to i
>
> After the patch, this code is generated:
>
> // types as above
> a src, dst;
> ac *temp; // !!! Now a pointer
>
> temp = 
> _vptr.copy(temp, ); // temp now points to memory that has a pointer to i
> // and is valid for copying.
>
> Bootstraps and regtests ok on x86_64-linux/F23. Ok for trunk?

ok with me. Thanks for the quick fix!

Cheers,
Janus


[PATCH] spellcheck bugfixes: don't offer the goal string as a suggestion

2016-11-15 Thread David Malcolm
This patch addresses various bugs in the spellcheck code in which
the goal string somehow makes it into the candidate list.
The goal string will always have an edit distance of 0 to itself, and
thus is the "closest" string to the goal, but offering it as a
suggestion will always be nonsensical e.g.
  'constexpr' does not name a type; did you mean 'constexpr'?

Ultimately such suggestions are due to bugs in constructing the
candidate list.

As a band-aid, the patch updates
best_match::get_best_meaningful_candidate so that we no longer
offer suggestions for the case where the edit distance == 0
(where candidate == goal).

Doing so fixes PR c++/72786, PR c++/77922, and PR c++/78313.

I looked at fixing the candidate list in each of these bugs.

PR c++/72786 (macro defined after use): this occurs because we're
using the set of macro names at the end of parsing, rather than
at the point of parsing the site of the would-be macro usage.
A better fix would be to indicate this, but would be somewhat
invasive, needing a new internal API (perhaps too much for
stage 3?) so hopefully the band-aid is good enough for GCC 7.

PR c++/77922: the patch updates C++'s lookup_name_fuzzy to only
suggest reserved words that are matched by "-std=" etc, which
thus eliminating bogus words from the candidate list.

PR c++/78313: I attempted to prune the candidate list here, but it
led to a worse message (see the comment in that bug), hence I'd
prefer to rely on the best_match::get_best_meaningful_candidate
fix for this one.

Successfully bootstrapped on x86_64-pc-linux-gnu; adds
26 PASS results to g++.sum.

OK for trunk?

gcc/cp/ChangeLog:
PR c++/77922
* name-lookup.c (lookup_name_fuzzy): Filter out reserved words
that were filtered out by init_reswords.

gcc/ChangeLog:
PR c++/72774
PR c++/72786
PR c++/77922
PR c++/78313
* spellcheck.c (selftest::test_find_closest_string): Verify that
we don't offer the goal string as a suggestion.
* spellcheck.h (best_match::get_best_meaningful_candidate): Don't
offer the goal string as a suggestion.

gcc/testsuite/ChangeLog:
PR c++/72774
PR c++/72786
PR c++/77922
PR c++/78313
* g++.dg/spellcheck-c++-11-keyword.C: New test case.
* g++.dg/spellcheck-macro-ordering.C: New test case.
* g++.dg/spellcheck-pr78313.C: New test case.
---
 gcc/cp/name-lookup.c |  6 ++
 gcc/spellcheck.c |  5 +
 gcc/spellcheck.h | 10 ++
 gcc/testsuite/g++.dg/spellcheck-c++-11-keyword.C | 15 +++
 gcc/testsuite/g++.dg/spellcheck-macro-ordering.C | 15 +++
 gcc/testsuite/g++.dg/spellcheck-pr78313.C| 11 +++
 6 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-c++-11-keyword.C
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-macro-ordering.C
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-pr78313.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 7ad65b8..84e064d 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -4811,6 +4811,12 @@ lookup_name_fuzzy (tree name, enum 
lookup_name_fuzzy_kind kind)
   if (!resword_identifier)
continue;
   gcc_assert (TREE_CODE (resword_identifier) == IDENTIFIER_NODE);
+
+  /* Only consider reserved words that survived the
+filtering in init_reswords (e.g. for -std).  */
+  if (!C_IS_RESERVED_WORD (resword_identifier))
+   continue;
+
   bm.consider (resword_identifier);
 }
 
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
index b37b1e4..86cdee1 100644
--- a/gcc/spellcheck.c
+++ b/gcc/spellcheck.c
@@ -210,6 +210,11 @@ test_find_closest_string ()
   ASSERT_STREQ ("banana", find_closest_string ("banyan", ));
   ASSERT_STREQ ("cherry", find_closest_string ("berry", ));
   ASSERT_EQ (NULL, find_closest_string ("not like the others", ));
+
+  /* If the goal string somehow makes it into the candidate list, offering
+ it as a suggestion will be nonsensical.  Verify that we don't offer such
+ suggestions.  */
+  ASSERT_EQ (NULL, find_closest_string ("banana", ));
 }
 
 /* Test data for test_metric_conditions.  */
diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
index b48cfbc..41c9308 100644
--- a/gcc/spellcheck.h
+++ b/gcc/spellcheck.h
@@ -165,6 +165,16 @@ class best_match
if (m_best_distance > cutoff)
  return NULL;
 }
+
+/* If the goal string somehow makes it into the candidate list, offering
+   it as a suggestion will be nonsensical e.g.
+ 'constexpr' does not name a type; did you mean 'constexpr'?
+   Ultimately such suggestions are due to bugs in constructing the
+   candidate list, but as a band-aid, do not offer suggestions for
+   distance == 0 (where candidate == goal).  */
+if (m_best_distance == 0)
+  return NULL;
+
 return 

Re: [PATCH][PPC] Fix ICE using power9 with soft-float

2016-11-15 Thread Andrew Stubbs

On 15/11/16 21:06, Michael Meissner wrote:

Now, that I have a little time, I can look into this, to at least make
predicate and peepholes match.  There is some other stuff (support for the new
load/store that were added to the compiler after that we should also tackle).


I've been investigating this today, and I've found that the insn does 
not match because the "fusion_addis_mem_combo_store" predicate requires 
TARGET_SF_FPR is true, which in turn requires TARGET_HARD_FLOAT is true.


So basically the fusion stuff is disabled in soft-float mode regardless 
of where the value is stored.


Anyway, I'm at end-of-day now, so let me know if you come up with anything.

Thanks

Andrew


Re: [PATCH 9/9] Add "__RTL" to cc1 (v4)

2016-11-15 Thread David Malcolm
On Mon, 2016-11-14 at 16:14 +0100, Richard Biener wrote:
> On Fri, Nov 11, 2016 at 10:15 PM, David Malcolm 
> wrote:
> > Changed in this version:
> > 
> > * Rather than running just one pass, run *all* passes, but start at
> >   the given pass; support for "dg-do run" tests that execute the
> >   resulting code.
> > * Updated test cases to new "compact" dump format; more test cases;
> >   use "dg-do run" in various places.
> > * Lots of bugfixing
> > 
> > Links to previous versions:
> >   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00263.html
> >   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00500.html

> Does running the RTL passes right from the parser work with -fsyntax
> -only?

It depends what you mean by "work".  If I run it with -fsyntax-only,
then pass_rest_of_compilation::gate returns false, and none of the RTL
passes are run.  Is this behavior correct?

> Doing it like __GIMPLE has the advantage of not exposing
> "rest_of_compilation", etc..

The gimple part of the compiler supports having multiple functions in
memory at once, and then compiling them in arbitrary order based on
decisions made by the callgraph.

By contrast, the RTL part of the compiler is full of singleton state:
things like crtl (aka x_rtl), the state of emit-rtl.c,
"reload_completed", etc etc.

To try to limit the scope of the project, for the RTL frontend work I'm
merely attempting to restore the singleton RTL state from a dump,
rather than to support having per function stashes of RTL state.

Hence the rest of compilation gets invoked directly from the frontend
for the __RTL case, since it will get overwritten if there's a second
__RTL function in the source file (which sounds like an idea for a test
case; I'll attempt such a test case).

I hope this is a reasonable approach.  If not, I suppose I can attempt
to bundle it up into some kind of RTL function state, but that seems
like significant scope creep.

> I'm now handling __GIMPLE from within declspecs (the GIMPLE FE stuff
> has been committed), it would be nice to match the __RTL piece here.

(Congratulations on getting the GIMPLE FE stuff in)

I'm not sure I understand you here - do you want me to rewrite the
__RTL parsing to match the __GIMPLE piece, or the other way around?
If the former, presumably I should reuse (and rename)
c_parser_gimple_pass_list?


> > gcc/ChangeLog:
> > * Makefile.in (OBJS): Add run-rtl-passes.o.
> > 
> > gcc/c-family/ChangeLog:
> > * c-common.c (c_common_reswords): Add "__RTL".
> > * c-common.h (enum rid): Add RID_RTL.
> > 
> > gcc/c/ChangeLog:
> > * c-parser.c: Include "read-rtl-function.h" and
> > "run-rtl-passes.h".
> > (c_parser_declaration_or_fndef): In the "GNU extensions"
> > part of
> > the leading comment, add an alternate production for
> > "function-definition", along with new "rtl-body-specifier"
> > and
> > "rtl-body-pass-specifier" productions.  Handle "__RTL" by
> > calling
> > c_parser_parse_rtl_body.  Convert a timevar_push/pop pair
> > to an auto_timevar, to cope with early exit.
> > (c_parser_parse_rtl_body): New function.
> > 
> > gcc/ChangeLog:
> > * cfg.c (free_original_copy_tables): Remove assertion
> > on original_copy_bb_pool.
> 
> How can that trigger?

It happens when running pass_outof_cfg_layout_mode::execute; seen with
gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c.

The input file is a dump taken in cfg_layout mode (although in this
case it's a trivial 3-basic-block CFG, but ideally there would be cases
with non-trivial control flow); it has "fwprop1" as its starting pass.

Running without -quiet shows:

skipping pass: *rest_of_compilation
skipping pass: vregs
skipping pass: into_cfglayout
skipping pass: jump
skipping pass: subreg1
skipping pass: cse1
found starting pass: fwprop1

i.e. it skips the into_cfglayout (amongst others), to start with
fwprop1.

In theory skipping a pass ought to be a no-op, assuming that we're
faithfully reconstructing all RTL state.  However, RTL state management
is fiddly, so the patch introduces some logic in passes.c to do some
things when skipping a pass; in particular, it has:

  /* Update the cfg hooks as appropriate.  */
  if (strcmp (pass->name, "into_cfglayout") == 0)
{
  cfg_layout_rtl_register_cfg_hooks ();
  cfun->curr_properties |= PROP_cfglayout;
}
  if (strcmp (pass->name, "outof_cfglayout") == 0)
{
  rtl_register_cfg_hooks ();
  cfun->curr_properties &= ~PROP_cfglayout;
}

so that even when skipping "into_cfglayout", the CFG hooks are at least
correct.

The assertion fires when running outof_cfglayout later on (rather than
skipping it); the assertion:

  gcc_assert (original_copy_bb_pool);

assumes that into_cfglayout was actually run, rather than just the
simple 

Re: [PATCH][PPC] Fix ICE using power9 with soft-float

2016-11-15 Thread Michael Meissner
On Mon, Nov 14, 2016 at 04:57:58PM +, Andrew Stubbs wrote:
> The testcase powerpc/fusion3.c causes an ICE when compiled with
> -msoft-float.
> 
> The key line in the testcase looks fairly harmless:
> 
>void fusion_float_write (float *p, float f){ p[LARGE] = f; }

LARGE is large enough that it won't fit in the offset field of a single
instruction.

> The error message look like this:
> 
> .../gcc.target/powerpc/fusion3.c: In function 'fusion_float_write':
> .../gcc.target/powerpc/fusion3.c:12:1: error: unrecognizable insn:
> (insn 18 4 14 2 (parallel [
> (set (mem:SF (plus:SI (plus:SI (reg:SI 3 3 [ p ])
> (const_int 327680 [0x5]))
> (const_int -29420 [0x8d14])) [1
> MEM[(float *)p_1(D) + 298260B]+0 S4 A32])
> (unspec:SF [
> (reg:SF 4 4 [ f ])
> ] UNSPEC_FUSION_P9))
> (clobber (reg/f:SI 3 3 [157]))
> ]) 
> "/scratch/astubbs/fsf/src/gcc-mainline/gcc/testsuite/gcc.target/powerpc/fusion3.c":12
> -1
>  (nil))

When I wrote the basic fusion stuff, I was assuming nobody would do
-msoft-float -mcpu=power7.  By the time the code had been written, the
soft-float libraries were no longer being built on the 64-bit Linux systems,
due to the Linux distributions dropping support for them.

However, while we can make this particular failure go away by making
powerpc_p9vector_ok (and probably some of the other targets needing VSX
features) false if -msoft-float it is still a problem, since SFmode can go in
GPRs.

This is the same basic failure (PR 78101) that I saw in building the next
generation of the Spec benchmark suite, except that it is a DFmode instead of
SFmode.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78101

Both are trying to store a value from a GPR


> Basically, the problem is that the peephole optimization tries to
> create a Power9 Fusion instruction, but those do not support SF
> values in integer registers (AFAICT).
> 
> So, presumably, I need to adjust either the predicate or the
> condition of the peephole rules.

Now, that I have a little time, I can look into this, to at least make
predicate and peepholes match.  There is some other stuff (support for the new
load/store that were added to the compiler after that we should also tackle).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 5/9] Add patterns and predicates foutline-msabi-xlouges

2016-11-15 Thread Daniel Santos

On 11/15/2016 02:06 PM, Daniel Santos wrote:

+;; Save multiple registers out-of-line after realignment
+(define_insn "save_multiple_realign"
+  [(match_parallel 0 "save_multiple"
+[(use (match_operand:P 1 "symbol_operand"))
+ (set (reg:P SP_REG) (plus:P (reg:P AX_REG)
+ (match_operand:DI 2 "const_int_operand")))
+])]
+  "TARGET_SSE && TARGET_64BIT"
+  "leaq\t%c2(%%rax),%%rsp;\n\tcall\t%P1")


This pattern was included by mistake (it's incorrect and improperly 
documented). This is supposed to be the pattern that manages the enter 
and realignment in the special optimization case of all 17 registers 
being clobbered and I can do the enter, stack realignment and allocation 
in savms64f.S just prior to the symbol __savms64f_17. Please ignore it 
for now.


Daniel


Re: [patch] remove more GCJ references

2016-11-15 Thread Matthias Klose
On 15.11.2016 21:41, Matthias Klose wrote:
> On 15.11.2016 16:52, Jeff Law wrote:
>> On 11/15/2016 03:55 AM, Matthias Klose wrote:
>>> This patch removes some references to gcj in the top level and config
>>> directories and in the gcc documentation.  The change to the config 
>>> directory
>>> requires regenerating aclocal.m4 and configure in each sub directory.
>>>
>>> Ok for the trunk?
>>>
>>> Matthias
>>>
>>> 
>>>
>>> 2016-11-14  Matthias Klose  
>>>
>>> * config-ml.in: Remove references to GCJ.
>>> * configure.ac: Likewise.
>>> * configure: Regenerate.
>>>
>>> config/
>>>
>>> 2016-11-14  Matthias Klose  
>>>
>>> multi.m4: Don't set GCJ.
>>>
>>> gcc/
>>>
>>> 2016-11-14  Matthias Klose  
>>>
>>> * doc/install.texi: Remove references to gcj/libjava.
>>> * doc/invoke.texi: Likewise.
>>>
>> OK.
>> jeff
> 
> I was missing more references in the documentation, committing the remaining
> changes as obvious:
> 
> gcc/
> 2016-11-15  Matthias Klose  
> 
> * doc/install.texi: Remove references to java/libjava.
> * doc/sourcebuild.texi: Likewise.

and here are the remaining java reference in the user oriented documentation:

2016-11-15  Matthias Klose  

* doc/install.texi: Remove references to java/libjava.
* doc/invoke.texi: Likewise.
* doc/standards.texi: Likewise.


gcc/
 
2016-11-15  Matthias Klose  

 	* doc/install.texi: Remove references to java/libjava.
	* doc/invoke.texi: Likewise.
	* doc/standards.texi: Likewise.
 
Index: gcc/doc/install.texi
===
--- gcc/doc/install.texi	(revision 242455)
+++ gcc/doc/install.texi	(working copy)
@@ -4021,7 +4021,7 @@
 it sorts relocations for REL targets (o32, o64, EABI).  This can cause
 bad code to be generated for simple C++ programs.  Also the linker
 from GNU binutils versions prior to 2.17 has a bug which causes the
-runtime linker stubs in very large programs, like @file{libgcj.so}, to
+runtime linker stubs in very large programs to
 be incorrectly generated.  GNU Binutils 2.18 and later (and snapshots
 made after Nov. 9, 2006) should be free from both of these problems.
 
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 242455)
+++ gcc/doc/invoke.texi	(working copy)
@@ -1316,12 +1316,6 @@
 @item @var{file}.go
 Go source code.
 
-@c FIXME: Descriptions of Java file types.
-@c @var{file}.java
-@c @var{file}.class
-@c @var{file}.zip
-@c @var{file}.jar
-
 @item @var{file}.ads
 Ada source code file that contains a library unit declaration (a
 declaration of a package, subprogram, or generic, or a generic
@@ -1370,7 +1364,6 @@
 ada
 f77  f77-cpp-input f95  f95-cpp-input
 go
-java
 @end smallexample
 
 @item -x none
@@ -3174,7 +3167,7 @@
 @item -fobjc-exceptions
 @opindex fobjc-exceptions
 Enable syntactic support for structured exception handling in
-Objective-C, similar to what is offered by C++ and Java.  This option
+Objective-C, similar to what is offered by C++.  This option
 is required to use the Objective-C keywords @code{@@try},
 @code{@@throw}, @code{@@catch}, @code{@@finally} and
 @code{@@synchronized}.  This option is available with both the GNU
@@ -10800,7 +10793,7 @@
 @opindex fbounds-check
 For front ends that support it, generate additional code to check that
 indices used to access arrays are within the declared range.  This is
-currently only supported by the Java and Fortran front ends, where
+currently only supported by the Fortran front end, where
 this option defaults to true and false respectively.
 
 @item -fcheck-pointer-bounds
@@ -11861,8 +11854,7 @@
 This option instructs the compiler to assume that signed arithmetic
 overflow of addition, subtraction and multiplication wraps around
 using twos-complement representation.  This flag enables some optimizations
-and disables others.  This option is enabled by default for the Java
-front end, as required by the Java language specification.
+and disables others.
 The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
 @option{-ftrapv} @option{-fwrapv} on the command-line results in
 @option{-fwrapv} being effective.  Note that only active options override, so
Index: gcc/doc/standards.texi
===
--- gcc/doc/standards.texi	(revision 242455)
+++ gcc/doc/standards.texi	(working copy)
@@ -315,6 +315,3 @@
 
 @xref{Standards,,Standards, gfortran, The GNU Fortran Compiler}, for details
 of standards supported by GNU Fortran.
-
-@xref{Compatibility,,Compatibility with the Java Platform, gcj, GNU gcj},
-for details of compatibility between @command{gcj} and the Java Platform.


Re: Fix vec_cmp comparison mode

2016-11-15 Thread Jeff Law

On 11/15/2016 09:49 AM, Richard Sandiford wrote:

vec_cmps assign the result of a vector comparison to a mask.
The optab was called with the destination having mode mask_mode
but with the source (the comparison) having mode VOIDmode,
which led to invalid rtl if the source operand was used directly.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* optabs.c (vector_compare_rtx): Add a cmp_mode parameter
and use it in the final call to gen_rtx_fmt_ee.
(expand_vec_cond_expr): Update accordingly.
(expand_vec_cmp_expr): Likewise.

OK.
jeff



Re: [patch] remove more GCJ references

2016-11-15 Thread Matthias Klose
On 15.11.2016 16:52, Jeff Law wrote:
> On 11/15/2016 03:55 AM, Matthias Klose wrote:
>> This patch removes some references to gcj in the top level and config
>> directories and in the gcc documentation.  The change to the config directory
>> requires regenerating aclocal.m4 and configure in each sub directory.
>>
>> Ok for the trunk?
>>
>> Matthias
>>
>> 
>>
>> 2016-11-14  Matthias Klose  
>>
>> * config-ml.in: Remove references to GCJ.
>> * configure.ac: Likewise.
>> * configure: Regenerate.
>>
>> config/
>>
>> 2016-11-14  Matthias Klose  
>>
>> multi.m4: Don't set GCJ.
>>
>> gcc/
>>
>> 2016-11-14  Matthias Klose  
>>
>> * doc/install.texi: Remove references to gcj/libjava.
>> * doc/invoke.texi: Likewise.
>>
> OK.
> jeff

I was missing more references in the documentation, committing the remaining
changes as obvious:

gcc/
2016-11-15  Matthias Klose  

* doc/install.texi: Remove references to java/libjava.
* doc/sourcebuild.texi: Likewise.


gcc/
2016-11-15  Matthias Klose  

	* doc/install.texi: Remove references to java/libjava.
	* doc/sourcebuild.texi: Likewise.

Index: gcc/doc/install.texi
===
--- gcc/doc/install.texi	(revision 242453)
+++ gcc/doc/install.texi	(working copy)
@@ -498,28 +498,6 @@
 Necessary when applying patches, created with @command{diff}, to one's
 own sources.
 
-@item ecj1
-@itemx gjavah
-
-If you wish to modify @file{.java} files in libjava, you will need to
-configure with @option{--enable-java-maintainer-mode}, and you will need
-to have executables named @command{ecj1} and @command{gjavah} in your path.
-The @command{ecj1} executable should run the Eclipse Java compiler via
-the GCC-specific entry point.  You can download a suitable jar from
-@uref{ftp://sourceware.org/pub/java/}, or by running the script
-@command{contrib/download_ecj}.
-
-@item antlr.jar version 2.7.1 (or later)
-@itemx antlr binary
-
-If you wish to build the @command{gjdoc} binary in libjava, you will
-need to have an @file{antlr.jar} library available. The library is
-searched for in system locations but can be specified with
-@option{--with-antlr-jar=} instead.  When configuring with
-@option{--enable-java-maintainer-mode}, you will need to have one of
-the executables named @command{cantlr}, @command{runantlr} or
-@command{antlr} in your path.
-
 @end table
 
 @html
@@ -550,9 +528,9 @@
 Please refer to the @uref{http://gcc.gnu.org/releases.html,,releases web page}
 for information on how to obtain GCC@.
 
-The source distribution includes the C, C++, Objective-C, Fortran, Java,
+The source distribution includes the C, C++, Objective-C, Fortran,
 and Ada (in the case of GCC 3.1 and later) compilers, as well as
-runtime libraries for C++, Objective-C, Fortran, and Java.
+runtime libraries for C++, Objective-C, and Fortran.
 For previous versions these were downloadable as separate components such
 as the core GCC distribution, which included the C language front end and
 shared components, and language-specific distributions including the
@@ -934,7 +912,7 @@
 will be built.  Package names currently recognized in the GCC tree are
 @samp{libgcc} (also known as @samp{gcc}), @samp{libstdc++} (not
 @samp{libstdc++-v3}), @samp{libffi}, @samp{zlib}, @samp{boehm-gc},
-@samp{ada}, @samp{libada}, @samp{libjava}, @samp{libgo}, and @samp{libobjc}.
+@samp{ada}, @samp{libada}, @samp{libgo}, and @samp{libobjc}.
 Note @samp{libiberty} does not support shared libraries at all.
 
 Use @option{--disable-shared} to build only static libraries.  Note that
@@ -1178,7 +1156,7 @@
 @item --enable-threads
 Specify that the target
 supports threads.  This affects the Objective-C compiler and runtime
-library, and exception handling for other languages like C++ and Java.
+library, and exception handling for other languages like C++.
 On some systems, this is the default.
 
 In general, the best (and, in many cases, the only known) threading
@@ -1195,7 +1173,7 @@
 Specify that
 @var{lib} is the thread support library.  This affects the Objective-C
 compiler and runtime library, and exception handling for other languages
-like C++ and Java.  The possibilities for @var{lib} are:
+like C++.  The possibilities for @var{lib} are:
 
 @table @code
 @item aix
@@ -1443,7 +1421,7 @@
 @option{--with-gxx-include-dir=@var{dirname}}.  Using this option is
 particularly useful if you intend to use several versions of GCC in
 parallel.  This is currently supported by @samp{libgfortran},
-@samp{libjava}, @samp{libstdc++}, and @samp{libobjc}.
+@samp{libstdc++}, and @samp{libobjc}.
 
 @item @anchor{WithAixSoname}--with-aix-soname=@samp{aix}, @samp{svr4} or @samp{both}
 Traditional AIX shared library versioning (versioned @code{Shared Object}
@@ -1563,7 +1541,7 @@
 @end smallexample
 Currently, you can use any of the following:
 @code{all}, 

[Patch, Fortran] PR 66227: [5/6/7 Regression] [OOP] EXTENDS_TYPE_OF n returns wrong result for polymorphic variable allocated to extended type

2016-11-15 Thread Janus Weil
Hi all,

the attached patch fixes a wrong-code problem with the intrinsic
function EXTENDS_TYPE_OF. The simplification function which tries to
reduce calls to EXTENDS_TYPE_OF to a compile-time constant (if
possible) was a bit over-zealous and simplified cases that were
actually not decidable at compile-time, thus causing wrong code.

The patch fixes the simplification function and also the corresponding
test case (which unfortunately was wrong as well) and regtests
cleanly. Ok for trunk and the release branches?

Cheers,
Janus



2016-11-15  Janus Weil  

PR fortran/66227
* simplify.c (gfc_simplify_extends_type_of): Prevent over-
simplification. Fix a comment. Add a comment.

2016-11-15  Janus Weil  

PR fortran/66227
* gfortran.dg/extends_type_of_3.f90: Fix and extend the test case.
Index: gcc/fortran/simplify.c
===
--- gcc/fortran/simplify.c  (Revision 242447)
+++ gcc/fortran/simplify.c  (Arbeitskopie)
@@ -2517,7 +2517,7 @@ gfc_simplify_extends_type_of (gfc_expr *a, gfc_exp
   if (UNLIMITED_POLY (a) || UNLIMITED_POLY (mold))
 return NULL;
 
-  /* Return .false. if the dynamic type can never be the same.  */
+  /* Return .false. if the dynamic type can never be an extension.  */
   if ((a->ts.type == BT_CLASS && mold->ts.type == BT_CLASS
&& !gfc_type_is_extension_of
(mold->ts.u.derived->components->ts.u.derived,
@@ -2535,10 +2535,14 @@ gfc_simplify_extends_type_of (gfc_expr *a, gfc_exp
   || (a->ts.type == BT_CLASS && mold->ts.type == BT_DERIVED
  && !gfc_type_is_extension_of
(mold->ts.u.derived,
-a->ts.u.derived->components->ts.u.derived)))
+a->ts.u.derived->components->ts.u.derived)
+ && !gfc_type_is_extension_of
+   (a->ts.u.derived->components->ts.u.derived,
+mold->ts.u.derived)))
 return gfc_get_logical_expr (gfc_default_logical_kind, >where, false);
 
-  if (mold->ts.type == BT_DERIVED
+  /* Return .true. if the dynamic type is guaranteed to be an extension.  */
+  if (a->ts.type == BT_CLASS && mold->ts.type == BT_DERIVED
   && gfc_type_is_extension_of (mold->ts.u.derived,
   a->ts.u.derived->components->ts.u.derived))
 return gfc_get_logical_expr (gfc_default_logical_kind, >where, true);
Index: gcc/testsuite/gfortran.dg/extends_type_of_3.f90
===
--- gcc/testsuite/gfortran.dg/extends_type_of_3.f90 (Revision 242447)
+++ gcc/testsuite/gfortran.dg/extends_type_of_3.f90 (Arbeitskopie)
@@ -3,9 +3,7 @@
 !
 ! PR fortran/41580
 !
-! Compile-time simplification of SAME_TYPE_AS
-! and EXTENDS_TYPE_OF.
-!
+! Compile-time simplification of SAME_TYPE_AS and EXTENDS_TYPE_OF.
 
 implicit none
 type t1
@@ -37,6 +35,8 @@ logical, parameter :: p6 = same_type_as(a1,a1)  !
 
 if (p1 .or. p2 .or. p3 .or. p4 .or. .not. p5 .or. .not. p6) call 
should_not_exist()
 
+if (same_type_as(b1,b1)   .neqv. .true.) call should_not_exist()
+
 ! Not (trivially) compile-time simplifiable:
 if (same_type_as(b1,a1)  .neqv. .true.) call abort()
 if (same_type_as(b1,a11) .neqv. .false.) call abort()
@@ -49,6 +49,7 @@ if (same_type_as(b1,a1)  .neqv. .false.) call abor
 if (same_type_as(b1,a11) .neqv. .true.) call abort()
 deallocate(b1)
 
+
 ! .true. -> same type
 if (extends_type_of(a1,a1)   .neqv. .true.) call should_not_exist()
 if (extends_type_of(a11,a11) .neqv. .true.) call should_not_exist()
@@ -83,8 +84,8 @@ if (extends_type_of(a1,a11) .neqv. .false.) call s
 if (extends_type_of(b1,a1)   .neqv. .true.) call should_not_exist()
 if (extends_type_of(b11,a1)  .neqv. .true.) call should_not_exist()
 if (extends_type_of(b11,a11) .neqv. .true.) call should_not_exist()
-if (extends_type_of(b1,a11)  .neqv. .false.) call should_not_exist()
 
+
 if (extends_type_of(a1,b11)  .neqv. .false.) call abort()
 
 ! Special case, simplified at tree folding:
@@ -92,19 +93,34 @@ if (extends_type_of(b1,b1)   .neqv. .true.) call a
 
 ! All other possibilities are not compile-time checkable
 if (extends_type_of(b11,b1)  .neqv. .true.) call abort()
-!if (extends_type_of(b1,b11)  .neqv. .false.) call abort() ! FAILS due to PR 
47189
+if (extends_type_of(b1,b11)  .neqv. .false.) call abort()
 if (extends_type_of(a11,b11) .neqv. .true.) call abort()
+
 allocate(t11 :: b11)
 if (extends_type_of(a11,b11) .neqv. .true.) call abort()
 deallocate(b11)
+
 allocate(t111 :: b11)
 if (extends_type_of(a11,b11) .neqv. .false.) call abort()
 deallocate(b11)
+
 allocate(t11 :: b1)
 if (extends_type_of(a11,b1) .neqv. .true.) call abort()
 deallocate(b1)
 
+allocate(t11::b1)
+if (extends_type_of(b1,a11) .neqv. .true.) call abort()
+deallocate(b1)
+
+allocate(b1,source=a11)
+if (extends_type_of(b1,a11) .neqv. .true.) call abort()
+deallocate(b1)
+
+allocate( 

Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions

2016-11-15 Thread Segher Boessenkool
On Tue, Nov 15, 2016 at 12:16:19PM -0700, Kelvin Nilsen wrote:
> The reason I am using SI mode is so that I don't have to disqualify the
> use of these functions on a 32-bit big-endian configuration.
> 
> Do you want me to switch to DI mode for all the operands?

SI is fine, and can give slightly better code in some cases (the machine
instructions work fine with garbage in the upper half of the regs, so GCC
can avoid a zero extend in some cases if you use SImode).  Marginal
advantage here, we have much bigger suboptimalities with extensions, don't
worry too much about it :-)

> > The code (in rs6000.c) expanding the builtin can create two insns directly,
> > so that you do not need to repeat this over and over in define_expands?
> 
> The pattern I'm familiar with is to allocate the temporary scratch
> register during expansion, and to use the allocated temporary at insn
> match time.  I'll have to teach myself a new pattern to do all of this
> at insn match time.  Feel free to point me to an example of define_insn
> code that does this.

I meant not the define_insn, but the actual builtin expander code, like
for example how altivec_expand_predicate_builtin is hooked up.


Segher


[PATCH] PR 59406 note that FNV hash functions are incorrect

2016-11-15 Thread Jonathan Wakely

The PR points out that our FNV hash functions don't correctly
implement the FNV-1a function. Since the code is only kept for
backwards compatibility we probably don't want to change the results,
so this just adds comments to point out the issue.

PR libstdc++/59406
* include/bits/functional_hash.h: Add comment noting difference from
FNV-1a.
* include/tr1/functional_hash.h: Likewise.
* libsupc++/hash_bytes.cc: Likewise.

Tested powerpc64le-linux, committed to trunk.

commit 138a2ac8bc4bcbef28815ef89758f13e93fa433b
Author: Jonathan Wakely 
Date:   Tue Nov 15 20:00:54 2016 +

PR 59406 note that FNV hash functions are incorrect

PR libstdc++/59406
* include/bits/functional_hash.h: Add comment noting difference from
FNV-1a.
* include/tr1/functional_hash.h: Likewise.
* libsupc++/hash_bytes.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/functional_hash.h 
b/libstdc++-v3/include/bits/functional_hash.h
index dc09683..cee1ea8 100644
--- a/libstdc++-v3/include/bits/functional_hash.h
+++ b/libstdc++-v3/include/bits/functional_hash.h
@@ -200,6 +200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return hash(&__val, sizeof(__val), __hash); }
   };
 
+  // A hash function similar to FNV-1a (see PR59406 for how it differs).
   struct _Fnv_hash_impl
   {
 static size_t
diff --git a/libstdc++-v3/include/tr1/functional_hash.h 
b/libstdc++-v3/include/tr1/functional_hash.h
index 4edc49a..8148e4d 100644
--- a/libstdc++-v3/include/tr1/functional_hash.h
+++ b/libstdc++-v3/include/tr1/functional_hash.h
@@ -83,6 +83,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Fowler / Noll / Vo (FNV) Hash (type FNV-1a)
   // (Used by the next specializations of std::tr1::hash.)
 
+  // N.B. These functions should work on unsigned char, otherwise they do not
+  // correctly implement the FNV-1a algorithm (see PR59406).
+  // The existing behaviour is retained for backwards compatibility.
+
   /// Dummy generic implementation (for sizeof(size_t) != 4, 8).
   template
 struct _Fnv_hash_base
diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
b/libstdc++-v3/libsupc++/hash_bytes.cc
index 1042de6..7d76c34 100644
--- a/libstdc++-v3/libsupc++/hash_bytes.cc
+++ b/libstdc++-v3/libsupc++/hash_bytes.cc
@@ -112,6 +112,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   // Implementation of FNV hash for 32-bit size_t.
+  // N.B. This function should work on unsigned char, otherwise it does not
+  // correctly implement the FNV-1a algorithm (see PR59406).
+  // The existing behaviour is retained for backwards compatibility.
   size_t
   _Fnv_hash_bytes(const void* ptr, size_t len, size_t hash)
   {
@@ -157,6 +160,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   // Implementation of FNV hash for 64-bit size_t.
+  // N.B. This function should work on unsigned char, otherwise it does not
+  // correctly implement the FNV-1a algorithm (see PR59406).
+  // The existing behaviour is retained for backwards compatibility.
   size_t
   _Fnv_hash_bytes(const void* ptr, size_t len, size_t hash)
   {


[PATCH] Fix PR77848

2016-11-15 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77848 identifies a situation
where if-conversion causes degradation when the if-converted loop is not
subsequently vectorized.  The if-conversion pass does not have a cost
model to avoid such degradations.  However, it does have a capability to
version the if-converted loop, so that the vectorizer can choose the
if-converted version if vectorization occurs, or the unmodified version
if vectorization does not occur.  Currently versioning is only done under
special circumstances.

This patch does two things:  It requires loop versioning whenever loop
vectorization is enabled so that such degradations can't occur; and it
extends loop versioning to outer loops when such loops are of the right
form for outer loop vectorization.  The latter is needed to avoid
introducing degradations with versioning of inner loops, which disturbs
the pattern that outer loop vectorization expects.

This is an embarrassingly simple patch, given how much time I spent going
down other paths.  The most surprising thing is that versioning the outer
loop doesn't require any additional handshaking with the vectorizer.  It
just works.  I've verified this on some examples, and we end up with the
correct vectorization and with the unused loop nest discarded.

The one remaining problem with this bug is that it precludes SLP from
seeing if-converted loops to work on.  With this patch, if the vectorizer
can't vectorize an if-converted loop, the original version survives.  We
have one test case that fails when that happens, because it expected to
do SLP vectorization on the if-converted statements:

> FAIL: gcc.dg/vect/bb-slp-cond-1.c -flto -ffat-lto-objects  
> scan-tree-dump-times slp1 "basic block vectorized" 1
> FAIL: gcc.dg/vect/bb-slp-cond-1.c scan-tree-dump-times slp1 "basic block 
> vectorized" 1

Arguably, this shows a deficiency in SLP vectorization, since it won't
see if-converted statements in non-loop code in any event.  Eventually
SLP should learn to handle these kinds of PHI statements itself.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu, with only the
specified regression.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2016-11-15  Bill Schmidt  

PR tree-optimization/77848
* tree-if-conv.c (version_loop_for_if_conversion): When versioning
an outer loop, only save basic block aux information for the inner
loop.
(versionable_outer_loop_p): New function.
(tree_if_conversion): Always version a loop when vectorization
is enabled; version the outer loop instead of the inner one
if the pattern will be recognized for outer-loop vectorization.

[gcc/testsuite]

2016-11-15  Bill Schmidt  

PR tree-optimization/77848
* gfortran.dg/vect/pr78848.f: New test.


Index: gcc/testsuite/gfortran.dg/vect/pr77848.f
===
--- gcc/testsuite/gfortran.dg/vect/pr77848.f(revision 0)
+++ gcc/testsuite/gfortran.dg/vect/pr77848.f(working copy)
@@ -0,0 +1,24 @@
+! PR 77848: Verify versioning is on when vectorization fails
+! { dg-do compile }
+! { dg-options "-O3 -ffast-math -fdump-tree-ifcvt -fdump-tree-vect-details" }
+
+  subroutine sub(x,a,n,m)
+  implicit none
+  real*8 x(*),a(*),atemp
+  integer i,j,k,m,n
+  real*8 s,t,u,v
+  do j=1,m
+ atemp=0.d0
+ do i=1,n
+if (abs(a(i)).gt.atemp) then
+   atemp=a(i)
+   k = i
+end if
+ enddo
+ call dummy(atemp,k)
+  enddo
+  return
+  end
+
+! { dg-final { scan-tree-dump "LOOP_VECTORIZED" "ifcvt" } }
+! { dg-final { scan-tree-dump "vectorized 0 loops in function" "vect" } }
Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 242412)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2533,6 +2533,7 @@ version_loop_for_if_conversion (struct loop *loop)
   struct loop *new_loop;
   gimple *g;
   gimple_stmt_iterator gsi;
+  unsigned int save_length;
 
   g = gimple_build_call_internal (IFN_LOOP_VECTORIZED, 2,
  build_int_cst (integer_type_node, loop->num),
@@ -2540,8 +2541,9 @@ version_loop_for_if_conversion (struct loop *loop)
   gimple_call_set_lhs (g, cond);
 
   /* Save BB->aux around loop_version as that uses the same field.  */
-  void **saved_preds = XALLOCAVEC (void *, loop->num_nodes);
-  for (unsigned i = 0; i < loop->num_nodes; i++)
+  save_length = loop->inner ? loop->inner->num_nodes : loop->num_nodes;
+  void **saved_preds = XALLOCAVEC (void *, save_length);
+  for (unsigned i = 0; i < save_length; i++)
 saved_preds[i] = ifc_bbs[i]->aux;
 
   initialize_original_copy_tables ();
@@ -2550,7 +2552,7 @@ version_loop_for_if_conversion (struct loop *loop)
   REG_BR_PROB_BASE, true);
   

[PATCH 9/9] Add remainder of foutline-msabi-xlogues implementation

2016-11-15 Thread Daniel Santos
Adds functions emit_msabi_outlined_save and emit_msabi_outlined_restore,
which are called from ix86_expand_prologue and ix86_expand_epilogue,
respectively.
---
 gcc/config/i386/i386.c | 307 ++---
 1 file changed, 288 insertions(+), 19 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f3149ef..42ce9c1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13900,6 +13900,114 @@ ix86_elim_entry_set_got (rtx reg)
 }
 }
 
+static rtx
+gen_frame_set (rtx reg, rtx frame_reg, int offset, bool store)
+{
+  rtx addr, mem;
+
+  if (offset)
+addr = gen_rtx_PLUS (Pmode, frame_reg, GEN_INT (offset));
+  mem = gen_frame_mem (GET_MODE (reg), offset ? addr : frame_reg);
+  return gen_rtx_SET (store ? mem : reg, store ? reg : mem);
+}
+
+static inline rtx
+gen_frame_load (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, false);
+}
+
+static inline rtx
+gen_frame_store (rtx reg, rtx frame_reg, int offset)
+{
+  return gen_frame_set (reg, frame_reg, offset, true);
+}
+
+static void
+emit_msabi_outlined_save (const struct ix86_frame )
+{
+  struct machine_function *m = cfun->machine;
+  const unsigned ncregs = NUM_X86_64_MS_CLOBBERED_REGS
+ + m->outline_ms_sysv_extra_regs;
+  rtvec v = rtvec_alloc (ncregs - 1 + 3);
+  rtx insn, sym, tmp;
+  rtx rax = gen_rtx_REG (word_mode, AX_REG);
+  unsigned i = 0;
+  unsigned j;
+  const struct xlogue_layout  = xlogue_layout::get_instance ();
+  HOST_WIDE_INT stack_used = xlogue.get_stack_space_used ();
+  HOST_WIDE_INT stack_alloc_size = stack_used;
+  HOST_WIDE_INT rax_offset = xlogue.get_stub_ptr_offset ();
+  bool realign = crtl->stack_realign_needed;
+
+  gcc_assert (TARGET_64BIT);
+  gcc_assert (!crtl->need_drap);
+
+  /* Verify that the incoming stack 16-byte alignment offset matches the
+ layout we're using.  */
+  gcc_assert ((m->fs.sp_offset & 15) == xlogue.get_stack_align_off_in ());
+
+  tmp = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (-rax_offset));
+  insn = emit_insn (gen_rtx_SET (rax, tmp));
+
+  /* Combine as many other allocations as possible.  */
+  if (frame.nregs == 0)
+{
+  if (frame.nsseregs != 0)
+   stack_alloc_size = frame.sse_reg_save_offset - m->fs.sp_offset;
+  else
+   stack_alloc_size = frame.reg_save_offset - m->fs.sp_offset;
+
+  gcc_assert (stack_alloc_size >= stack_used);
+}
+
+  sym = xlogue.get_stub_rtx (realign ? XLOGUE_STUB_SAVE_HFP
+: XLOGUE_STUB_SAVE);
+  RTVEC_ELT (v, i++) = gen_rtx_USE (VOIDmode, sym);
+
+  /* Take care of any stack realignment here.  */
+  if (realign)
+{
+  int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
+  rtx rax_sp_offset = GEN_INT (-(stack_alloc_size - rax_offset));
+
+  gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
+
+  /* Align rax.  */
+  insn = emit_insn (ix86_gen_andsp (rax, rax, GEN_INT (-align_bytes)));
+  RTX_FRAME_RELATED_P (insn) = 1;
+
+  tmp = gen_rtx_PLUS (Pmode, rax, rax_sp_offset);
+  tmp = gen_rtx_SET (stack_pointer_rtx, tmp);
+  RTVEC_ELT (v, i++) = tmp;
+  m->fs.sp_offset += stack_alloc_size;
+}
+  else
+{
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (-stack_alloc_size), -1,
+   m->fs.cfa_reg == stack_pointer_rtx);
+  RTVEC_ELT (v, i++) = const0_rtx;
+}
+
+  for (j = 0; j < ncregs; ++j)
+{
+  const xlogue_layout::reginfo  = xlogue.get_reginfo (j);
+  rtx store;
+  rtx reg;
+
+  reg = gen_rtx_REG (SSE_REGNO_P (r.regno) ? V4SFmode : word_mode,
+r.regno);
+  store = gen_frame_store (reg, rax, -r.offset);
+  RTVEC_ELT (v, i++) = store;
+}
+
+  gcc_assert (i == (unsigned)GET_NUM_ELEM (v));
+
+  insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
+  RTX_FRAME_RELATED_P (insn) = true;
+}
+
 /* Expand the prologue into a bunch of separate insns.  */
 
 void
@@ -14113,6 +14221,11 @@ ix86_expand_prologue (void)
}
 }
 
+  /* Call to outlining stub occurs after pushing frame pointer (if it was
+ needed).  */
+  if (m->outline_ms_sysv)
+  emit_msabi_outlined_save (frame);
+
   if (!int_registers_saved)
 {
   /* If saving registers via PUSH, do so now.  */
@@ -14141,20 +14254,24 @@ ix86_expand_prologue (void)
   int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
   gcc_assert (align_bytes > MIN_STACK_BOUNDARY / BITS_PER_UNIT);
 
-  /* The computation of the size of the re-aligned stack frame means
-that we must allocate the size of the register save area before
-performing the actual alignment.  Otherwise we cannot guarantee
-that there's enough storage above the realignment point.  */
-  if (m->fs.sp_offset != frame.sse_reg_save_offset)
-pro_epilogue_adjust_stack 

[PATCH 6/9] Adds class xlouge_layout to i386.c

2016-11-15 Thread Daniel Santos
This C++ class adds the basic support for foutline-msabi-xlogues by
manging the layout (where registers are stored based upon and other
facets of the optimization) and providing the proper symbol rtx for the
required stub.

xlouge_layout should not be used until a call to
ix86_compute_frame_layout as it's behavior is dependent upon data in
ctrl and cfun->machine. Once ix86_compute_frame_layout has been called,
the static member function xlouge_layout::get_instance can be used to
retrieve the appropriate (constant) instance of xlouge_layout.
---
 gcc/config/i386/i386.c | 218 +
 1 file changed, 218 insertions(+)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4cc3c8f..f39b847 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2429,6 +2429,224 @@ unsigned const 
x86_64_ms_sysv_extra_clobbered_registers[12] =
   XMM12_REG, XMM13_REG, XMM14_REG, XMM15_REG
 };
 
+enum xlogue_stub {
+  XLOGUE_STUB_SAVE,
+  XLOGUE_STUB_RESTORE,
+  XLOGUE_STUB_RESTORE_TAIL,
+  XLOGUE_STUB_SAVE_HFP,
+  XLOGUE_STUB_RESTORE_HFP,
+  XLOGUE_STUB_RESTORE_HFP_TAIL,
+
+  XLOGUE_STUB_COUNT
+};
+
+enum xlogue_stub_sets {
+  XLOGUE_SET_ALIGNED,
+  XLOGUE_SET_ALIGNED_PLUS_8,
+  XLOGUE_SET_UNALIGNED,
+
+  XLOGUE_SET_COUNT
+};
+
+/* Register save/restore layout used by an out-of-line stubs.  */
+class xlogue_layout {
+public:
+  struct reginfo {
+unsigned regno;
+HOST_WIDE_INT offset;  /* Offset used by stub base pointer (rax or
+  rsi) to where each register is stored.  */
+  };
+
+  unsigned get_nregs () const  {return m_nregs;}
+  HOST_WIDE_INT get_stack_align_off_in () const{return 
m_stack_align_off_in;}
+
+  const reginfo _reginfo (unsigned reg) const
+{
+  gcc_assert (reg < m_nregs);
+  return m_regs[reg];
+}
+
+  /* Returns an rtx for the stub's symbol based upon
+   1.) the specified stub (save, restore or restore_ret) and
+   2.) the value of cfun->machine->outline_ms_sysv_extra_regs and
+   3.) rather or not stack alignment is being performed.  */
+  rtx get_stub_rtx (enum xlogue_stub stub) const;
+
+  /* Returns the amount of stack space (including padding) that the stub
+ needs to store registers based upon data in the machine_function.  */
+  HOST_WIDE_INT get_stack_space_used () const
+{
+  const struct machine_function  = *cfun->machine;
+  unsigned last_reg = m.outline_ms_sysv_extra_regs + MIN_REGS;
+
+  gcc_assert (m.outline_ms_sysv_extra_regs <= MAX_EXTRA_REGS);
+  return m_regs[last_reg - 1].offset
++ (m.outline_ms_sysv_pad_out ? 8 : 0)
++ STUB_INDEX_OFFSET;
+}
+
+  /* Returns the offset for the base pointer used by the stub.  */
+  HOST_WIDE_INT get_stub_ptr_offset () const
+{
+  return STUB_INDEX_OFFSET + m_stack_align_off_in;
+}
+
+  static const struct xlogue_layout _instance ();
+
+  static const HOST_WIDE_INT STUB_INDEX_OFFSET = 0x70;
+  static const unsigned MIN_REGS = 12;
+  static const unsigned MAX_REGS = 18;
+  static const unsigned MAX_EXTRA_REGS = MAX_REGS - MIN_REGS;
+  static const unsigned VARIANT_COUNT = MAX_EXTRA_REGS + 1;
+  static const unsigned STUB_NAME_MAX_LEN = 16;
+  static const char * const STUB_BASE_NAMES[XLOGUE_STUB_COUNT];
+  static const unsigned REG_ORDER[MAX_REGS];
+  static const unsigned REG_ORDER_REALIGN[MAX_REGS];
+
+private:
+  xlogue_layout ();
+  xlogue_layout (HOST_WIDE_INT stack_align_off_in, bool hfp);
+  xlogue_layout (const xlogue_layout &);
+  ~xlogue_layout ();
+
+  /* True if hard frame pointer is used.  */
+  bool m_hfp;
+
+  /* Max number of register this layout manages.  */
+  unsigned m_nregs;
+
+  /* Incoming offset from 16-byte alignment.  */
+  HOST_WIDE_INT m_stack_align_off_in;
+  struct reginfo m_regs[MAX_REGS];
+  rtx m_syms[XLOGUE_STUB_COUNT][VARIANT_COUNT];
+  char m_stub_names[XLOGUE_STUB_COUNT][VARIANT_COUNT][STUB_NAME_MAX_LEN];
+
+  static const struct xlogue_layout GTY(()) s_instances[XLOGUE_SET_COUNT];
+};
+
+const char * const xlogue_layout::STUB_BASE_NAMES[XLOGUE_STUB_COUNT] = {
+  "savms64",
+  "resms64",
+  "resms64x",
+  "savms64f",
+  "resms64f",
+  "resms64fx"
+};
+
+const unsigned xlogue_layout::REG_ORDER[xlogue_layout::MAX_REGS] = {
+/* The below offset values are where each register is stored for the layout
+   relative to incoming stack pointer.  The value of each m_regs[].offset will
+   be relative to the incoming base pointer (rax or rsi) used by the stub.
+
+ FP offset FP offset
+Register  aligned  aligned + 8 realigned*/
+XMM15_REG, /* 0x10 0x180x10*/
+XMM14_REG, /* 0x20 0x280x20*/
+XMM13_REG, /* 0x30 0x380x30*/
+XMM12_REG, /* 0x40 0x480x40*/
+XMM11_REG, /* 0x50 0x580x50*/
+  

[PATCH 8/9] Modify ix86_compute_frame_layout for foutline-msabi-xlogues

2016-11-15 Thread Daniel Santos
ix86_compute_frame_layout will now populate fields added to structs
machine_function and ix86_frame and modify the frame layout specific to
facilitate the use of save & restore stubs.
---
 gcc/config/i386/i386.c | 117 -
 1 file changed, 116 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cb4e688..f3149ef 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12516,6 +12516,8 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
+  m->outline_ms_sysv_pad_in = 0;
+  m->outline_ms_sysv_pad_out = 0;
   CLEAR_HARD_REG_SET (stub_managed_regs);
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
@@ -12531,6 +12533,61 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   crtl->stack_alignment_needed = 128;
 }
 
+  /* m->outline_ms_sysv is initially enabled in ix86_expand_call for all
+ 64-bit ms_abi functions that call a sysv function.  So this is where
+ we prune away cases where actually don't want to out-of-line the
+ pro/epilogues.  */
+  if (m->outline_ms_sysv)
+  {
+gcc_assert (TARGET_64BIT_MS_ABI);
+gcc_assert (flag_outline_msabi_xlogues);
+
+/* Do we need to handle SEH and disable the optimization? */
+gcc_assert (!TARGET_SEH);
+
+if (!TARGET_SSE)
+  m->outline_ms_sysv = false;
+
+/* Don't break hot-patched functions.  */
+else if (ix86_function_ms_hook_prologue (current_function_decl))
+  m->outline_ms_sysv = false;
+
+/* TODO: Still need to add support for hard frame pointers when stack
+   realignment is not needed.  */
+else if (crtl->stack_realign_finalized
+&& (frame_pointer_needed && !crtl->stack_realign_needed))
+  {
+   static bool warned = false;
+   if (!warned)
+ {
+   warned = true;
+   warning (OPT_foutline_msabi_xlogues,
+"not currently supported with hard frame pointers when "
+"not realigning stack.");
+ }
+   m->outline_ms_sysv = false;
+  }
+
+/* TODO: Cases that have not yet been examined.  */
+else if (crtl->calls_eh_return
+|| crtl->need_drap
+|| m->static_chain_on_stack
+|| ix86_using_red_zone ()
+|| flag_split_stack)
+  {
+   static bool warned = false;
+   if (!warned)
+ {
+   warned = true;
+   warning (OPT_foutline_msabi_xlogues,
+"not currently supported with the following: SEH, "
+"DRAP, static call chains on the stack, red zones or "
+"split stack.");
+ }
+   m->outline_ms_sysv = false;
+  }
+  }
+
   stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT;
   preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT;
 
@@ -12599,6 +12656,60 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   /* The traditional frame pointer location is at the top of the frame.  */
   frame->hard_frame_pointer_offset = offset;
 
+  if (m->outline_ms_sysv)
+{
+  unsigned i;
+  HOST_WIDE_INT offset_after_int_regs;
+
+  gcc_assert (!(offset & 7));
+
+  /* Select an appropriate layout for incoming stack offset.  */
+  m->outline_ms_sysv_pad_in = (!crtl->stack_realign_needed && (offset & 
8));
+  const struct xlogue_layout  = xlogue_layout::get_instance ();
+
+  gcc_assert (frame->nregs >= 2);
+  gcc_assert (frame->nsseregs >= 10);
+
+  for (i = 0; i < xlogue.get_nregs (); ++i)
+   {
+ unsigned regno = xlogue.get_reginfo (i).regno;
+
+ if (ix86_save_reg (regno, false, false))
+   {
+ add_to_hard_reg_set (_managed_regs, DImode, regno);
+ /* For the purposes of pro/epilogue generation, we'll only count
+regs that aren't saved/restored by out-of-line stubs.  */
+ if (SSE_REGNO_P (regno))
+   --frame->nsseregs;
+ else
+   --frame->nregs;
+   }
+ else
+   break;
+   }
+
+  gcc_assert (i >= xlogue_layout::MIN_REGS);
+  gcc_assert (i <= xlogue_layout::MAX_REGS);
+  gcc_assert (frame->nregs >=0);
+  gcc_assert (frame->nsseregs >=0);
+  m->outline_ms_sysv_extra_regs = i - xlogue_layout::MIN_REGS;
+
+  /* If, after saving any remaining int regs we need padding for
+16-byte alignment, we insert that padding prior to remaining int
+reg saves.  */
+  offset_after_int_regs = xlogue.get_stack_space_used ()
+ + frame->nregs * UNITS_PER_WORD;
+  if (offset_after_int_regs & 8)
+  {
+   m->outline_ms_sysv_pad_out = 1;
+   offset_after_int_regs += UNITS_PER_WORD;
+  }
+
+  gcc_assert (!(offset_after_int_regs & 15));
+  offset += 

[PATCH 5/9] Add patterns and predicates foutline-msabi-xlouges

2016-11-15 Thread Daniel Santos
Adds the predicates save_multiple and restore_multiple to predicates.md,
which are used by following patterns in sse.md:

* save_multiple - insn that calls a save stub
* save_multiple_realign - insn that calls a save stub and also manages
  a realign and hard frame pointer
* restore_multiple - call_insn that calls a save stub and returns to the
  function to allow a sibling call (which should typically offer better
  optimization than the restore stub as the tail call)
* restore_multiple_and_return - a jump_insn that is the return from
  a function (tail call)
---
 gcc/config/i386/predicates.md | 148 ++
 gcc/config/i386/sse.md|  56 
 2 files changed, 204 insertions(+)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 219674e..f50bba9a 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1663,3 +1663,151 @@
   (ior (match_operand 0 "register_operand")
(and (match_code "const_int")
(match_test "op == constm1_rtx"
+
+;; Return true if:
+;; * first op is a symbol reference,
+;; * >= 14 operands, and
+;; * operands 2 to end save a register to a memory location that's an
+(define_predicate "save_multiple"
+  (match_code "parallel")
+{
+  const unsigned nregs = XVECLEN (op, 0);
+  rtx head = XVECEXP (op, 0, 0);
+  unsigned i;
+
+  if (GET_CODE (head) != USE)
+return false;
+  else
+{
+  rtx op0 = XEXP (head, 0);
+  if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF)
+   return false;
+}
+
+  if (nregs < 14)
+return false;
+
+  for (i = 2; i < nregs; i++)
+{
+  rtx e, src, dest;
+
+  e = XVECEXP (op, 0, i);
+
+  switch (GET_CODE (e))
+   {
+ case SET:
+   src  = SET_SRC (e);
+   dest = SET_DEST (e);
+
+   /* storing a register to memory.  */
+   if (GET_CODE (src) == REG && GET_CODE (dest) == MEM)
+ {
+   rtx addr = XEXP (dest, 0);
+
+   /* Good if dest address is in RAX.  */
+   if (GET_CODE (addr) == REG
+   && REGNO (addr) == AX_REG)
+ continue;
+
+   /* Good if dest address is offset of RAX.  */
+   if (GET_CODE (addr) == PLUS
+   && GET_CODE (XEXP (addr, 0)) == REG
+   && REGNO (XEXP (addr, 0)) == AX_REG)
+ continue;
+ }
+   break;
+
+ default:
+   break;
+   }
+   return false;
+}
+  return true;
+})
+
+;; Return true if:
+;; * first op is (return) or a a use (symbol reference),
+;; * >= 14 operands, and
+;; * operands 2 to end are one of:
+;;   - restoring a register from a memory location that's an offset of RSI.
+;;   - clobbering a reg
+;;   - adjusting SP
+(define_predicate "restore_multiple"
+  (match_code "parallel")
+{
+  const unsigned nregs = XVECLEN (op, 0);
+  rtx head = XVECEXP (op, 0, 0);
+  unsigned i;
+
+  switch (GET_CODE (head))
+{
+  case RETURN:
+   i = 3;
+   break;
+
+  case USE:
+  {
+   rtx op0 = XEXP (head, 0);
+
+   if (op0 == NULL_RTX || GET_CODE (op0) != SYMBOL_REF)
+ return false;
+
+   i = 1;
+   break;
+  }
+
+  default:
+   return false;
+}
+
+  if (nregs < i + 12)
+return false;
+
+  for (; i < nregs; i++)
+{
+  rtx e, src, dest;
+
+  e = XVECEXP (op, 0, i);
+
+  switch (GET_CODE (e))
+   {
+ case CLOBBER:
+   continue;
+
+ case SET:
+   src  = SET_SRC (e);
+   dest = SET_DEST (e);
+
+   /* restoring a register from memory.  */
+   if (GET_CODE (src) == MEM && GET_CODE (dest) == REG)
+ {
+   rtx addr = XEXP (src, 0);
+
+   /* Good if src address is in RSI.  */
+   if (GET_CODE (addr) == REG
+   && REGNO (addr) == SI_REG)
+ continue;
+
+   /* Good if src address is offset of RSI.  */
+   if (GET_CODE (addr) == PLUS
+   && GET_CODE (XEXP (addr, 0)) == REG
+   && REGNO (XEXP (addr, 0)) == SI_REG)
+ continue;
+
+   /* Good if adjusting stack pointer.  */
+   if (GET_CODE (dest) == REG
+   && REGNO (dest) == SP_REG
+   && GET_CODE (src) == PLUS
+   && GET_CODE (XEXP (src, 0)) == REG
+   && REGNO (XEXP (src, 0)) == SP_REG)
+ continue;
+ }
+   break;
+
+ default:
+   break;
+   }
+   return false;
+}
+  return true;
+})
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 14fcd67..b9dac15 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -19397,3 +19397,59 @@
   [(set_attr "type" "sselog")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
+
+;; Save multiple registers 

[PATCH 2/9] Minor refactor in ix86_compute_frame_layout

2016-11-15 Thread Daniel Santos
This refactor is separated from a future patch that actually alters
ix86_compute_frame_layout.
---
 gcc/config/i386/i386.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 56cc67d..5ed8fb6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12256,6 +12256,7 @@ ix86_builtin_setjmp_frame_value (void)
 static void
 ix86_compute_frame_layout (struct ix86_frame *frame)
 {
+  struct machine_function *m = cfun->machine;
   unsigned HOST_WIDE_INT stack_alignment_needed;
   HOST_WIDE_INT offset;
   unsigned HOST_WIDE_INT preferred_alignment;
@@ -12290,19 +12291,19 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
  scheduling that can be done, which means that there's very little point
  in doing anything except PUSHs.  */
   if (TARGET_SEH)
-cfun->machine->use_fast_prologue_epilogue = false;
+m->use_fast_prologue_epilogue = false;
 
   /* During reload iteration the amount of registers saved can change.
  Recompute the value as needed.  Do not recompute when amount of registers
  didn't change as reload does multiple calls to the function and does not
  expect the decision to change within single iteration.  */
   else if (!optimize_bb_for_size_p (ENTRY_BLOCK_PTR_FOR_FN (cfun))
-   && cfun->machine->use_fast_prologue_epilogue_nregs != frame->nregs)
+  && m->use_fast_prologue_epilogue_nregs != frame->nregs)
 {
   int count = frame->nregs;
   struct cgraph_node *node = cgraph_node::get (current_function_decl);
 
-  cfun->machine->use_fast_prologue_epilogue_nregs = count;
+  m->use_fast_prologue_epilogue_nregs = count;
 
   /* The fast prologue uses move instead of push to save registers.  This
  is significantly longer, but also executes faster as modern hardware
@@ -12319,14 +12320,14 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   if (node->frequency < NODE_FREQUENCY_NORMAL
  || (flag_branch_probabilities
  && node->frequency < NODE_FREQUENCY_HOT))
-cfun->machine->use_fast_prologue_epilogue = false;
+   m->use_fast_prologue_epilogue = false;
   else
-cfun->machine->use_fast_prologue_epilogue
+   m->use_fast_prologue_epilogue
   = !expensive_function_p (count);
 }
 
   frame->save_regs_using_mov
-= (TARGET_PROLOGUE_USING_MOVE && cfun->machine->use_fast_prologue_epilogue
+= (TARGET_PROLOGUE_USING_MOVE && m->use_fast_prologue_epilogue
/* If static stack checking is enabled and done with probes,
  the registers need to be saved before allocating the frame.  */
&& flag_stack_check != STATIC_BUILTIN_STACK_CHECK);
-- 
2.9.0



[PATCH 7/9] Modify ix86_save_reg to optionally omit stub-managed registers

2016-11-15 Thread Daniel Santos
Adds static HARD_REG_SET stub_managed_regs to track registers that will
be managed by the pro/epilogue stubs for the function.

Adds a third parameter bool ignore_outlined to ix86_save_reg to specify
rather or not the count should include registers marked in
stub_managed_regs.
---
 gcc/config/i386/i386.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f39b847..cb4e688 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12321,10 +12321,14 @@ ix86_hard_regno_scratch_ok (unsigned int regno)
  && df_regs_ever_live_p (regno)));
 }
 
+/* Registers who's save & restore will be managed by stubs called from
+   pro/epilogue (inited in ix86_compute_frame_layout).  */
+static HARD_REG_SET GTY(()) stub_managed_regs;
+
 /* Return TRUE if we need to save REGNO.  */
 
 static bool
-ix86_save_reg (unsigned int regno, bool maybe_eh_return)
+ix86_save_reg (unsigned int regno, bool maybe_eh_return, bool ignore_outlined)
 {
   /* If there are no caller-saved registers, we preserve all registers,
  except for MMX and x87 registers which aren't supported when saving
@@ -12392,6 +12396,10 @@ ix86_save_reg (unsigned int regno, bool 
maybe_eh_return)
}
 }
 
+  if (ignore_outlined && cfun->machine->outline_ms_sysv
+  && in_hard_reg_set_p (stub_managed_regs, DImode, regno))
+return false;
+
   if (crtl->drap_reg
   && regno == REGNO (crtl->drap_reg)
   && !cfun->machine->no_drap_save_restore)
@@ -12412,7 +12420,7 @@ ix86_nsaved_regs (void)
   int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, false))
   nregs ++;
   return nregs;
 }
@@ -12428,7 +12436,7 @@ ix86_nsaved_sseregs (void)
   if (!TARGET_64BIT_MS_ABI)
 return 0;
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, false))
   nregs ++;
   return nregs;
 }
@@ -12508,6 +12516,7 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
 
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
+  CLEAR_HARD_REG_SET (stub_managed_regs);
 
   /* 64-bit MS ABI seem to require stack alignment to be always 16,
  except for function prologues, leaf functions and when the defult
@@ -12819,7 +12828,7 @@ ix86_emit_save_regs (void)
   rtx_insn *insn;
 
   for (regno = FIRST_PSEUDO_REGISTER - 1; regno-- > 0; )
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno)));
RTX_FRAME_RELATED_P (insn) = 1;
@@ -12901,7 +12910,7 @@ ix86_emit_save_regs_using_mov (HOST_WIDE_INT cfa_offset)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
 ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset);
cfa_offset -= UNITS_PER_WORD;
@@ -12916,7 +12925,7 @@ ix86_emit_save_sse_regs_using_mov (HOST_WIDE_INT 
cfa_offset)
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true))
+if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
   {
ix86_emit_save_reg_using_mov (V4SFmode, regno, cfa_offset);
cfa_offset -= GET_MODE_SIZE (V4SFmode);
@@ -13296,13 +13305,13 @@ get_scratch_register_on_entry (struct scratch_reg *sr)
   && !static_chain_p
   && drap_regno != CX_REG)
regno = CX_REG;
-  else if (ix86_save_reg (BX_REG, true))
+  else if (ix86_save_reg (BX_REG, true, false))
regno = BX_REG;
   /* esi is the static chain register.  */
   else if (!(regparm == 3 && static_chain_p)
-  && ix86_save_reg (SI_REG, true))
+  && ix86_save_reg (SI_REG, true, false))
regno = SI_REG;
-  else if (ix86_save_reg (DI_REG, true))
+  else if (ix86_save_reg (DI_REG, true, false))
regno = DI_REG;
   else
{
@@ -14403,7 +14412,7 @@ ix86_emit_restore_regs_using_mov (HOST_WIDE_INT 
cfa_offset,
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, maybe_eh_return))
+if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, maybe_eh_return, 
true))
   {
rtx reg = gen_rtx_REG (word_mode, regno);
rtx mem;
@@ -14442,7 +14451,7 @@ ix86_emit_restore_sse_regs_using_mov (HOST_WIDE_INT 
cfa_offset,
   unsigned int regno;
 
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)

[PATCH 3/9] Add msabi pro/epilogue stubs to libgcc

2016-11-15 Thread Daniel Santos
Adds libgcc/config/i386/i386-asm.h to manage common cpp and gas macros.
stubs use the following naming convention:

  (sav|res)ms64[f][x]

save|resSave or restore
ms64Avoid possible name collisions with future stubs
(specific to 64-bit msabi --> sysv scenario)
[f] Variant for hard frame pointer (and stack realignment)
[x] Tail-call variant (is the return from function)
---
 libgcc/config.host |  2 +-
 libgcc/config/i386/i386-asm.h  | 82 ++
 libgcc/config/i386/resms64.S   | 63 
 libgcc/config/i386/resms64f.S  | 59 ++
 libgcc/config/i386/resms64fx.S | 61 +++
 libgcc/config/i386/resms64x.S  | 65 +
 libgcc/config/i386/savms64.S   | 63 
 libgcc/config/i386/savms64f.S  | 64 +
 libgcc/config/i386/t-msabi |  7 
 9 files changed, 465 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/i386/i386-asm.h
 create mode 100644 libgcc/config/i386/resms64.S
 create mode 100644 libgcc/config/i386/resms64f.S
 create mode 100644 libgcc/config/i386/resms64fx.S
 create mode 100644 libgcc/config/i386/resms64x.S
 create mode 100644 libgcc/config/i386/savms64.S
 create mode 100644 libgcc/config/i386/savms64f.S
 create mode 100644 libgcc/config/i386/t-msabi

diff --git a/libgcc/config.host b/libgcc/config.host
index 64beb21..07bb269 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1335,7 +1335,7 @@ case ${host} in
 i[34567]86-*-linux* | x86_64-*-linux* | \
   i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu | \
   i[34567]86-*-gnu*)
-   tmake_file="${tmake_file} t-tls i386/t-linux t-slibgcc-libgcc"
+   tmake_file="${tmake_file} t-tls i386/t-linux i386/t-msabi 
t-slibgcc-libgcc"
if test "$libgcc_cv_cfi" = "yes"; then
tmake_file="${tmake_file} t-stack i386/t-stack-i386"
fi
diff --git a/libgcc/config/i386/i386-asm.h b/libgcc/config/i386/i386-asm.h
new file mode 100644
index 000..73acf5c
--- /dev/null
+++ b/libgcc/config/i386/i386-asm.h
@@ -0,0 +1,82 @@
+/* Defines common perprocessor and assembly macros for use by various stubs.
+ *
+ *   Copyright (C) 2016 Free Software Foundation, Inc.
+ *   Written By Daniel Santos 
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 3, or (at your option) any
+ * later version.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Under Section 7 of GPL version 3, you are granted additional
+ * permissions described in the GCC Runtime Library Exception, version
+ * 3.1, as published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License and
+ * a copy of the GCC Runtime Library Exception along with this program;
+ * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+ * .
+ */
+
+#ifndef I386_ASM_H
+#define I386_ASM_H
+
+#ifdef __ELF__
+# define ELFFN(fn) .type fn,@function
+#else
+# define ELFFN(fn)
+#endif
+
+#define FUNC_START(fn) \
+   .global fn; \
+   ELFFN (fn); \
+fn:
+
+#define HIDDEN_FUNC(fn)\
+   FUNC_START (fn) \
+   .hidden fn; \
+
+#define FUNC_END(fn) .size fn,.-fn
+
+#ifdef __SSE2__
+# ifdef __AVX__
+#  define MOVAPS vmovaps
+# else
+#  define MOVAPS movaps
+# endif
+
+/* Save SSE registers 6-15. off is the offset of rax to get to xmm6.  */
+.macro SSE_SAVE off=0
+   MOVAPS %xmm15,(\off - 0x90)(%rax)
+   MOVAPS %xmm14,(\off - 0x80)(%rax)
+   MOVAPS %xmm13,(\off - 0x70)(%rax)
+   MOVAPS %xmm12,(\off - 0x60)(%rax)
+   MOVAPS %xmm11,(\off - 0x50)(%rax)
+   MOVAPS %xmm10,(\off - 0x40)(%rax)
+   MOVAPS %xmm9, (\off - 0x30)(%rax)
+   MOVAPS %xmm8, (\off - 0x20)(%rax)
+   MOVAPS %xmm7, (\off - 0x10)(%rax)
+   MOVAPS %xmm6, \off(%rax)
+.endm
+
+/* Restore SSE registers 6-15. off is the offset of rsi to get to xmm6.  */
+.macro SSE_RESTORE off=0
+   MOVAPS (\off - 0x90)(%rsi), %xmm15
+   MOVAPS (\off - 0x80)(%rsi), %xmm14
+   MOVAPS (\off - 0x70)(%rsi), %xmm13
+   MOVAPS (\off - 0x60)(%rsi), %xmm12
+   MOVAPS (\off - 0x50)(%rsi), %xmm11
+   MOVAPS (\off - 0x40)(%rsi), %xmm10
+   MOVAPS (\off - 0x30)(%rsi), %xmm9
+   MOVAPS (\off - 0x20)(%rsi), %xmm8
+   MOVAPS (\off - 0x10)(%rsi), %xmm7
+   MOVAPS \off(%rsi), %xmm6
+.endm
+
+#endif /* __SSE2__ */
+#endif /* I386_ASM_H */
diff --git a/libgcc/config/i386/resms64.S b/libgcc/config/i386/resms64.S
new file 

[PATCH 1/9] Change type of x86_64_ms_sysv_extra_clobbered_registers

2016-11-15 Thread Daniel Santos
This will need to be unsigned for a subsequent patch. Also adds the
constant NUM_X86_64_MS_CLOBBERED_REGS for brievity.
---
 gcc/config/i386/i386.c | 8 +++-
 gcc/config/i386/i386.h | 4 +++-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a5c4ba7..56cc67d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2421,7 +2421,7 @@ static int const x86_64_int_return_registers[4] =
 
 /* Additional registers that are clobbered by SYSV calls.  */
 
-int const x86_64_ms_sysv_extra_clobbered_registers[12] =
+unsigned const x86_64_ms_sysv_extra_clobbered_registers[12] =
 {
   SI_REG, DI_REG,
   XMM6_REG, XMM7_REG,
@@ -28209,11 +28209,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
   else if (TARGET_64BIT_MS_ABI
   && (!callarg2 || INTVAL (callarg2) != -2))
 {
-  int const cregs_size
-   = ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers);
-  int i;
+  unsigned i;
 
-  for (i = 0; i < cregs_size; i++)
+  for (i = 0; i < NUM_X86_64_MS_CLOBBERED_REGS; i++)
{
  int regno = x86_64_ms_sysv_extra_clobbered_registers[i];
  machine_mode mode = SSE_REGNO_P (regno) ? TImode : DImode;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..a45b66a 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2172,7 +2172,9 @@ extern int const dbx_register_map[FIRST_PSEUDO_REGISTER];
 extern int const dbx64_register_map[FIRST_PSEUDO_REGISTER];
 extern int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER];
 
-extern int const x86_64_ms_sysv_extra_clobbered_registers[12];
+extern unsigned const x86_64_ms_sysv_extra_clobbered_registers[12];
+#define NUM_X86_64_MS_CLOBBERED_REGS \
+  (ARRAY_SIZE (x86_64_ms_sysv_extra_clobbered_registers))
 
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-- 
2.9.0



[PATCH 4/9] Add struct fields and option for foutline-msabi-xlouges

2016-11-15 Thread Daniel Santos
Adds fountline-msabi-xlogues to common.opt and various fields to structs
machine_function and ix86_frame
---
 gcc/common.opt |  7 +++
 gcc/config/i386/i386.c | 35 ++-
 gcc/config/i386/i386.h | 18 ++
 3 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 5e8d72d..e9570b0 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3075,4 +3075,11 @@ fipa-ra
 Common Report Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+foutline-msabi-xlogues
+Common Report Var(flag_outline_msabi_xlogues) Optimization
+Outline pro/epilogues to save/restore registers clobbered by calling
+sysv_abi functions from within a 64-bit ms_abi function.  This reduces
+.text size at the expense of a few more instructions being executed
+per function.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5ed8fb6..4cc3c8f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2449,13 +2449,37 @@ struct GTY(()) stack_local_entry {
 
saved frame pointer if frame_pointer_needed
<- HARD_FRAME_POINTER
-   [saved regs]
-   <- regs_save_offset
-   [padding0]
+   [Normal case:
 
-   [saved SSE regs]
+ [saved regs]
+   <- regs_save_offset
+ [padding0]
+
+ [saved SSE regs]
+
+   ][ms x64 --> sysv with -foutline-msabi-xlogues:
+ [padding0]
+   <- Start of out-of-line, stub-saved/restored regs
+  (see libgcc/config/i386/msabi.S)
+ [XMM6-15]
+ [RSI]
+ [RDI]
+ [?RBX]only if RBX is clobbered
+ [?RBP]only if RBP and RBX are clobbered
+ [?R12]only if R12 and all previous regs are clobbered
+ [?R13]only if R13 and all previous regs are clobbered
+ [?R14]only if R14 and all previous regs are clobbered
+ [?R15]only if R15 and all previous regs are clobbered
+   <- end of stub-saved/restored regs
+ [padding1]
+   <- outlined_save_offset
+ [saved regs]  Any remaning regs are saved in-line
+   <- regs_save_offset
+ [saved SSE regs]  not yet verified, but I *think* that there should be no
+   other SSE regs to save here.
+   ]
<- sse_regs_save_offset
-   [padding1]  |
+   [padding2]
   |<- FRAME_POINTER
[va_arg registers]  |
   |
@@ -2477,6 +2501,7 @@ struct ix86_frame
   HOST_WIDE_INT hard_frame_pointer_offset;
   HOST_WIDE_INT stack_pointer_offset;
   HOST_WIDE_INT hfp_save_offset;
+  HOST_WIDE_INT outlined_save_offset;
   HOST_WIDE_INT reg_save_offset;
   HOST_WIDE_INT sse_reg_save_offset;
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index a45b66a..e6b79df 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2575,6 +2575,24 @@ struct GTY(()) machine_function {
  pass arguments and can be used for indirect sibcall.  */
   BOOL_BITFIELD arg_reg_available : 1;
 
+  /* If true, we're out-of-lining reg save/restore for regs clobbered
+ by ms_abi functions calling a sysv function.  */
+  BOOL_BITFIELD outline_ms_sysv : 1;
+
+  /* If true, the incoming 16-byte aligned stack has an offset (of 8) and
+ needs padding.  */
+  BOOL_BITFIELD outline_ms_sysv_pad_in : 1;
+
+  /* If true, the size of the stub save area plus inline int reg saves will
+ result in an 8 byte offset, so needs padding.  */
+  BOOL_BITFIELD outline_ms_sysv_pad_out : 1;
+
+  /* This is the number of extra registers saved by stub (valid range is
+ 0-6). Each additional register is only saved/restored by the stubs
+ if all successive ones are. (Will always be zero when using a hard
+ frame pointer.) */
+  unsigned int outline_ms_sysv_extra_regs:3;
+
   /* During prologue/epilogue generation, the current frame state.
  Otherwise, the frame state at the end of the prologue.  */
   struct machine_frame_state fs;
-- 
2.9.0



[PATCH 0/9] RFC: Add optimization -foutline-msabi-xlougues (for Wine 64)

2016-11-15 Thread Daniel Santos
Due to differences between the 64-bit Microsoft and System V ABIs, any 
msabi function that calls a sysv function must consider RSI, RDI and 
XMM6-15 as clobbered. The result is that such functions are bloated with 
SSE saves/restores costing as much as 106 bytes each (up to 200-ish 
bytes per function). This patch set targets 64-bit Wine and aims to 
mitigate some of those costs.


A few save & restore stubs are added to the static portion of libgcc and 
the pro/epilogues of such functions uses these stubs instead, thus 
reducing .text size. While we're already tinkering with stubs, it also 
manages the save/restore of up to 6 additional registers. Analysis of 
building Wine 64 demonstrates a reduction of .text by around 20%. While 
I haven't produce performance data yet, this is my first attempt to 
modify gcc so I would rather ask for comments earlier in this process.


The basic theory is that a reduction of I-cache misses will offset the 
extra instructions required for implementation. In addition, since there 
are only a handful of stubs that will be in memory, I'm using the larger 
mov instructions instead of push/pop to facilitate better parallelization.


Here is a sample of what these prologues/epilogues look like:

Prologue (in this case, SP adjustment was properly combined with later 
stack allocation):

7b833800:   48 8d 44 24 88  lea -0x78(%rsp),%rax
7b833805:   48 81 ec 58 01 00 00sub$0x158,%rsp
7b83380c:   e8 95 6f 05 00  callq  7b88a7a6 <__savms64_17>

Epilogue (r10 stores the value to restore the stack pointer to):
7b83386c:   48 8d b4 24 e0 00 00lea 0xe0(%rsp),%rsi
7b833873:   00
7b833874:   4c 8d 56 78 lea 0x78(%rsi),%r10
7b833878:   e9 c9 6f 05 00  jmpq   7b88a846 <__resms64x_17>

Prologue, stack realignment case (this shows the uncombined SP 
modifications, described below):

7b833800:   55  push   %rbp
7b833801:   48 8d 44 24 90  lea -0x70(%rsp),%rax
7b833806:   48 89 e5mov%rsp,%rbp
7b833809:   48 83 e0 f0 and $0xfff0,%rax
7b83380d:   48 8d 60 90 lea -0x70(%rax),%rsp
7b833811:   e8 cc 79 05 00  callq  7b88b1e2 <__savms64r_17>
7b833816:   48 89 cbmov%rcx,%rbx# reordered 
insn from body

7b833819:   48 83 ec 70 sub$0x70,%rsp

Epilogue, stack realignment case:
7b833875:   48 8d b4 24 e0 00 00lea 0xe0(%rsp),%rsi
7b83387c:   00
   7b83387d:   e9 ac 79 05 00 jmpq   7b88b22e <__resms64rx_17>


Questions and (known) outstanding issues:

1. I have added the new -f optimization to common.opt, but being that
   it only impacts x86_64, should this be a machine-specific -m option
   instead?
2. In the prologues that realign the stack, stack pointer modifications
   aren't combining, presumably since I'm using a lea after realigning
   using rax.
3. My x86 assembly expertise is limited, so I would appreciate any
   feedback on my stubs & emitted code.
4. Documentation is still missing.
5. A Changelog entry is still missing.
6. This is my first major work on a GNU project and I have not yet
   fully reviewed all of the relevant GNU coding conventions, so I
   might still have some non-compliance code.
7. Regression tests only run on my old Phenom. Have not yet tested on
   AVX cpu (which should use vmovaps instead of movaps).
8. My test program is inadequate (and is not included in this patch
   set).  During development it failed to produce many optimization
   errors that I got when building Wine.  I've been building 64-bit
   Wine and running Wine's tests in the mean time.
9. I need to devise a meaningful benchmarking strategy.
10. I have not yet examined how this may or may not affect -flto or
   where additional optimization opportunities in the lto driver may exist.
11. There are a few more optimization opportunities that I haven't
   attempted to exploit yet and prefer to leave for later projects.
 * In the case of stack realignment and all 17 registers being
   clobbered, I can combine the majority of the prologue
   (alignment, saving frame pointer, etc.) in the stub.
 * With these stubs being in the static portion of libgcc, each
   Wine "dll" gets a separate copy. The average number of dlls a
   Windows program loads seems to be at least 15, allowing a
   mechanism for them to be linked dynamically from libwine.so
   could save a little bit more .text and icache.
 * Ultimately, good static analysis of local sysv functions can
   completely eliminate the need to save SSE registers in some cases.
12. Use of hard frame pointers disables the optimization unless we're
   also realigning the stack. I've implemented this in another (local)
   branch, but haven't tested it yet.


gcc/common.opt |   7 +
 gcc/config/i386/i386.c | 729 
++---

 

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Janne Blomqvist
On Tue, Nov 15, 2016 at 6:37 PM, Jerry DeLisle  wrote:
> All comments incorporated. Standing by for approval.

Looks good, nice job! Ok for trunk.

I was thinking that for strided arrays, it probably is faster to copy
them to dense arrays before doing the matrix multiplication. That
would also enable using an optimized blas (-fexternal-blas) for
strided arrays. But this is of course nothing that blocks this patch,
just something that might be worth looking into in the future.

-- 
Janne Blomqvist


Re: [Patch] Remove variant, variant<T&> and variant<>

2016-11-15 Thread Tim Shen
On Tue, Nov 15, 2016 at 11:31 AM, Jonathan Wakely  wrote:
> On 15/11/16 12:08 +, Jonathan Wakely wrote:
>>
>> On 12/11/16 12:11 -0800, Tim Shen wrote:
>>>
>>> At Issaquah we decided to remove the supports above.
>>
>>
>> OK with a suitable ChangeLog, thanks.
>
>
> I've adjusted your ChangeLog entry to fit under 80 columns with TAB
> set to 8 spaces. I've also adjusted the pretty printer test that was
> using variant.

Thanks! This is how tabstop=2 people have a different view of world
from the rest of us. :)

And I keep forgetting running the whole testsuite. Sorry!

>
> Tested x86_64-linux, committed to trunk.
>



-- 
Regards,
Tim Shen


Re: Add a mem_alias_size helper class

2016-11-15 Thread Eric Botcazou
> alias.c encodes memory sizes as follows:
> 
> size > 0: the exact size is known
> size == 0: the size isn't known
> size < 0: the exact size of the reference itself is known,
>   but the address has been aligned via AND.  In this case
>   "-size" includes the size of the reference and the worst-case
>   number of bytes traversed by the AND.
> 
> This patch wraps this up in a helper class and associated
> functions.  The new routines fix what seems to be a hole
> in the old logic: if the size of a reference A was unknown,
> offset_overlap_p would assume that it could conflict with any
> other reference B, even if we could prove that B comes before A.
> 
> The fallback CONSTANT_P (x) && CONSTANT_P (y) case looked incorrect.
> Either "c" is trustworthy as a distance between the two constants,
> in which case the alignment handling should work as well there as
> elsewhere, or "c" isn't trustworthy, in which case offset_overlap_p
> is unsafe.  I think the latter's true; AFAICT we have no evidence
> that "c" really is the distance between the two references, so using
> it in the check doesn't make sense.
> 
> At this point we've excluded cases for which:
> 
> (a) the base addresses are the same
> (b) x and y are SYMBOL_REFs, or SYMBOL_REF-based constants
> wrapped in a CONST
> (c) x and y are both constant integers
> 
> No useful cases should be left.  As things stood, we would
> assume that:
> 
>   (mem:SI (const_int X))
> 
> could overlap:
> 
>   (mem:SI (symbol_ref Y))
> 
> but not:
> 
>   (mem:SI (const (plus (symbol_ref Y) (const_int 4

Frankly this seems to be an example of counter-productive C++ization: the 
class doesn't provide any useful abstraction and the code gets obfuscated by 
all the wrapper methods.  Moreover it's mixed with real changes so very hard 
to review.  Can't you just fix what needs to be fixed first?

-- 
Eric Botcazou


[PATCH, i386 testsuite]: Move common 32-bit and 64-bit function specific options to an include file

2016-11-15 Thread Uros Bizjak
Hello!

Just noticed that we don't test many function specific options on
64-bit targets.

2016-11-15  Uros Bizjak  

* gcc.target/i386/funcspec-56.inc: New file.
* gcc.target/i386.funcspec-5.c: Include funcspec-56.inc.  Remove
common 32-bit and 64-bit function specific options.
* gcc.target/i386.funcspec-6.c: Ditto.

Tested on x86_64-linux-gnu {,-m32}  and committed to mainline.

Uros.
Index: gcc.target/i386/funcspec-5.c
===
--- gcc.target/i386/funcspec-5.c(revision 242427)
+++ gcc.target/i386/funcspec-5.c(working copy)
@@ -3,68 +3,8 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target ia32 } */
 
-extern void test_abm (void)
__attribute__((__target__("abm")));
-extern void test_aes (void)
__attribute__((__target__("aes")));
-extern void test_bmi (void)
__attribute__((__target__("bmi")));
-extern void test_mmx (void)
__attribute__((__target__("mmx")));
-extern void test_pclmul (void) 
__attribute__((__target__("pclmul")));
-extern void test_popcnt (void) 
__attribute__((__target__("popcnt")));
-extern void test_recip (void)  
__attribute__((__target__("recip")));
-extern void test_sse (void)
__attribute__((__target__("sse")));
-extern void test_sse2 (void)   
__attribute__((__target__("sse2")));
-extern void test_sse3 (void)   
__attribute__((__target__("sse3")));
-extern void test_sse4 (void)   
__attribute__((__target__("sse4")));
-extern void test_sse4_1 (void) 
__attribute__((__target__("sse4.1")));
-extern void test_sse4_2 (void) 
__attribute__((__target__("sse4.2")));
-extern void test_sse4a (void)  
__attribute__((__target__("sse4a")));
-extern void test_fma (void)
__attribute__((__target__("fma")));
-extern void test_fma4 (void)   
__attribute__((__target__("fma4")));
-extern void test_xop (void)
__attribute__((__target__("xop")));
-extern void test_ssse3 (void)  
__attribute__((__target__("ssse3")));
-extern void test_tbm (void)
__attribute__((__target__("tbm")));
-extern void test_avx (void)
__attribute__((__target__("avx")));
-extern void test_avx2 (void)   
__attribute__((__target__("avx2")));
-extern void test_avx512f (void)
__attribute__((__target__("avx512f")));
-extern void test_avx512vl(void)
__attribute__((__target__("avx512vl")));
-extern void test_avx512bw(void)
__attribute__((__target__("avx512bw")));
-extern void test_avx512dq(void)
__attribute__((__target__("avx512dq")));
-extern void test_avx512er(void)
__attribute__((__target__("avx512er")));
-extern void test_avx512pf(void)
__attribute__((__target__("avx512pf")));
-extern void test_avx512cd(void)
__attribute__((__target__("avx512cd")));
-extern void test_bmi (void)
__attribute__((__target__("bmi")));
-extern void test_bmi2 (void)   
__attribute__((__target__("bmi2")));
+#include "funcspec-56.inc"
 
-extern void test_no_abm (void) 
__attribute__((__target__("no-abm")));
-extern void test_no_aes (void) 
__attribute__((__target__("no-aes")));
-extern void test_no_bmi (void) 
__attribute__((__target__("no-bmi")));
-extern void test_no_mmx (void) 
__attribute__((__target__("no-mmx")));
-extern void test_no_pclmul (void)  
__attribute__((__target__("no-pclmul")));
-extern void test_no_popcnt (void)  
__attribute__((__target__("no-popcnt")));
-extern void test_no_recip (void)   
__attribute__((__target__("no-recip")));
-extern void test_no_sse (void) 
__attribute__((__target__("no-sse")));
-extern void test_no_sse2 (void)
__attribute__((__target__("no-sse2")));
-extern void test_no_sse3 (void)
__attribute__((__target__("no-sse3")));
-extern void test_no_sse4 (void)
__attribute__((__target__("no-sse4")));
-extern void test_no_sse4_1 (void)  
__attribute__((__target__("no-sse4.1")));
-extern void test_no_sse4_2 (void)  
__attribute__((__target__("no-sse4.2")));
-extern void test_no_sse4a (void)   
__attribute__((__target__("no-sse4a")));
-extern void test_no_fma (void) 
__attribute__((__target__("no-fma")));
-extern void test_no_fma4 (void)
__attribute__((__target__("no-fma4")));
-extern void test_no_xop (void) 
__attribute__((__target__("no-xop")));
-extern void test_no_ssse3 

[PATCH] Make std::tuple_size SFINAE-friendly (LWG 2770)

2016-11-15 Thread Jonathan Wakely

This is needed to avoid problems with the new structured bindings
feature that landed in trunk yesterday.

As part of this patch I'm removing the docs for the DR 2742 and 2748
changes that I added earlier today. The manual doesn't need to track
changes against new features that only appear in trunk builds of GCC
and working drafts of the standard. DR 2770 should be documented
though, because it affects a C++11 feature that is in several GCC
releases.

* doc/xml/manual/intro.xml: Document LWG 2770 status. Remove entries
for 2742 and 2748.
* doc/html/*: Regenerate.
* include/std/utility (__tuple_size_cv_impl): New helper to safely
detect tuple_size::value, as per LWG 2770.
(tuple_size): Adjust partial specializations to derive from
__tuple_size_cv_impl.
* testsuite/20_util/tuple/cv_tuple_size.cc: Test SFINAE-friendliness.

Tested powerpc64le-linux, committed to trunk.

commit 6182fd1774f8129f127b213586807080fdc58d1b
Author: Jonathan Wakely 
Date:   Tue Nov 15 18:27:15 2016 +

Make std::tuple_size SFINAE-friendly (LWG 2770)

* doc/xml/manual/intro.xml: Document LWG 2770 status. Remove entries
for 2742 and 2748.
* doc/html/*: Regenerate.
* include/std/utility (__tuple_size_cv_impl): New helper to safely
detect tuple_size::value, as per LWG 2770.
(tuple_size): Adjust partial specializations to derive from
__tuple_size_cv_impl.
* testsuite/20_util/tuple/cv_tuple_size.cc: Test SFINAE-friendliness.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 7f2586d..d23008a 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1107,20 +1107,13 @@ requirements of the license of GCC.
 Define the value_compare typedef.
 
 
-http://www.w3.org/1999/xlink; 
xlink:href="../ext/lwg-defects.html#2742">2742:
-   Inconsistent string interface taking 
string_view
+http://www.w3.org/1999/xlink; 
xlink:href="../ext/lwg-defects.html#2770">2770:
+   tuple_sizeconst T specialization is not
+SFINAE compatible and breaks decomposition declarations

 
-Add the new constructor and additionally constrain it
-  to avoid ambiguities with non-const charT*.
-
-
-http://www.w3.org/1999/xlink; 
xlink:href="../ext/lwg-defects.html#2748">2748:
-   swappable traits for optionals
-   
-
-Disable the non-member swap overload when
-  the contained object is not swappable.
+Safely detect tuple_sizeT::value and
+  only use it if valid.
 
 
   
diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 2ca52fe..3982156 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -88,18 +88,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct tuple_size;
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2770. tuple_size specialization is not SFINAE compatible
+  template
+struct __tuple_size_cv_impl { };
+
+  template
+struct __tuple_size_cv_impl<_Tp, 
__void_t::value)>>
+: integral_constant::value> { };
+
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 2313. tuple_size should always derive from integral_constant
   template
-struct tuple_size
-: integral_constant::value> { };
+struct tuple_size : __tuple_size_cv_impl<_Tp> { };
 
   template
-struct tuple_size
-: integral_constant::value> { };
+struct tuple_size : __tuple_size_cv_impl<_Tp> { };
 
   template
-struct tuple_size
-: integral_constant::value> { };
+struct tuple_size : __tuple_size_cv_impl<_Tp> { };
 
   /// Gives the type of the ith element of a given tuple type.
   template
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cv_tuple_size.cc 
b/libstdc++-v3/testsuite/20_util/tuple/cv_tuple_size.cc
index df5e0e9..c4a1e02 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/cv_tuple_size.cc
+++ b/libstdc++-v3/testsuite/20_util/tuple/cv_tuple_size.cc
@@ -42,3 +42,13 @@ int main()
   test01();
   return 0;
 }
+
+// LWG DR 2770. tuple_size specialization is not SFINAE compatible
+template
+struct has_value : std::false_type { };
+
+template
+struct has_value> : std::true_type { };
+
+static_assert( !has_value::value, "" );
+static_assert( !has_value::value, "" );


Re: [PATCH] Add std::string constructor for substring of string_view (LWG 2742)

2016-11-15 Thread Jonathan Wakely

On 15/11/16 14:33 +, Jonathan Wakely wrote:

This is another issue resolution for C++17 features that was approved
at the recent meeting. I think this resolution is wrong too, but in
this case the fix is obvious so I've gone ahead and done it.

* doc/xml/manual/intro.xml: Document LWG 2742 status.
* doc/html/*: Regenerate.
* include/bits/basic_string.h
(basic_string(const T&, size_type, size_type, const Allocator&)): Add
constructor for substring of basic_string_view, as per LWG 2742 but
with additional constraint to fix ambiguity.
* testsuite/21_strings/basic_string/cons/char/9.cc: New test.
* testsuite/21_strings/basic_string/cons/wchar_t/9.cc: New test.

Tested powerpc64le-linux, comitted to trunk.


I forgot I already added an convenience alias template for checking
the condition in this patch.

Tested powerpc64le-linux, comitted to trunk.
commit f7852de7c77f0d9cc8520d10549da0652e334dc7
Author: Jonathan Wakely 
Date:   Tue Nov 15 18:55:35 2016 +

Use existing helper for new std::string constructor

	* include/bits/basic_string.h: Reuse _If_sv alias template for new
	constructor.

diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 943e88d..9af7bfb 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -585,6 +585,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 	{ _M_construct(__beg, __end); }
 
 #if __cplusplus > 201402L
+  template
+	using _If_sv = enable_if_t<
+	  __and_>::value,
+	  _Res>;
+
   /**
*  @brief  Construct string from a substring of a string_view.
*  @param  __t   Source string view.
@@ -592,9 +598,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @param  __n   The number of characters to copy from __t.
*  @param  __a   Allocator to use.
*/
-  template,
-			__not_>>
+  template>
 	basic_string(const _Tp& __t, size_type __pos, size_type __n,
 		 const _Alloc& __a = _Alloc())
 	: basic_string(__sv_type(__t).substr(__pos, __n), __a) { }
@@ -1252,12 +1256,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   append(__sv_type __sv)
   { return this->append(__sv.data(), __sv.size()); }
 
-  template
-	using _If_sv = enable_if_t<
-	  __and_>::value,
-	  _Res>;
-
   /**
*  @brief  Append a range of characters from a string_view.
*  @param __sv  The string_view to be appended from.


Re: [Patch] Remove variant, variant<T&> and variant<>

2016-11-15 Thread Jonathan Wakely

On 15/11/16 12:08 +, Jonathan Wakely wrote:

On 12/11/16 12:11 -0800, Tim Shen wrote:

At Issaquah we decided to remove the supports above.


OK with a suitable ChangeLog, thanks.


I've adjusted your ChangeLog entry to fit under 80 columns with TAB
set to 8 spaces. I've also adjusted the pretty printer test that was
using variant.

Tested x86_64-linux, committed to trunk.

commit ab5b0b5d8618cd03e4d21a37b782d2726361bde2
Author: Jonathan Wakely 
Date:   Tue Nov 15 19:21:02 2016 +

Adjust pretty printer test for variant

	* testsuite/libstdc++-prettyprinters/cxx17.cc: Adjust test for
	variant.

diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
index bc9b26a..96be8c7 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
@@ -86,8 +86,8 @@ main()
 // { dg-final { note-test v3 {std::variant [index 1] = {3}} } }
   variant v4{ str };
 // { dg-final { note-test v4 {std::variant [index 2] = {"string"}} } }
-  variant vref{str};
-// { dg-final { note-test vref {std::variant &> [index 0] = {"string"}} } }
+  variant vref{str};
+// { dg-final { note-test vref {std::variant [index 0] = {"string"}} } }
 
   map m{ {1, "one"} };
   map::node_type n0;


Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Richard Earnshaw (lists)
On 15/11/16 16:48, Jiong Wang wrote:
> 
> 
> On 15/11/16 16:18, Jakub Jelinek wrote:
>> On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:
>Takes one signed LEB128 offset and retrieves 8-byte contents
> from the address
>calculated by CFA plus this offset, the contents then
> authenticated as per A
>key for instruction pointer using current CFA as salt. The
> result is pushed
>onto the stack.
 I'd like to point out that especially the vendor range of DW_OP_* is
 extremely scarce resource, we have only a couple of unused values,
 so taking
 3 out of the remaining unused 12 for a single architecture is IMHO
 too much.
 Can't you use just a single opcode and encode which of the 3
 operations it is
 in say the low 2 bits of a LEB 128 operand?
 We'll likely need to do RSN some multiplexing even for the generic GNU
 opcodes if we need just a few further ones (say 0xff as an extension,
 followed by uleb128 containing the opcode - 0xff).
 In the non-vendor area we still have 54 values left, so there is
 more space
 for future expansion.
>>>Seperate DWARF operations are introduced instead of combining all
>>> of them into
>>> one are mostly because these operations are going to be used for most
>>> of the
>>> functions once return address signing are enabled, and they are used for
>>> describing frame unwinding that they will go into unwind table for
>>> C++ program
>>> or C program compiled with -fexceptions, the impact on unwind table
>>> size is
>>> significant.  So I was trying to lower the unwind table size overhead
>>> as much as
>>> I can.
>>>
>>>IMHO, three numbers actually is not that much for one architecture
>>> in DWARF
>>> operation vendor extension space as vendors can overlap with each
>>> other.  The
>>> only painful thing from my understand is there are platform vendors,
>>> for example
>>> "GNU" and "LLVM" etc, for which architecture vendor can't overlap with.
>> For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is
>> just
>> one range, so ideally the opcodes would be unique everywhere, if not,
>> there
>> is just a single GNU vendor, there is no separate range for Aarch64, that
>> can overlap with range for x86_64, and powerpc, etc.
>>
>> Perhaps we could declare that certain opcode subrange for the GNU
>> vendor is
>> architecture specific and document that the meaning of opcodes in that
>> range
>> and count/encoding of their arguments depends on the architecture, but
>> then
>> we should document how to figure out the architecture too (e.g. for ELF
>> base it on the containing EM_*).  All the tools that look at DWARF
>> (readelf,
>> objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to
>> agree on that
>> though.
>>
>> I know nothing about the aarch64 return address signing, would all 3
>> or say
>> 2 usually appear together without any separate pc advance, or are they
>> all
>> going to appear frequently and at different pcs?
> 
>   I think it's the latter, the DW_OP_AARCH64_paciasp and
> DW_OP_AARCH64_paciasp_deref are going to appear frequently and at
> different pcs.
> For example, the following function prologue, there are three
> instructions
> at 0x0, 0x4, 0x8.
> 
>   After the first instruction at 0x0, LR/X30 will be mangled.  The
> "paciasp" always
> mangle LR register using SP as salt and write back the value into LR. 
> We then generate
> DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is
> mangled in this
> way so they can unwind the original value properly.
> 
>   After the second instruction at 0x4, The mangled value of LR/X30 will
> be pushed on
> to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes:
> first fetch the
> mangled value from stack offset -16, then do whatever to restore the
> original value
> from the mangled value.  This is represented by
> (DW_OP_AARCH64_paciasp_deref, offset).
> 
> .cfi_startproc
>0x0  paciasp (this instruction sign return address register LR/X30)
> .cfi_val_expression 30, DW_OP_AARCH64_paciasp
>0x4  stp x29, x30, [sp, -32]!
> .cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
> .cfi_offset 29, -32
> .cfi_def_cfa_offset 32
>0x8  add x29, sp, 0
> 

Now I'm confused.

I was thinking that we needed one opcode for the sign operation in the
prologue and one for the unsign/validate operation in the epilogue (to
support non-call exceptions.  But why do we need a separate code to say
that a previously signed value has now been pushed on the stack?  Surely
that's just a normal store operation that can be tracked through the
unwinding state machine.

I was expecting the third opcode to be needed for the special operations
that are not frequently used by the compiler.

R.

>> Perhaps if there is just 1
>> opcode and has all the info encoded just in one bigger uleb128 or
>> something
>> similar...
> 



Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions

2016-11-15 Thread Kelvin Nilsen

> 
>> Thanks for catching this.  I think I got endian confusion inside my head
>> while I was writing the above.  I will rewrite these comments, below also.
> 
> Note the ISA calls the bits in 32-bit registers 32..63, so that 63 is
> the rightmost bit in all registers.
> 

True, but the ISA only uses the lower half of the 64-bit register, so I
have describe my patterns using SI mode instead of DI mode, which is
part of the reason I was numbering my bits differently than the ISA
document.

The reason I am using SI mode is so that I don't have to disqualify the
use of these functions on a 32-bit big-endian configuration.

Do you want me to switch to DI mode for all the operands?

>>> I wonder if we really need all these predicate expanders, if it wouldn't
>>> be easier if the builtin handling code did the setb itself?
>>>
>>
>> The reason it seems most "natural" to me use the expanders is because I
>> need to introduce a temporary CR scratch register between expansion and
>> insn matching.  Also, it seems that the *setb pattern may be of more
>> general use in the future implementation of other built-in functions.
>> I'm inclined to keep this as is, but if you still feel otherwise, I'll
>> figure out how to avoid the expansion.
> 
> The code (in rs6000.c) expanding the builtin can create two insns directly,
> so that you do not need to repeat this over and over in define_expands?
> 

The pattern I'm familiar with is to allocate the temporary scratch
register during expansion, and to use the allocated temporary at insn
match time.  I'll have to teach myself a new pattern to do all of this
at insn match time.  Feel free to point me to an example of define_insn
code that does this.

Thanks again.


-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



Re: Add a load_extend_op wrapper

2016-11-15 Thread Jeff Law

On 11/15/2016 11:56 AM, Jeff Law wrote:

On 11/15/2016 11:12 AM, Richard Sandiford wrote:

Jeff Law  writes:

On 11/15/2016 05:42 AM, Richard Sandiford wrote:

LOAD_EXTEND_OP only applies to scalar integer modes that are narrower
than a word.  However, callers weren't consistent about which of these
checks they made beforehand, and also weren't consistent about whether
"smaller" was based on (bit)size or precision (IMO it's the latter).
This patch adds a wrapper to try to make the macro easier to use.

It's unclear to me how GET_MODE_PRECISION is different from
GET_MODE_SIZE or GET_MODE_BITSIZE.  But I haven't really thought about
it, particularly in the context of vector modes and such.  I'm certainly
willing to trust your judgment on this.


In this case it's really more about scalar integer modes than vector
modes.
I think using size and precision are equivalent for MODE_INT but they can
be different for MODE_PARTIAL_INT.  Using precision allows LOAD_EXTEND_OP
to apply to (say) PSImode extensions to SImode, whereas using the size
wouldn't, since PSImode and SImode have the same (memory) size.

Ah, partial modes.  No idea what the right thing to do would be.  So
again, I'll trust your judgment.

The only target where I had to deal with partial modes was the mn102. On
that target partial modes (PSI/24 bits) are larger than the machine's
natural word size (16 bits) so LOAD_EXTEND_OP didn't apply.
More correctly, didn't apply to partial modes.  We certainly used it to 
avoid unnecessary extensions when loading 8 bit values.


jeff


Re: Add a load_extend_op wrapper

2016-11-15 Thread Jeff Law

On 11/15/2016 11:12 AM, Richard Sandiford wrote:

Jeff Law  writes:

On 11/15/2016 05:42 AM, Richard Sandiford wrote:

LOAD_EXTEND_OP only applies to scalar integer modes that are narrower
than a word.  However, callers weren't consistent about which of these
checks they made beforehand, and also weren't consistent about whether
"smaller" was based on (bit)size or precision (IMO it's the latter).
This patch adds a wrapper to try to make the macro easier to use.

It's unclear to me how GET_MODE_PRECISION is different from
GET_MODE_SIZE or GET_MODE_BITSIZE.  But I haven't really thought about
it, particularly in the context of vector modes and such.  I'm certainly
willing to trust your judgment on this.


In this case it's really more about scalar integer modes than vector modes.
I think using size and precision are equivalent for MODE_INT but they can
be different for MODE_PARTIAL_INT.  Using precision allows LOAD_EXTEND_OP
to apply to (say) PSImode extensions to SImode, whereas using the size
wouldn't, since PSImode and SImode have the same (memory) size.
Ah, partial modes.  No idea what the right thing to do would be.  So 
again, I'll trust your judgment.


The only target where I had to deal with partial modes was the mn102. 
On that target partial modes (PSI/24 bits) are larger than the machine's 
natural word size (16 bits) so LOAD_EXTEND_OP didn't apply.


jeff



Re: [PATCH v2] aarch64: Add split-stack initial support

2016-11-15 Thread Wilco Dijkstra

On 07/11/2016 16:59, Adhemerval Zanella wrote:
> On 14/10/2016 15:59, Wilco Dijkstra wrote:

> There is no limit afaik on gold split stack allocation handling,
> and I think one could be added for each backend (in the method
> override require to implement it).
> 
> In fact it is not really required to tie the nop generation with the
> instruction generated by 'aarch64_internal_mov_immediate', it is
> just a matter to simplify linker code.  

If there is no easy limit and you'll still require a nop, I think it is best 
then
to emit mov N+movk #0. Then the scheduler won't be able to reorder
them with the add/sub.

>> Is there any need to detect underflow of x10 or is there a guarantee that 
>> stacks are
>> never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's 
>> safe
>> to do a signed comparison.
> 
> I do not think so, at least none of current backend that implements
> split stack do so.

OK, well a signed comparison like in your new version works for underflow.

Now to the patch:


@@ -3316,6 +3339,28 @@ aarch64_expand_prologue (void)
   aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
 callee_adjust != 0 || frame_pointer_needed);
   aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed);
+
+  if (split_stack_arg_pointer_used_p ())
+{
+  /* Setup the argument pointer (x10) for -fsplit-stack code.  If
+__morestack was called, it will left the arg pointer to the
+old stack in x28.  Otherwise, the argument pointer is the top
+of current frame.  */
+  rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM);
+  rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM);
+  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+
+  rtx not_more = gen_label_rtx ();
+
+  rtx cmp = gen_rtx_fmt_ee (LT, VOIDmode, cc_reg, const0_rtx);
+  rtx jump = emit_jump_insn (gen_condjump (cmp, cc_reg, not_more));
+  JUMP_LABEL (jump) = not_more;
+  LABEL_NUSES (not_more) += 1;
+
+  emit_move_insn (x11, x28);
+
+  emit_label (not_more);
+}

If you pass the old sp in x11 when called from __morestack you can remove
the above thunk completely.

+  /* It limits total maximum stack allocation on 2G so its value can be
+ materialized using two instructions at most (movn/movk).  It might be
+ used by the linker to add some extra space for split calling non split
+ stack functions.  */
+  allocate = cfun->machine->frame.frame_size;
+  if (allocate > ((HOST_WIDE_INT) 1 << 31))
+{
+  sorry ("Stack frame larger than 2G is not supported for -fsplit-stack");
+  return;
+}

Note a 2-instruction mov/movk can generate any immediate up to 4GB and if
we need even large sizes, we could round up to a multiple of 64KB so that 2
instructions are enough for a 48-bit stack size...

+  int ninsn = aarch64_internal_mov_immediate (reg10, GEN_INT (-allocate),
+ true, Pmode);
+  gcc_assert (ninsn == 1 || ninsn == 2);
+  if (ninsn == 1)
+emit_insn (gen_nop ());

To avoid any issues with the nop being scheduled, it's best to emit an explicit 
movk
here (0x if allocate > 0, or 0 if zero) using gen_insv_immdi.

+void
+aarch64_split_stack_space_check (rtx size, rtx label)

Isn't very similar code used in aarch64_expand_split_stack_prologue? Any 
possibility
to share/reuse?

+static void
+aarch64_live_on_entry (bitmap regs)
+{
+  if (flag_split_stack)
+bitmap_set_bit (regs, R11_REGNUM);
+}

I'm wondering whether you need extra code in aarch64_can_eliminate to deal
with the argument pointer? Also do we need to define a fixed register, or will 
GCC
automatically allocate it to a callee-save if necessary?

+++ b/libgcc/config/aarch64/morestack.S

+/* Offset from __morestack frame where the arguments size saved and
+   passed to __generic_morestack.  */
+#define ARGS_SIZE_SAVE 80

This define is unused.

+# The normal function prologue follows here, with a small addition at the
+# end to set up the argument pointer if required (the prolog):
+#
+#   [...]  # default function prologue
+#  b.lt   function:
+#  movx11, x28

We don't need this if we pass sp in x11 when calling back to the original 
function.

+   stp x8, x10, [sp, 80]
+   stp x11, x12, [sp, 96]

No need to save x11 - it just contains original sp.

+   str x28, [sp, 112]
+   .cfi_offset 28, -112
+
+   # Setup on x28 the function initial frame pointer.
+   add x28, sp, MORESTACK_FRAMESIZE

Why save x28 when x28 = x29 + MORESTACK_FRAMESIZE? You can use x29
throughout the code as it is preserved by calls.

+   # Start using new stack
+   str x29, [x0, -16]!

This has no use.

+   mov sp, x0
+
+   # Set __private_ss stack guard for the new stack.
+   ldr x9, [x28, STACKFRAME_BASE + NEWSTACK_SAVE]
+   add x0, x0, BACKOFF

+   sub x0, x0, 16

Neither has this.

+   ldp 

[PATCH] rs6000: Separate shrink-wrapping for the FPRs

2016-11-15 Thread Segher Boessenkool
This implements separate shrink-wrapping for the save/restore of the
floating point registers.  It regression checks fine, but that does
not test on many big floating point routines (and neither does the
bootstrap itself).  So I'm not proposing this for trunk just yet.

I'll write a changelog if this patch actually works ;-)


Segher


---
 gcc/config/rs6000/rs6000.c | 162 -
 1 file changed, 131 insertions(+), 31 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d75d52c..ed04828 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -157,6 +157,7 @@ typedef struct GTY(()) machine_function
   /* The components already handled by separate shrink-wrapping, which should
  not be considered by the prologue and epilogue.  */
   bool gpr_is_wrapped_separately[32];
+  bool fpr_is_wrapped_separately[32];
   bool lr_is_wrapped_separately;
 } machine_function;
 
@@ -27723,17 +27724,25 @@ rs6000_get_separate_components (void)
   if (TARGET_SPE_ABI)
 return NULL;
 
-  sbitmap components = sbitmap_alloc (32);
-  bitmap_clear (components);
-
   gcc_assert (!(info->savres_strategy & SAVE_MULTIPLE)
  && !(info->savres_strategy & REST_MULTIPLE));
 
+  /* Component 0 is the save/restore of LR (done via GPR0).
+ Components 13..31 are the save/restore of GPR13..GPR31.
+ Components 46..63 are the save/restore of FPR14..FPR63.  */
+
+  int n_components = 64;
+
+  sbitmap components = sbitmap_alloc (n_components);
+  bitmap_clear (components);
+
+  int reg_size = TARGET_32BIT ? 4 : 8;
+  int fp_reg_size = 8;
+
   /* The GPRs we need saved to the frame.  */
   if ((info->savres_strategy & SAVE_INLINE_GPRS)
   && (info->savres_strategy & REST_INLINE_GPRS))
 {
-  int reg_size = TARGET_32BIT ? 4 : 8;
   int offset = info->gp_save_offset;
   if (info->push_p)
offset += info->total_size;
@@ -27758,6 +27767,23 @@ rs6000_get_separate_components (void)
   || (flag_pic && DEFAULT_ABI == ABI_DARWIN))
 bitmap_clear_bit (components, RS6000_PIC_OFFSET_TABLE_REGNUM);
 
+  /* The FPRs we need saved to the frame.  */
+  if ((info->savres_strategy & SAVE_INLINE_FPRS)
+  && (info->savres_strategy & REST_INLINE_FPRS))
+{
+  int offset = info->fp_save_offset;
+  if (info->push_p)
+   offset += info->total_size;
+
+  for (unsigned regno = info->first_fp_reg_save; regno < 64; regno++)
+   {
+ if (IN_RANGE (offset, -0x8000, 0x7fff) && save_reg_p (regno))
+   bitmap_set_bit (components, regno);
+
+ offset += fp_reg_size;
+   }
+}
+
   /* Optimize LR save and restore if we can.  This is component 0.  Any
  out-of-line register save/restore routines need LR.  */
   if (info->lr_save_p
@@ -27792,14 +27818,23 @@ rs6000_components_for_bb (basic_block bb)
   sbitmap components = sbitmap_alloc (32);
   bitmap_clear (components);
 
-  /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
+  /* A register is used in a bb if it is in the IN, GEN, or KILL sets.  */
+
+  /* GPRs.  */
   for (unsigned regno = info->first_gp_reg_save; regno < 32; regno++)
 if (bitmap_bit_p (in, regno)
|| bitmap_bit_p (gen, regno)
|| bitmap_bit_p (kill, regno))
   bitmap_set_bit (components, regno);
 
-  /* LR needs to be saved around a bb if it is killed in that bb.  */
+  /* FPRs.  */
+  for (unsigned regno = info->first_fp_reg_save; regno < 64; regno++)
+if (bitmap_bit_p (in, regno)
+   || bitmap_bit_p (gen, regno)
+   || bitmap_bit_p (kill, regno))
+  bitmap_set_bit (components, regno);
+
+  /* The link register.  */
   if (bitmap_bit_p (in, LR_REGNO)
   || bitmap_bit_p (gen, LR_REGNO)
   || bitmap_bit_p (kill, LR_REGNO))
@@ -27833,13 +27868,18 @@ rs6000_emit_prologue_components (sbitmap components)
   rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed
 ? HARD_FRAME_POINTER_REGNUM
 : STACK_POINTER_REGNUM);
+
+  machine_mode reg_mode = Pmode;
   int reg_size = TARGET_32BIT ? 4 : 8;
+  machine_mode fp_reg_mode = (TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT)
+? DFmode : SFmode;
+  int fp_reg_size = 8;
 
   /* Prologue for LR.  */
   if (bitmap_bit_p (components, 0))
 {
-  rtx reg = gen_rtx_REG (Pmode, 0);
-  rtx_insn *insn = emit_move_insn (reg, gen_rtx_REG (Pmode, LR_REGNO));
+  rtx reg = gen_rtx_REG (reg_mode, 0);
+  rtx_insn *insn = emit_move_insn (reg, gen_rtx_REG (reg_mode, LR_REGNO));
   RTX_FRAME_RELATED_P (insn) = 1;
   add_reg_note (insn, REG_CFA_REGISTER, NULL);
 
@@ -27849,7 +27889,7 @@ rs6000_emit_prologue_components (sbitmap components)
 
   insn = emit_insn (gen_frame_store (reg, ptr_reg, offset));
   RTX_FRAME_RELATED_P (insn) = 1;
-  rtx lr = gen_rtx_REG (Pmode, LR_REGNO);
+  rtx lr = gen_rtx_REG (reg_mode, LR_REGNO);
   rtx mem = copy_rtx 

Re: Add a load_extend_op wrapper

2016-11-15 Thread Eric Botcazou
> 2016-11-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
>   * rtl.h (load_extend_op): Declare.
>   * rtlanal.c (load_extend_op): New function.

I'd make it an inline function.

-- 
Eric Botcazou


Re: Fix handling of unknown sizes in rtx_addr_can_trap_p

2016-11-15 Thread Richard Sandiford
Jeff Law  writes:
> On 11/15/2016 09:21 AM, Richard Sandiford wrote:
>> If the size passed in to rtx_addr_can_trap_p was zero, the frame
>> handling would get the size from the mode instead.  However, this
>> too can be zero if the mode is BLKmode, i.e. if we have a BLKmode
>> memory reference with no MEM_SIZE (which should be rare these days).
>> This meant that the conditions for a 4-byte access at offset X were
>> stricter than those for an access of unknown size at offset X.
>>
>> This patch checks whether the size is still zero, as the
>> SYMBOL_REF handling does.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> [ This patch is part of the SVE series posted here:
>>   https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]
>>
>> gcc/
>> 2016-11-15  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>>
>>  * rtlanal.c (rtx_addr_can_trap_p_1): Handle unknown sizes.
> I guess it's conservatively correct in that claiming we can trap when we 
> can't never hurts correctness.
>
>
> I'm OK with the patch, but am quite curious how we got to this point 
> without an attached MEM_SIZE.

Yeah, I should have kept better notes...  This didn't show up on SVE
itself, but was something I tripped over while doing the before-and-after
comparison of assembly output on other targets.  A later patch in the
SVE series removes:

 if (size == 0)
   size = GET_MODE_SIZE (mode);

on the basis that if a meaningful mode size was available, the caller
would have passed it in as the size parameter.  However, that tripped
over the fact that, when the mode was BLKmode, we would treat the size
of 0 literally.  This in turn meant that that later patch caused minor
codegen changes on other targets.

Thanks,
Richard


Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions

2016-11-15 Thread Segher Boessenkool
On Tue, Nov 15, 2016 at 11:05:07AM -0700, Kelvin Nilsen wrote:
> >>* config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value.
> >>(UNSPEC_CMPRB2): New unspec value.
> > 
> > I wonder if you really need both?  The number of arguments will tell
> > which is which, anyway?
> 
> I appreciate your preference to avoid proliferation of special-case
> unspec constants.  However, it is a not so straightforward to combine
> these two cases under the same constant value.  The issue is that though
> the two encoding conceptually represent different "numbers of
> arguments", the arguments are all packed inside of a 32-bit register.
> At the RTL level, it looks like the two different forms have the same
> number of arguments (the same number of register operands).  The
> difference is which bits serve relevant purposes within the incoming
> register operands.
> 
> So I'm inclined to keep this as is if that's ok with you.

Ah right, for some reason I thought the unspec had all the bounds as
separate args.  -ENOTENOUGHCOFFEE.

[ snip ]

> Thanks for catching this.  I think I got endian confusion inside my head
> while I was writing the above.  I will rewrite these comments, below also.

Note the ISA calls the bits in 32-bit registers 32..63, so that 63 is
the rightmost bit in all registers.

> > I wonder if we really need all these predicate expanders, if it wouldn't
> > be easier if the builtin handling code did the setb itself?
> > 
> 
> The reason it seems most "natural" to me use the expanders is because I
> need to introduce a temporary CR scratch register between expansion and
> insn matching.  Also, it seems that the *setb pattern may be of more
> general use in the future implementation of other built-in functions.
> I'm inclined to keep this as is, but if you still feel otherwise, I'll
> figure out how to avoid the expansion.

The code (in rs6000.c) expanding the builtin can create two insns directly,
so that you do not need to repeat this over and over in define_expands?


Segher


Re: Fix instances of gen_rtx_REG (VOIDmode, ...)

2016-11-15 Thread Jeff Law

On 11/15/2016 09:52 AM, Richard Sandiford wrote:

Several definitions of INCOMING_RETURN_ADDR_RTX used
gen_rtx_REG (VOIDmode, ...), which with later patches
would trip an assert.  This patch converts them to use
Pmode instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* config/i386/i386.h (INCOMING_RETURN_ADDR_RTX): Use Pmode instead
of VOIDmode.
* config/ia64/ia64.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/iq2000/iq2000.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/m68k/m68k.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/microblaze/microblaze.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/mips/mips.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/mn10300/mn10300.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/nios2/nios2.h (INCOMING_RETURN_ADDR_RTX): Likewise.

OK.
jeff



Re: Add a load_extend_op wrapper

2016-11-15 Thread Richard Sandiford
Jeff Law  writes:
> On 11/15/2016 05:42 AM, Richard Sandiford wrote:
>> LOAD_EXTEND_OP only applies to scalar integer modes that are narrower
>> than a word.  However, callers weren't consistent about which of these
>> checks they made beforehand, and also weren't consistent about whether
>> "smaller" was based on (bit)size or precision (IMO it's the latter).
>> This patch adds a wrapper to try to make the macro easier to use.
> It's unclear to me how GET_MODE_PRECISION is different from 
> GET_MODE_SIZE or GET_MODE_BITSIZE.  But I haven't really thought about 
> it, particularly in the context of vector modes and such.  I'm certainly 
> willing to trust your judgment on this.

In this case it's really more about scalar integer modes than vector modes.
I think using size and precision are equivalent for MODE_INT but they can
be different for MODE_PARTIAL_INT.  Using precision allows LOAD_EXTEND_OP
to apply to (say) PSImode extensions to SImode, whereas using the size
wouldn't, since PSImode and SImode have the same (memory) size.

>> The patch doesn't change reload, since different checks could have
>> unforeseen consequences.
> I think the same concepts apply in reload, but I understand the 
> hesitation to twiddle that code and deal with possible fallout.

Yeah :-)  I know it's a bit of a cop-out, but given the scale of the SVE
changes as a whole, we didn't want to go looking for unnecessary trouble.

>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> [ This patch is part of the SVE series posted here:
>>   https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]
>>
>> gcc/
>> 2016-11-15  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>>
>>  * rtl.h (load_extend_op): Declare.
>>  * rtlanal.c (load_extend_op): New function.
>>  (nonzero_bits1): Use it.
>>  (num_sign_bit_copies1): Likewise.
>>  * cse.c (cse_insn): Likewise.
>>  * fold-const.c (fold_single_bit_test): Likewise.
>>  (fold_unary_loc): Likewise.
>>  * fwprop.c (free_load_extend): Likewise.
>>  * postreload.c (reload_cse_simplify_set): Likewise.
>>  (reload_cse_simplify_operands): Likewise.
>>  * combine.c (try_combine): Likewise.
>>  (simplify_set): Likewise.  Remove redundant SUBREG_BYTE and
>>  subreg_lowpart_p checks.
> OK.
> jeff

Thanks,
Richard


[hsa-branch] Replace all omp references of GPGPU with HSA grid

2016-11-15 Thread Martin Jambor
Hi,

this is the last patch to the hsa branch before using it to create the
merge-to-trunk patches.  Basically, it replaces all references to
"GPGPU gridification" in omp-low.c to "HSA gridification" as requested
by Jakub at the Cauldron.

Committed to the HSA branch, it is part of the posted patches merging
it to trunk.

Thanks,

Martin



2016-11-12  Martin Jambor  

gcc/
* omp-low.c (grid references): Replace GPGPU in the function
comment with a reference to HSA grids.
(grid_dist_follows_simple_pattern): Likewise.
(grid_dist_follows_tiling_pattern): Likewise.
(grid_target_follows_gridifiable_pattern): Likewise.
(grid_expand_target_grid_body): Update function comment.
(GRID_MISSED_MSG_PREFIX): Replece GPGPU with HSA kernel.
(grid_attempt_target_gridification): Likewise.

testsuite/
* c-c++-common/gomp/gridify-1.c: Adjusted scan dump.
* c-c++-common/gomp/gridify-2.c: Likewise.
* c-c++-common/gomp/gridify-3.c: Likewise.
* gfortran.dg/gomp/gridify-1.f90: Likewise.
---
 gcc/omp-low.c| 25 -
 gcc/testsuite/c-c++-common/gomp/gridify-1.c  |  2 +-
 gcc/testsuite/c-c++-common/gomp/gridify-2.c  |  2 +-
 gcc/testsuite/c-c++-common/gomp/gridify-3.c  |  2 +-
 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 |  2 +-
 5 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d6d5272..cf228bf 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13600,10 +13600,10 @@ expand_omp_target (struct omp_region *region)
 }
 }
 
-/* Expand KFOR loop as a GPGPU kernel, i.e. as a body only with iteration
-   variable derived from the thread number.  INTRA_GROUP means this is an
-   expansion of a loop iterating over work-items within a separate iteration
-   over groups. */
+/* Expand KFOR loop as a HSA grifidied kernel, i.e. as a body only with
+   iteration variable derived from the thread number.  INTRA_GROUP means this
+   is an expansion of a loop iterating over work-items within a separate
+   iteration over groups. */
 
 static void
 grid_expand_omp_for_loop (struct omp_region *kfor, bool intra_group)
@@ -13729,7 +13729,7 @@ grid_remap_kernel_arg_accesses (tree *tp, int 
*walk_subtrees, void *data)
 static void expand_omp (struct omp_region *region);
 
 /* If TARGET region contains a kernel body for loop, remove its region from the
-   TARGET and expand it in GPGPU kernel fashion. */
+   TARGET and expand it in HSA gridified kernel fashion. */
 
 static void
 grid_expand_target_grid_body (struct omp_region *target)
@@ -17368,7 +17368,7 @@ struct grid_prop
 };
 
 #define GRID_MISSED_MSG_PREFIX "Will not turn target construct into a " \
-  "gridified GPGPU kernel because "
+  "gridified HSA kernel because "
 
 /* Return true if STMT is an assignment of a register-type into a local
VAR_DECL.  If GRID is non-NULL, the assignment additionally must not be to
@@ -17682,7 +17682,7 @@ grid_inner_loop_gridifiable_p (gomp_for *gfor, 
grid_prop *grid)
 
 /* Given distribute omp construct represented by DIST, which in the original
source forms a compound construct with a looping construct, return true if 
it
-   can be turned into a gridified GPGPU kernel.  Otherwise return false. GRID
+   can be turned into a gridified HSA kernel.  Otherwise return false. GRID
describes hitherto discovered properties of the loop that is evaluated for
possible gridification.  */
 
@@ -17867,7 +17867,7 @@ grid_handle_call_in_distribute (gimple_stmt_iterator 
*gsi)
 /* Given a sequence of statements within a distribute omp construct or a
parallel construct, which in the original source does not form a compound
construct with a looping construct, return true if it does not prevent us
-   from turning it into a gridified GPGPU kernel.  Otherwise return false. GRID
+   from turning it into a gridified HSA kernel.  Otherwise return false. GRID
describes hitherto discovered properties of the loop that is evaluated for
possible gridification.  IN_PARALLEL must be true if seq is within a
parallel construct and flase if it is only within a distribute
@@ -17991,10 +17991,9 @@ grid_dist_follows_tiling_pattern (gimple_seq seq, 
grid_prop *grid,
 return true;
 }
 
-/* If TARGET follows a pattern that can be turned into a gridified GPGPU
-   kernel, return true, otherwise return false.  In the case of success, also
-   fill in GROUP_SIZE_P with the requested group size or NULL if there is
-   none.  */
+/* If TARGET follows a pattern that can be turned into a gridified HSA kernel,
+   return true, otherwise return false.  In the case of success, also fill in
+   GRID with information describing the kernel grid.  */
 
 static bool
 grid_target_follows_gridifiable_pattern (gomp_target *target, grid_prop *grid)
@@ -18530,7 +18529,7 @@ grid_attempt_target_gridification (gomp_target *target,
   location_t loc = gimple_location 

[hsa branch] Move hsa headers to plugin libgomp directory

2016-11-15 Thread Martin Jambor
Hi,

this is one of the last two commits to the hsa branch I made while
preparing the merge to trunk, it moves headers generated from HSA
documentation to the plugin directory.

Committed to the HSA branch, it is part of the posted patches merging
it to trunk.

Thanks,

Martin


2016-11-12  Martin Jambor  

* hsa.h: Moved to plugin directory.
* hsa_ext_finalize.h: Likewise.
* plugin/plugin-hsa.c: Prefixed incldes of hsa headers with plugin/
---
 libgomp/{ => plugin}/hsa.h  | 0
 libgomp/{ => plugin}/hsa_ext_finalize.h | 0
 libgomp/plugin/plugin-hsa.c | 4 ++--
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename libgomp/{ => plugin}/hsa.h (100%)
 rename libgomp/{ => plugin}/hsa_ext_finalize.h (100%)

diff --git a/libgomp/hsa.h b/libgomp/plugin/hsa.h
similarity index 100%
rename from libgomp/hsa.h
rename to libgomp/plugin/hsa.h
diff --git a/libgomp/hsa_ext_finalize.h b/libgomp/plugin/hsa_ext_finalize.h
similarity index 100%
rename from libgomp/hsa_ext_finalize.h
rename to libgomp/plugin/hsa_ext_finalize.h
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index ef7a202..ecf8302 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -34,8 +34,8 @@
 #include 
 #include 
 #include 
-#include 
-#include 
+#include 
+#include 
 #include 
 #include "libgomp-plugin.h"
 #include "gomp-constants.h"
-- 
2.10.1



Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions

2016-11-15 Thread Kelvin Nilsen

Thank you very much for the prompt and thorough review.  There are a few
points below where I'd like to seek further clarification.

On 11/15/2016 04:19 AM, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Nov 14, 2016 at 04:43:35PM -0700, Kelvin Nilsen wrote:
>>  * config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value.
>>  (UNSPEC_CMPRB2): New unspec value.
> 
> I wonder if you really need both?  The number of arguments will tell
> which is which, anyway?

I appreciate your preference to avoid proliferation of special-case
unspec constants.  However, it is a not so straightforward to combine
these two cases under the same constant value.  The issue is that though
the two encoding conceptually represent different "numbers of
arguments", the arguments are all packed inside of a 32-bit register.
At the RTL level, it looks like the two different forms have the same
number of arguments (the same number of register operands).  The
difference is which bits serve relevant purposes within the incoming
register operands.

So I'm inclined to keep this as is if that's ok with you.

> 
>>  (cmprb_p): New expansion.
> 
> Not such a great name (now you get a gen_cmprb_p function which isn't
> a predicate itself).

I'll change these names.

> 
>>  (CMPRB): Add byte-in-range built-in function.
>>  (CMBRB2): Add byte-in-either_range built-in function.
>>  (CMPEQB): Add byte-in-set builtin-in function.
> 
> "builtin-in", and you typoed an underscore?

Thanks.


> 
>> +;; Predicate: test byte within range.
>> +;; Return in target register operand 0 a non-zero value iff the byte
>> +;; held in bits 24:31 of operand 1 is within the inclusive range
>> +;; bounded below by operand 2's bits 0:7 and above by operand 2's
>> +;; bits 8:15.
>> +(define_expand "cmprb_p"
> 
> It seems you got the bit numbers mixed up.  Maybe just call it the low
> byte, and the byte just above?
> 
> (And it always sets 0 or 1 here, you might want to make that more explicit).
> 
>> +;; Set bit 1 (the GT bit, 0x2) of CR register operand 0 to 1 iff the
> 
> That's 4, i.e. 0b0100.
> 
>> +;; Set operand 0 register to non-zero value iff the CR register named
>> +;; by operand 1 has its GT bit (0x2) or its LT bit (0x1) set.
>> +(define_insn "*setb"
> 
> LT is 8, GT is 4.  If LT is set it returns -1, otherwise if GT is set it
> returns 1, otherwise it returns 0.
> 

Thanks for catching this.  I think I got endian confusion inside my head
while I was writing the above.  I will rewrite these comments, below also.

>> +;; Predicate: test byte within two ranges.
>> +;; Return in target register operand 0 a non-zero value iff the byte
>> +;; held in bits 24:31 of operand 1 is within the inclusive range
>> +;; bounded below by operand 2's bits 0:7 and above by operand 2's
>> +;; bits 8:15 or if the byte is within the inclusive range bounded
>> +;; below by operand 2's bits 16:23 and above by operand 2's bits 24:31.
>> +(define_expand "cmprb2_p"
> 
> The high bound is higher in the reg than the low bound.  See the example
> where 0x3930 is used to do isdigit (and yes 0x3039 would be much more
> fun, but alas).
> 
>> +;; Predicate: test byte membership within set of 8 bytes.
>> +;; Return in target register operand 0 a non-zero value iff the byte
>> +;; held in bits 24:31 of operand 1 equals at least one of the eight
>> +;; byte values represented by the 64-bit register supplied as operand
>> +;; 2.  Note that the 8 byte values held within operand 2 need not be
>> +;; unique. 
> 
> (trailing space)
> 
> I wonder if we really need all these predicate expanders, if it wouldn't
> be easier if the builtin handling code did the setb itself?
> 

The reason it seems most "natural" to me use the expanders is because I
need to introduce a temporary CR scratch register between expansion and
insn matching.  Also, it seems that the *setb pattern may be of more
general use in the future implementation of other built-in functions.
I'm inclined to keep this as is, but if you still feel otherwise, I'll
figure out how to avoid the expansion.



Re: Move misplaced assignment in num_sign_bit_copies1

2016-11-15 Thread Richard Sandiford
Eric Botcazou  writes:
>> 2016-11-15  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>> 
>>  * rtlanal.c (num_sign_bit_copies1): Calculate bitwidth after
>>  handling VOIDmode.
>
> OK, thanks, but please change the comment too, I think "For a smaller mode" 
> would be less confusing.

OK, thanks, installed with that change.

Richard


Re: [patch] remove more GCJ references

2016-11-15 Thread Matthias Klose
On 15.11.2016 16:52, Jeff Law wrote:
> On 11/15/2016 03:55 AM, Matthias Klose wrote:
>> This patch removes some references to gcj in the top level and config
>> directories and in the gcc documentation.  The change to the config directory
>> requires regenerating aclocal.m4 and configure in each sub directory.
>>
>> Ok for the trunk?
>>
>> Matthias
>>
>> 
>>
>> 2016-11-14  Matthias Klose  
>>
>> * config-ml.in: Remove references to GCJ.
>> * configure.ac: Likewise.
>> * configure: Regenerate.
>>
>> config/
>>
>> 2016-11-14  Matthias Klose  
>>
>> multi.m4: Don't set GCJ.
>>
>> gcc/
>>
>> 2016-11-14  Matthias Klose  
>>
>> * doc/install.texi: Remove references to gcj/libjava.
>> * doc/invoke.texi: Likewise.
>>
> OK.
> jeff

- committed.
- restored the accidentally regenerated files in libiberty.
- forgot to commit some files, now committed.

2016-11-15  Matthias Klose  

* Makefile.def: Remove references to GCJ.
* Makefile.tpl: Likewise.
* Makefile.in: Regenerate.


2016-11-15  Matthias Klose  

	* Makefile.def: Remove references to GCJ.
	* Makefile.tpl: Likewise.
	* Makefile.in: Regenerate.

Index: Makefile.def
===
--- Makefile.def	(revision 242433)
+++ Makefile.def	(working copy)
@@ -280,7 +280,6 @@
 flags_to_pass = { flag= CXXFLAGS_FOR_TARGET ; };
 flags_to_pass = { flag= DLLTOOL_FOR_TARGET ; };
 flags_to_pass = { flag= FLAGS_FOR_TARGET ; };
-flags_to_pass = { flag= GCJ_FOR_TARGET ; };
 flags_to_pass = { flag= GFORTRAN_FOR_TARGET ; };
 flags_to_pass = { flag= GOC_FOR_TARGET ; };
 flags_to_pass = { flag= GOCFLAGS_FOR_TARGET ; };
Index: Makefile.tpl
===
--- Makefile.tpl	(revision 242433)
+++ Makefile.tpl	(working copy)
@@ -156,7 +156,6 @@
 	CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
 	CXX="$(CXX_FOR_BUILD)"; export CXX; \
 	CXXFLAGS="$(CXXFLAGS_FOR_BUILD)"; export CXXFLAGS; \
-	GCJ="$(GCJ_FOR_BUILD)"; export GCJ; \
 	GFORTRAN="$(GFORTRAN_FOR_BUILD)"; export GFORTRAN; \
 	GOC="$(GOC_FOR_BUILD)"; export GOC; \
 	GOCFLAGS="$(GOCFLAGS_FOR_BUILD)"; export GOCFLAGS; \
@@ -194,7 +193,6 @@
 	CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
 	CXX="$(CXX)"; export CXX; \
 	CXXFLAGS="$(CXXFLAGS)"; export CXXFLAGS; \
-	GCJ="$(GCJ)"; export GCJ; \
 	GFORTRAN="$(GFORTRAN)"; export GFORTRAN; \
 	GOC="$(GOC)"; export GOC; \
 	AR="$(AR)"; export AR; \
@@ -282,7 +280,6 @@
 	CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
 	CPPFLAGS="$(CPPFLAGS_FOR_TARGET)"; export CPPFLAGS; \
 	CXXFLAGS="$(CXXFLAGS_FOR_TARGET)"; export CXXFLAGS; \
-	GCJ="$(GCJ_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export GCJ; \
 	GFORTRAN="$(GFORTRAN_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export GFORTRAN; \
 	GOC="$(GOC_FOR_TARGET) $(XGCC_FLAGS_FOR_TARGET) $$TFLAGS"; export GOC; \
 	DLLTOOL="$(DLLTOOL_FOR_TARGET)"; export DLLTOOL; \
@@ -348,7 +345,6 @@
 CXXFLAGS_FOR_BUILD = @CXXFLAGS_FOR_BUILD@
 CXX_FOR_BUILD = @CXX_FOR_BUILD@
 DLLTOOL_FOR_BUILD = @DLLTOOL_FOR_BUILD@
-GCJ_FOR_BUILD = @GCJ_FOR_BUILD@
 GFORTRAN_FOR_BUILD = @GFORTRAN_FOR_BUILD@
 GOC_FOR_BUILD = @GOC_FOR_BUILD@
 LDFLAGS_FOR_BUILD = @LDFLAGS_FOR_BUILD@
@@ -488,7 +484,6 @@
 GCC_FOR_TARGET=$(STAGE_CC_WRAPPER) @GCC_FOR_TARGET@
 CXX_FOR_TARGET=$(STAGE_CC_WRAPPER) @CXX_FOR_TARGET@
 RAW_CXX_FOR_TARGET=$(STAGE_CC_WRAPPER) @RAW_CXX_FOR_TARGET@
-GCJ_FOR_TARGET=$(STAGE_CC_WRAPPER) @GCJ_FOR_TARGET@
 GFORTRAN_FOR_TARGET=$(STAGE_CC_WRAPPER) @GFORTRAN_FOR_TARGET@
 GOC_FOR_TARGET=$(STAGE_CC_WRAPPER) @GOC_FOR_TARGET@
 DLLTOOL_FOR_TARGET=@DLLTOOL_FOR_TARGET@
@@ -614,7 +609,6 @@
 	'CC=$(CC)' \
 	'CXX=$(CXX)' \
 	'DLLTOOL=$(DLLTOOL)' \
-	'GCJ=$(GCJ)' \
 	'GFORTRAN=$(GFORTRAN)' \
 	'GOC=$(GOC)' \
 	'LD=$(LD)' \
@@ -670,7 +664,6 @@
 	 $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
 	'CXXFLAGS=$$(CXXFLAGS_FOR_TARGET)' \
 	'DLLTOOL=$$(DLLTOOL_FOR_TARGET)' \
-	'GCJ=$$(GCJ_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
 	'GFORTRAN=$$(GFORTRAN_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
 	'GOC=$$(GOC_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
 	'GOCFLAGS=$$(GOCFLAGS_FOR_TARGET)' \


Re: [C++ PATCH] SOme further g++.dg/cpp1z/decomp*.C tests

2016-11-15 Thread Jason Merrill
OK.

On Tue, Nov 15, 2016 at 9:13 AM, Jakub Jelinek  wrote:
> Hi!
>
> This patch adds 3 new tests.  Tested on x86_64-linux, ok for trunk?
>
> 2016-11-15  Jakub Jelinek  
>
> * g++.dg/cpp1z/decomp13.C: New test.
> * g++.dg/cpp1z/decomp14.C: New test.
> * g++.dg/cpp1z/decomp15.C: New test.
>
> --- gcc/testsuite/g++.dg/cpp1z/decomp13.C.jj2016-11-15 14:25:18.902048735 
> +0100
> +++ gcc/testsuite/g++.dg/cpp1z/decomp13.C   2016-11-15 14:48:12.795463351 
> +0100
> @@ -0,0 +1,30 @@
> +// { dg-do compile { target c++11 } }
> +// { dg-options "" }
> +
> +struct A { int f; };
> +struct B { int b; };
> +struct C : virtual A {};
> +struct D : virtual A {};
> +struct E { int f; };
> +struct F : A { int f; };
> +struct G : A, E {};
> +struct H : C, D {};
> +struct I : A, C {};// { dg-warning "due to ambiguity" }
> +struct J : B {};
> +struct K : B, virtual J {};// { dg-warning "due to ambiguity" }
> +struct L : virtual J {};
> +struct M : virtual J, L {};
> +
> +void
> +foo (C , F , G , H , I , K , M )
> +{
> +  auto [ ci ] = c; // { dg-warning "decomposition declaration 
> only available with" "" { target c++14_down } }
> +  auto [ fi ] = f; // { dg-error "cannot decompose class type 
> 'F': both it and its base class 'A' have non-static data members" }
> +   // { dg-warning "decomposition declaration 
> only available with" "" { target c++14_down } .-1 }
> +  auto [ gi ] = g; // { dg-error "cannot decompose class type 
> 'G': its base classes 'A' and 'E' have non-static data members" }
> +   // { dg-warning "decomposition declaration 
> only available with" "" { target c++14_down } .-1 }
> +  auto [ hi ] = h; // { dg-warning "decomposition declaration 
> only available with" "" { target c++14_down } }
> +  auto [ ki ] = k; // { dg-error "'B' is an ambiguous base of 
> 'K'" }
> +   // { dg-warning "decomposition declaration 
> only available with" "" { target c++14_down } .-1 }
> +  auto [ mi ] = m; // { dg-warning "decomposition declaration 
> only available with" "" { target c++14_down } }
> +}
> --- gcc/testsuite/g++.dg/cpp1z/decomp14.C.jj2016-11-15 14:30:40.296941834 
> +0100
> +++ gcc/testsuite/g++.dg/cpp1z/decomp14.C   2016-11-15 14:50:32.361678491 
> +0100
> @@ -0,0 +1,24 @@
> +// { dg-do compile }
> +// { dg-options "-std=c++1z" }
> +
> +struct A { bool a, b; };
> +struct B { int a, b; };
> +
> +void
> +foo ()
> +{
> +  auto [ a, b ] = A ();
> +  for (auto [ a, b ] = A (); a; )
> +;
> +  if (auto [ a, b ] = A (); a)
> +;
> +  switch (auto [ a, b ] = B (); b)
> +{
> +case 2:
> +  break;
> +}
> +  auto && [ c, d ] = A ();
> +  [[maybe_unused]] auto [ e, f ] = A ();
> +  alignas (A) auto [ g, h ] = A ();
> +  __attribute__((unused)) auto [ i, j ] = A ();
> +}
> --- gcc/testsuite/g++.dg/cpp1z/decomp15.C.jj2016-11-15 14:38:55.198602649 
> +0100
> +++ gcc/testsuite/g++.dg/cpp1z/decomp15.C   2016-11-15 14:46:33.0 
> +0100
> @@ -0,0 +1,47 @@
> +// { dg-do compile }
> +// { dg-options "-std=c++1z" }
> +
> +struct A { bool a, b; };
> +struct B { int a, b; };
> +
> +void
> +foo ()
> +{
> +  auto [ a, b ] = A ();
> +  for (; auto [ a, b ] = A (); )   // { dg-error 
> "expected" }
> +;
> +  for (; false; auto [ a, b ] = A ())  // { dg-error 
> "expected" }
> +;
> +  if (auto [ a, b ] = A ())// { dg-error 
> "expected" }
> +;
> +  if (auto [ a, b ] = A (); auto [ c, d ] = A ())  // { dg-error 
> "expected" }
> +;
> +  if (int d = 5; auto [ a, b ] = A ()) // { dg-error 
> "expected" }
> +;
> +  switch (auto [ a, b ] = B ())// { dg-error 
> "expected" }
> +{
> +case 2:
> +  break;
> +}
> +  switch (int d = 5; auto [ a, b ] = B ()) // { dg-error 
> "expected" }
> +{
> +case 2:
> +  break;
> +}
> +  A e = A ();
> +  auto && [ c, d ] = e;
> +  auto [ i, j ] = A (), [ k, l ] = A ();   // { dg-error 
> "expected" }
> +  auto m = A (), [ n, o ] = A ();  // { dg-error 
> "expected" }
> +}
> +
> +template 
> +auto [ a, b ] = A ();  // { dg-error 
> "expected" }
> +
> +struct C
> +{
> +  auto [ e, f ] = A ();// { dg-error 
> "expected" }
> +  mutable auto [ g, h ] = A ();// { dg-error 
> "expected" }
> +  virtual auto [ i, j ] = A ();// { dg-error 
> "expected" }
> +  explicit auto [ k, l ] = A ();   // { dg-error 
> "expected" }
> +  friend auto [ m, n ] = A (); // { dg-error 
> "expected" }
> +};
>
> Jakub


Re: [PATCH] Add map clauses to libgomp test device-3.f90

2016-11-15 Thread Alexander Monakov
On Tue, 15 Nov 2016, Alexander Monakov wrote:
> Yep, I do see new test execution failures with both Intel MIC and PTX 
> offloading
> on device-1.f90, device-3.f90 and target2.f90.  Here's an actually-tested 
> patch
> for the first two (on target2.f90 there's a different problem).

And here's a patch for target2.f90.  I don't have a perfect understanding of
mapping clauses, but the test appears to need to explicitly map pointer
variables, at a minimum.  Also, 'map (from: r)' is missing on the last target
region.

* testsuite/libgomp.fortran/target2.f90 (foo): Add mapping clauses to
target construct.

diff --git a/libgomp/testsuite/libgomp.fortran/target2.f90 
b/libgomp/testsuite/libgomp.fortran/target2.f90
index 42f704f..7119774 100644
--- a/libgomp/testsuite/libgomp.fortran/target2.f90
+++ b/libgomp/testsuite/libgomp.fortran/target2.f90
@@ -63,7 +63,7 @@ contains
   r = r .or. (any (k(5:n-5) /= 17)) .or. (lbound (k, 1) /= 4) .or. (ubound 
(k, 1) /= n)
 !$omp end target
 if (r) call abort
-!$omp target map (to: d(2:n+1), n)
+!$omp target map (to: d(2:n+1), f, j) map (from: r)
   r = a /= 7
   r = r .or. (any (b /= 8)) .or. (lbound (b, 1) /= 3) .or. (ubound (b, 1) 
/= n)
   r = r .or. (any (c /= 9)) .or. (lbound (c, 1) /= 5) .or. (ubound (c, 1) 
/= n + 4)



Re: [PATCH/AARCH64] Have the verbose cost model output output be controllable

2016-11-15 Thread James Greenhalgh
On Tue, Nov 15, 2016 at 08:48:04AM -0800, Andrew Pinski wrote:
> On Fri, Oct 7, 2016 at 1:01 AM, Kyrill Tkachov
>  wrote:
> > Hi Andrew,
> >
> >
> > On 24/09/16 06:46, Andrew Pinski wrote:
> >>
> >> Hi,
> >>As reported in PR 61367, the aarch64 back-end is too verbose when it
> >> is dealing with the cost model.  I tend to agree, no other back-end is
> >> this verbose.  So I decided to add an option to enable this verbose
> >> output if requested.
> >>
> >> I did NOT document it in invoke.texi because I don't feel like this is
> >> an option which an user should use.  But I can add it if requested.
> >>
> >> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> >>
> >> Thanks,
> >> Andrew Pinski
> >>
> >> ChangeLog:
> >> * config/aarch64/aarch64.opt (mverbose-cost-dump): New option.
> >> * config/aarch64/aarch64.c (aarch64_rtx_costs): Use
> >> flag_aarch64_verbose_cost instead of checking for details dump.
> >> (aarch64_rtx_costs_wrapper): Likewise.
> >
> >
> > I'm okay with the idea, but I can't approve (cc'ing people who can).
> 
> Ping?

I think after having AArch64 back-end developers frustrated by the mountains
of dump output for two years it is a good time to relent...

This is OK with Kyrill's change applied.

Thanks,
James

> 
> 
> > One nit:
> >
> > +mverbose-cost-dump
> > +Common Var(flag_aarch64_verbose_cost)
> > +Enables verbose cost model dummping in the debug dump files.
> >
> > You should add "Undocumented" to that.
> > I don't think the option is major enough to warrant an entry in invoke.texi.
> > It's only for aarch64 backend developers who know exactly what they're
> > looking for.
> >
> > Cheers,
> > Kyrill
> >
> >


Re: [C++ PATCH] Add mangling for P0217R3 decompositions at namespace scope

2016-11-15 Thread Jason Merrill
OK.

On Tue, Nov 15, 2016 at 9:12 AM, Jakub Jelinek  wrote:
> Hi!
>
> On the following testcase we ICE, because the underlying artificial decls
> have NULL DECL_NAME (intentional), thus mangling is not able to figure out
> what to do.  This patch attempts to follow the
> http://sourcerytools.com/pipermail/cxx-abi-dev/2016-August/002951.html
> proposal (and for error recovery just uses  in order not to ICE).
>
> Not really sure about ABI tags though.
> I guess one can specify abi tag on the whole decomposition, perhaps
> __attribute__((abi_tag ("foobar"))) auto [ a, b ] = A ();
> And/or there could be ABI tags on the type of the artifical decl.
> What about ABI tags on the types that the decomposition resolved to
> (say if std::tuple* is involved)?  Shall all ABI tags go at the end
> of the whole decomp decl, or shall the individual source names have their
> ABI tags attached after them?
> What about the std::tuple* case where the standalone vars exist too,
> shall e.g. abi_tag attributes be copied from the decomp var to those?
> Any other attributes to copy over (e.g. unused comes to mind).
>
> In any case, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk (and the rest would be resolved incrementally)?
>
> 2016-11-15  Jakub Jelinek  
>
> * decl.c (cp_finish_decomp): For DECL_NAMESPACE_SCOPE_P decl,
> set DECL_ASSEMBLER_NAME.
> * parser.c (cp_parser_decomposition_declaration): Likewise
> if returning error_mark_node.
> * mangle.c (mangle_decomp): New function.
> * cp-tree.h (mangle_decomp): New declaration.
>
> * g++.dg/cpp1z/decomp12.C: New test.
>
> --- gcc/cp/decl.c.jj2016-11-15 09:57:00.0 +0100
> +++ gcc/cp/decl.c   2016-11-15 12:16:41.230596777 +0100
> @@ -7301,7 +7301,6 @@ get_tuple_decomp_init (tree decl, unsign
>  void
>  cp_finish_decomp (tree decl, tree first, unsigned int count)
>  {
> -  location_t loc = DECL_SOURCE_LOCATION (decl);
>if (error_operand_p (decl))
>  {
>   error_out:
> @@ -7315,9 +7314,12 @@ cp_finish_decomp (tree decl, tree first,
> }
>   first = DECL_CHAIN (first);
> }
> +  if (DECL_P (decl) && DECL_NAMESPACE_SCOPE_P (decl))
> +   SET_DECL_ASSEMBLER_NAME (decl, get_identifier (""));
>return;
>  }
>
> +  location_t loc = DECL_SOURCE_LOCATION (decl);
>if (type_dependent_expression_p (decl)
>/* This happens for range for when not in templates.
>  Still add the DECL_VALUE_EXPRs for later processing.  */
> @@ -7530,6 +7532,8 @@ cp_finish_decomp (tree decl, tree first,
> i++;
>   }
>  }
> +  if (DECL_NAMESPACE_SCOPE_P (decl))
> +SET_DECL_ASSEMBLER_NAME (decl, mangle_decomp (decl, v));
>  }
>
>  /* Returns a declaration for a VAR_DECL as if:
> --- gcc/cp/parser.c.jj  2016-11-15 10:37:56.0 +0100
> +++ gcc/cp/parser.c 2016-11-15 12:16:26.361784744 +0100
> @@ -12944,6 +12944,7 @@ cp_parser_decomposition_declaration (cp_
>tree decl = start_decl (declarator, decl_specifiers, SD_INITIALIZED,
>   NULL_TREE, decl_specifiers->attributes,
>   _scope);
> +  tree orig_decl = decl;
>
>unsigned int i;
>cp_expr e;
> @@ -13020,6 +13021,12 @@ cp_parser_decomposition_declaration (cp_
>if (pushed_scope)
>  pop_scope (pushed_scope);
>
> +  if (decl == error_mark_node && DECL_P (orig_decl))
> +{
> +  if (DECL_NAMESPACE_SCOPE_P (orig_decl))
> +   SET_DECL_ASSEMBLER_NAME (orig_decl, get_identifier (""));
> +}
> +
>return decl;
>  }
>
> --- gcc/cp/mangle.c.jj  2016-11-11 14:01:06.0 +0100
> +++ gcc/cp/mangle.c 2016-11-15 11:48:58.345751857 +0100
> @@ -3995,6 +3995,53 @@ mangle_vtt_for_type (const tree type)
>return mangle_special_for_type (type, "TT");
>  }
>
> +/* Returns an identifier for the mangled name of the decomposition
> +   artificial variable DECL.  DECLS is the vector of the VAR_DECLs
> +   for the identifier-list.  */
> +
> +tree
> +mangle_decomp (const tree decl, vec )
> +{
> +  gcc_assert (!type_dependent_expression_p (decl));
> +
> +  location_t saved_loc = input_location;
> +  input_location = DECL_SOURCE_LOCATION (decl);
> +
> +  start_mangling (decl);
> +  write_string ("_Z");
> +
> +  tree context = decl_mangling_context (decl);
> +  gcc_assert (context != NULL_TREE);
> +
> +  bool nested = false;
> +  if (DECL_NAMESPACE_STD_P (context))
> +write_string ("St");
> +  else if (context != global_namespace)
> +{
> +  nested = true;
> +  write_char ('N');
> +  write_prefix (decl_mangling_context (decl));
> +}
> +
> +  write_string ("DC");
> +  unsigned int i;
> +  tree d;
> +  FOR_EACH_VEC_ELT (decls, i, d)
> +write_unqualified_name (d);
> +  write_char ('E');
> +
> +  if (nested)
> +write_char ('E');
> +
> +  tree id = finish_mangling_get_identifier ();
> +  if (DEBUG_MANGLE)
> +fprintf (stderr, "mangle_decomp = 

Re: [PATCH] Add map clauses to libgomp test device-3.f90

2016-11-15 Thread Jakub Jelinek
On Tue, Nov 15, 2016 at 07:52:56PM +0300, Alexander Monakov wrote:
> On Mon, 14 Nov 2016, Alexander Monakov wrote:
> > On Mon, 14 Nov 2016, Martin Jambor wrote:
> > 
> > > Hi,
> > > 
> > > yesterday I forgot to send out the following patch.  The test
> > > libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90 was failing
> > > for me when I was testing the HSA branch merge but I believe the test
> > > itself is wrong and the failure is due to us now adhering to OpenMP
> > > 4.5 default mapping of scalars (i.e. firstprivate, as opposed to
> > > tofrom in 4.0) and the test itself needs to be fixed in the following
> > > way.
> > 
> > From inspection, I believe device-1.f90 in the same directory has the same
> > issue?
> 
> Yep, I do see new test execution failures with both Intel MIC and PTX 
> offloading
> on device-1.f90, device-3.f90 and target2.f90.  Here's an actually-tested 
> patch
> for the first two (on target2.f90 there's a different problem).
> 
>   Martin Jambor  
>   Alexander Monakov  
> 
>   * testsuite/libgomp.fortran/examples-4/device-1.f90 (e_57_1): Add
>   mapping clauses to target constructs.
>   * testsuite/libgomp.fortran/examples-4/device-3.f90 (e_57_3): Ditto.

Ok, thanks.

Jakub


Re: [PATCH] libiberty: Fix some demangler crashes caused by reading past end of input.

2016-11-15 Thread Ian Lance Taylor
On Mon, Nov 14, 2016 at 1:19 AM, Mark Wielaard  wrote:
> In various situations the cplus_demangle () function could read past the
> end of input causing crashes. Add checks in various places to not advance
> the demangle string location and fail early when end of string is reached.
> Add various examples of input strings to the testsuite that would crash
> test-demangle before the fixes.
>
> Found by using the American Fuzzy Lop (afl) fuzzer.
>
> libiberty/ChangeLog:
>
>* cplus-dem.c (demangle_signature): After 'H', template function,
>no success and don't advance position if end of string reached.
>(demangle_template): After 'z', template name, return zero on
>premature end of string.
>(gnu_special): Guard strchr against searching for zero characters.
>(do_type): If member, only advance mangled string when 'F' found.
>* testsuite/demangle-expected: Add examples of strings that could
>crash the demangler by reading past end of input.
> ---

This is OK.

Thanks.

Ian


Re: [PATCH][PR libgfortran/78314] Fix ieee_support_halting

2016-11-15 Thread FX
> disabling/enabling makes this api a lot heavier
> than before, but trapping cannot be decided at
> compile-time, although the result may be cached,
> i think this should not be a frequent operation.
> 
> otoh rereading my patch i think i fail to restore
> the original exception state correctly.

Well, if we have no choice, then let’s do it. (With an updated patch)

FX

Re: [PATCH] Add map clauses to libgomp test device-3.f90

2016-11-15 Thread Alexander Monakov
On Mon, 14 Nov 2016, Alexander Monakov wrote:
> On Mon, 14 Nov 2016, Martin Jambor wrote:
> 
> > Hi,
> > 
> > yesterday I forgot to send out the following patch.  The test
> > libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90 was failing
> > for me when I was testing the HSA branch merge but I believe the test
> > itself is wrong and the failure is due to us now adhering to OpenMP
> > 4.5 default mapping of scalars (i.e. firstprivate, as opposed to
> > tofrom in 4.0) and the test itself needs to be fixed in the following
> > way.
> 
> From inspection, I believe device-1.f90 in the same directory has the same
> issue?

Yep, I do see new test execution failures with both Intel MIC and PTX offloading
on device-1.f90, device-3.f90 and target2.f90.  Here's an actually-tested patch
for the first two (on target2.f90 there's a different problem).

Martin Jambor  
Alexander Monakov  

* testsuite/libgomp.fortran/examples-4/device-1.f90 (e_57_1): Add
mapping clauses to target constructs.
* testsuite/libgomp.fortran/examples-4/device-3.f90 (e_57_3): Ditto.

diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90 
b/libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90
index a411db4..30148f1 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90
@@ -9,12 +9,12 @@ program e_57_1
   a = 100
   b = 0
 
-  !$omp target if(a > 200 .and. a < 400)
+  !$omp target map(from: c) if(a > 200 .and. a < 400)
 c = omp_is_initial_device ()
   !$omp end target
 
   !$omp target data map(to: b) if(a > 200 .and. a < 400)
-!$omp target
+!$omp target map(from: b, d)
   b = 100
   d = omp_is_initial_device ()
 !$omp end target
@@ -25,12 +25,12 @@ program e_57_1
   a = a + 200
   b = 0
 
-  !$omp target if(a > 200 .and. a < 400)
+  !$omp target map(from: c) if(a > 200 .and. a < 400)
 c = omp_is_initial_device ()
   !$omp end target
 
   !$omp target data map(to: b) if(a > 200 .and. a < 400)
-!$omp target
+!$omp target map(from: b, d)
   b = 100
   d = omp_is_initial_device ()
 !$omp end target
@@ -41,12 +41,12 @@ program e_57_1
   a = a + 200
   b = 0
 
-  !$omp target if(a > 200 .and. a < 400)
+  !$omp target map(from: c) if(a > 200 .and. a < 400)
 c = omp_is_initial_device ()
   !$omp end target
 
   !$omp target data map(to: b) if(a > 200 .and. a < 400)
-!$omp target
+!$omp target map(from: b, d)
   b = 100
   d = omp_is_initial_device ()
 !$omp end target
diff --git a/libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90 
b/libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90
index a29f1b5..d770b91 100644
--- a/libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90
+++ b/libgomp/testsuite/libgomp.fortran/examples-4/device-3.f90
@@ -8,13 +8,13 @@ program e_57_3
   integer :: default_device
 
   default_device = omp_get_default_device ()
-  !$omp target
+  !$omp target map(from: res)
 res = omp_is_initial_device ()
   !$omp end target
   if (res) call abort
 
   call omp_set_default_device (omp_get_num_devices ())
-  !$omp target
+  !$omp target map(from: res)
 res = omp_is_initial_device ()
   !$omp end target
   if (.not. res) call abort


Fix instances of gen_rtx_REG (VOIDmode, ...)

2016-11-15 Thread Richard Sandiford
Several definitions of INCOMING_RETURN_ADDR_RTX used
gen_rtx_REG (VOIDmode, ...), which with later patches
would trip an assert.  This patch converts them to use
Pmode instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* config/i386/i386.h (INCOMING_RETURN_ADDR_RTX): Use Pmode instead
of VOIDmode.
* config/ia64/ia64.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/iq2000/iq2000.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/m68k/m68k.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/microblaze/microblaze.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/mips/mips.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/mn10300/mn10300.h (INCOMING_RETURN_ADDR_RTX): Likewise.
* config/nios2/nios2.h (INCOMING_RETURN_ADDR_RTX): Likewise.

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index add7a64..fdaf423 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2176,7 +2176,7 @@ extern int const 
x86_64_ms_sysv_extra_clobbered_registers[12];
 
 /* Before the prologue, RA is at 0(%esp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-  gen_rtx_MEM (VOIDmode, gen_rtx_REG (VOIDmode, STACK_POINTER_REGNUM))
+  gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
 
 /* After the prologue, RA is at -4(AP) in the current frame.  */
 #define RETURN_ADDR_RTX(COUNT, FRAME)  \
diff --git a/gcc/config/ia64/ia64.h b/gcc/config/ia64/ia64.h
index ac0cb86..c79e20b 100644
--- a/gcc/config/ia64/ia64.h
+++ b/gcc/config/ia64/ia64.h
@@ -896,7 +896,7 @@ enum reg_class
RTL is either a `REG', indicating that the return value is saved in `REG',
or a `MEM' representing a location in the stack.  This enables DWARF2
unwind info for C++ EH.  */
-#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (VOIDmode, BR_REG (0))
+#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, BR_REG (0))
 
 /* A C expression whose value is an integer giving the offset, in bytes, from
the value of the stack pointer register to the top of the stack frame at the
diff --git a/gcc/config/iq2000/iq2000.h b/gcc/config/iq2000/iq2000.h
index 3b9dceb..e79c9a7 100644
--- a/gcc/config/iq2000/iq2000.h
+++ b/gcc/config/iq2000/iq2000.h
@@ -258,7 +258,7 @@ enum reg_class
 : (rtx) 0)
 
 /* Before the prologue, RA lives in r31.  */
-#define INCOMING_RETURN_ADDR_RTX  gen_rtx_REG (VOIDmode, GP_REG_FIRST + 31)
+#define INCOMING_RETURN_ADDR_RTX  gen_rtx_REG (Pmode, GP_REG_FIRST + 31)
 
 
 /* Register That Address the Stack Frame.  */
diff --git a/gcc/config/m68k/m68k.h b/gcc/config/m68k/m68k.h
index 2aa858f..7b63bd2 100644
--- a/gcc/config/m68k/m68k.h
+++ b/gcc/config/m68k/m68k.h
@@ -768,7 +768,7 @@ do { if (cc_prev_status.flags & CC_IN_68881)
\
 
 /* Before the prologue, RA is at 0(%sp).  */
 #define INCOMING_RETURN_ADDR_RTX \
-  gen_rtx_MEM (VOIDmode, gen_rtx_REG (VOIDmode, STACK_POINTER_REGNUM))
+  gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM))
 
 /* After the prologue, RA is at 4(AP) in the current frame.  */
 #define RETURN_ADDR_RTX(COUNT, FRAME) \
diff --git a/gcc/config/microblaze/microblaze.h 
b/gcc/config/microblaze/microblaze.h
index dbfb652..849fab9 100644
--- a/gcc/config/microblaze/microblaze.h
+++ b/gcc/config/microblaze/microblaze.h
@@ -182,7 +182,7 @@ extern enum pipeline_type microblaze_pipe;
NOTE:  GDB has a workaround and expects this incorrect value.
If this is fixed, a corresponding fix to GDB is needed.  */
 #define INCOMING_RETURN_ADDR_RTX   \
-  gen_rtx_REG (VOIDmode, GP_REG_FIRST + MB_ABI_SUB_RETURN_ADDR_REGNUM)
+  gen_rtx_REG (Pmode, GP_REG_FIRST + MB_ABI_SUB_RETURN_ADDR_REGNUM)
 
 /* Use DWARF 2 debugging information by default.  */
 #define DWARF2_DEBUGGING_INFO
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 81862a9..12662a7 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1469,7 +1469,7 @@ FP_ASM_SPEC "\
 #define DWARF_FRAME_RETURN_COLUMN RETURN_ADDR_REGNUM
 
 /* Before the prologue, RA lives in r31.  */
-#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (VOIDmode, RETURN_ADDR_REGNUM)
+#define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM)
 
 /* Describe how we implement __builtin_eh_return.  */
 #define EH_RETURN_DATA_REGNO(N) \
diff --git a/gcc/config/mn10300/mn10300.h b/gcc/config/mn10300/mn10300.h
index 714c6a0..9fd3d4b 100644
--- a/gcc/config/mn10300/mn10300.h
+++ b/gcc/config/mn10300/mn10300.h
@@ -516,7 +516,7 @@ struct cum_arg
 /* The return address is saved both in the stack and in MDR.  Using
the stack location is handiest for what 

Re: [PATCH][PPC] Fix ICE using power9 with soft-float

2016-11-15 Thread Andrew Stubbs

On 15/11/16 12:29, Segher Boessenkool wrote:

The peepholes do not support it, or maybe the define_insns do not either.
The machine of course will not care.


Oh, OK, so probably the bug is not in the peephole at all, but in the 
define_insn, or lack thereof.


More investigation required.

Thanks

Andrew


Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jiong Wang

On 15/11/16 16:18, Jakub Jelinek wrote:

On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:

   Takes one signed LEB128 offset and retrieves 8-byte contents from the address
   calculated by CFA plus this offset, the contents then authenticated as per A
   key for instruction pointer using current CFA as salt. The result is pushed
   onto the stack.

I'd like to point out that especially the vendor range of DW_OP_* is
extremely scarce resource, we have only a couple of unused values, so taking
3 out of the remaining unused 12 for a single architecture is IMHO too much.
Can't you use just a single opcode and encode which of the 3 operations it is
in say the low 2 bits of a LEB 128 operand?
We'll likely need to do RSN some multiplexing even for the generic GNU
opcodes if we need just a few further ones (say 0xff as an extension,
followed by uleb128 containing the opcode - 0xff).
In the non-vendor area we still have 54 values left, so there is more space
for future expansion.

   Seperate DWARF operations are introduced instead of combining all of them 
into
one are mostly because these operations are going to be used for most of the
functions once return address signing are enabled, and they are used for
describing frame unwinding that they will go into unwind table for C++ program
or C program compiled with -fexceptions, the impact on unwind table size is
significant.  So I was trying to lower the unwind table size overhead as much as
I can.

   IMHO, three numbers actually is not that much for one architecture in DWARF
operation vendor extension space as vendors can overlap with each other.  The
only painful thing from my understand is there are platform vendors, for example
"GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just
one range, so ideally the opcodes would be unique everywhere, if not, there
is just a single GNU vendor, there is no separate range for Aarch64, that
can overlap with range for x86_64, and powerpc, etc.

Perhaps we could declare that certain opcode subrange for the GNU vendor is
architecture specific and document that the meaning of opcodes in that range
and count/encoding of their arguments depends on the architecture, but then
we should document how to figure out the architecture too (e.g. for ELF
base it on the containing EM_*).  All the tools that look at DWARF (readelf,
objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on 
that
though.

I know nothing about the aarch64 return address signing, would all 3 or say
2 usually appear together without any separate pc advance, or are they all
going to appear frequently and at different pcs?


 I think it's the latter, the DW_OP_AARCH64_paciasp and
DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs.
For example, the following function prologue, there are three instructions
at 0x0, 0x4, 0x8.

  After the first instruction at 0x0, LR/X30 will be mangled.  The "paciasp" 
always
mangle LR register using SP as salt and write back the value into LR.  We then 
generate
DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in 
this
way so they can unwind the original value properly.

  After the second instruction at 0x4, The mangled value of LR/X30 will be 
pushed on
to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first 
fetch the
mangled value from stack offset -16, then do whatever to restore the original 
value
from the mangled value.  This is represented by (DW_OP_AARCH64_paciasp_deref, 
offset).

.cfi_startproc
   0x0  paciasp (this instruction sign return address register LR/X30)
.cfi_val_expression 30, DW_OP_AARCH64_paciasp
   0x4  stp x29, x30, [sp, -32]!
.cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
.cfi_offset 29, -32
.cfi_def_cfa_offset 32
   0x8  add x29, sp, 0



  Perhaps if there is just 1
opcode and has all the info encoded just in one bigger uleb128 or something
similar...

Jakub




Fix vec_cmp comparison mode

2016-11-15 Thread Richard Sandiford
vec_cmps assign the result of a vector comparison to a mask.
The optab was called with the destination having mode mask_mode
but with the source (the comparison) having mode VOIDmode,
which led to invalid rtl if the source operand was used directly.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* optabs.c (vector_compare_rtx): Add a cmp_mode parameter
and use it in the final call to gen_rtx_fmt_ee.
(expand_vec_cond_expr): Update accordingly.
(expand_vec_cmp_expr): Likewise.

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 7a1f025..b135c9b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5283,14 +5283,15 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
   return code;
 }
 
-/* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
-   unsigned operators.  OPNO holds an index of the first comparison
-   operand in insn with code ICODE.  Do not generate compare instruction.  */
+/* Return a comparison rtx of mode CMP_MODE for COND.  Use UNSIGNEDP to
+   select signed or unsigned operators.  OPNO holds the index of the
+   first comparison operand for insn ICODE.  Do not generate the
+   compare instruction itself.  */
 
 static rtx
-vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
-   bool unsignedp, enum insn_code icode,
-   unsigned int opno)
+vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
+   tree t_op0, tree t_op1, bool unsignedp,
+   enum insn_code icode, unsigned int opno)
 {
   struct expand_operand ops[2];
   rtx rtx_op0, rtx_op1;
@@ -5318,7 +5319,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, 
tree t_op1,
   create_input_operand ([1], rtx_op1, m1);
   if (!maybe_legitimize_operands (icode, opno, 2, ops))
 gcc_unreachable ();
-  return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
+  return gen_rtx_fmt_ee (rcode, cmp_mode, ops[0].value, ops[1].value);
 }
 
 /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
@@ -5644,7 +5645,8 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree 
op1, tree op2,
return 0;
 }
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
+  comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
+  icode, 4);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
@@ -5688,7 +5690,8 @@ expand_vec_cmp_expr (tree type, tree exp, rtx target)
return 0;
 }
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+  comparison = vector_compare_rtx (mask_mode, tcode, op0a, op0b,
+  unsignedp, icode, 2);
   create_output_operand ([0], target, mask_mode);
   create_fixed_operand ([1], comparison);
   create_fixed_operand ([2], XEXP (comparison, 0));



Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jiong Wang



On 15/11/16 16:18, Jakub Jelinek wrote:

On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:

   Takes one signed LEB128 offset and retrieves 8-byte contents from the address
   calculated by CFA plus this offset, the contents then authenticated as per A
   key for instruction pointer using current CFA as salt. The result is pushed
   onto the stack.

I'd like to point out that especially the vendor range of DW_OP_* is
extremely scarce resource, we have only a couple of unused values, so taking
3 out of the remaining unused 12 for a single architecture is IMHO too much.
Can't you use just a single opcode and encode which of the 3 operations it is
in say the low 2 bits of a LEB 128 operand?
We'll likely need to do RSN some multiplexing even for the generic GNU
opcodes if we need just a few further ones (say 0xff as an extension,
followed by uleb128 containing the opcode - 0xff).
In the non-vendor area we still have 54 values left, so there is more space
for future expansion.

   Seperate DWARF operations are introduced instead of combining all of them 
into
one are mostly because these operations are going to be used for most of the
functions once return address signing are enabled, and they are used for
describing frame unwinding that they will go into unwind table for C++ program
or C program compiled with -fexceptions, the impact on unwind table size is
significant.  So I was trying to lower the unwind table size overhead as much as
I can.

   IMHO, three numbers actually is not that much for one architecture in DWARF
operation vendor extension space as vendors can overlap with each other.  The
only painful thing from my understand is there are platform vendors, for example
"GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just
one range, so ideally the opcodes would be unique everywhere, if not, there
is just a single GNU vendor, there is no separate range for Aarch64, that
can overlap with range for x86_64, and powerpc, etc.

Perhaps we could declare that certain opcode subrange for the GNU vendor is
architecture specific and document that the meaning of opcodes in that range
and count/encoding of their arguments depends on the architecture, but then
we should document how to figure out the architecture too (e.g. for ELF
base it on the containing EM_*).  All the tools that look at DWARF (readelf,
objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on 
that
though.

I know nothing about the aarch64 return address signing, would all 3 or say
2 usually appear together without any separate pc advance, or are they all
going to appear frequently and at different pcs?


  I think it's the latter, the DW_OP_AARCH64_paciasp and
DW_OP_AARCH64_paciasp_deref are going to appear frequently and at different pcs.
  
  For example, the following function prologue, there are three instructions

at 0x0, 0x4, 0x8.

  After the first instruction at 0x0, LR/X30 will be mangled.  The "paciasp" 
always
mangle LR register using SP as salt and write back the value into LR.  We then 
generate
DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is mangled in 
this
way so they can unwind the original value properly.

  After the second instruction at 0x4, The mangled value of LR/X30 will be 
pushed on
to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes: first 
fetch the
mangled value from stack offset -16, then do whatever to restore the original 
value
from the mangled value.  This is represented by (DW_OP_AARCH64_paciasp_deref, 
offset).

.cfi_startproc
   0x0  paciasp (this instruction sign return address register LR/X30)
.cfi_val_expression 30, DW_OP_AARCH64_paciasp
   0x4  stp x29, x30, [sp, -32]!
.cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
.cfi_offset 29, -32
.cfi_def_cfa_offset 32
   0x8  add x29, sp, 0


Perhaps if there is just 1
opcode and has all the info encoded just in one bigger uleb128 or something
similar...




Re: [PATCH/AARCH64] Have the verbose cost model output output be controllable

2016-11-15 Thread Andrew Pinski
On Fri, Oct 7, 2016 at 1:01 AM, Kyrill Tkachov
 wrote:
> Hi Andrew,
>
>
> On 24/09/16 06:46, Andrew Pinski wrote:
>>
>> Hi,
>>As reported in PR 61367, the aarch64 back-end is too verbose when it
>> is dealing with the cost model.  I tend to agree, no other back-end is
>> this verbose.  So I decided to add an option to enable this verbose
>> output if requested.
>>
>> I did NOT document it in invoke.texi because I don't feel like this is
>> an option which an user should use.  But I can add it if requested.
>>
>> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * config/aarch64/aarch64.opt (mverbose-cost-dump): New option.
>> * config/aarch64/aarch64.c (aarch64_rtx_costs): Use
>> flag_aarch64_verbose_cost instead of checking for details dump.
>> (aarch64_rtx_costs_wrapper): Likewise.
>
>
> I'm okay with the idea, but I can't approve (cc'ing people who can).

Ping?


> One nit:
>
> +mverbose-cost-dump
> +Common Var(flag_aarch64_verbose_cost)
> +Enables verbose cost model dummping in the debug dump files.
>
> You should add "Undocumented" to that.
> I don't think the option is major enough to warrant an entry in invoke.texi.
> It's only for aarch64 backend developers who know exactly what they're
> looking for.
>
> Cheers,
> Kyrill
>
>


Re: Rework subreg_get_info

2016-11-15 Thread Richard Sandiford
Richard Sandiford  writes:
> This isn't intended to change the behaviour, just rewrite the
> existing logic in a different (and hopefully clearer) way.
> The new form -- particularly the part based on the "block"
> concept -- is easier to convert to polynomial sizes.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Sorry, I should have said: this was also tested by compiling the
testsuite before and after the change at -O2 -ftree-vectorize on:

aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf
epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu
mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu
powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf
rx-elf s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu
sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf
xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin
x86_64-linux-gnu xtensa-elf

There were no differences in assembly output.

Thanks,
Richard


Re: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

2016-11-15 Thread Jan Hubicka
> Hi.
> 
> As seen on ppc64le during compilation of Firefox with LTO, combining inchash 
> value
> with a pointer, enum value and an integer, one can eventually get zero value.
> Thus I decided to introduce a new flag that would distinguish between not set 
> hash value
> and a valid and (possibly) zero value.
> 
> I've been running regression tests, ready to install after it finishes?
> Martin

> >From 952ca6f6c0f99bcd965825898970453fb413964e Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 11 Nov 2016 16:15:20 +0100
> Subject: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)
> 
> gcc/ChangeLog:
> 
> 2016-11-15  Martin Liska  
> 
>   PR ipa/78309
>   * ipa-icf.c (void sem_item::set_hash): Update m_hash_set.
>   (sem_function::get_hash): Make condition based on m_hash_set.
>   (sem_variable::get_hash): Likewise.
>   * ipa-icf.h (sem_item::m_hash_set): New property.
Yep, zero is definitly valid hash value:0

Patch is OK. We may consider backporting it to release branches.
Honza


Re: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

2016-11-15 Thread Jeff Law

On 11/15/2016 09:43 AM, Martin Liška wrote:

Hi.

As seen on ppc64le during compilation of Firefox with LTO, combining inchash 
value
with a pointer, enum value and an integer, one can eventually get zero value.
Thus I decided to introduce a new flag that would distinguish between not set 
hash value
and a valid and (possibly) zero value.

I've been running regression tests, ready to install after it finishes?
Martin


0001-Add-sem_item-m_hash_set-PR-ipa-78309.patch


From 952ca6f6c0f99bcd965825898970453fb413964e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Nov 2016 16:15:20 +0100
Subject: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

gcc/ChangeLog:

2016-11-15  Martin Liska  

PR ipa/78309
* ipa-icf.c (void sem_item::set_hash): Update m_hash_set.
(sem_function::get_hash): Make condition based on m_hash_set.
(sem_variable::get_hash): Likewise.
* ipa-icf.h (sem_item::m_hash_set): New property.

OK.

jeff



[PATCH] Add sem_item::m_hash_set (PR ipa/78309)

2016-11-15 Thread Martin Liška
Hi.

As seen on ppc64le during compilation of Firefox with LTO, combining inchash 
value
with a pointer, enum value and an integer, one can eventually get zero value.
Thus I decided to introduce a new flag that would distinguish between not set 
hash value
and a valid and (possibly) zero value.

I've been running regression tests, ready to install after it finishes?
Martin
>From 952ca6f6c0f99bcd965825898970453fb413964e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Nov 2016 16:15:20 +0100
Subject: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

gcc/ChangeLog:

2016-11-15  Martin Liska  

	PR ipa/78309
	* ipa-icf.c (void sem_item::set_hash): Update m_hash_set.
	(sem_function::get_hash): Make condition based on m_hash_set.
	(sem_variable::get_hash): Likewise.
	* ipa-icf.h (sem_item::m_hash_set): New property.
---
 gcc/ipa-icf.c | 10 ++
 gcc/ipa-icf.h |  3 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 1ab67f3..4352fd0 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -140,7 +140,8 @@ sem_usage_pair::sem_usage_pair (sem_item *_item, unsigned int _index):
for bitmap memory allocation.  */
 
 sem_item::sem_item (sem_item_type _type,
-		bitmap_obstack *stack): type (_type), m_hash (0)
+		bitmap_obstack *stack): type (_type), m_hash (0),
+		m_hash_set (false)
 {
   setup (stack);
 }
@@ -151,7 +152,7 @@ sem_item::sem_item (sem_item_type _type,
 
 sem_item::sem_item (sem_item_type _type, symtab_node *_node,
 		hashval_t _hash, bitmap_obstack *stack): type(_type),
-  node (_node), m_hash (_hash)
+  node (_node), m_hash (_hash), m_hash_set (true)
 {
   decl = node->decl;
   setup (stack);
@@ -230,6 +231,7 @@ sem_item::target_supports_symbol_aliases_p (void)
 void sem_item::set_hash (hashval_t hash)
 {
   m_hash = hash;
+  m_hash_set = true;
 }
 
 /* Semantic function constructor that uses STACK as bitmap memory stack.  */
@@ -279,7 +281,7 @@ sem_function::get_bb_hash (const sem_bb *basic_block)
 hashval_t
 sem_function::get_hash (void)
 {
-  if (!m_hash)
+  if (!m_hash_set)
 {
   inchash::hash hstate;
   hstate.add_int (177454); /* Random number for function type.  */
@@ -2116,7 +2118,7 @@ sem_variable::parse (varpool_node *node, bitmap_obstack *stack)
 hashval_t
 sem_variable::get_hash (void)
 {
-  if (m_hash)
+  if (m_hash_set)
 return m_hash;
 
   /* All WPA streamed in symbols should have their hashes computed at compile
diff --git a/gcc/ipa-icf.h b/gcc/ipa-icf.h
index d8de655..8dc3d31 100644
--- a/gcc/ipa-icf.h
+++ b/gcc/ipa-icf.h
@@ -274,6 +274,9 @@ protected:
   /* Hash of item.  */
   hashval_t m_hash;
 
+  /* Indicated whether a hash value has been set or not.  */
+  bool m_hash_set;
+
 private:
   /* Initialize internal data structures. Bitmap STACK is used for
  bitmap memory allocation process.  */
-- 
2.10.1



Re: Fix handling of unknown sizes in rtx_addr_can_trap_p

2016-11-15 Thread Jeff Law

On 11/15/2016 09:21 AM, Richard Sandiford wrote:

If the size passed in to rtx_addr_can_trap_p was zero, the frame
handling would get the size from the mode instead.  However, this
too can be zero if the mode is BLKmode, i.e. if we have a BLKmode
memory reference with no MEM_SIZE (which should be rare these days).
This meant that the conditions for a 4-byte access at offset X were
stricter than those for an access of unknown size at offset X.

This patch checks whether the size is still zero, as the
SYMBOL_REF handling does.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtlanal.c (rtx_addr_can_trap_p_1): Handle unknown sizes.
I guess it's conservatively correct in that claiming we can trap when we 
can't never hurts correctness.



I'm OK with the patch, but am quite curious how we got to this point 
without an attached MEM_SIZE.


jeff


Rework subreg_get_info

2016-11-15 Thread Richard Sandiford
This isn't intended to change the behaviour, just rewrite the
existing logic in a different (and hopefully clearer) way.
The new form -- particularly the part based on the "block"
concept -- is easier to convert to polynomial sizes.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtlanal.c (subreg_get_info): Use more local variables.
Remark that for HARD_REGNO_NREGS_HAS_PADDING, each scalar unit
occupies at least one register.  Use byte_lowpart_offset to
check for big-endian offsets unless REG_WORDS_BIG_ENDIAN !=
WORDS_BIG_ENDIAN.  Share previously-duplicated if block.
Rework the main handling so that it operates on independently-
addressable YMODE-sized blocks.  Use subreg_size_lowpart_offset
to check lowpart offsets, without trying to find an equivalent
integer mode first.  Handle WORDS_BIG_ENDIAN != REG_WORDS_BIG_ENDIAN
as a final register-endianness correction.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index ca6cced..7c0acf5 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3601,31 +3601,28 @@ subreg_get_info (unsigned int xregno, machine_mode 
xmode,
 unsigned int offset, machine_mode ymode,
 struct subreg_info *info)
 {
-  int nregs_xmode, nregs_ymode;
-  int mode_multiple, nregs_multiple;
-  int offset_adj, y_offset, y_offset_adj;
-  int regsize_xmode, regsize_ymode;
-  bool rknown;
+  unsigned int nregs_xmode, nregs_ymode;
 
   gcc_assert (xregno < FIRST_PSEUDO_REGISTER);
 
-  rknown = false;
+  unsigned int xsize = GET_MODE_SIZE (xmode);
+  unsigned int ysize = GET_MODE_SIZE (ymode);
+  bool rknown = false;
 
   /* If there are holes in a non-scalar mode in registers, we expect
- that it is made up of its units concatenated together.  */
+ that it is made up of its units concatenated together.  Each scalar
+ unit occupies at least one register.  */
   if (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode))
 {
-  machine_mode xmode_unit;
-
   nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
-  xmode_unit = GET_MODE_INNER (xmode);
+  unsigned int nunits = GET_MODE_NUNITS (xmode);
+  machine_mode xmode_unit = GET_MODE_INNER (xmode);
   gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
   gcc_assert (nregs_xmode
- == (GET_MODE_NUNITS (xmode)
+ == (nunits
  * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
   gcc_assert (hard_regno_nregs[xregno][xmode]
- == (hard_regno_nregs[xregno][xmode_unit]
- * GET_MODE_NUNITS (xmode)));
+ == hard_regno_nregs[xregno][xmode_unit] * nunits);
 
   /* You can only ask for a SUBREG of a value with holes in the middle
 if you don't cross the holes.  (Such a SUBREG should be done by
@@ -3635,11 +3632,9 @@ subreg_get_info (unsigned int xregno, machine_mode xmode,
 3 for each part, but in memory it's two 128-bit parts.
 Padding is assumed to be at the end (not necessarily the 'high part')
 of each unit.  */
-  if ((offset / GET_MODE_SIZE (xmode_unit) + 1
-  < GET_MODE_NUNITS (xmode))
+  if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < nunits)
  && (offset / GET_MODE_SIZE (xmode_unit)
- != ((offset + GET_MODE_SIZE (ymode) - 1)
- / GET_MODE_SIZE (xmode_unit
+ != ((offset + ysize - 1) / GET_MODE_SIZE (xmode_unit
{
  info->representable_p = false;
  rknown = true;
@@ -3651,18 +3646,17 @@ subreg_get_info (unsigned int xregno, machine_mode 
xmode,
   nregs_ymode = hard_regno_nregs[xregno][ymode];
 
   /* Paradoxical subregs are otherwise valid.  */
-  if (!rknown
-  && offset == 0
-  && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode))
+  if (!rknown && offset == 0 && ysize > xsize)
 {
   info->representable_p = true;
   /* If this is a big endian paradoxical subreg, which uses more
 actual hard registers than the original register, we must
 return a negative offset so that we find the proper highpart
 of the register.  */
-  if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD
- ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN)
-   info->offset = nregs_xmode - nregs_ymode;
+  if (REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN && ysize > UNITS_PER_WORD
+ ? REG_WORDS_BIG_ENDIAN
+ : byte_lowpart_offset (ymode, xmode) != 0)
+   info->offset = (int) nregs_xmode - (int) nregs_ymode;
   else
info->offset = 0;
   info->nregs = nregs_ymode;
@@ -3673,31 +3667,23 @@ subreg_get_info 

Re: [PATCH][PR libgfortran/78314] Fix ieee_support_halting

2016-11-15 Thread Szabolcs Nagy
On 15/11/16 16:22, FX wrote:
>> There seems to be a separate api for checking trapping support:
>> ieee_support_halting, but it only checked if the exception status
>> flags are available, so check trapping support too by enabling
>> and disabling traps.
> 
> Thanks for the patch.
> 
> I am worried about the unnecessary operations that we’re doing here: doesn’t 
> glibc have a way to tell you what it supports without having to do it (twice, 
> enabling then disabling)?
> 
> Also, the glibc doc states that: "Each of the macros FE_DIVBYZERO, 
> FE_INEXACT, FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW is defined when the 
> implementation supports handling of the corresponding exception”. It evens 
> says:
> 
>> Each constant is defined if and only if the FPU you are compiling for 
>> supports that exception, so you can test for FPU support with ‘#ifdef’.
> 
> So it seems rather clear that compile-time tests are the recommended way to 
> go.

i think that's a documentation bug then, it
should say that the macros imply the support
of fpu exception status flags, but not trapping.

(otherwise glibc could not provide iso c annex
f conforming fenv on aarch64 and arm, where FE_*
must be defined, but only status flag support
is required.)

disabling/enabling makes this api a lot heavier
than before, but trapping cannot be decided at
compile-time, although the result may be cached,
i think this should not be a frequent operation.

otoh rereading my patch i think i fail to restore
the original exception state correctly.




Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Jerry DeLisle

On 11/15/2016 07:59 AM, Jerry DeLisle wrote:

On 11/14/2016 11:22 PM, Thomas Koenig wrote:

Hi Jerry,


With these changes, OK for trunk?


Just going over this with a fine comb...

One thing just struck me:   The loop variables should be index_type, so

  const index_type m = xcount, n = ycount, k = count;

[...]

   index_type a_dim1, a_offset, b_dim1, b_offset, c_dim1, c_offset, i1, i2,
  i3, i4, i5, i6;

  /* Local variables */
  GFC_REAL_4 t1[65536], /* was [256][256] */
 f11, f12, f21, f22, f31, f32, f41, f42,
 f13, f14, f23, f24, f33, f34, f43, f44;
  index_type i, j, l, ii, jj, ll;
  index_type isec, jsec, lsec, uisec, ujsec, ulsec;

I agree that we should do the tuning of the inline limit
separately.



Several of my iterations used index_type. I found using integer gives better
performance. The reason is that they are of type ptr_diff_t which is a 64 bit
integer. I suspect we eliminate one memory fetch for each of these and reduce
the register loading by reducing the number of registers needed, two for one
situation. I will change back and retest.

and Paul commeneted "-ftree-vectorize turns on -ftree-loop-vectorize and
-ftree-slp-vectorize already."

I will remove those to options and keep -ftree-vectorize

I will report back my findings.



Changed back to index_type, all OK, must have been some OS stuff running in the 
background.


All comments incorporated. Standing by for approval.

Jerry



Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS

2016-11-15 Thread Jeff Law

On 11/01/2016 03:39 PM, Wilco Dijkstra wrote:

 Jeff Law  wrote:


I think you'll need to look at bz61320 before this could go in.


I had a look, but there is nothing there that is related - eventually
a latent alignment bug was fixed in IVOpt.

Excellent.  Thanks for digging into what really happened.


Note that the bswap phase
currently inserts unaligned accesses irrespectively of STRICT_ALIGNMENT
or SLOW_UNALIGNED_ACCESS:

-  if (bswap
 - && align < GET_MODE_ALIGNMENT (TYPE_MODE (load_type))
 - && SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align))
 -   return false;

If bswap is false no byte swap is needed, so we found a native endian load
and it will always perform the optimization by inserting an unaligned load.
This apparently works on all targets, and doesn't cause alignment traps or
huge slowdowns via trap emulation claimed by SLOW_UNALIGNED_ACCESS.
So I'm at a loss what these macros are supposed to mean and how I can query
whether a backend supports fast unaligned access for a particular mode.

What I actually want to write is something like:

 if (!FAST_UNALIGNED_LOAD (mode, align)) return false;

And know that it only accepts unaligned accesses that are efficient on the 
target.
Maybe we need a new hook like this and get rid of the old one?
As Richi indicated later, these decisions are probably made best at 
expansion time -- as long as we have the required information.  So I'd 
only go with a hook if (for example) the alignment information is lost 
by the time we get to expansion and thus we can't DTRT at expansion time.


Patch is OK.

jeff


Use df_read_modify_subreg_p in cprop.c

2016-11-15 Thread Richard Sandiford
local_cprop_find_used_regs punted on all multiword registers,
with the comment:

  /* Setting a subreg of a register larger than word_mode leaves
 the non-written words unchanged.  */

But this only applies if the outer mode is smaller than the
inner mode.  If they're the same size then writes to the subreg
are a normal full update.

This patch uses df_read_modify_subreg_p instead.  A later patch
adds more uses of the same routine, but this part had a (positive)
effect on code generation for the testsuite whereas the others
seemed to be simple clean-ups.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* cprop.c (local_cprop_find_used_regs): Use df_read_modify_subreg_p.

diff --git a/gcc/cprop.c b/gcc/cprop.c
index 6b4c0b8..31868a5 100644
--- a/gcc/cprop.c
+++ b/gcc/cprop.c
@@ -1161,9 +1161,7 @@ local_cprop_find_used_regs (rtx *xptr, void *data)
   return;
 
 case SUBREG:
-  /* Setting a subreg of a register larger than word_mode leaves
-the non-written words unchanged.  */
-  if (GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (x))) > BITS_PER_WORD)
+  if (df_read_modify_subreg_p (x))
return;
   break;
 



Add more subreg offset helpers

2016-11-15 Thread Richard Sandiford
Provide versions of subreg_lowpart_offset and subreg_highpart_offset
that work on mode sizes rather than modes.  Also provide a routine
that converts an lsb position to a subreg offset.

The intent (in combination with later patches) is to move the
handling of the BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN case into
just two places, so that for other combinations we don't have
to split offsets into words and subwords.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtl.h (subreg_size_offset_from_lsb): Declare.
(subreg_offset_from_lsb): Likewise.
(subreg_size_lowpart_offset): Likewise.
(subreg_size_highpart_offset): Likewise.
* emit-rtl.c (subreg_size_lowpart_offset): New function.
(subreg_lowpart_offset): Use it.
(subreg_size_highpart_offset): New function.
(subreg_highpart_offset): Use it.
* rtlanal.c (subreg_size_offset_from_lsb): New function.
(subreg_offset_from_lsb): Likewise.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 9ea0c8f..bc4e536 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1478,44 +1478,59 @@ gen_highpart_mode (machine_mode outermode, machine_mode 
innermode, rtx exp)
  subreg_highpart_offset (outermode, innermode));
 }
 
-/* Return the SUBREG_BYTE for an OUTERMODE lowpart of an INNERMODE value.  */
+/* Return the SUBREG_BYTE for a lowpart subreg whose outer mode has
+   OUTER_BYTES bytes and whose inner mode has INNER_BYTES bytes.  */
 
 unsigned int
-subreg_lowpart_offset (machine_mode outermode, machine_mode innermode)
+subreg_size_lowpart_offset (unsigned int outer_bytes, unsigned int inner_bytes)
 {
-  unsigned int offset = 0;
-  int difference = (GET_MODE_SIZE (innermode) - GET_MODE_SIZE (outermode));
+  if (outer_bytes > inner_bytes)
+/* Paradoxical subregs always have a SUBREG_BYTE of 0.  */
+return 0;
 
-  if (difference > 0)
-{
-  if (WORDS_BIG_ENDIAN)
-   offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
-  if (BYTES_BIG_ENDIAN)
-   offset += difference % UNITS_PER_WORD;
-}
+  if (BYTES_BIG_ENDIAN && WORDS_BIG_ENDIAN)
+return inner_bytes - outer_bytes;
+  else if (!BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN)
+return 0;
+  else
+return subreg_size_offset_from_lsb (outer_bytes, inner_bytes, 0);
+}
+
+/* Return the SUBREG_BYTE for an OUTERMODE lowpart of an INNERMODE value.  */
 
-  return offset;
+unsigned int
+subreg_lowpart_offset (machine_mode outermode, machine_mode innermode)
+{
+  return subreg_size_lowpart_offset (GET_MODE_SIZE (outermode),
+GET_MODE_SIZE (innermode));
 }
 
-/* Return offset in bytes to get OUTERMODE high part
-   of the value in mode INNERMODE stored in memory in target format.  */
+/* Return the SUBREG_BYTE for a highpart subreg whose outer mode has
+   OUTER_BYTES bytes and whose inner mode has INNER_BYTES bytes.  */
+
 unsigned int
-subreg_highpart_offset (machine_mode outermode, machine_mode innermode)
+subreg_size_highpart_offset (unsigned int outer_bytes,
+unsigned int inner_bytes)
 {
-  unsigned int offset = 0;
-  int difference = (GET_MODE_SIZE (innermode) - GET_MODE_SIZE (outermode));
+  gcc_assert (inner_bytes >= outer_bytes);
 
-  gcc_assert (GET_MODE_SIZE (innermode) >= GET_MODE_SIZE (outermode));
+  if (BYTES_BIG_ENDIAN && WORDS_BIG_ENDIAN)
+return 0;
+  else if (!BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN)
+return inner_bytes - outer_bytes;
+  else
+return subreg_size_offset_from_lsb (outer_bytes, inner_bytes,
+   (inner_bytes - outer_bytes)
+   * BITS_PER_UNIT);
+}
 
-  if (difference > 0)
-{
-  if (! WORDS_BIG_ENDIAN)
-   offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
-  if (! BYTES_BIG_ENDIAN)
-   offset += difference % UNITS_PER_WORD;
-}
+/* Return the SUBREG_BYTE for an OUTERMODE highpart of an INNERMODE value.  */
 
-  return offset;
+unsigned int
+subreg_highpart_offset (machine_mode outermode, machine_mode innermode)
+{
+  return subreg_size_highpart_offset (GET_MODE_SIZE (outermode),
+ GET_MODE_SIZE (innermode));
 }
 
 /* Return 1 iff X, assumed to be a SUBREG,
diff --git a/gcc/rtl.h b/gcc/rtl.h
index df5172b..2fca974 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2178,6 +2178,10 @@ extern void get_full_rtx_cost (rtx, machine_mode, enum 
rtx_code, int,
 extern unsigned int subreg_lsb (const_rtx);
 extern unsigned int subreg_lsb_1 (machine_mode, machine_mode,
  unsigned int);
+extern unsigned int subreg_size_offset_from_lsb 

C++ PATCH for c++/78358 (decltype and decomposition)

2016-11-15 Thread Jason Merrill
OK, (hopefully) one more patch for decltype and C++17 decomposition
declarations.  I hadn't been thinking that "referenced type" meant to
look through references in the tuple case, since other parts of
[dcl.decomp] define "the referenced type" directly, but that does seem
to be how it's used elsewhere in the standard.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 113051a8a3e231bb4003831a2f595cd8788eec64
Author: Jason Merrill 
Date:   Tue Nov 15 10:50:00 2016 -0500

PR c++/78358 - tuple decomposition decltype

* semantics.c (finish_decltype_type): Strip references for a tuple
decomposition.
* cp-tree.h (DECL_DECOMPOSITION_P): False for non-variables.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index edcd3b4..634efc9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -3627,10 +3627,10 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
   (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))->u.base.var_declared_inline_p \
= true)
 
-/* Nonzero if NODE is the artificial VAR_DECL for decomposition
+/* Nonzero if NODE is an artificial VAR_DECL for a C++17 decomposition
declaration.  */
 #define DECL_DECOMPOSITION_P(NODE) \
-  (DECL_LANG_SPECIFIC (VAR_DECL_CHECK (NODE))  \
+  (VAR_P (NODE) && DECL_LANG_SPECIFIC (NODE)   \
? DECL_LANG_SPECIFIC (NODE)->u.base.decomposition_p \
: false)
 #define SET_DECL_DECOMPOSITION_P(NODE) \
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 29f5233..dc5ad13 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -8873,14 +8873,6 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
   if (identifier_p (expr))
 expr = lookup_name (expr);
 
-  /* The decltype rules for decomposition are different from the rules for
-member access; in particular, the decomposition decl gets
-cv-qualifiers from the aggregate object, whereas decltype of a member
-access expr ignores the object.  */
-  if (VAR_P (expr) && DECL_DECOMPOSITION_P (expr)
- && DECL_HAS_VALUE_EXPR_P (expr))
-   return unlowered_expr_type (DECL_VALUE_EXPR (expr));
-
   if (INDIRECT_REF_P (expr))
 /* This can happen when the expression is, e.g., "a.b". Just
look at the underlying operand.  */
@@ -8898,6 +8890,21 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
 /* See through BASELINK nodes to the underlying function.  */
 expr = BASELINK_FUNCTIONS (expr);
 
+  /* decltype of a decomposition name drops references in the tuple case
+(unlike decltype of a normal variable) and keeps cv-qualifiers from
+the containing object in the other cases (unlike decltype of a member
+access expression).  */
+  if (DECL_DECOMPOSITION_P (expr))
+   {
+ if (DECL_HAS_VALUE_EXPR_P (expr))
+   /* Expr is an array or struct subobject proxy, handle
+  bit-fields properly.  */
+   return unlowered_expr_type (expr);
+ else
+   /* Expr is a reference variable for the tuple case.  */
+   return non_reference (TREE_TYPE (expr));
+   }
+
   switch (TREE_CODE (expr))
 {
 case FIELD_DECL:
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp12.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp12.C
new file mode 100644
index 000..a5b686a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp12.C
@@ -0,0 +1,20 @@
+// PR c++/78358
+// { dg-do run }
+// { dg-options -std=c++1z }
+
+#include 
+
+template  struct same_type;
+template  struct same_type {};
+
+int main() {
+  std::tuple tuple = { 1, 'a', 2.3, true };
+  auto[i, c, d, b] = tuple;
+  same_type::type, decltype(i)>{};
+  same_type{};
+  same_type{};
+  same_type{};
+  same_type{};
+  if (i != 1 || c != 'a' || d != 2.3 || b != true)
+__builtin_abort ();
+}


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Andrew Senkevich
2016-11-15 17:56 GMT+03:00 Jeff Law :
> On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>>
>> 2016-11-11 14:16 GMT+03:00 Uros Bizjak :
>>>
>>> --- a/gcc/genmodes.c
>>> +++ b/gcc/genmodes.c
>>> --- a/gcc/init-regs.c
>>> +++ b/gcc/init-regs.c
>>> --- a/gcc/machmode.h
>>> +++ b/gcc/machmode.h
>>>
>>> These are middle-end changes, you will need a separate review for these.
>>
>>
>> Who could review these changes?
>
> I can.  I likely dropped the message because it looked x86 specific, so if
> you could resend it'd be appreciated.

Attached (diff with previous only in fixed comments typos).


--
WBR,
Andrew


new_avx512_instructions_15.11.patch
Description: Binary data


[PATCH, Fortran, pr78356, v1] [7 Regression] [OOP] segfault allocating polymorphic variable with polymorphic component with allocatable component

2016-11-15 Thread Andre Vehreschild
Hi all,

attached patch fixes the issue raised. The issue here was, that a copy of the
base class was generated and its address passed to the _vptr->copy()-method,
which then accessed memory, that was not present in the copy being an object of
the base class. The patch fixes this by making sure the temporary handle is a
pointer to the data to copy.

Sorry, when that is not clear. I am not feeling so well today. So here in
terms of pseudo code. This code was formerly generated:

struct ac {};
struct a : struct ac { integer *i; };

a src, dst;
ac temp;

temp = src; // temp is now only a copy of ac

_vptr.copy(, ); // temp does not denote memory having a pointer to i

After the patch, this code is generated:

// types as above
a src, dst;
ac *temp; // !!! Now a pointer

temp = 
_vptr.copy(temp, ); // temp now points to memory that has a pointer to i
// and is valid for copying.

Bootstraps and regtests ok on x86_64-linux/F23. Ok for trunk?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/fortran/ChangeLog:

2016-11-15  Andre Vehreschild  

PR fortran/78356
* class.c (gfc_is_class_scalar_expr): Prevent taking an array ref for
a component ref.
* trans-expr.c (gfc_trans_assignment_1): Ensure a reference to the
object to copy is generated, when assigning class objects.

gcc/testsuite/ChangeLog:

2016-11-15  Andre Vehreschild  

PR fortran/78356
* gfortran.dg/class_allocate_23.f08: New test.


diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index b42ec40..9db86b4 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -378,7 +378,8 @@ gfc_is_class_scalar_expr (gfc_expr *e)
 	&& CLASS_DATA (e->symtree->n.sym)
 	&& !CLASS_DATA (e->symtree->n.sym)->attr.dimension
 	&& (e->ref == NULL
-	|| (strcmp (e->ref->u.c.component->name, "_data") == 0
+	|| (e->ref->type == REF_COMPONENT
+		&& strcmp (e->ref->u.c.component->name, "_data") == 0
 		&& e->ref->next == NULL)))
 return true;
 
@@ -390,7 +391,8 @@ gfc_is_class_scalar_expr (gfc_expr *e)
 	&& CLASS_DATA (ref->u.c.component)
 	&& !CLASS_DATA (ref->u.c.component)->attr.dimension
 	&& (ref->next == NULL
-		|| (strcmp (ref->next->u.c.component->name, "_data") == 0
+		|| (ref->next->type == REF_COMPONENT
+		&& strcmp (ref->next->u.c.component->name, "_data") == 0
 		&& ref->next->next == NULL)))
 	return true;
 }
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 48296b8..1331b07 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -9628,6 +9628,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
   int n;
   bool maybe_workshare = false;
   symbol_attribute lhs_caf_attr, rhs_caf_attr, lhs_attr;
+  bool is_poly_assign;
 
   /* Assignment of the form lhs = rhs.  */
   gfc_start_block ();
@@ -9648,6 +9649,19 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
 	  || gfc_is_alloc_class_scalar_function (expr2)))
 expr2->must_finalize = 1;
 
+  /* Checking whether a class assignment is desired is quite complicated and
+ needed at two locations, so do it once only before the information is
+ needed.  */
+  lhs_attr = gfc_expr_attr (expr1);
+  is_poly_assign = (use_vptr_copy || lhs_attr.pointer
+		|| (lhs_attr.allocatable && !lhs_attr.dimension))
+		   && (expr1->ts.type == BT_CLASS
+		   || gfc_is_class_array_ref (expr1, NULL)
+		   || gfc_is_class_scalar_expr (expr1)
+		   || gfc_is_class_array_ref (expr2, NULL)
+		   || gfc_is_class_scalar_expr (expr2));
+
+
   /* Only analyze the expressions for coarray properties, when in coarray-lib
  mode.  */
   if (flag_coarray == GFC_FCOARRAY_LIB)
@@ -9676,6 +9690,10 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
   if (rss == gfc_ss_terminator)
 	/* The rhs is scalar.  Add a ss for the expression.  */
 	rss = gfc_get_scalar_ss (gfc_ss_terminator, expr2);
+  /* When doing a class assign, then the handle to the rhs needs to be a
+	 pointer to allow for polymorphism.  */
+  if (is_poly_assign && expr2->rank == 0 && !UNLIMITED_POLY (expr2))
+	rss->info->type = GFC_SS_REFERENCE;
 
   /* Associate the SS with the loop.  */
   gfc_add_ss_to_loop (, lss);
@@ -9835,14 +9853,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
 	gfc_add_block_to_block (, );
 }
 
-  lhs_attr = gfc_expr_attr (expr1);
-  if ((use_vptr_copy || lhs_attr.pointer
-   || (lhs_attr.allocatable && !lhs_attr.dimension))
-  && (expr1->ts.type == BT_CLASS
-	  || (gfc_is_class_array_ref (expr1, NULL)
-	  || gfc_is_class_scalar_expr (expr1))
-	  || (gfc_is_class_array_ref (expr2, NULL)
-	  || gfc_is_class_scalar_expr (expr2
+  if (is_poly_assign)
 {
   tmp = trans_class_assignment (, expr1, expr2, , ,
 use_vptr_copy || 

Optimise CONCAT handling in emit_group_load

2016-11-15 Thread Richard Sandiford
The CONCAT handling in emit_group_load chooses between doing
an extraction from a single component or forcing the whole
thing to memory and extracting from there.  The condition for
the former (more efficient) option was:

  if ((bytepos == 0 && bytelen == slen0)
  || (bytepos != 0 && bytepos + bytelen <= slen))

On the one hand this seems dangerous, since the second line
allows bit ranges that start in the first component and leak
into the second.  On the other hand it seems strange to allow
references that start after the first byte of the second
component but not those that start after the first byte
of the first component.  This led to a pessimisation of
things like gcc.dg/builtins-54.c for hppa64-hp-hpux11.23.

This patch simply checks whether the reference is contained
within a single component.  It also makes sure that we do
an extraction on anything that doesn't span the whole
component (even if it's constant).

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* expr.c (emit_group_load_1): Tighten check for whether an
access involves only one operand of a CONCAT.  Use extract_bit_field
for constants if the bit range does span the whole operand.

diff --git a/gcc/expr.c b/gcc/expr.c
index 0b0946d..985c2b3 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2175,19 +2175,22 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, 
tree type, int ssize)
{
  unsigned int slen = GET_MODE_SIZE (GET_MODE (src));
  unsigned int slen0 = GET_MODE_SIZE (GET_MODE (XEXP (src, 0)));
+ unsigned int elt = bytepos / slen0;
+ unsigned int subpos = bytepos % slen0;
 
- if ((bytepos == 0 && bytelen == slen0)
- || (bytepos != 0 && bytepos + bytelen <= slen))
+ if (subpos + bytelen <= slen0)
{
  /* The following assumes that the concatenated objects all
 have the same size.  In this case, a simple calculation
 can be used to determine the object and the bit field
 to be extracted.  */
- tmps[i] = XEXP (src, bytepos / slen0);
- if (! CONSTANT_P (tmps[i])
- && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode))
+ tmps[i] = XEXP (src, elt);
+ if (subpos != 0
+ || subpos + bytelen != slen0
+ || (!CONSTANT_P (tmps[i])
+ && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode)))
tmps[i] = extract_bit_field (tmps[i], bytelen * BITS_PER_UNIT,
-(bytepos % slen0) * BITS_PER_UNIT,
+subpos * BITS_PER_UNIT,
 1, NULL_RTX, mode, mode, false);
}
  else



RE: [PATCH] MIPS/GCC: Mark text contents as code or data

2016-11-15 Thread Matthew Fortune
Maciej Rozycki  writes:
>   gcc/
>   * config/mips/mips-protos.h (mips_set_text_contents_type): New
>   prototype.
>   * config/mips/mips.h (ASM_OUTPUT_BEFORE_CASE_LABEL): New macro.
>   (ASM_OUTPUT_CASE_END): Likewise.
>   * config/mips/mips.c (mips_set_text_contents_type): New
>   function.
>   (mips16_emit_constants): Record the pool's initial label number
>   with the `consttable' insn.  Emit a `consttable_end' insn at the
>   end.
>   (mips_final_prescan_insn): Call `mips_set_text_contents_type'
>   for `consttable' insns.
>   (mips_final_postscan_insn): Call `mips_set_text_contents_type'
>   for `consttable_end' insns.
>   * config/mips/mips.md (unspec): Add UNSPEC_CONSTTABLE_END enum
>   value.
>   (consttable): Add operand.
>   (consttable_end): New insn.
> 
>   gcc/testsuite/
>   * gcc.target/mips/data-sym-jump.c: New test case.
>   * gcc.target/mips/data-sym-pool.c: New test case.
>   * gcc.target/mips/insn-pseudo-4.c: Adjust for constant pool
>   annotation.

Thanks for working on this it is really useful functionality.

I'm a little concerned the expected output tests may be fragile over
time but let's wait and see.

OK to commit.

Thanks,
Matthew



Re: [PATCH][PR libgfortran/78314] Fix ieee_support_halting

2016-11-15 Thread FX
Hi,

> There seems to be a separate api for checking trapping support:
> ieee_support_halting, but it only checked if the exception status
> flags are available, so check trapping support too by enabling
> and disabling traps.

Thanks for the patch.

I am worried about the unnecessary operations that we’re doing here: doesn’t 
glibc have a way to tell you what it supports without having to do it (twice, 
enabling then disabling)?

Also, the glibc doc states that: "Each of the macros FE_DIVBYZERO, FE_INEXACT, 
FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW is defined when the implementation 
supports handling of the corresponding exception”. It evens says:

> Each constant is defined if and only if the FPU you are compiling for 
> supports that exception, so you can test for FPU support with ‘#ifdef’.

So it seems rather clear that compile-time tests are the recommended way to go.

FX

Re: Use MEM_SIZE rather than GET_MODE_SIZE in dce.c

2016-11-15 Thread Jeff Law

On 11/15/2016 09:17 AM, Richard Sandiford wrote:

Using MEM_SIZE is more general, since it copes with cases where
targets are forced to use BLKmode references for whatever reason.

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* dce.c (check_argument_store): Pass the size instead of
the memory reference.
(find_call_stack_args): Pass MEM_SIZE to check_argument_store.

OK.

Jeff



Fix handling of unknown sizes in rtx_addr_can_trap_p

2016-11-15 Thread Richard Sandiford
If the size passed in to rtx_addr_can_trap_p was zero, the frame
handling would get the size from the mode instead.  However, this
too can be zero if the mode is BLKmode, i.e. if we have a BLKmode
memory reference with no MEM_SIZE (which should be rare these days).
This meant that the conditions for a 4-byte access at offset X were
stricter than those for an access of unknown size at offset X.

This patch checks whether the size is still zero, as the
SYMBOL_REF handling does.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* rtlanal.c (rtx_addr_can_trap_p_1): Handle unknown sizes.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index a9d3960..889b14d 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -543,6 +543,8 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST_WIDE_INT offset, 
HOST_WIDE_INT size,
 
  if (size == 0)
size = GET_MODE_SIZE (mode);
+ if (size == 0)
+   return 1;
 
  if (x == frame_pointer_rtx)
{



Re: Some backward threader refactoring

2016-11-15 Thread Jeff Law

On 11/14/2016 02:39 AM, Jeff Law wrote:



I was looking at the possibility of dropping threading from VRP1/VRP2 or
DOM1/DOM2 in favor of the backwards threader -- the obvious idea being
to recover some compile-time for gcc-7.

Of the old-style threader passes (VRP1, VRP2, DOM1, DOM2), VRP2 is by
far the least useful.  But I can't see a path to removing it in the
gcc-7 timeframe.

Looking at what is caught by VRP and DOM threaders is quite interesting.
 VRP obviously catches stuff with ranges, some fairly complex.  While
you might think that querying range info in the backwards threader would
work, the problem is we lose way too much information as we drop
ASSERT_EXPRs.  (Recall that the threader runs while we're still in VRP
and thus has access to the ASSERT_EXPRs).

The DOM threaders catch stuff through state, simplifications and
bi-directional propagation of equivalences created by conditionals.

The most obvious limitation of the backwards walking threader is that it
only looks at PHIs, copies and constant initializations.  Other
statements are ignored and stop the backwards walk.

I've got a fair amount of support for walking through unary and limited
form binary expressions that I believe can be extended based on needs.
But that's not quite ready for stage1 close.  However, some of the
refactoring to make those changes easier to implement is ready.

This patch starts to break down fsm_find_control_statement_thread_paths
into more manageable hunks.

One such hunk is sub-path checking.  Essentially we're looking to add a
range of blocks to the thread path as we move from one def site to
another in the IL.  There aren't any functional changes in that
refactoring.  It's really just to make f_f_c_s_t_p easier to grok.

f_f_c_s_t_p has inline code to recursively walk backwards through PHI
nodes as well as assignments that are copies and constant initialization
terminals.  Pulling that handling out results in a f_f_c_s_t_p that fits
on a page.  It's just a hell of a lot easier to see what's going on.

The handling of assignments is slightly improved in this patch.
Essentially we only considered a const initialization using an
INTEGER_CST as a proper terminal node.  But certainly other constants
are useful -- ADDR_EXPR in particular and are now handled.  I'll mirror
that improvement in the PHI node routines tomorrow.

Anyway, this is really just meant to make it easier to start extending
the GIMPLE_ASSIGN handling.

Bootstrapped and regression tested on x86_64-linux-gnu.

I've got function comments for the new routines on a local branch.  I'll
get those installed before committing.
Final version attached.  Only change was allowing tcc_constant rather 
than just INTEGER_CST in PHIs and the addition of comments.


Bootstrapped and regression tested on x86, installing on the trunk.

Jeff
commit 4cbde473b184922d6c8423a7a63bdbb86de32b33
Author: Jeff Law 
Date:   Tue Nov 15 09:16:26 2016 -0700

* tree-ssa-threadbackward.c (fsm_find_thread_path): Remove unneeded
parameter.  Callers changed.
(check-subpath_and_update_thread_path): Extracted from
fsm_find_control_statement_thread_paths.
(handle_phi, handle_assignment, handle_assignment_p): Likewise.
(handle_phi, handle_assignment): Allow any constant node, not
just INTEGER_CST.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1e8475f..a54423a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2016-11-15  Jeff Law  
+
+   * tree-ssa-threadbackward.c (fsm_find_thread_path): Remove unneeded
+   parameter.  Callers changed.
+   (check-subpath_and_update_thread_path): Extracted from
+   fsm_find_control_statement_thread_paths.
+   (handle_phi, handle_assignment, handle_assignment_p): Likewise.
+   (handle_phi, handle_assignment): Allow any constant node, not
+   just INTEGER_CST.
+
 2016-11-15  Claudiu Zissulescu  
 
* config/arc/arc-arch.h: New file.
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index fd7d855..203e20e 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -62,14 +62,12 @@ get_gimple_control_stmt (basic_block bb)
 /* Return true if the CFG contains at least one path from START_BB to END_BB.
When a path is found, record in PATH the blocks from END_BB to START_BB.
VISITED_BBS is used to make sure we don't fall into an infinite loop.  Bound
-   the recursion to basic blocks belonging to LOOP.
-   SPEED_P indicate that we could increase code size to improve the code path 
*/
+   the recursion to basic blocks belonging to LOOP.  */
 
 static bool
 fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
  vec *,
- hash_set *visited_bbs, loop_p loop,
- bool speed_p)
+ hash_set *visited_bbs, loop_p loop)
 {
   if (loop != 

Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-15 Thread Jakub Jelinek
On Tue, Nov 15, 2016 at 04:00:40PM +, Jiong Wang wrote:
> >>   Takes one signed LEB128 offset and retrieves 8-byte contents from the 
> >> address
> >>   calculated by CFA plus this offset, the contents then authenticated as 
> >> per A
> >>   key for instruction pointer using current CFA as salt. The result is 
> >> pushed
> >>   onto the stack.
> >I'd like to point out that especially the vendor range of DW_OP_* is
> >extremely scarce resource, we have only a couple of unused values, so taking
> >3 out of the remaining unused 12 for a single architecture is IMHO too much.
> >Can't you use just a single opcode and encode which of the 3 operations it is
> >in say the low 2 bits of a LEB 128 operand?
> >We'll likely need to do RSN some multiplexing even for the generic GNU
> >opcodes if we need just a few further ones (say 0xff as an extension,
> >followed by uleb128 containing the opcode - 0xff).
> >In the non-vendor area we still have 54 values left, so there is more space
> >for future expansion.
> 
>   Seperate DWARF operations are introduced instead of combining all of them 
> into
> one are mostly because these operations are going to be used for most of the
> functions once return address signing are enabled, and they are used for
> describing frame unwinding that they will go into unwind table for C++ program
> or C program compiled with -fexceptions, the impact on unwind table size is
> significant.  So I was trying to lower the unwind table size overhead as much 
> as
> I can.
> 
>   IMHO, three numbers actually is not that much for one architecture in DWARF
> operation vendor extension space as vendors can overlap with each other.  The
> only painful thing from my understand is there are platform vendors, for 
> example
> "GNU" and "LLVM" etc, for which architecture vendor can't overlap with.

For DW_OP_*, there aren't two vendor ranges like e.g. in ELF, there is just
one range, so ideally the opcodes would be unique everywhere, if not, there
is just a single GNU vendor, there is no separate range for Aarch64, that
can overlap with range for x86_64, and powerpc, etc.

Perhaps we could declare that certain opcode subrange for the GNU vendor is
architecture specific and document that the meaning of opcodes in that range
and count/encoding of their arguments depends on the architecture, but then
we should document how to figure out the architecture too (e.g. for ELF
base it on the containing EM_*).  All the tools that look at DWARF (readelf,
objdump, eu-readelf, libdw, libunwind, gdb, dwz, ...) would need to agree on 
that
though.

I know nothing about the aarch64 return address signing, would all 3 or say
2 usually appear together without any separate pc advance, or are they all
going to appear frequently and at different pcs?  Perhaps if there is just 1
opcode and has all the info encoded just in one bigger uleb128 or something
similar...

Jakub


Use MEM_SIZE rather than GET_MODE_SIZE in dce.c

2016-11-15 Thread Richard Sandiford
Using MEM_SIZE is more general, since it copes with cases where
targets are forced to use BLKmode references for whatever reason.

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* dce.c (check_argument_store): Pass the size instead of
the memory reference.
(find_call_stack_args): Pass MEM_SIZE to check_argument_store.

diff --git a/gcc/dce.c b/gcc/dce.c
index 154469c..16340b64 100644
--- a/gcc/dce.c
+++ b/gcc/dce.c
@@ -234,16 +234,17 @@ mark_nonreg_stores (rtx body, rtx_insn *insn, bool fast)
 }
 
 
-/* Return true if store to MEM, starting OFF bytes from stack pointer,
+/* Return true if a store to SIZE bytes, starting OFF bytes from stack pointer,
is a call argument store, and clear corresponding bits from SP_BYTES
bitmap if it is.  */
 
 static bool
-check_argument_store (rtx mem, HOST_WIDE_INT off, HOST_WIDE_INT min_sp_off,
- HOST_WIDE_INT max_sp_off, bitmap sp_bytes)
+check_argument_store (HOST_WIDE_INT size, HOST_WIDE_INT off,
+ HOST_WIDE_INT min_sp_off, HOST_WIDE_INT max_sp_off,
+ bitmap sp_bytes)
 {
   HOST_WIDE_INT byte;
-  for (byte = off; byte < off + GET_MODE_SIZE (GET_MODE (mem)); byte++)
+  for (byte = off; byte < off + size; byte++)
 {
   if (byte < min_sp_off
  || byte >= max_sp_off
@@ -468,8 +469,8 @@ find_call_stack_args (rtx_call_insn *call_insn, bool 
do_mark, bool fast,
break;
}
 
-  if (GET_MODE_SIZE (GET_MODE (mem)) == 0
- || !check_argument_store (mem, off, min_sp_off,
+  if (!MEM_SIZE_KNOWN_P (mem)
+ || !check_argument_store (MEM_SIZE (mem), off, min_sp_off,
max_sp_off, sp_bytes))
break;
 



Tweak LRA handling of shared spill slots

2016-11-15 Thread Richard Sandiford
The previous code processed the users of a stack slot in order of
decreasing size and allocated the slot based on the first user.
This seems a bit dangerous, since the ordering is based on the
mode of the biggest reference while the allocation is based also
on the size of the register itself (which I think could be larger).

That scheme doesn't scale well to polynomial sizes, since there's
no guarantee that the order of the sizes is known at compile time.
This patch instead records an upper bound on the size required
by all users of a slot.  It also records the maximum alignment
requirement.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* function.h (spill_slot_alignment): Declare.
* function.c (spill_slot_alignment): New function.
* lra-spills.c (slot): Add align and size fields.
(assign_mem_slot): Use them in the call to assign_stack_local.
(add_pseudo_to_slot): Update the fields.
(assign_stack_slot_num_and_sort_pseudos): Initialise the fields.

diff --git a/gcc/function.c b/gcc/function.c
index 0b1d168..b009a0d 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -246,6 +246,14 @@ frame_offset_overflow (HOST_WIDE_INT offset, tree func)
   return FALSE;
 }
 
+/* Return the minimum spill slot alignment for a register of mode MODE.  */
+
+unsigned int
+spill_slot_alignment (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return STACK_SLOT_ALIGNMENT (NULL_TREE, mode, GET_MODE_ALIGNMENT (mode));
+}
+
 /* Return stack slot alignment in bits for TYPE and MODE.  */
 
 static unsigned int
diff --git a/gcc/function.h b/gcc/function.h
index e854c7f..6898f7f 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -567,6 +567,8 @@ extern HOST_WIDE_INT get_frame_size (void);
return FALSE.  */
 extern bool frame_offset_overflow (HOST_WIDE_INT, tree);
 
+extern unsigned int spill_slot_alignment (machine_mode);
+
 extern rtx assign_stack_local_1 (machine_mode, HOST_WIDE_INT, int, int);
 extern rtx assign_stack_local (machine_mode, HOST_WIDE_INT, int);
 extern rtx assign_stack_temp_for_type (machine_mode, HOST_WIDE_INT, tree);
diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c
index 6e044cd..9f1d5e9 100644
--- a/gcc/lra-spills.c
+++ b/gcc/lra-spills.c
@@ -104,6 +104,10 @@ struct slot
   /* Hard reg into which the slot pseudos are spilled. The value is
  negative for pseudos spilled into memory. */
   int hard_regno;
+  /* Maximum alignment required by all users of the slot.  */
+  unsigned int align;
+  /* Maximum size required by all users of the slot.  */
+  HOST_WIDE_INT size;
   /* Memory representing the all stack slot.  It can be different from
  memory representing a pseudo belonging to give stack slot because
  pseudo can be placed in a part of the corresponding stack slot.
@@ -128,51 +132,23 @@ assign_mem_slot (int i)
 {
   rtx x = NULL_RTX;
   machine_mode mode = GET_MODE (regno_reg_rtx[i]);
-  unsigned int inherent_size = PSEUDO_REGNO_BYTES (i);
-  unsigned int inherent_align = GET_MODE_ALIGNMENT (mode);
-  unsigned int max_ref_width = GET_MODE_SIZE (lra_reg_info[i].biggest_mode);
-  unsigned int total_size = MAX (inherent_size, max_ref_width);
-  unsigned int min_align = max_ref_width * BITS_PER_UNIT;
-  int adjust = 0;
+  HOST_WIDE_INT inherent_size = PSEUDO_REGNO_BYTES (i);
+  machine_mode wider_mode
+= (GET_MODE_SIZE (mode) >= GET_MODE_SIZE (lra_reg_info[i].biggest_mode)
+   ? mode : lra_reg_info[i].biggest_mode);
+  HOST_WIDE_INT total_size = GET_MODE_SIZE (wider_mode);
+  HOST_WIDE_INT adjust = 0;
 
   lra_assert (regno_reg_rtx[i] != NULL_RTX && REG_P (regno_reg_rtx[i])
  && lra_reg_info[i].nrefs != 0 && reg_renumber[i] < 0);
 
-  x = slots[pseudo_slots[i].slot_num].mem;
-
-  /* We can use a slot already allocated because it is guaranteed the
- slot provides both enough inherent space and enough total
- space.  */
-  if (x)
-;
-  /* Each pseudo has an inherent size which comes from its own mode,
- and a total size which provides room for paradoxical subregs
- which refer to the pseudo reg in wider modes.  We allocate a new
- slot, making sure that it has enough inherent space and total
- space.  */
-  else
+  unsigned int slot_num = pseudo_slots[i].slot_num;
+  x = slots[slot_num].mem;
+  if (!x)
 {
-  rtx stack_slot;
-
-  /* No known place to spill from => no slot to reuse.  */
-  x = assign_stack_local (mode, total_size,
- min_align > inherent_align
- || total_size > inherent_size ? -1 : 0);
-  stack_slot = x;
-  /* Cancel the big-endian correction done in assign_stack_local.
-Get the address of the beginning 

Re: Use simplify_gen_binary in canon_rtx

2016-11-15 Thread Jeff Law

On 11/15/2016 09:07 AM, Richard Sandiford wrote:

After simplifying the operands of a PLUS, canon_rtx checked only
for cases in which one of the simplified operands was a constant,
falling back to gen_rtx_PLUS otherwise.  This left the PLUS in a
non-canonical order if one of the simplified operands was
(plus (reg R1) (const_int X)); we'd end up with:

   (plus (plus (reg R1) (const_int Y)) (reg R2))

rather than:

   (plus (plus (reg R1) (reg R2)) (const_int Y))

Fixing this exposed new DSE opportunities on spu-elf in
gcc.c-torture/execute/builtins/strcat-chk.c but otherwise
it doesn't seem to have much practical effect.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* alias.c (canon_rtx): Use simplify_gen_binary.

OK.
jeff



[PATCH][PR libgfortran/78314] Fix ieee_support_halting

2016-11-15 Thread Szabolcs Nagy
When fpu trapping is enabled in libgfortran, the return value of
feenableexcept is not checked.  Glibc reports there if the operation
was unsuccessful which happens if the target has no trapping support.

There seems to be a separate api for checking trapping support:
ieee_support_halting, but it only checked if the exception status
flags are available, so check trapping support too by enabling
and disabling traps.

Updated the test that changed trapping to use ieee_support_halting,
(I think this is better than XFAILing the test case as it tests
for things that work without trapping support just fine.)

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

gcc/testsuite/
2016-11-15  Szabolcs Nagy  

PR libgfortran/78314
* gfortran.dg/ieee/ieee_6.f90: Use ieee_support_halting.

libgfortran/
2016-11-15  Szabolcs Nagy  

PR libgfortran/78314
* config/fpu-glibc.h (support_fpu_trap): Use feenableexcept.
diff --git a/gcc/testsuite/gfortran.dg/ieee/ieee_6.f90 b/gcc/testsuite/gfortran.dg/ieee/ieee_6.f90
index 8fb4f6f..43aa3bf 100644
--- a/gcc/testsuite/gfortran.dg/ieee/ieee_6.f90
+++ b/gcc/testsuite/gfortran.dg/ieee/ieee_6.f90
@@ -9,7 +9,7 @@
   implicit none
 
   type(ieee_status_type) :: s1, s2
-  logical :: flags(5), halt(5)
+  logical :: flags(5), halt(5), haltworks
   type(ieee_round_type) :: mode
   real :: x
 
@@ -18,6 +18,7 @@
   call ieee_set_flag(ieee_all, .false.)
   call ieee_set_rounding_mode(ieee_down)
   call ieee_set_halting_mode(ieee_all, .false.)
+  haltworks = ieee_support_halting(ieee_overflow)
 
   call ieee_get_status(s1)
   call ieee_set_status(s1)
@@ -46,7 +47,7 @@
   call ieee_get_rounding_mode(mode)
   if (mode /= ieee_to_zero) call abort
   call ieee_get_halting_mode(ieee_all, halt)
-  if ((.not. halt(1)) .or. any(halt(2:))) call abort
+  if ((haltworks .and. .not. halt(1)) .or. any(halt(2:))) call abort
 
   call ieee_set_status(s2)
 
@@ -58,7 +59,7 @@
   call ieee_get_rounding_mode(mode)
   if (mode /= ieee_to_zero) call abort
   call ieee_get_halting_mode(ieee_all, halt)
-  if ((.not. halt(1)) .or. any(halt(2:))) call abort
+  if ((haltworks .and. .not. halt(1)) .or. any(halt(2:))) call abort
 
   call ieee_set_status(s1)
 
@@ -79,6 +80,6 @@
   call ieee_get_rounding_mode(mode)
   if (mode /= ieee_to_zero) call abort
   call ieee_get_halting_mode(ieee_all, halt)
-  if ((.not. halt(1)) .or. any(halt(2:))) call abort
+  if ((haltworks .and. .not. halt(1)) .or. any(halt(2:))) call abort
 
 end
diff --git a/libgfortran/config/fpu-glibc.h b/libgfortran/config/fpu-glibc.h
index 6e505da..e254fb1 100644
--- a/libgfortran/config/fpu-glibc.h
+++ b/libgfortran/config/fpu-glibc.h
@@ -121,7 +121,43 @@ get_fpu_trap_exceptions (void)
 int
 support_fpu_trap (int flag)
 {
-  return support_fpu_flag (flag);
+  int exceptions = 0;
+  int old, ret;
+
+  if (!support_fpu_flag (flag))
+return 0;
+
+#ifdef FE_INVALID
+  if (flag & GFC_FPE_INVALID) exceptions |= FE_INVALID;
+#endif
+
+#ifdef FE_DIVBYZERO
+  if (flag & GFC_FPE_ZERO) exceptions |= FE_DIVBYZERO;
+#endif
+
+#ifdef FE_OVERFLOW
+  if (flag & GFC_FPE_OVERFLOW) exceptions |= FE_OVERFLOW;
+#endif
+
+#ifdef FE_UNDERFLOW
+  if (flag & GFC_FPE_UNDERFLOW) exceptions |= FE_UNDERFLOW;
+#endif
+
+#ifdef FE_DENORMAL
+  if (flag & GFC_FPE_DENORMAL) exceptions |= FE_DENORMAL;
+#endif
+
+#ifdef FE_INEXACT
+  if (flag & GFC_FPE_INEXACT) exceptions |= FE_INEXACT;
+#endif
+
+  old = fedisableexcept (exceptions);
+  if (old == -1)
+return 0;
+
+  ret = feenableexcept (exceptions) != -1;
+  feenableexcept (old);
+  return ret;
 }
 
 


Use simplify_gen_binary in canon_rtx

2016-11-15 Thread Richard Sandiford
After simplifying the operands of a PLUS, canon_rtx checked only
for cases in which one of the simplified operands was a constant,
falling back to gen_rtx_PLUS otherwise.  This left the PLUS in a
non-canonical order if one of the simplified operands was
(plus (reg R1) (const_int X)); we'd end up with:

   (plus (plus (reg R1) (const_int Y)) (reg R2))

rather than:

   (plus (plus (reg R1) (reg R2)) (const_int Y))

Fixing this exposed new DSE opportunities on spu-elf in
gcc.c-torture/execute/builtins/strcat-chk.c but otherwise
it doesn't seem to have much practical effect.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* alias.c (canon_rtx): Use simplify_gen_binary.

diff --git a/gcc/alias.c b/gcc/alias.c
index 486d06a..74df23c 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -1800,13 +1800,7 @@ canon_rtx (rtx x)
   rtx x1 = canon_rtx (XEXP (x, 1));
 
   if (x0 != XEXP (x, 0) || x1 != XEXP (x, 1))
-   {
- if (CONST_INT_P (x0))
-   return plus_constant (GET_MODE (x), x1, INTVAL (x0));
- else if (CONST_INT_P (x1))
-   return plus_constant (GET_MODE (x), x0, INTVAL (x1));
- return gen_rtx_PLUS (GET_MODE (x), x0, x1);
-   }
+   return simplify_gen_binary (PLUS, GET_MODE (x), x0, x1);
 }
 
   /* This gives us much better alias analysis when called from



Add a mem_alias_size helper class

2016-11-15 Thread Richard Sandiford
alias.c encodes memory sizes as follows:

size > 0: the exact size is known
size == 0: the size isn't known
size < 0: the exact size of the reference itself is known,
  but the address has been aligned via AND.  In this case
  "-size" includes the size of the reference and the worst-case
  number of bytes traversed by the AND.

This patch wraps this up in a helper class and associated
functions.  The new routines fix what seems to be a hole
in the old logic: if the size of a reference A was unknown,
offset_overlap_p would assume that it could conflict with any
other reference B, even if we could prove that B comes before A.

The fallback CONSTANT_P (x) && CONSTANT_P (y) case looked incorrect.
Either "c" is trustworthy as a distance between the two constants,
in which case the alignment handling should work as well there as
elsewhere, or "c" isn't trustworthy, in which case offset_overlap_p
is unsafe.  I think the latter's true; AFAICT we have no evidence
that "c" really is the distance between the two references, so using
it in the check doesn't make sense.

At this point we've excluded cases for which:

(a) the base addresses are the same
(b) x and y are SYMBOL_REFs, or SYMBOL_REF-based constants
wrapped in a CONST
(c) x and y are both constant integers

No useful cases should be left.  As things stood, we would
assume that:

  (mem:SI (const_int X))

could overlap:

  (mem:SI (symbol_ref Y))

but not:

  (mem:SI (const (plus (symbol_ref Y) (const_int 4

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


[ This patch is part of the SVE series posted here:
  https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ]

gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* alias.c (mem_alias_size): New class.
(mem_alias_size::mode): New function.
(mem_alias_size::exact_p): Likewise.
(mem_alias_size::max_size_known_p): Likewise.
(align_to): Likewise.
(alias_may_gt): Likewise.
(addr_side_effect_eval): Change type of size argument to
mem_alias_size.  Use plus_constant.
(offset_overlap_p): Change type of xsize and ysize to
mem_alias_size.  Use alias_may_gt.  Don't assume an overlap
between an access of unknown size and an access that's known
to be earlier than it.
(memrefs_conflict_p): Change type of xsize and ysize to
mem_alias_size.  Remove fallback CONSTANT_P (x) && CONSTANT_P (y)
handling.

diff --git a/gcc/alias.c b/gcc/alias.c
index 1ea2417..486d06a 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -148,7 +148,6 @@ struct GTY(()) alias_set_entry {
 };
 
 static int rtx_equal_for_memref_p (const_rtx, const_rtx);
-static int memrefs_conflict_p (int, rtx, int, rtx, HOST_WIDE_INT);
 static void record_set (rtx, const_rtx, void *);
 static int base_alias_check (rtx, rtx, rtx, rtx, machine_mode,
 machine_mode);
@@ -176,11 +175,104 @@ static struct {
   unsigned long long num_disambiguated;
 } alias_stats;
 
+/* Represents the size of a memory reference during alias analysis.
+   There are three possibilities:
 
-/* Set up all info needed to perform alias analysis on memory references.  */
+   (1) the size needs to be treated as completely unknown
+   (2) the size is known exactly and no alignment is applied to the address
+   (3) the size is known exactly but an alignment is applied to the address
+
+   (3) is used for aligned addresses of the form (and X (const_int -N)),
+   which can subtract something in the range [0, N) from the original
+   address X.  We handle this by subtracting N - 1 from X and adding N - 1
+   to the size, so that the range spans all possible bytes.  */
+class mem_alias_size {
+public:
+  /* Return an unknown size (case (1) above).  */
+  static mem_alias_size unknown () { return (HOST_WIDE_INT) 0; }
+
+  /* Return an exact size (case (2) above).  */
+  static mem_alias_size exact (HOST_WIDE_INT size) { return size; }
+
+  /* Return a worst-case size after alignment (case (3) above).
+ SIZE includes the maximum adjustment applied by the alignment.  */
+  static mem_alias_size aligned (HOST_WIDE_INT size) { return -size; }
+
+  /* Return the size of memory reference X.  */
+  static mem_alias_size mem (const_rtx x) { return MEM_SIZE (x); }
+
+  static mem_alias_size mode (machine_mode m);
+
+  /* Return true if the exact size of the memory is known.  */
+  bool exact_p () const { return m_value > 0; }
+  bool exact_p (HOST_WIDE_INT *) const;
+
+  /* Return true if an upper bound on the memory size is known;
+ i.e. not case (1) above.  */
+  bool max_size_known_p () const { return m_value != 0; }
+  bool max_size_known_p (HOST_WIDE_INT *) const;
+
+  /* Return true if the size is subject to alignment.  */
+  bool aligned_p () const { return m_value < 0; }
+
+private:
+  mem_alias_size 

  1   2   >