Re: [PATCH-v3] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-07-04 Thread Sebastian Huber



On 04/07/17 15:38, Daniel Cederman wrote:



On 2017-06-30 07:11, Sebastian Huber wrote:

On 29/06/17 18:05, David Miller wrote:


From: Daniel Cederman 
Date: Thu, 29 Jun 2017 17:15:43 +0200


I'm not thrilled with this, it's undocumented, the other workaround
don't have
it and I don't think that we really need it.

The B2BST errata workaround requires more changes to assembler
routines commonly used by operating systems, such as for example
register window handling, than what the UT699 workaround needed. It
would be nice to have a way to only enable these modification when the
-mfix- flag is used. The alternative would be to provide a define
directly on the compiler command line in conjunction with -mfix
flag. But if more changes are required later on it would be good to
have the define more closely tied to the flag to minimize the number
of changes to Makefiles and etc.

Personally, I have never seen compiler based CPP defines as ever being
useful for tailoring OS assembler code.  Ever.

In most cases you will want to support several families of CPUs and
therefore sort out the individual cpu support assembler routines
internally in the kernel sources.


This depends on the operating system you use. For some  embedded 
systems where the application and the operating system are one 
executable it is quite common to use compiler provided defines in 
assembly code.


For example:

https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/machine/arm/memcpy-armv7a.S;h=cd7962e075a30cb90ec073d77b177c3536429b9b;hb=HEAD 



For a software development kit, the run-time libraries are built for 
a set of multilibs. Each assembler file may use multilib specific 
compiler defines, e.g. floating point unit present or not, errata XYZ 
present or not, etc.




We can drop the define if necessary, but we would like to keep the two 
flags. Would that be OK to apply?




If I read the GRLIB-TN-0009 correctly, then we have to adjust the 
context switch, interrupt processing and window management code in 
RTEMS. So, we definitely need this define.


Since this errata affects actually the GRLIB, which is used in many 
products, should we really start adding -mfix-some-processor options? 
The GRLIB affected by this errata may be used in custom designs as well. 
I suggest to simply add a


-mfix-leon3ft-b2bst

option which enables the workaround and adds a builtin define

#define __FIX_LEON3FT_B2BST

The documentation for this option should mention this and also reference 
the GRLIB-TN-0009 and maybe the affected known processors.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [Patch, fortran] PR34640 - ICE when assigning item of a derived-component to a pointer

2017-07-04 Thread Thomas Koenig

Hi Paul,

first, this patch looks really good - it certainly fixes a lot of the
ICEs.

I have a few points (a part already mentioned in private mail).

Consider the test case:

module x
  use iso_c_binding
  implicit none
  type foo
 complex :: c
 integer :: i
  end type foo
contains
  subroutine printit(c)
complex, pointer, dimension(:) :: c
integer :: i
integer(kind=8) :: a
a = transfer(c_loc(c(1)),a)
print '(A,Z16)',"Adrress of first element is ", a
  end subroutine printit

  subroutine p2(c)
complex, dimension(:), target :: c
integer :: i
integer(kind=8) :: a
a = transfer(c_loc(c(1)),a)
print '(A,Z16)',"Adrress of first element is ", a
  end subroutine p2

end module x

program main
  use x
  use iso_c_binding
  implicit none
  type(foo), dimension(5), target :: a
  integer :: i
  complex, dimension(:), pointer :: pc
  complex, dimension(4), target :: v
  integer(kind=8) :: s1, s2
  a%i = 0
  do i=1,5
 a(i)%c = cmplx(i**2,i)
  end do
  pc => a%c
  print *,"Pointer to complex passed to pointer argument:"
  call printit(pc)
  print *,"Pointer to complex passed to array argument"
  call p2(pc)
  s1 = transfer(c_loc(a(1)),s1)
  print '(A,Z16,/)',"Main program: Address of first element: ", s1

  pc => v
  print *,"Pointer to complex passed to pointer argument:"
  call printit(pc)
  print *,"Complex array passed to array argument"
  call p2(v)
  s1 = transfer(c_loc(v(1)),s1)
  print '(A,Z16)',"Address of first element: ", s1
end program main

This yields:

 Pointer to complex passed to pointer argument:
Adrress of first element is  10021C90FF0
 Pointer to complex passed to array argument
Adrress of first element is  10021C90FF0
Main program: Address of first element: 3FFFCEC599A4

 Pointer to complex passed to pointer argument:
Adrress of first element is  10021C90FF0
 Complex array passed to array argument
Adrress of first element is 3FFFCEC59A20
Address of first element: 3FFFCEC59A20

It appears that a temporary is created when passing
a pointer array to a pointer array dummy argument.
I think this would be wrong code, because the
subroutine could stash away the pointer and later
access data through it.

The same seems to happen when passing a pointer to
a normal argument - a temporary copy appears to be made.

While this code is correct, I am wodering if it
is intentional.  Is the span field in the array
descriptor used in the called subroutine?

Regards

Thomas


[PATCH, i386]: Fix PR 81300, -fpeephole2 breaks __builtin_ia32_sbb_u64, _subborrow_u64 on AMD64

2017-07-04 Thread Uros Bizjak
Hello!

Attached patch tightens peephole2 condition to prevent unwanted
flags_reg clobbering by insn patterns, emitted by ix86_expand_clear.

2017-07-04  Uros Bizjak  

PR target/81300
* config/i386/i386.md (setcc + movzbl/and to xor + setcc peepholes):
Require dead FLAGS_REG at the beginning of a peephole.

testsuite/ChangeLog:

2017-07-04  Uros Bizjak  

PR target/81300
* gcc.target/i386/pr81300.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be backported to release branches.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 249971)
+++ config/i386/i386.md (working copy)
@@ -11754,7 +11754,8 @@
(zero_extend (match_dup 1)))]
   "(peep2_reg_dead_p (3, operands[1])
 || operands_match_p (operands[1], operands[3]))
-   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
+   && ! reg_overlap_mentioned_p (operands[3], operands[0])
+   && peep2_regno_dead_p (0, FLAGS_REG)"
   [(set (match_dup 4) (match_dup 0))
(set (strict_low_part (match_dup 5))
(match_dup 2))]
@@ -11775,7 +11776,8 @@
   "(peep2_reg_dead_p (3, operands[1])
 || operands_match_p (operands[1], operands[3]))
&& ! reg_overlap_mentioned_p (operands[3], operands[0])
-   && ! reg_set_p (operands[3], operands[4])"
+   && ! reg_set_p (operands[3], operands[4])
+   && peep2_regno_dead_p (0, FLAGS_REG)"
   [(parallel [(set (match_dup 5) (match_dup 0))
  (match_dup 4)])
(set (strict_low_part (match_dup 6))
@@ -11797,7 +11799,8 @@
   (and:SI (match_dup 3) (const_int 255)))
  (clobber (reg:CC FLAGS_REG))])]
   "REGNO (operands[1]) == REGNO (operands[3])
-   && ! reg_overlap_mentioned_p (operands[3], operands[0])"
+   && ! reg_overlap_mentioned_p (operands[3], operands[0])
+   && peep2_regno_dead_p (0, FLAGS_REG)"
   [(set (match_dup 4) (match_dup 0))
(set (strict_low_part (match_dup 5))
(match_dup 2))]
@@ -11819,7 +11822,8 @@
   "(peep2_reg_dead_p (3, operands[1])
 || operands_match_p (operands[1], operands[3]))
&& ! reg_overlap_mentioned_p (operands[3], operands[0])
-   && ! reg_set_p (operands[3], operands[4])"
+   && ! reg_set_p (operands[3], operands[4])
+   && peep2_regno_dead_p (0, FLAGS_REG)"
   [(parallel [(set (match_dup 5) (match_dup 0))
  (match_dup 4)])
(set (strict_low_part (match_dup 6))
Index: testsuite/gcc.target/i386/pr81300.c
===
--- testsuite/gcc.target/i386/pr81300.c (nonexistent)
+++ testsuite/gcc.target/i386/pr81300.c (working copy)
@@ -0,0 +1,30 @@
+/* PR target/81300 */
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+int
+__attribute__((noinline, noclone))
+foo (void)
+{
+  unsigned long long _discard = 0, zero = 0, maxull = 0;
+  unsigned char zero1 = __builtin_ia32_addcarryx_u64 (0, 0, 0, &_discard);
+  unsigned char zero2 = __builtin_ia32_addcarryx_u64 (zero1, 0, 0, );
+  __builtin_ia32_sbb_u64 (0x0, 2, -1, &_discard);
+  unsigned char one = __builtin_ia32_sbb_u64 (0, zero, 1, );
+  unsigned long long x = __builtin_ia32_sbb_u64 (one, zero2, 0, &_discard);
+
+  unsigned long long z1 = 0;
+  __asm__ ("mov{q}\t{%1, %0|%0, %1}" : "+r" (z1) : "r" (x));
+  unsigned long long z2 = 3;
+  __asm__ ("mov{q}\t{%1, %0|%0, %1}" : "+r" (z2) : "r" (x));
+
+  return 1 - (z1 | z2);
+}
+
+int main ()
+{
+  if (foo ())
+__builtin_abort ();
+
+  return 0;
+}


Re: [PATCH, i386] Fix PR 81294, _subborrow_u64 argument order inconsistent with intrinsic reference

2017-07-04 Thread Jakub Jelinek
On Tue, Jul 04, 2017 at 10:41:26PM +0200, Uros Bizjak wrote:
> Hello!
> 
> Apparently, Intel changed operand order with the new intrinsic
> reference release version. Attached patch updates gcc intrinsic
> headers accordingly.
> 
> 2017-07-04  Uros Bizjak  
> 
> PR target/81294
> * config/i386/adxintrin.h (_subborrow_u32): Swap _X and _Y
> arguments in the call to __builtin_ia32_sbb_u32.
> (_subborrow_u64): Swap _X and _Y arguments in the call to
> __builtin_ia32_sbb_u64.
> 
> testsuite/ChangeLog:
> 
> 2017-07-04  Uros Bizjak  
> 
> PR target/81249
> * gcc.target/i386/adx_addcarryx32-2.c (adx_test): Swap
> x and y arguments in the call to _subborrow_u32.
> * gcc.target/i386/adx_addcarryx64-2.c (adx_test): Swap
> x and y arguments in the call to _subborrow_u64.
> * gcc.target/i386/pr81294-1.c: New test.
> * gcc.target/i386/pr81294-2.c: Ditto.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> Committed to mainline SVN, willbe backported to release branches.

When it goes into release branches, it needs changes.html changes
for each of those to make users aware of that.
7.2 will have rc after mid July, so if you want it in 7.2, it should be
committed before that.  5.5 + 5.x branch closing will be around that time
too.

Jakub


[PATCH, i386] Fix PR 81294, _subborrow_u64 argument order inconsistent with intrinsic reference

2017-07-04 Thread Uros Bizjak
Hello!

Apparently, Intel changed operand order with the new intrinsic
reference release version. Attached patch updates gcc intrinsic
headers accordingly.

2017-07-04  Uros Bizjak  

PR target/81294
* config/i386/adxintrin.h (_subborrow_u32): Swap _X and _Y
arguments in the call to __builtin_ia32_sbb_u32.
(_subborrow_u64): Swap _X and _Y arguments in the call to
__builtin_ia32_sbb_u64.

testsuite/ChangeLog:

2017-07-04  Uros Bizjak  

PR target/81249
* gcc.target/i386/adx_addcarryx32-2.c (adx_test): Swap
x and y arguments in the call to _subborrow_u32.
* gcc.target/i386/adx_addcarryx64-2.c (adx_test): Swap
x and y arguments in the call to _subborrow_u64.
* gcc.target/i386/pr81294-1.c: New test.
* gcc.target/i386/pr81294-2.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN, willbe backported to release branches.

Uros.
Index: config/i386/adxintrin.h
===
--- config/i386/adxintrin.h (revision 249971)
+++ config/i386/adxintrin.h (working copy)
@@ -33,7 +33,7 @@
 _subborrow_u32 (unsigned char __CF, unsigned int __X,
unsigned int __Y, unsigned int *__P)
 {
-  return __builtin_ia32_sbb_u32 (__CF, __Y, __X, __P);
+  return __builtin_ia32_sbb_u32 (__CF, __X, __Y, __P);
 }
 
 extern __inline unsigned char
@@ -58,7 +58,7 @@
 _subborrow_u64 (unsigned char __CF, unsigned long long __X,
unsigned long long __Y, unsigned long long *__P)
 {
-  return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
+  return __builtin_ia32_sbb_u64 (__CF, __X, __Y, __P);
 }
 
 extern __inline unsigned char
Index: testsuite/gcc.target/i386/adx-addcarryx32-2.c
===
--- testsuite/gcc.target/i386/adx-addcarryx32-2.c   (revision 249971)
+++ testsuite/gcc.target/i386/adx-addcarryx32-2.c   (working copy)
@@ -44,9 +44,9 @@
   sum_ref = 0x0;
 
   /* X = 0x0001, Y = 0x, C = 0.  */
-  c = _subborrow_u32 (c, x, y, );
+  c = _subborrow_u32 (c, y, x, );
   /* X = 0x, Y = 0x, C = 1.  */
-  c = _subborrow_u32 (c, x, y, );
+  c = _subborrow_u32 (c, y, x, );
   /* X = 0x, Y = 0x, C = 1.  */
 
   if (x != sum_ref)
Index: testsuite/gcc.target/i386/adx-addcarryx64-2.c
===
--- testsuite/gcc.target/i386/adx-addcarryx64-2.c   (revision 249971)
+++ testsuite/gcc.target/i386/adx-addcarryx64-2.c   (working copy)
@@ -44,9 +44,9 @@
   sum_ref = 0x0LL;
 
   /* X = 0x0001, Y = 0x, C = 0.  */
-  c = _subborrow_u64 (c, x, y, );
+  c = _subborrow_u64 (c, y, x, );
   /* X = 0x, Y = 0x, C = 1.  */
-  c = _subborrow_u64 (c, x, y, );
+  c = _subborrow_u64 (c, y, x, );
   /* X = 0x, Y = 0x, C = 1.  */
 
   if (x != sum_ref)
Index: testsuite/gcc.target/i386/pr81294-1.c
===
--- testsuite/gcc.target/i386/pr81294-1.c   (nonexistent)
+++ testsuite/gcc.target/i386/pr81294-1.c   (working copy)
@@ -0,0 +1,29 @@
+/* PR target/81294 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include 
+
+int
+main ()
+{
+  volatile unsigned char c;
+  unsigned int x;
+  volatile unsigned int y, sum_ref;
+
+  c = 0;
+  x = 1;
+  y = 0;
+  sum_ref = 0x0;
+
+  /* X = 0x0001, Y = 0x, C = 0.  */
+  c = _subborrow_u32 (c, y, x, );
+  /* X = 0x, Y = 0x, C = 1.  */
+  c = _subborrow_u32 (c, y, x, );
+  /* X = 0x, Y = 0x, C = 1.  */
+
+  if (x != sum_ref)
+__builtin_abort ();
+
+  return 0;
+}
Index: testsuite/gcc.target/i386/pr81294-2.c
===
--- testsuite/gcc.target/i386/pr81294-2.c   (nonexistent)
+++ testsuite/gcc.target/i386/pr81294-2.c   (working copy)
@@ -0,0 +1,28 @@
+/* PR target/81294 */
+/* { dg-do run { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+#include 
+
+int main ()
+{
+  volatile unsigned char c;
+  unsigned long long x;
+  volatile unsigned long long y, sum_ref;
+
+  c = 0;
+  x = 1LL;
+  y = 0LL;
+  sum_ref = 0x0LL;
+
+  /* X = 0x0001, Y = 0x, C = 0.  */
+  c = _subborrow_u64 (c, y, x, );
+  /* X = 0x, Y = 0x, C = 1.  */
+  c = _subborrow_u64 (c, y, x, );
+  /* X = 0x, Y = 0x, C = 1.  */
+
+  if (x != sum_ref)
+__builtin_abort ();
+
+  return 0;
+}


Re: [patch,avr] Add support for devices with flash accessible by LD.

2017-07-04 Thread Richard Sandiford
Georg-Johann Lay  writes:
> Hi,
>
> This patch adds support for devices that can access flash memory
> by LD* instructions, hence there is no need to put .rodata in RAM.
>
> The default linker script for the new multilib versions already
> supports this feature, it's similar to avrtiny, cf.
>
> https://sourceware.org/PR21472
>
> This patch does the following:
>
> * Add multilib variants avrxmega3 and avrxmega3/short-calls.
>
> * Add new option -mshort-calls for multilib selection between
>devices with <= 8KiB flash and > 8KiB flash.
>
> * Add specs handling for -mshort-calls:  The compiler knows
>if this option is needed or not appropriate (similar to -msp8).
>
> * Add new ISA feature AVR_ISA_RCALL for multilib selection
>via -mshort-calls.
>
> * Add a new row to architecture description that contains the
>start address of flash memory in the RAM address range.
>(The actual value is not needed).
>
> * For devices with flash in RAM space, don't let .rodata
>objects trigger need for __do_copy_data.
>
> * Add some devices.
>
> * Add configure test for Binutils PR21472.

Sorry if this has already been discussed, but it's useful to be
able to do things like:

  .../configure --target=avr-elf --with-cpu=arc700
  make -j... all-gcc

as a basic sanity test of a pan-target patch.  (I usually do
before-and-after assembly comparisons too if no changes are
expected.)  The way the configure test is written means that
it's no longer possible to do this without first building a
trunk version of binutils for avr-elf.

Thanks,
Richard


Re: [PATHC][x86] Scalar mask and round RTL templates

2017-07-04 Thread Kirill Yukhin
Hello Sebastian,
On 23 Jun 09:00, Peryt, Sebastian wrote:
> Hi,
> 
> This patch adds three extra RTL meta-templates for scalar round and mask. 
> Additionally fixes errors caused by previous mask and round usage in some of 
> the intrinsics that I found.
Could you pls point which intrinsics did you fixed (or which errors)?
I see only MD changes in your patch.

> 
> 2017-06-23  Sebastian Peryt  
> 
> gcc/
>   * config/i386/subst.md (mask_scalar, round_scalar, 
> round_saeonly_scalar): New templates.
I'd call it meta-templates.
>   (mask_scalar_name, mask_scalar_operand3, round_scalar_name,
>   round_scalar_mask_operand3, round_scalar_mask_op3,
>   round_scalar_constraint, round_scalar_prefix, round_saeonly_scalar_name,
>   round_saeonly_scalar_mask_operand3, round_saeonly_scalar_mask_op3,
>   round_saeonly_scalar_constraint, round_saeonly_scalar_prefix): New 
> subst attribute.
>   * config/i386/sse.md
>   (_vm3): Renamed to ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to 
> ...
>   _vm3 
> ... this.
>   (_vm3): Renamed to ...
>   _vm3 ... 
> this.
>   (v\t{%2, %1, 
> %0|%0, %1, %2}): Changed 
> to ...
>   v\t{%2, 
> %1, %0|%0, %1, 
> %2} ... this.
>   (v\t{%2, %1, 
> %0|%0, %1, %2}): Changed 
> to ...
>   v\t{%2, 
> %1, %0|%0, %1, 
> %2} ... this.
>   (v\t{%2, %1, 
> %0|%0, %1, %2}): 
> Changed to ...
>   
> v\t{%2, %1, 
> %0|%0, %1, 
> %2} ... this.
We need to obey conventions. Pls break long lines here.

--
Thanks, K
> 
> Is it ok for trunk?
> 
> Thanks,
> Sebastian




Re: [PATCH] [AArch64] Fix PR71112

2017-07-04 Thread Ramana Radhakrishnan
On Wed, Nov 23, 2016 at 5:25 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> Please find attached the patch that fixes PR71112.
>
> The current implementation that handles SYMBOL_SMALL_GOT_28K in
> aarch64_load_symref_appropriately access the high part of RTX for Big-Endian
> mode which results in ICE for ILP32.
>
> The attached patch modifies it by accessing the lower part for both Endian
> and fixes the issue.
>
> Please review the patch and let me know if its okay?
>

PR71112 is still open - should this be backported to GCC-6 ?


regards
Ramana

>
> 2016-11-23  Andrew PInski  
>
> gcc
> * config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
> Access the lower part of RTX appropriately.
>
> gcc/testsuite
> * gcc.target/aarch64/pr71112.c : New Testcase.


Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Ramana Radhakrishnan
On Tue, Jul 4, 2017 at 2:53 PM, Michael Matz  wrote:
> Hi,
>
> On Tue, 4 Jul 2017, Wilco Dijkstra wrote:
>
>> > You'll probably also have to set GNATBIND and GNATMAKE to the
>> > appropriately suffixed variants.  Just saying, because that's what I'm
>> > usually forgetting and end up with strange errors :)
>>
>> Configure seems to be able to find gnatbind/gnatmake as they are in /usr/bin.
>
> Sure, but if you use gcc-5 as compiler you'll want to use gnatbind-5 as
> well, not gnatbind (which presumably is from the default 4.8 compiler you
> have).
>
>> Compiling a simple .adb file as suggested gives:
>>
>> > /usr/bin/gcc-5 tmp.adb -S
>> gcc-5: error trying to exec 'gnat1': execvp: No such file or directory
>>
>> Is this part of the GCC driver, GNAT or something else that needs to be
>> installed first?
>
> Don't know the package naming convention on Ubuntu (?).  For us it's e.g.
> gcc5-ada that you'd need to install in addition.  google tells me gnat-5
> should be it.

gnat-7 was on the machine instead of gnat-5 and that was probably the
cause of the fun and games.

Ramana

>
>
> Ciao,
> Michael.


Re: [PATCH GCC][2/2]Refine CFG and bound information for split loops

2017-07-04 Thread Bin.Cheng
On Fri, Jun 30, 2017 at 5:09 PM, Jeff Law  wrote:
> On 06/14/2017 07:08 AM, Bin Cheng wrote:
>> Hi,
>> Loop split currently generates below control flow graph for split loops:
>> +
>> +   .-- guard1  --.
>> +   v v
>> + pre1(loop1).-->pre2(loop2)
>> +  | ||
>> +.--->h1 |   h2<.
>> +| | || |
>> +|ex1---.|   .---ex2|
>> +|/ v|   | \|
>> +'---l1 guard2---'   | l2---'
>> +   ||
>> +   ||
>> +   '--->join<---'
>> +
>> In which,
>> +   LOOP2 is the second loop after split, GUARD1 and GUARD2 are the two bbs
>> +   controling if loop2 should be executed.
>>
>> Take added test case as an example, the second loop only iterates for 1 time,
>> as a result, the CFG and loop niter bound information can be refined.  In 
>> general,
>> guard1/guard2 can be folded to true/false if loop2's niters is known at 
>> compilation
>> time.  This patch does such improvement by analyzing and refining niters of
>> loop2; as well as using that information to simplify CFG.  With this patch,
>> the second split loop as in test can be completely unrolled by later passes.
> In your testcase the second loop iterates precisely once, right?  In
> fact, we know it always executes precisely one time regardless of the
> incoming value of LEN.  That's the property you're trying to detect and
> exploit, right?
>
>
>>
>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>
>> Thanks,
>> bin
>> 2017-06-12  Bin Cheng  
>>
>>   * tree-ssa-loop-split.c (compute_new_first_bound): Compute and
>>   return bound information for the second split loop.
>>   (adjust_loop_split): New function.
>>   (split_loop): Update calls to above functions.
>>
>> gcc/testsuite/ChangeLog
>> 2017-06-12  Bin Cheng  
>>
>>   * gcc.dg/loop-split-1.c: New test.
>>
>>
>> 0002-lsplit-refine-cfg-niter-bound-20170601.txt
>>
>>
>> From 61855c74c7db6178008f007198aedee9a03f13e6 Mon Sep 17 00:00:00 2001
>> From: amker 
>> Date: Sun, 4 Jun 2017 02:26:34 +0800
>> Subject: [PATCH 2/2] lsplit-refine-cfg-niter-bound-20170601.txt
>>
>> ---
>>  gcc/testsuite/gcc.dg/loop-split-1.c |  16 +++
>>  gcc/tree-ssa-loop-split.c   | 197 
>> 
>>  2 files changed, 194 insertions(+), 19 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/loop-split-1.c
>>
>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> index f8fe8e6..73c7dc2 100644
>> --- a/gcc/tree-ssa-loop-split.c
>> +++ b/gcc/tree-ssa-loop-split.c
>> @@ -387,28 +386,41 @@ connect_loops (struct loop *loop1, struct loop *loop2)
>>
>> Depending on the direction of the IVs and if the exit tests
>> are strict or non-strict we need to use MIN or MAX,
>> -   and add or subtract 1.  This routine computes newend above.  */
>> +   and add or subtract 1.  This routine computes newend above.
>> +
>> +   After computing the new bound (on j), we may be able to compare the
>> +   first loop's iteration space against the original loop's.  If it is
>> +   comparable at compilation time, we may be able to compute the niter
>> +   bound of the second loop.  Record the second loop's iteration bound
>> +   to SECOND_LOOP_NITER_BOUND which has below meaning:
>> +
>> + -3: Don't know anything about the second loop;
>> + -2: The second loop must not be executed;
>> + -1: The second loop must be executed, but niter bound is unknown;
>> +  n: The second loop must be executed, niter bound is n (>= 0);
>> +
>> +   Note we compute niter bound for the second loop's latch.  */
> How hard would it be to have a test for each of the 4 cases above?  I
> don't always ask for this, but ISTM this is a good chance to make sure
> most of the new code gets covered by testing.
>
> Does case -2 really occur in practice or are you just being safe?  ISTM
> that if case -2 occurs then the loop shouldn't have been split to begin
> with.  If this happens in practice, do we clean up all the dead code if
> the bounds are set properly.
>
> Similarly for N = 0, presumably meaning the second loop iterates exactly
> once, but never traverses its backedge, is there any value in changing
> the loop latch test so that it's always false?  Or will that get cleaned
> up later?
>
>
>>
>>  static tree
>> -compute_new_first_bound (gimple_seq *stmts, struct tree_niter_desc *niter,
>> -  tree border,
>> -  enum tree_code guard_code, tree guard_init)
>> +compute_new_first_bound (struct tree_niter_desc *niter, tree border,
>> +  enum tree_code guard_code, tree guard_init,
>> +  tree step, HOST_WIDE_INT *second_loop_niter_bound)
>>  {

Re: [PATCH GCC8][33/33]Fix PR69710/PR68030 by reassociate vect base address and a simple CSE pass

2017-07-04 Thread Bin.Cheng
On Mon, Jul 3, 2017 at 5:12 PM, Jeff Law  wrote:
> On 04/18/2017 04:54 AM, Bin Cheng wrote:
>> Hi,
>> This is the same patch posted at 
>> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02000.html,
>> after rebase against this patch series.  This patch was blocked because 
>> without this patch
>> series, it could generate worse code on targets with limited addressing mode 
>> support, like
>> AArch64.
>> There was some discussion about alternative fix for PRs, but after thinking 
>> twice I think
>> this fix is in the correct direction.  A CSE interface is useful to clean up 
>> code generated
>> in vectorizer, and we should improve this CSE interface into a region base 
>> one.  for the
>> moment, optimal code is not generated on targets like x86, I believe it's 
>> because the CSE
>> is weak and doesn't cover all basic blocks generated by vectorizer, the 
>> issue should be
>> fixed if region-based CSE is implemented.
>> Is it OK?
>>
>> Thanks,
>> bin
>> 2017-04-11  Bin Cheng  
>>
>>   PR tree-optimization/68030
>>   PR tree-optimization/69710
>>   * tree-ssa-dom.c (cse_bbs): New function.
>>   * tree-ssa-dom.h (cse_bbs): New declaration.
>>   * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
>>   Re-associate address by splitting constant offset.
>>   (vect_create_data_ref_ptr, vect_setup_realignment): Record changed
>>   basic block.
>>   * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Record
>>   changed basic block.
>>   * tree-vectorizer.c (tree-ssa-dom.h): Include header file.
>>   (changed_bbs): New variable.
>>   (vectorize_loops): Allocate and free CHANGED_BBS.  Call cse_bbs.
>>   * tree-vectorizer.h (changed_bbs): New declaration.
>>
> So are you still interested in moving this forward Bin?  I know you did
> a minor update in response to Michael Meissner's problems.  Is there
> another update planned?
Hi,
Yes I want to move this forward, but better with confirmation whether
the regression is real or just u-arch hazard.  OTOH, given the
proceeding 32 patches have been applied, I think there is no need to
hold this one.  It is proceeding patch rather than this one causes
regression I guess.

>
> THe only obvious thing I'd suggest changing in the DOM interface is to
> have continue to walk the dominator tree, but do nothing for blocks that
> are not in changed_bbs.  That way you walk blocks in changed_bbs in
> dominator order rather than in bb->index order.
I will update patch as you suggested, but may take some time.

Thanks,
bin
>
> Jeff
>


Re: [PATCH] Fix -fcompare-debug issues caused by recent VRP assert expr sorting changes (PR debug/81278)

2017-07-04 Thread Jeff Law
On 07/04/2017 06:41 AM, Jakub Jelinek wrote:
> On Tue, Jul 04, 2017 at 02:00:13PM +0200, Richard Biener wrote:
>>> That was intentional.  If a->e != NULL, then we know that b->e != NULL,
>>> because we have
>>>   else if (a->e != NULL && b->e == NULL)
>>> return -1;
>>> earlier.  Similarly, if a->e == NULL, then we know that b-> == NULL, because
>>> we have:
>>>   if (a->e == NULL && b->e != NULL)
>>> return 1;
>>> earlier.
>>
>> Ah, ok.  Twisty ;)  I suppose jump threading will have eliminated
>> the extra test.
> 
> In the first case maybe, I doubt it would do that after the
> iterative_hash_expr calls which are likely not pure.
Correct.  It'll have to assume that the memory objects changed.

Jeff


Re: [PATCH][AArch64] Fix strict aliasing issue in gcc.target/aarch64/simd/vminmaxnm_1.c

2017-07-04 Thread Richard Earnshaw (lists)
On 04/07/17 15:39, Kyrill Tkachov wrote:
> Hi all,
> 
> While doing some unrelated work the
> gcc.target/aarch64/simd/vminmaxnm_1.c testcase started failing for me.
> Upon investigation it turns out that it breaks the C strict aliasing
> rules in the CHECK macro by casting
> a pointer to an incompatible type and dereferencing it. GCC even warns
> about it if compiled with -Wstrict-aliasing.
> 
> This patch fixes the testcase by making it use memcmp to compare the
> vector elements.
> This avoids the undefined behaviour.
> 
> The testcase still passes on trunk.
> Ok to commit?
> 
> Thanks,
> Kyrill
> 
> 2017-07-04  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/simd/vminmaxnm_1.c: Fix strict aliasing issues.
> 

OK.

R.

> vmaxmin-aliasing.patch
> 
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
> index 192bad9..8fd4281 100644
> --- a/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
> @@ -7,12 +7,10 @@
>  
>  extern void abort ();
>  
> -#define CHECK(T, N, R, E) \
> +#define CHECK(R, E) \
>{\
> -int i = 0;\
> -for (; i < N; i++)\
> -  if (* (T *) [i] != * (T *) [i])\
> - abort ();\
> +if (__builtin_memcmp (, , sizeof (R)) != 0)\
> +  abort ();\
>}
>  
>  int
> @@ -26,8 +24,8 @@ main (int argc, char **argv)
>float32x2_t f32x2_ret_minnm  = vminnm_f32 (f32x2_input1, f32x2_input2);
>float32x2_t f32x2_ret_maxnm  = vmaxnm_f32 (f32x2_input1, f32x2_input2);
>  
> -  CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
> -  CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
> +  CHECK (f32x2_ret_minnm, f32x2_exp_minnm);
> +  CHECK (f32x2_ret_maxnm, f32x2_exp_maxnm);
>  
>/* v{min|max}nm_f32 NaN.  */
>f32x2_input1 = vdup_n_f32 (__builtin_nanf (""));
> @@ -37,8 +35,8 @@ main (int argc, char **argv)
>f32x2_ret_minnm  = vminnm_f32 (f32x2_input1, f32x2_input2);
>f32x2_ret_maxnm  = vmaxnm_f32 (f32x2_input1, f32x2_input2);
>  
> -  CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
> -  CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
> +  CHECK (f32x2_ret_minnm, f32x2_exp_minnm);
> +  CHECK (f32x2_ret_maxnm, f32x2_exp_maxnm);
>  
>/* v{min|max}nmq_f32 normal.  */
>float32x4_t f32x4_input1 = vdupq_n_f32 (-1024.0);
> @@ -48,8 +46,8 @@ main (int argc, char **argv)
>float32x4_t f32x4_ret_minnm  = vminnmq_f32 (f32x4_input1, f32x4_input2);
>float32x4_t f32x4_ret_maxnm  = vmaxnmq_f32 (f32x4_input1, f32x4_input2);
>  
> -  CHECK (uint32_t, 4, f32x4_ret_minnm, f32x4_exp_minnm);
> -  CHECK (uint32_t, 4, f32x4_ret_maxnm, f32x4_exp_maxnm);
> +  CHECK (f32x4_ret_minnm, f32x4_exp_minnm);
> +  CHECK (f32x4_ret_maxnm, f32x4_exp_maxnm);
>  
>/* v{min|max}nmq_f32 NaN.  */
>f32x4_input1 = vdupq_n_f32 (-__builtin_nanf (""));
> @@ -59,8 +57,8 @@ main (int argc, char **argv)
>f32x4_ret_minnm  = vminnmq_f32 (f32x4_input1, f32x4_input2);
>f32x4_ret_maxnm  = vmaxnmq_f32 (f32x4_input1, f32x4_input2);
>  
> -  CHECK (uint32_t, 4, f32x4_ret_minnm, f32x4_exp_minnm);
> -  CHECK (uint32_t, 4, f32x4_ret_maxnm, f32x4_exp_maxnm);
> +  CHECK (f32x4_ret_minnm, f32x4_exp_minnm);
> +  CHECK (f32x4_ret_maxnm, f32x4_exp_maxnm);
>  
>/* v{min|max}nm_f64 normal.  */
>float64x1_t f64x1_input1 = vdup_n_f64 (1.23);
> @@ -69,16 +67,16 @@ main (int argc, char **argv)
>float64x1_t f64x1_exp_maxnm  = vdup_n_f64 (4.56);
>float64x1_t f64x1_ret_minnm  = vminnm_f64 (f64x1_input1, f64x1_input2);
>float64x1_t f64x1_ret_maxnm  = vmaxnm_f64 (f64x1_input1, f64x1_input2);
> -  CHECK (uint64_t, 1, f64x1_ret_minnm, f64x1_exp_minnm);
> -  CHECK (uint64_t, 1, f64x1_ret_maxnm, f64x1_exp_maxnm);
> +  CHECK (f64x1_ret_minnm, f64x1_exp_minnm);
> +  CHECK (f64x1_ret_maxnm, f64x1_exp_maxnm);
>  
>/* v{min|max}_f64 normal.  */
>float64x1_t f64x1_exp_min  = vdup_n_f64 (1.23);
>float64x1_t f64x1_exp_max  = vdup_n_f64 (4.56);
>float64x1_t f64x1_ret_min  = vmin_f64 (f64x1_input1, f64x1_input2);
>float64x1_t f64x1_ret_max  = vmax_f64 (f64x1_input1, f64x1_input2);
> -  CHECK (uint64_t, 1, f64x1_ret_min, f64x1_exp_min);
> -  CHECK (uint64_t, 1, f64x1_ret_max, f64x1_exp_max);
> +  CHECK (f64x1_ret_min, f64x1_exp_min);
> +  CHECK (f64x1_ret_max, f64x1_exp_max);
>  
>/* v{min|max}nmq_f64 normal.  */
>float64x2_t f64x2_input1 = vdupq_n_f64 (1.23);
> @@ -87,8 +85,8 @@ main (int argc, char **argv)
>float64x2_t f64x2_exp_maxnm  = vdupq_n_f64 (4.56);
>float64x2_t f64x2_ret_minnm  = vminnmq_f64 (f64x2_input1, f64x2_input2);
>float64x2_t f64x2_ret_maxnm  = vmaxnmq_f64 (f64x2_input1, f64x2_input2);
> -  CHECK (uint64_t, 2, f64x2_ret_minnm, f64x2_exp_minnm);
> -  CHECK (uint64_t, 2, f64x2_ret_maxnm, f64x2_exp_maxnm);
> +  CHECK (f64x2_ret_minnm, f64x2_exp_minnm);
> +  CHECK (f64x2_ret_maxnm, f64x2_exp_maxnm);
>  
>/* v{min|max}nm_f64 NaN.  */
>f64x1_input1 = vdup_n_f64 

[PATCH][arm] Move some generated files out of the source tree

2017-07-04 Thread Richard Earnshaw (lists)
When I originally started work on the new options framework for ARM I'd
worked on the assumption that AWK might not be available on every build
machine (only on developer's machines).  However, looking again I notice
that all the options framework relies on it being present for every
build.  This means that some of the generated files that come from
running parsecpu.awk do not need to be kept under revision control.

Unfortunately, it's not _all_ generated files.  The build infrastructure
assumes that all .md fragments are in the source tree and similarly that
all .opt fragments are there as well.

Still, eliminating the very big .h files is a step forward as they are
very regular in structure and diff/patch/merge tools can sometimes make
mistakes when resolving conflicts.

So this patch removes the generated .h files from the source tree and
tweaks the make rules accordingly.  I've also changed the build rules to
use the stamp technique to eliminate some false dependencies in a rebuild.

Top-level:

* contrib/gcc_update: Remove stamp rules for arm-specific
auto-generated header files.

gcc:
* common/config/arm/arm-common.c: Adjust include path for
arm-cpu-cdata.h
* t-arm (TM_H): Adjust path for arm-cpu.h.
(arm-cpu.h): Create in build directory.  Adjust dependency rules.
(arm-cpu-data.h): Likewise.
(arm-cpu-cdata.h): Likewise.
* config/arm/arm-cpu.h: Delete.
* config/arm/arm-cpu-cdata.h: Delete.
* config/arm/arm-cpu-data.h: Delete.

Applied to trunk.

diff --git a/contrib/gcc_update b/contrib/gcc_update
index fe643af..082462f 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -82,9 +82,6 @@ gcc/fixinc/fixincl.x: gcc/fixinc/fixincl.tpl gcc/fixinc/inclhack.def
 gcc/config/aarch64/aarch64-tune.md: gcc/config/aarch64/aarch64-cores.def gcc/config/aarch64/gentune.sh
 gcc/config/arm/arm-tune.md: gcc/config/arm/arm-cpus.in gcc/config/arm/parsecpu.awk
 gcc/config/arm/arm-tables.opt: gcc/config/arm/arm-cpus.in gcc/config/arm/parsecpu.awk
-gcc/config/arm/arm-cpu.h: gcc/config/arm/arm-cpus.in gcc/config/arm/parsecpu.awk
-gcc/config/arm/arm-cpu-data.h: gcc/config/arm/arm-cpus.in gcc/config/arm/parsecpu.awk
-gcc/config/arm/arm-cpu-cdata.h: gcc/config/arm/arm-cpus.in gcc/config/arm/parsecpu.awk
 gcc/config/avr/avr-tables.opt: gcc/config/avr/avr-mcus.def gcc/config/avr/genopt.sh
 gcc/config/avr/t-multilib: gcc/config/avr/avr-mcus.def gcc/config/avr/genmultilib.awk
 gcc/config/c6x/c6x-tables.opt: gcc/config/c6x/c6x-isas.def gcc/config/c6x/genopt.sh
diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index b6244d6..38bd3a7 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -144,7 +144,7 @@ arm_rewrite_march (int argc, const char **argv)
   return arm_rewrite_selected_arch (argv[argc - 1]);
 }
 
-#include "config/arm/arm-cpu-cdata.h"
+#include "arm-cpu-cdata.h"
 
 /* Scan over a raw feature array BITS checking for BIT being present.
This is slower than the normal bitmask checks, but we would spend longer
diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h
deleted file mode 100644
index acd36d4..000
--- a/gcc/config/arm/arm-cpu-cdata.h
+++ /dev/null
@@ -1,2684 +0,0 @@
-/* -*- buffer-read-only: t -*-
-   Generated automatically by parsecpu.awk from arm-cpus.in.
-   Do not edit.
-
-   Copyright (C) 2011-2017 Free Software Foundation, Inc.
-
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as
-   published by the Free Software Foundation; either version 3,
-   or (at your option) any later version.
-
-   GCC is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public
-   License along with GCC; see the file COPYING3.  If not see
-   .  */
-
-static const cpu_arch_extension cpu_opttab_arm9e[] = {
-  {
-"nofp", true, false,
-{ ISA_ALL_FP, isa_nobit }
-  },
-  { NULL, false, false, {isa_nobit}}
-};
-
-static const cpu_arch_extension cpu_opttab_arm946es[] = {
-  {
-"nofp", true, false,
-{ ISA_ALL_FP, isa_nobit }
-  },
-  { NULL, false, false, {isa_nobit}}
-};
-
-static const cpu_arch_extension cpu_opttab_arm966es[] = {
-  {
-"nofp", true, false,
-{ ISA_ALL_FP, isa_nobit }
-  },
-  { NULL, false, false, {isa_nobit}}
-};
-
-static const cpu_arch_extension cpu_opttab_arm968es[] = {
-  {
-"nofp", true, false,
-{ ISA_ALL_FP, isa_nobit }
-  },
-  { NULL, false, false, {isa_nobit}}
-};
-
-static const cpu_arch_extension cpu_opttab_arm10e[] = {
-  {
-"nofp", true, false,
-{ ISA_ALL_FP, isa_nobit }
-  },
-  { NULL, false, false, 

[PATCH][AArch64] Fix strict aliasing issue in gcc.target/aarch64/simd/vminmaxnm_1.c

2017-07-04 Thread Kyrill Tkachov

Hi all,

While doing some unrelated work the gcc.target/aarch64/simd/vminmaxnm_1.c 
testcase started failing for me.
Upon investigation it turns out that it breaks the C strict aliasing rules in 
the CHECK macro by casting
a pointer to an incompatible type and dereferencing it. GCC even warns about it 
if compiled with -Wstrict-aliasing.

This patch fixes the testcase by making it use memcmp to compare the vector 
elements.
This avoids the undefined behaviour.

The testcase still passes on trunk.
Ok to commit?

Thanks,
Kyrill

2017-07-04  Kyrylo Tkachov  

* gcc.target/aarch64/simd/vminmaxnm_1.c: Fix strict aliasing issues.
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
index 192bad9..8fd4281 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
@@ -7,12 +7,10 @@
 
 extern void abort ();
 
-#define CHECK(T, N, R, E) \
+#define CHECK(R, E) \
   {\
-int i = 0;\
-for (; i < N; i++)\
-  if (* (T *) [i] != * (T *) [i])\
-	abort ();\
+if (__builtin_memcmp (, , sizeof (R)) != 0)\
+  abort ();\
   }
 
 int
@@ -26,8 +24,8 @@ main (int argc, char **argv)
   float32x2_t f32x2_ret_minnm  = vminnm_f32 (f32x2_input1, f32x2_input2);
   float32x2_t f32x2_ret_maxnm  = vmaxnm_f32 (f32x2_input1, f32x2_input2);
 
-  CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
-  CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
+  CHECK (f32x2_ret_minnm, f32x2_exp_minnm);
+  CHECK (f32x2_ret_maxnm, f32x2_exp_maxnm);
 
   /* v{min|max}nm_f32 NaN.  */
   f32x2_input1 = vdup_n_f32 (__builtin_nanf (""));
@@ -37,8 +35,8 @@ main (int argc, char **argv)
   f32x2_ret_minnm  = vminnm_f32 (f32x2_input1, f32x2_input2);
   f32x2_ret_maxnm  = vmaxnm_f32 (f32x2_input1, f32x2_input2);
 
-  CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
-  CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
+  CHECK (f32x2_ret_minnm, f32x2_exp_minnm);
+  CHECK (f32x2_ret_maxnm, f32x2_exp_maxnm);
 
   /* v{min|max}nmq_f32 normal.  */
   float32x4_t f32x4_input1 = vdupq_n_f32 (-1024.0);
@@ -48,8 +46,8 @@ main (int argc, char **argv)
   float32x4_t f32x4_ret_minnm  = vminnmq_f32 (f32x4_input1, f32x4_input2);
   float32x4_t f32x4_ret_maxnm  = vmaxnmq_f32 (f32x4_input1, f32x4_input2);
 
-  CHECK (uint32_t, 4, f32x4_ret_minnm, f32x4_exp_minnm);
-  CHECK (uint32_t, 4, f32x4_ret_maxnm, f32x4_exp_maxnm);
+  CHECK (f32x4_ret_minnm, f32x4_exp_minnm);
+  CHECK (f32x4_ret_maxnm, f32x4_exp_maxnm);
 
   /* v{min|max}nmq_f32 NaN.  */
   f32x4_input1 = vdupq_n_f32 (-__builtin_nanf (""));
@@ -59,8 +57,8 @@ main (int argc, char **argv)
   f32x4_ret_minnm  = vminnmq_f32 (f32x4_input1, f32x4_input2);
   f32x4_ret_maxnm  = vmaxnmq_f32 (f32x4_input1, f32x4_input2);
 
-  CHECK (uint32_t, 4, f32x4_ret_minnm, f32x4_exp_minnm);
-  CHECK (uint32_t, 4, f32x4_ret_maxnm, f32x4_exp_maxnm);
+  CHECK (f32x4_ret_minnm, f32x4_exp_minnm);
+  CHECK (f32x4_ret_maxnm, f32x4_exp_maxnm);
 
   /* v{min|max}nm_f64 normal.  */
   float64x1_t f64x1_input1 = vdup_n_f64 (1.23);
@@ -69,16 +67,16 @@ main (int argc, char **argv)
   float64x1_t f64x1_exp_maxnm  = vdup_n_f64 (4.56);
   float64x1_t f64x1_ret_minnm  = vminnm_f64 (f64x1_input1, f64x1_input2);
   float64x1_t f64x1_ret_maxnm  = vmaxnm_f64 (f64x1_input1, f64x1_input2);
-  CHECK (uint64_t, 1, f64x1_ret_minnm, f64x1_exp_minnm);
-  CHECK (uint64_t, 1, f64x1_ret_maxnm, f64x1_exp_maxnm);
+  CHECK (f64x1_ret_minnm, f64x1_exp_minnm);
+  CHECK (f64x1_ret_maxnm, f64x1_exp_maxnm);
 
   /* v{min|max}_f64 normal.  */
   float64x1_t f64x1_exp_min  = vdup_n_f64 (1.23);
   float64x1_t f64x1_exp_max  = vdup_n_f64 (4.56);
   float64x1_t f64x1_ret_min  = vmin_f64 (f64x1_input1, f64x1_input2);
   float64x1_t f64x1_ret_max  = vmax_f64 (f64x1_input1, f64x1_input2);
-  CHECK (uint64_t, 1, f64x1_ret_min, f64x1_exp_min);
-  CHECK (uint64_t, 1, f64x1_ret_max, f64x1_exp_max);
+  CHECK (f64x1_ret_min, f64x1_exp_min);
+  CHECK (f64x1_ret_max, f64x1_exp_max);
 
   /* v{min|max}nmq_f64 normal.  */
   float64x2_t f64x2_input1 = vdupq_n_f64 (1.23);
@@ -87,8 +85,8 @@ main (int argc, char **argv)
   float64x2_t f64x2_exp_maxnm  = vdupq_n_f64 (4.56);
   float64x2_t f64x2_ret_minnm  = vminnmq_f64 (f64x2_input1, f64x2_input2);
   float64x2_t f64x2_ret_maxnm  = vmaxnmq_f64 (f64x2_input1, f64x2_input2);
-  CHECK (uint64_t, 2, f64x2_ret_minnm, f64x2_exp_minnm);
-  CHECK (uint64_t, 2, f64x2_ret_maxnm, f64x2_exp_maxnm);
+  CHECK (f64x2_ret_minnm, f64x2_exp_minnm);
+  CHECK (f64x2_ret_maxnm, f64x2_exp_maxnm);
 
   /* v{min|max}nm_f64 NaN.  */
   f64x1_input1 = vdup_n_f64 (-__builtin_nanf (""));
@@ -98,8 +96,8 @@ main (int argc, char **argv)
   f64x1_ret_minnm  = vminnm_f64 (f64x1_input1, f64x1_input2);
   f64x1_ret_maxnm  = vmaxnm_f64 (f64x1_input1, f64x1_input2);
 
-  CHECK (uint64_t, 1, f64x1_ret_minnm, f64x1_exp_minnm);
-  CHECK (uint64_t, 1, f64x1_ret_maxnm, f64x1_exp_maxnm);
+  CHECK 

Re: [PATCH] Fix -fcompare-debug issues caused by recent VRP assert expr sorting changes (PR debug/81278)

2017-07-04 Thread Richard Biener
On July 4, 2017 2:41:52 PM GMT+02:00, Jakub Jelinek  wrote:
>On Tue, Jul 04, 2017 at 02:00:13PM +0200, Richard Biener wrote:
>> > That was intentional.  If a->e != NULL, then we know that b->e !=
>NULL,
>> > because we have
>> >   else if (a->e != NULL && b->e == NULL)
>> > return -1;
>> > earlier.  Similarly, if a->e == NULL, then we know that b-> ==
>NULL, because
>> > we have:
>> >   if (a->e == NULL && b->e != NULL)
>> > return 1;
>> > earlier.
>> 
>> Ah, ok.  Twisty ;)  I suppose jump threading will have eliminated
>> the extra test.
>
>In the first case maybe, I doubt it would do that after the
>iterative_hash_expr calls which are likely not pure.
>
>> > > Otherwise looks ok to me.  I wonder if we should merge the two
>> > > sorting functions and change behavior with a global var or a
>> > > template parameter instead (to reduce source duplication).  Does
>> > > 
>> > > vec.qsort (function_template);
>> > > 
>> > > work?
>> > 
>> > Let me try that.
>
>Seems to work, so like this if it passes bootstrap/regtest?

OK.

Richard.

>2017-07-04  Jakub Jelinek  
>
>   PR debug/81278
>   * tree-vrp.c (compare_assert_loc): Turn into a function template
>   with stable template parameter.  Only test if a->e is NULL,
>   !a->e == !b->e has been verified already.  Use e == NULL or
>   e != NULL instead of e or ! e tests.  If stable is true, don't use
>   iterative_hash_expr, on the other side allow a or b or both NULL
>   and sort the NULLs last.
>   (process_assert_insertions): Sort using compare_assert_loc
>   instead of compare_assert_loc, later sort using
>   compare_assert_loc before calling process_assert_insertions_for
>   in a loop.  Use break instead of continue once seen NULL pointer.
>
>--- gcc/tree-vrp.c.jj  2017-07-04 10:43:48.627706528 +0200
>+++ gcc/tree-vrp.c 2017-07-04 14:37:06.823101453 +0200
>@@ -6393,20 +6393,37 @@ process_assert_insertions_for (tree name
>   gcc_unreachable ();
> }
> 
>-/* Qsort helper for sorting assert locations.  */
>+/* Qsort helper for sorting assert locations.  If stable is true,
>don't
>+   use iterative_hash_expr because it can be unstable for
>-fcompare-debug,
>+   on the other side some pointers might be NULL.  */
> 
>+template 
> static int
> compare_assert_loc (const void *pa, const void *pb)
> {
>   assert_locus * const a = *(assert_locus * const *)pa;
>   assert_locus * const b = *(assert_locus * const *)pb;
>-  if (! a->e && b->e)
>+
>+  /* If stable, some asserts might be optimized away already, sort
>+ them last.  */
>+  if (stable)
>+{
>+  if (a == NULL)
>+  return b != NULL;
>+  else if (b == NULL)
>+  return -1;
>+}
>+
>+  if (a->e == NULL && b->e != NULL)
> return 1;
>-  else if (a->e && ! b->e)
>+  else if (a->e != NULL && b->e == NULL)
> return -1;
> 
>+  /* After the above checks, we know that (a->e == NULL) == (b->e ==
>NULL),
>+ no need to test both a->e and b->e.  */
>+
>   /* Sort after destination index.  */
>-  if (! a->e && ! b->e)
>+  if (a->e == NULL)
> ;
>   else if (a->e->dest->index > b->e->dest->index)
> return 1;
>@@ -6419,11 +6436,27 @@ compare_assert_loc (const void *pa, cons
>   else if (a->comp_code < b->comp_code)
> return -1;
> 
>+  hashval_t ha, hb;
>+
>+  /* E.g. if a->val is ADDR_EXPR of a VAR_DECL, iterative_hash_expr
>+ uses DECL_UID of the VAR_DECL, so sorting might differ between
>+ -g and -g0.  When doing the removal of redundant assert exprs
>+ and commonization to successors, this does not matter, but for
>+ the final sort needs to be stable.  */
>+  if (stable)
>+{
>+  ha = 0;
>+  hb = 0;
>+}
>+  else
>+{
>+  ha = iterative_hash_expr (a->expr, iterative_hash_expr (a->val,
>0));
>+  hb = iterative_hash_expr (b->expr, iterative_hash_expr (b->val,
>0));
>+}
>+
>   /* Break the tie using hashing and source/bb index.  */
>-  hashval_t ha = iterative_hash_expr (a->expr, iterative_hash_expr
>(a->val, 0));
>-  hashval_t hb = iterative_hash_expr (b->expr, iterative_hash_expr
>(b->val, 0));
>   if (ha == hb)
>-return (a->e && b->e
>+return (a->e != NULL
>   ? a->e->src->index - b->e->src->index
>   : a->bb->index - b->bb->index);
>   return ha - hb;
>@@ -6452,7 +6485,7 @@ process_assert_insertions (void)
>   auto_vec asserts;
>   for (; loc; loc = loc->next)
>   asserts.safe_push (loc);
>-  asserts.qsort (compare_assert_loc);
>+  asserts.qsort (compare_assert_loc);
> 
>/* Push down common asserts to successors and remove redundant ones. 
>*/
>   unsigned ecnt = 0;
>@@ -6506,11 +6539,14 @@ process_assert_insertions (void)
>   }
>   }
> 
>+  /* The asserts vector sorting above might be unstable for
>+   -fcompare-debug, sort again to ensure a stable sort.  */
>+  asserts.qsort (compare_assert_loc);
>   for (unsigned j = 0; j < asserts.length (); ++j)
>   {
> 

Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Andreas Schwab
On Jul 04 2017, Wilco Dijkstra  wrote:

> Configure seems to be able to find gnatbind/gnatmake as they are in /usr/bin.

Strange there are ada tools, but no ada compiler.  Are you sure you
installed all relevant ada packages?

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: Profile upating in simd_clone_adjust

2017-07-04 Thread Jakub Jelinek
On Tue, Jul 04, 2017 at 03:46:26PM +0200, Jan Hubicka wrote:
> Hi,
> this is the last occurence of missing probability update during x86-64
> bootstrap  I am not sure what is really going on here,
> is haing probability as almost never executed OK?

This is for the test whether the particular iteration has been masked off
and nothing should be done for that iteration.  Never is a bad guess,
if the mask is always true on all iterations, then one should be using
a different simd clone function without the mask argument which is more
efficient to gain performance.  I think the mask non-zero (i.e. do
something) is still more probable than zero, but not significantly much.
The hope is we can vectorize using AVX512 masked operations, but in reality
in many cases we can't.

> Index: omp-simd-clone.c
> ===
> --- omp-simd-clone.c  (revision 249928)
> +++ omp-simd-clone.c  (working copy)
> @@ -1240,7 +1240,9 @@ simd_clone_adjust (struct cgraph_node *n
>g = gimple_build_cond (EQ_EXPR, mask, build_zero_cst (TREE_TYPE 
> (mask)),
>NULL, NULL);
>gsi_insert_after (, g, GSI_CONTINUE_LINKING);
> -  make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE);
> +  edge e = make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE);
> +  /* FIXME: What is real porbability here?  */
> +  e->probability = profile_probability::guessed_never ();
>FALLTHRU_EDGE (loop->header)->flags = EDGE_FALSE_VALUE;
>  }
>  

Jakub


Re: [PATCH 1/3, GCC/ARM] Add MIDR info for ARM Cortex-R7 and Cortex-R8

2017-07-04 Thread Kyrill Tkachov


On 29/06/17 14:55, Thomas Preudhomme wrote:

Hi,

The driver is missing MIDR information for processors ARM Cortex-R7 and
Cortex-R8 to support -march/-mcpu/-mtune=native on the command line.
This patch adds the missing information.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

* config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
Cortex-R7 and Cortex-R8 processors.

Is this ok for master?



Ok.
Thanks,
Kyrill


Best regards,

Thomas




Re: [PATCH 1/3, GCC/ARM, ping] Add MIDR info for ARM Cortex-R7 and Cortex-R8

2017-07-04 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 29/06/17 14:55, Thomas Preudhomme wrote:

Hi,

The driver is missing MIDR information for processors ARM Cortex-R7 and
Cortex-R8 to support -march/-mcpu/-mtune=native on the command line.
This patch adds the missing information.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-01-31  Thomas Preud'homme  

 * config/arm/driver-arm.c (arm_cpu_table): Add entry for ARM
 Cortex-R7 and Cortex-R8 processors.

Is this ok for master?

Best regards,

Thomas
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index b034f13fda63f5892bbd9879d72f4b02e2632d69..29873d57a1e45fd989f6ff01dd4a2ae7320d93bb 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -54,6 +54,8 @@ static struct vendor_cpu arm_cpu_table[] = {
 {"0xd09", "armv8-a+crc", "cortex-a73"},
 {"0xc14", "armv7-r", "cortex-r4"},
 {"0xc15", "armv7-r", "cortex-r5"},
+{"0xc17", "armv7-r", "cortex-r7"},
+{"0xc18", "armv7-r", "cortex-r8"},
 {"0xc20", "armv6-m", "cortex-m0"},
 {"0xc21", "armv6-m", "cortex-m1"},
 {"0xc23", "armv7-m", "cortex-m3"},


Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Michael Matz
Hi,

On Tue, 4 Jul 2017, Wilco Dijkstra wrote:

> > You'll probably also have to set GNATBIND and GNATMAKE to the 
> > appropriately suffixed variants.  Just saying, because that's what I'm 
> > usually forgetting and end up with strange errors :)
> 
> Configure seems to be able to find gnatbind/gnatmake as they are in /usr/bin.

Sure, but if you use gcc-5 as compiler you'll want to use gnatbind-5 as 
well, not gnatbind (which presumably is from the default 4.8 compiler you 
have).

> Compiling a simple .adb file as suggested gives:
> 
> > /usr/bin/gcc-5 tmp.adb -S
> gcc-5: error trying to exec 'gnat1': execvp: No such file or directory
> 
> Is this part of the GCC driver, GNAT or something else that needs to be 
> installed first?

Don't know the package naming convention on Ubuntu (?).  For us it's e.g. 
gcc5-ada that you'd need to install in addition.  google tells me gnat-5 
should be it.


Ciao,
Michael.

Re: Profile upating in simd_clone_adjust

2017-07-04 Thread Rainer Orth
Jan Hubicka  writes:

> Hi,
> this is the last occurence of missing probability update during x86-64
> bootstrap  I am not sure what is really going on here,
> is haing probability as almost never executed OK?
>
> Honza
>
> Index: omp-simd-clone.c
> ===
> --- omp-simd-clone.c  (revision 249928)
> +++ omp-simd-clone.c  (working copy)
> @@ -1240,7 +1240,9 @@ simd_clone_adjust (struct cgraph_node *n
>g = gimple_build_cond (EQ_EXPR, mask, build_zero_cst (TREE_TYPE 
> (mask)),
>NULL, NULL);
>gsi_insert_after (, g, GSI_CONTINUE_LINKING);
> -  make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE);
> +  edge e = make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE);
> +  /* FIXME: What is real porbability here?  */
 ^ typo

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Profile upating in simd_clone_adjust

2017-07-04 Thread Jan Hubicka
Hi,
this is the last occurence of missing probability update during x86-64
bootstrap  I am not sure what is really going on here,
is haing probability as almost never executed OK?

Honza

Index: omp-simd-clone.c
===
--- omp-simd-clone.c(revision 249928)
+++ omp-simd-clone.c(working copy)
@@ -1240,7 +1240,9 @@ simd_clone_adjust (struct cgraph_node *n
   g = gimple_build_cond (EQ_EXPR, mask, build_zero_cst (TREE_TYPE (mask)),
 NULL, NULL);
   gsi_insert_after (, g, GSI_CONTINUE_LINKING);
-  make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE);
+  edge e = make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE);
+  /* FIXME: What is real porbability here?  */
+  e->probability = profile_probability::guessed_never ();
   FALLTHRU_EDGE (loop->header)->flags = EDGE_FALSE_VALUE;
 }
 


Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Wilco Dijkstra
Michael Matz wrote:
>
> You'll probably also have to set GNATBIND and GNATMAKE to the 
> appropriately suffixed variants.  Just saying, because that's what I'm 
> usually forgetting and end up with strange errors :)

Configure seems to be able to find gnatbind/gnatmake as they are in /usr/bin.

Compiling a simple .adb file as suggested gives:

> /usr/bin/gcc-5 tmp.adb -S
gcc-5: error trying to exec 'gnat1': execvp: No such file or directory

Is this part of the GCC driver, GNAT or something else that needs to be 
installed first?

Wilco

Re: [PATCH-v3] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-07-04 Thread Daniel Cederman



On 2017-06-30 07:11, Sebastian Huber wrote:

On 29/06/17 18:05, David Miller wrote:


From: Daniel Cederman 
Date: Thu, 29 Jun 2017 17:15:43 +0200


I'm not thrilled with this, it's undocumented, the other workaround
don't have
it and I don't think that we really need it.

The B2BST errata workaround requires more changes to assembler
routines commonly used by operating systems, such as for example
register window handling, than what the UT699 workaround needed. It
would be nice to have a way to only enable these modification when the
-mfix- flag is used. The alternative would be to provide a define
directly on the compiler command line in conjunction with -mfix
flag. But if more changes are required later on it would be good to
have the define more closely tied to the flag to minimize the number
of changes to Makefiles and etc.

Personally, I have never seen compiler based CPP defines as ever being
useful for tailoring OS assembler code.  Ever.

In most cases you will want to support several families of CPUs and
therefore sort out the individual cpu support assembler routines
internally in the kernel sources.


This depends on the operating system you use. For some  embedded systems 
where the application and the operating system are one executable it is 
quite common to use compiler provided defines in assembly code.


For example:

https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/machine/arm/memcpy-armv7a.S;h=cd7962e075a30cb90ec073d77b177c3536429b9b;hb=HEAD 



For a software development kit, the run-time libraries are built for a 
set of multilibs. Each assembler file may use multilib specific compiler 
defines, e.g. floating point unit present or not, errata XYZ present or 
not, etc.




We can drop the define if necessary, but we would like to keep the two 
flags. Would that be OK to apply?


--
Daniel Cederman
Cobham Gaisler



Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Michael Matz
Hi,

On Tue, 4 Jul 2017, Ramana Radhakrishnan wrote:

> Yeah it turns out that on the machine Wilco was using, we are running 
> 14.04 which has a gcc 4.8 base compiler that didn't have Ada on for 
> AArch64.
> 
> I think we can work around by installing a gcc-5 package and then 
> setting CC to gcc-5.

You'll probably also have to set GNATBIND and GNATMAKE to the 
appropriately suffixed variants.  Just saying, because that's what I'm 
usually forgetting and end up with strange errors :)


Ciao,
Michael.


Re: [PATCH v9] add -fpatchable-function-entry=N,M option

2017-07-04 Thread Torsten Duwe
On Tue, Jul 04, 2017 at 02:27:00PM +0100, Richard Earnshaw (lists) wrote:
> > 
> > How about omitting the recording step and document that? This way the
> > instrumentation can still be useful on e.g. a.out format; the framework
> > would then have to check around each function entry whether the NOPs are
> > there.
> > 
> 
> I'm happy with that if the result is still something useful.  Frankly, I
> don't think GCC has many targets now that don't support named sections,
> so I wouldn't loose too much sleep over being unable to support this
> feature on such targets.  However, if the implementation is straight
> forward then doing what you suggest is fine.

Ok, will do it this way then.

> 
> >> The template NOP in default_print_patchable_function_entry needs to be
> >> added as a GC root to prevent it being discarded during garbage collection.
> > 
> > Are you sure? For the nop_templ I'm only digging my way to the platform-
> > dependent constant string, which I find amongst the constant strings in
> > .rodata (or .text, for platforms without named sections ;-).
> > Is the memory region pointed to by nop_templ really endangered?
> > 
> 
> I'd missed that this was trying to recover a string.  That's probably OK
> from a GC perspective then, but it isn't safe to hold this between
> functions since the nop used for one function might not be the same when
> we come to the next.  Consider, for example, the ARM port where we can
> switch from generating ARM code to generating Thumb code: it might work
> with the current implementation, but this isn't guaranteed to be the case.

I'm "caching" the string, currently. Now that you say it...
It will slow things down a bit, but you're right.



Re: [PATCH v9] add -fpatchable-function-entry=N,M option

2017-07-04 Thread Michael Matz
Hi,

On Tue, 4 Jul 2017, Michael Matz wrote:

> I don't think so: get_insn_template() should always return strings in 
> .rodata, even for output statements, and should never point into GC 
> memory.

Bah, ignore that.  I should refetch mail before answering mid-thread ;)


Ciao,
Michael.


Re: [PATCH v9] add -fpatchable-function-entry=N,M option

2017-07-04 Thread Michael Matz
Hello Richard,

On Tue, 4 Jul 2017, Richard Earnshaw (lists) wrote:

> > +void
> > +default_print_patchable_function_entry (FILE *file,
> > +   unsigned HOST_WIDE_INT patch_area_size,
> > +   bool record_p)
> > +{
> > +  static const char *nop_templ = 0;
> 
> You need to record this pointer as a GC root.  Otherwise garbage
> collection might destroy your template NOP.

I don't think so: get_insn_template() should always return strings in 
.rodata, even for output statements, and should never point into GC 
memory.

> > +  /* We use the template alone, relying on the (currently sane) assumption
> > + that the NOP template does not have variable operands.  */
> > +  if (!nop_templ)
> > +{
> > +  int code_num;
> > +  rtx_insn *my_nop = make_insn_raw (gen_nop ());
> > +
> > +  code_num = recog_memoized (my_nop);
> > +  nop_templ = get_insn_template (code_num, my_nop);
> > +}


Ciao,
Michael.


Re: [PATCH v9] add -fpatchable-function-entry=N,M option

2017-07-04 Thread Richard Earnshaw (lists)
On 04/07/17 14:14, Torsten Duwe wrote:
> On Tue, Jul 04, 2017 at 12:02:47PM +0100, Richard Earnshaw (lists) wrote:
>> On 13/06/17 18:00, Torsten Duwe wrote:
>>> Changes since v8:
>>>
>>>   * Documentation changes as requested by Sandra
>>>   * 3 functional test cases added
>>>
>>> Torsten
>>>
>>>
>>> gcc/c-family/ChangeLog
>>> 2017-06-13  Torsten Duwe  
>>>
>>> * c-attribs.c (c_common_attribute_table): Add entry for
>>> "patchable_function_entry".
>>>
>>> gcc/lto/ChangeLog
>>> 2017-06-13  Torsten Duwe  
>>>
>>> * lto-lang.c (lto_attribute_table): Add entry for
>>> "patchable_function_entry".
>>>
>>> gcc/ChangeLog
>>> 2017-06-13  Torsten Duwe  
>>>
>>> * common.opt: Introduce -fpatchable-function-entry
>>> command line option, and its variables function_entry_patch_area_size
>>> and function_entry_patch_area_start.
>>> * opts.c (common_handle_option): Add -fpatchable_function_entry_ case,
>>> including a two-value parser.
>>> * target.def (print_patchable_function_entry): New target hook.
>>> * targhooks.h (default_print_patchable_function_entry): New function.
>>> * targhooks.c (default_print_patchable_function_entry): Likewise.
>>> * toplev.c (process_options): Switch off IPA-RA if
>>> patchable function entries are being generated.
>>> * varasm.c (assemble_start_function): Look at the
>>> patchable-function-entry command line switch and current
>>> function attributes and maybe generate NOP instructions by
>>> calling the print_patchable_function_entry hook.
>>> * doc/extend.texi: Document patchable_function_entry attribute.
>>> * doc/invoke.texi: Document -fpatchable_function_entry
>>> command line option.
>>> * doc/tm.texi.in (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
>>> New target hook.
>>> * doc/tm.texi: Likewise.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2017-06-13  Torsten Duwe  
>>>
>>> * c-c++-common/patchable_function_entry-default.c: New test.
>>> * c-c++-common/patchable_function_entry-decl.c: Likewise.
>>> * c-c++-common/patchable_function_entry-definition.c: Likewise.
>>
>> I think we're nearly there with this one, but there are a couple of nits
>> still to sort out.
>>
>> I can't see anything in the patch to deal with targets that set
>> TARGET_HAVE_NAMED_SECTIONS to false.  We'll probably ICE in that case
>> since the compiler will be unable to record the location of the patch
>> region in the explicitly named section.  I think the correct thing to do
>> there is to reject the option during option validation when we can't
>> support it properly.
> 
> How about omitting the recording step and document that? This way the
> instrumentation can still be useful on e.g. a.out format; the framework
> would then have to check around each function entry whether the NOPs are
> there.
> 

I'm happy with that if the result is still something useful.  Frankly, I
don't think GCC has many targets now that don't support named sections,
so I wouldn't loose too much sleep over being unable to support this
feature on such targets.  However, if the implementation is straight
forward then doing what you suggest is fine.


>> The template NOP in default_print_patchable_function_entry needs to be
>> added as a GC root to prevent it being discarded during garbage collection.
> 
> Are you sure? For the nop_templ I'm only digging my way to the platform-
> dependent constant string, which I find amongst the constant strings in
> .rodata (or .text, for platforms without named sections ;-).
> Is the memory region pointed to by nop_templ really endangered?
> 

I'd missed that this was trying to recover a string.  That's probably OK
from a GC perspective then, but it isn't safe to hold this between
functions since the nop used for one function might not be the same when
we come to the next.  Consider, for example, the ARM port where we can
switch from generating ARM code to generating Thumb code: it might work
with the current implementation, but this isn't guaranteed to be the case.

>> There are a couple of documentation tweaks, which I think help make the
>> text a little more comprehensible.  See inline.
> 
> Ack for those, in toto. I wrote this stuff, and am actively using it.
> I find it sort of hard to imagine to have 0 knowledge about it, and phrase
> documentation accordingly. I'd add the half-sentence for non named sections,
> see below.
> 
>> R.
>>
>>>
>>> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
>>> index f2a88e147ba..31137ce0433 100644
>>> --- a/gcc/c-family/c-attribs.c
>>> +++ b/gcc/c-family/c-attribs.c
>>> @@ -139,6 +139,8 @@ static tree handle_bnd_variable_size_attribute (tree *, 
>>> tree, tree, int, bool *)
>>>  static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
>>>  static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
>>>  static tree handle_fallthrough_attribute (tree *, tree, tree, 

Fix ChangeLog format in r247584

2017-07-04 Thread Thomas Preudhomme

Hi,

This patch fixes relative pathnames in gcc/ChangeLog for r247584. Committed as 
obvious to trunk, GCC 5, 6 and 7.


Best regards,

Thomas
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f9e00198bbfd352960685b5c72193570e232e68a..39bdcb12ebbad3cdbdce6b9d4dd87c28610e37fe 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -5826,7 +5826,7 @@
 
 2017-05-04  Prakhar Bahuguna  
 
-	* gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
+	* config/arm/arm-builtins.c (arm_init_builtins): Rename
 	__builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
 	__builtin_arm_stfscr to __builtin_arm_set_fpscr.
 


Re: [PATCH v9] add -fpatchable-function-entry=N,M option

2017-07-04 Thread Torsten Duwe
On Tue, Jul 04, 2017 at 12:02:47PM +0100, Richard Earnshaw (lists) wrote:
> On 13/06/17 18:00, Torsten Duwe wrote:
> > Changes since v8:
> > 
> >   * Documentation changes as requested by Sandra
> >   * 3 functional test cases added
> > 
> > Torsten
> > 
> > 
> > gcc/c-family/ChangeLog
> > 2017-06-13  Torsten Duwe  
> > 
> > * c-attribs.c (c_common_attribute_table): Add entry for
> > "patchable_function_entry".
> > 
> > gcc/lto/ChangeLog
> > 2017-06-13  Torsten Duwe  
> > 
> > * lto-lang.c (lto_attribute_table): Add entry for
> > "patchable_function_entry".
> > 
> > gcc/ChangeLog
> > 2017-06-13  Torsten Duwe  
> > 
> > * common.opt: Introduce -fpatchable-function-entry
> > command line option, and its variables function_entry_patch_area_size
> > and function_entry_patch_area_start.
> > * opts.c (common_handle_option): Add -fpatchable_function_entry_ case,
> > including a two-value parser.
> > * target.def (print_patchable_function_entry): New target hook.
> > * targhooks.h (default_print_patchable_function_entry): New function.
> > * targhooks.c (default_print_patchable_function_entry): Likewise.
> > * toplev.c (process_options): Switch off IPA-RA if
> > patchable function entries are being generated.
> > * varasm.c (assemble_start_function): Look at the
> > patchable-function-entry command line switch and current
> > function attributes and maybe generate NOP instructions by
> > calling the print_patchable_function_entry hook.
> > * doc/extend.texi: Document patchable_function_entry attribute.
> > * doc/invoke.texi: Document -fpatchable_function_entry
> > command line option.
> > * doc/tm.texi.in (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
> > New target hook.
> > * doc/tm.texi: Likewise.
> > 
> > gcc/testsuite/ChangeLog
> > 2017-06-13  Torsten Duwe  
> > 
> > * c-c++-common/patchable_function_entry-default.c: New test.
> > * c-c++-common/patchable_function_entry-decl.c: Likewise.
> > * c-c++-common/patchable_function_entry-definition.c: Likewise.
> 
> I think we're nearly there with this one, but there are a couple of nits
> still to sort out.
> 
> I can't see anything in the patch to deal with targets that set
> TARGET_HAVE_NAMED_SECTIONS to false.  We'll probably ICE in that case
> since the compiler will be unable to record the location of the patch
> region in the explicitly named section.  I think the correct thing to do
> there is to reject the option during option validation when we can't
> support it properly.

How about omitting the recording step and document that? This way the
instrumentation can still be useful on e.g. a.out format; the framework
would then have to check around each function entry whether the NOPs are
there.

> The template NOP in default_print_patchable_function_entry needs to be
> added as a GC root to prevent it being discarded during garbage collection.

Are you sure? For the nop_templ I'm only digging my way to the platform-
dependent constant string, which I find amongst the constant strings in
.rodata (or .text, for platforms without named sections ;-).
Is the memory region pointed to by nop_templ really endangered?

> There are a couple of documentation tweaks, which I think help make the
> text a little more comprehensible.  See inline.

Ack for those, in toto. I wrote this stuff, and am actively using it.
I find it sort of hard to imagine to have 0 knowledge about it, and phrase
documentation accordingly. I'd add the half-sentence for non named sections,
see below.

> R.
> 
> > 
> > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > index f2a88e147ba..31137ce0433 100644
> > --- a/gcc/c-family/c-attribs.c
> > +++ b/gcc/c-family/c-attribs.c
> > @@ -139,6 +139,8 @@ static tree handle_bnd_variable_size_attribute (tree *, 
> > tree, tree, int, bool *)
> >  static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
> >  static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
> >  static tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *);
> > +static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
> > +  int, bool *);
> >  
> >  /* Table of machine-independent attributes common to all C-like languages.
> >  
> > @@ -345,6 +347,9 @@ const struct attribute_spec c_common_attribute_table[] =
> >   handle_bnd_instrument, false },
> >{ "fallthrough",   0, 0, false, false, false,
> >   handle_fallthrough_attribute, false },
> > +  { "patchable_function_entry",1, 2, true, false, false,
> > + handle_patchable_function_entry_attribute,
> > + false },
> >{ NULL, 0, 0, false, false, false, NULL, false }
> >  };
> >  
> > @@ -3173,3 +3178,10 @@ 

Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Ramana Radhakrishnan
On Tue, Jul 4, 2017 at 1:56 PM, Arnaud Charlet  wrote:
> On Tue, Jul 04, 2017 at 12:19:35PM +, Wilco Dijkstra wrote:
>> Andreas Schwab wrote:
>> > @@ -5207,6 +5209,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>> >
>> >case MEM:
>> >  output_address (GET_MODE (x), XEXP (x, 0));
>> > +   gcc_assert (GET_MODE (XEXP (x, 0)) == Pmode);
>> >  break;
>> >
>> >case CONST:
>>
>> > That breaks a lot of gnat tests in ilp32 mode:
>>
>> That's interesting since it works fine in C and C++, and unless there are 
>> some
>> ADA specific MD patterns, it seems unlikely it could only affect ADA.
>>
>> So how do you build the ADA backend? GCC can't even get past the configure as
>> it doesn't appear to understand GNAT is installed after finding it...
>> Is there some more magic setup required for ADA?
>>
>> Wilco
>>
>> checking for gnatbind... gnatbind
>> checking for gnatmake... gnatmake
>> checking whether compiler driver understands Ada... no
>
> The line above means that using $CC -c dummy_ada_file.adb
> generates an error.
>
> $CC is "gcc" by default, some installs use gnatgcc instead of gcc to
> provide an Ada compiler.
>
> In other words, try to do:
>
> gcc -c hello.adb
>
> manually and you'll see the error for yourself, or check your config.log
> file.
>
> hello.adb can be as simple as the following single line:
>
> procedure hello is begin null; end;

Yeah it turns out that on the machine Wilco was using, we are running
14.04 which has a gcc 4.8 base compiler that didn't have Ada on for
AArch64.

I think we can work around by installing a gcc-5 package and then
setting CC to gcc-5.

Ramana


Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Arnaud Charlet
On Tue, Jul 04, 2017 at 12:19:35PM +, Wilco Dijkstra wrote:
> Andreas Schwab wrote:
> > @@ -5207,6 +5209,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
> >  
> >    case MEM:
> >  output_address (GET_MODE (x), XEXP (x, 0));
> > +   gcc_assert (GET_MODE (XEXP (x, 0)) == Pmode);
> >  break;
> >  
> >    case CONST:
> 
> > That breaks a lot of gnat tests in ilp32 mode:
> 
> That's interesting since it works fine in C and C++, and unless there are some
> ADA specific MD patterns, it seems unlikely it could only affect ADA.
> 
> So how do you build the ADA backend? GCC can't even get past the configure as
> it doesn't appear to understand GNAT is installed after finding it...
> Is there some more magic setup required for ADA?
> 
> Wilco
> 
> checking for gnatbind... gnatbind
> checking for gnatmake... gnatmake
> checking whether compiler driver understands Ada... no

The line above means that using $CC -c dummy_ada_file.adb
generates an error.

$CC is "gcc" by default, some installs use gnatgcc instead of gcc to
provide an Ada compiler.

In other words, try to do:

gcc -c hello.adb

manually and you'll see the error for yourself, or check your config.log
file.

hello.adb can be as simple as the following single line:

procedure hello is begin null; end;


Re: [PATCH] Fix -fcompare-debug issues caused by recent VRP assert expr sorting changes (PR debug/81278)

2017-07-04 Thread Jakub Jelinek
On Tue, Jul 04, 2017 at 02:00:13PM +0200, Richard Biener wrote:
> > That was intentional.  If a->e != NULL, then we know that b->e != NULL,
> > because we have
> >   else if (a->e != NULL && b->e == NULL)
> > return -1;
> > earlier.  Similarly, if a->e == NULL, then we know that b-> == NULL, because
> > we have:
> >   if (a->e == NULL && b->e != NULL)
> > return 1;
> > earlier.
> 
> Ah, ok.  Twisty ;)  I suppose jump threading will have eliminated
> the extra test.

In the first case maybe, I doubt it would do that after the
iterative_hash_expr calls which are likely not pure.

> > > Otherwise looks ok to me.  I wonder if we should merge the two
> > > sorting functions and change behavior with a global var or a
> > > template parameter instead (to reduce source duplication).  Does
> > > 
> > > vec.qsort (function_template);
> > > 
> > > work?
> > 
> > Let me try that.

Seems to work, so like this if it passes bootstrap/regtest?

2017-07-04  Jakub Jelinek  

PR debug/81278
* tree-vrp.c (compare_assert_loc): Turn into a function template
with stable template parameter.  Only test if a->e is NULL,
!a->e == !b->e has been verified already.  Use e == NULL or
e != NULL instead of e or ! e tests.  If stable is true, don't use
iterative_hash_expr, on the other side allow a or b or both NULL
and sort the NULLs last.
(process_assert_insertions): Sort using compare_assert_loc
instead of compare_assert_loc, later sort using
compare_assert_loc before calling process_assert_insertions_for
in a loop.  Use break instead of continue once seen NULL pointer.

--- gcc/tree-vrp.c.jj   2017-07-04 10:43:48.627706528 +0200
+++ gcc/tree-vrp.c  2017-07-04 14:37:06.823101453 +0200
@@ -6393,20 +6393,37 @@ process_assert_insertions_for (tree name
   gcc_unreachable ();
 }
 
-/* Qsort helper for sorting assert locations.  */
+/* Qsort helper for sorting assert locations.  If stable is true, don't
+   use iterative_hash_expr because it can be unstable for -fcompare-debug,
+   on the other side some pointers might be NULL.  */
 
+template 
 static int
 compare_assert_loc (const void *pa, const void *pb)
 {
   assert_locus * const a = *(assert_locus * const *)pa;
   assert_locus * const b = *(assert_locus * const *)pb;
-  if (! a->e && b->e)
+
+  /* If stable, some asserts might be optimized away already, sort
+ them last.  */
+  if (stable)
+{
+  if (a == NULL)
+   return b != NULL;
+  else if (b == NULL)
+   return -1;
+}
+
+  if (a->e == NULL && b->e != NULL)
 return 1;
-  else if (a->e && ! b->e)
+  else if (a->e != NULL && b->e == NULL)
 return -1;
 
+  /* After the above checks, we know that (a->e == NULL) == (b->e == NULL),
+ no need to test both a->e and b->e.  */
+
   /* Sort after destination index.  */
-  if (! a->e && ! b->e)
+  if (a->e == NULL)
 ;
   else if (a->e->dest->index > b->e->dest->index)
 return 1;
@@ -6419,11 +6436,27 @@ compare_assert_loc (const void *pa, cons
   else if (a->comp_code < b->comp_code)
 return -1;
 
+  hashval_t ha, hb;
+
+  /* E.g. if a->val is ADDR_EXPR of a VAR_DECL, iterative_hash_expr
+ uses DECL_UID of the VAR_DECL, so sorting might differ between
+ -g and -g0.  When doing the removal of redundant assert exprs
+ and commonization to successors, this does not matter, but for
+ the final sort needs to be stable.  */
+  if (stable)
+{
+  ha = 0;
+  hb = 0;
+}
+  else
+{
+  ha = iterative_hash_expr (a->expr, iterative_hash_expr (a->val, 0));
+  hb = iterative_hash_expr (b->expr, iterative_hash_expr (b->val, 0));
+}
+
   /* Break the tie using hashing and source/bb index.  */
-  hashval_t ha = iterative_hash_expr (a->expr, iterative_hash_expr (a->val, 
0));
-  hashval_t hb = iterative_hash_expr (b->expr, iterative_hash_expr (b->val, 
0));
   if (ha == hb)
-return (a->e && b->e
+return (a->e != NULL
? a->e->src->index - b->e->src->index
: a->bb->index - b->bb->index);
   return ha - hb;
@@ -6452,7 +6485,7 @@ process_assert_insertions (void)
   auto_vec asserts;
   for (; loc; loc = loc->next)
asserts.safe_push (loc);
-  asserts.qsort (compare_assert_loc);
+  asserts.qsort (compare_assert_loc);
 
   /* Push down common asserts to successors and remove redundant ones.  */
   unsigned ecnt = 0;
@@ -6506,11 +6539,14 @@ process_assert_insertions (void)
}
}
 
+  /* The asserts vector sorting above might be unstable for
+-fcompare-debug, sort again to ensure a stable sort.  */
+  asserts.qsort (compare_assert_loc);
   for (unsigned j = 0; j < asserts.length (); ++j)
{
  loc = asserts[j];
  if (! loc)
-   continue;
+   break;
  update_edges_p |= process_assert_insertions_for (ssa_name (i), loc);
  num_asserts++;
  free (loc);


Re: [Patch ARM] Add initial tuning for Cortex-A55 and Cortex-A75

2017-07-04 Thread Richard Earnshaw (lists)
On 04/07/17 12:28, James Greenhalgh wrote:
> 
> Much like my AArch64 patch a few weeks ago, this patch adds support
> for the ARM Cortex-A75 and Cortex-A55 processors through the
> -mcpu/-mtune values cortex-a55 and cortex-a75, and an
> ARM DynamIQ big.LITTLE configuration of these two processors through
> the -mcpu/-mtune value cortex-a75.cortex-a55
> 
> Both Cortex-A55 and Cortex-A75 support ARMv8-A with the ARM8.1-A and
> ARMv8.2-A extensions. This is reflected in the patch, -mcpu=cortex-a75 is
> treated as equivalent to passing -mtune=cortex-a75 -march=armv8.2-a+fp16
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2017-07-04  James Greenhalgh  
> 
>   * config/arm/arm-cpus.in (cortex-a55): New.
>   (cortex-a75): Likewise.
>   (cortex-a75.cortex-a55): Likewise.
>   * config/arm/driver-arm.c (arm_cpu_table): Add cortex-a55 and
>   cortex-a75.
>   * doc/invoke.texi (-mcpu): Document cortex-a55 and cortex-a75.
> 
>   * config/arm/arm-cpu-cdata.h: Regenerate.
>   * config/arm/arm-cpu-data.h: Regenerate.
>   * config/arm/arm-cpu.h: Regenerate.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-tune.md: Regenerate.
> 
> 
> 0001-Patch-ARM-Add-initial-tuning-for-Cortex-A55-and-Cort.patch
> 
> 
> diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h
> index 1cf1149..acd36d4 100644
> --- a/gcc/config/arm/arm-cpu-cdata.h
> +++ b/gcc/config/arm/arm-cpu-cdata.h
> @@ -388,6 +388,34 @@ static const cpu_arch_extension 
> cpu_opttab_cortexa73cortexa53[] = {
>{ NULL, false, false, {isa_nobit}}
>  };
>  
> +static const cpu_arch_extension cpu_opttab_cortexa55[] = {
> +  {
> +"crypto", false, false,
> +{ ISA_FP_ARMv8,ISA_CRYPTO, isa_nobit }
> +  },
> +  {
> +"nofp", true, false,
> +{ ISA_ALL_FP, isa_nobit }
> +  },
> +  { NULL, false, false, {isa_nobit}}
> +};
> +
> +static const cpu_arch_extension cpu_opttab_cortexa75[] = {
> +  {
> +"crypto", false, false,
> +{ ISA_FP_ARMv8,ISA_CRYPTO, isa_nobit }
> +  },
> +  { NULL, false, false, {isa_nobit}}
> +};
> +
> +static const cpu_arch_extension cpu_opttab_cortexa75cortexa55[] = {
> +  {
> +"crypto", false, false,
> +{ ISA_FP_ARMv8,ISA_CRYPTO, isa_nobit }
> +  },
> +  { NULL, false, false, {isa_nobit}}
> +};
> +
>  static const cpu_arch_extension cpu_opttab_cortexm33[] = {
>{
>  "nofp", true, false,
> @@ -1624,6 +1652,45 @@ const cpu_option all_cores[] =
>},
>{
>  {
> +  "cortex-a55",
> +  cpu_opttab_cortexa55,
> +  {
> +ISA_ARMv8_2a,
> +isa_bit_fp16,ISA_FP_ARMv8,ISA_NEON,
> +ISA_FP_ARMv8,ISA_NEON,
> +isa_nobit
> +  }
> +},
> +TARGET_ARCH_armv8_2_a
> +  },
> +  {
> +{
> +  "cortex-a75",
> +  cpu_opttab_cortexa75,
> +  {
> +ISA_ARMv8_2a,
> +isa_bit_fp16,ISA_FP_ARMv8,ISA_NEON,
> +ISA_FP_ARMv8,ISA_NEON,
> +isa_nobit
> +  }
> +},
> +TARGET_ARCH_armv8_2_a
> +  },
> +  {
> +{
> +  "cortex-a75.cortex-a55",
> +  cpu_opttab_cortexa75cortexa55,
> +  {
> +ISA_ARMv8_2a,
> +isa_bit_fp16,ISA_FP_ARMv8,ISA_NEON,
> +ISA_FP_ARMv8,ISA_NEON,
> +isa_nobit
> +  }
> +},
> +TARGET_ARCH_armv8_2_a
> +  },
> +  {
> +{
>"cortex-m23",
>NULL,
>{
> diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h
> index d42021d..1e05522 100644
> --- a/gcc/config/arm/arm-cpu-data.h
> +++ b/gcc/config/arm/arm-cpu-data.h
> @@ -552,6 +552,21 @@ static const cpu_tune all_tunes[] =
>  (TF_LDSCHED),
>  _cortex_a73_tune
>},
> +  { /* cortex-a55.  */
> +TARGET_CPU_cortexa53,
> +(TF_LDSCHED),
> +_cortex_a53_tune
> +  },
> +  { /* cortex-a75.  */
> +TARGET_CPU_cortexa57,
> +(TF_LDSCHED),
> +_cortex_a73_tune
> +  },
> +  { /* cortex-a75.cortex-a55.  */
> +TARGET_CPU_cortexa53,
> +(TF_LDSCHED),
> +_cortex_a73_tune
> +  },
>{ /* cortex-m23.  */
>  TARGET_CPU_cortexm23,
>  (TF_LDSCHED),
> diff --git a/gcc/config/arm/arm-cpu.h b/gcc/config/arm/arm-cpu.h
> index e27634c..8fda717 100644
> --- a/gcc/config/arm/arm-cpu.h
> +++ b/gcc/config/arm/arm-cpu.h
> @@ -128,6 +128,9 @@ enum processor_type
>TARGET_CPU_cortexa72cortexa53,
>TARGET_CPU_cortexa73cortexa35,
>TARGET_CPU_cortexa73cortexa53,
> +  TARGET_CPU_cortexa55,
> +  TARGET_CPU_cortexa75,
> +  TARGET_CPU_cortexa75cortexa55,
>TARGET_CPU_cortexm23,
>TARGET_CPU_cortexm33,
>TARGET_CPU_arm_none
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 3231740..946d543 100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -1206,7 +1206,6 @@ begin cpu xgene1
>   costs xgene1
>  end cpu xgene1
>  
> -
>  # V8 A-profile big.LITTLE implementations
>  begin cpu cortex-a57.cortex-a53
>   cname cortexa57cortexa53
> @@ -1249,6 +1248,40 @@ begin cpu cortex-a73.cortex-a53
>  end cpu 

Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom

2017-07-04 Thread Richard Biener
On Tue, Jul 4, 2017 at 2:06 PM, Bin.Cheng  wrote:
> On Tue, Jul 4, 2017 at 12:19 PM, Richard Biener
>  wrote:
>> On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng  wrote:
>>> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
>>>  wrote:
 On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng  wrote:
> Hi,
> For the moment, tree-predcom.c only supports 
> invariant/load-loads/store-loads chains.
> This patch generalizes dead store elimination (or store motion) across 
> loop iterations in
> predictive commoning pass by supporting store-store chain.  As comment in 
> the patch:
>
>Apart from predictive commoning on Load-Load and Store-Load chains, we
>also support Store-Store chains -- stores killed by other store can be
>eliminated.  Given below example:
>
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>  a[i+2] = 2;
>}
>
>It can be replaced with:
>
>  t0 = a[0];
>  t1 = a[1];
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>  t2 = 2;
>  t0 = t1;
>  t1 = t2;
>}
>  a[n] = t0;
>  a[n+1] = t1;
>
>If the loop runs more than 1 iterations, it can be further simplified 
> into:
>
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>}
>  a[n] = 2;
>  a[n+1] = 2;
>
>The interesting part is this can be viewed either as general store 
> motion
>or general dead store elimination in either intra/inter-iterations way.
>
> There are number of interesting facts about this enhancement:
> a) This patch supports dead store elimination for both across-iteration 
> case and single-iteration
>  case.  For the latter, it is dead store elimination.
> b) There are advantages supporting dead store elimination in predcom, for 
> example, it has
>  complete information about memory address.  On the contrary, DSE 
> pass can only handle
>  memory references with exact the same memory address expression.
> c) It's cheap to support store-stores chain in predcom based on existing 
> code.
> d) As commented, the enhancement can be viewed as either generalized dead 
> store elimination
>  or generalized store motion.  I prefer DSE here.
>
> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?

 Looks mostly ok.  I have a few questions though.

 +  /* Don't do store elimination if loop has multiple exit edges.  */
 +  bool eliminate_store_p = single_exit (loop) != NULL;

 handling this would be an enhancement?  IIRC LIM store-motion handles this
 just fine by emitting code on all exits.
>>> It is an enhancement with a little bit more complication.  We would
>>> need to setup/record finalizer memory references for different exit
>>> edges.  I added TODO description for this (and following one).  Is it
>>> okay to pick up this in the future?
>>
>> Yes.
>>

 @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec chains)
  {
if (chain->type == CT_INVARIANT)
 continue;
 +  /* Don't unroll when eliminating stores.  */
 +  else if (chain->type == CT_STORE_STORE)
 +   return 1;

 this is a hard exit value so we do not handle the case where another chain
 in the loop would want to unroll? (enhancement?)  I'd have expected to do
 the same as for CT_INVARIANT here.
>>> I didn't check what change is needed in case of unrolling.  I am not
>>> very sure if we should prefer unroll for *load chains or prefer not
>>> unroll for store-store chains, because unroll in general increases
>>> loop-carried register pressure for store-store chains rather than
>>> decreases register pressure for *load chains.
>>> I was also thinking if it's possible to restrict unrolling somehow in
>>> order to enable predcom at O2.  BTW, this is not common, it only
>>> happens once in spec2k6 with factor forced to 1.  So okay if as it is
>>> now?
>>
>> I think it is ok for now with a TODO added.  Please change the comment
>> to "we can't handle unrolling when eliminating stores" (it's not clear if we
>> can -- did you try?  maybe add a testcase that would ICE)
>>
>>>

 +  tree init = ref_at_iteration (dr, (int) 0 - i, );
 +  if (!chain->all_always_accessed && tree_could_trap_p (init))
 +   {
 + gimple_seq_discard (stmts);
 + return false;

 so this is the only place that remotely cares for not always performed 
 stores.
 But as-is the patch doesn't seem to avoid speculating stores and thus
 violates the C++ memory model, aka, introduces store-data-races?  The LIM
 store-motion 

Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Andreas Schwab
On Jul 04 2017, Wilco Dijkstra  wrote:

> checking whether compiler driver understands Ada... no

You need to fix that first.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [7/7] Pool alignment information for common bases

2017-07-04 Thread Richard Biener
On Tue, Jul 4, 2017 at 2:01 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Jul 3, 2017 at 9:49 AM, Richard Sandiford
>>  wrote:
>>> @@ -2070,8 +2143,7 @@ vect_find_same_alignment_drs (struct dat
>>>if (dra == drb)
>>>  return;
>>>
>>> -  if (!operand_equal_p (DR_BASE_OBJECT (dra), DR_BASE_OBJECT (drb),
>>> -   OEP_ADDRESS_OF)
>>> +  if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
>>
>> Why this change?  It's semantically weaker after your change.
>
> It's because the DR_BASE_OBJECT comes from the access_fn analysis
> while the DR_BASE_ADDRESS comes from the innermost_loop_behavior.
> I hadn't realised when adding the original code how different the
> two were, and since all the other parts are based on the
> innermost_loop_behavior, I think this check should be too.
> E.g. it doesn't really make sense to compare DR_INITs based
> on DR_BASE_OBJECT.

Ah ok, makes sense now.

> I guess it should have been a separate patch though.

No need.

>>>|| !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0)
>>>|| !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0))
>>>  return;
>>> @@ -2129,6 +2201,7 @@ vect_analyze_data_refs_alignment (loop_v
>>>vec datarefs = vinfo->datarefs;
>>>struct data_reference *dr;
>>>
>>> +  vect_record_base_alignments (vinfo);
>>>FOR_EACH_VEC_ELT (datarefs, i, dr)
>>>  {
>>>stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
>>> @@ -3327,7 +3400,8 @@ vect_analyze_data_refs (vec_info *vinfo,
>>> {
>>>   struct data_reference *newdr
>>> = create_data_ref (NULL, loop_containing_stmt (stmt),
>>> - DR_REF (dr), stmt, maybe_scatter ? false : true);
>>> +  DR_REF (dr), stmt, !maybe_scatter,
>>> +  DR_IS_CONDITIONAL_IN_STMT (dr));
>>>   gcc_assert (newdr != NULL && DR_REF (newdr));
>>>   if (DR_BASE_ADDRESS (newdr)
>>>   && DR_OFFSET (newdr)
>>> Index: gcc/tree-vect-slp.c
>>> ===
>>> --- gcc/tree-vect-slp.c 2017-07-03 08:20:56.404763323 +0100
>>> +++ gcc/tree-vect-slp.c 2017-07-03 08:42:51.149380545 +0100
>>> @@ -2358,6 +2358,7 @@ new_bb_vec_info (gimple_stmt_iterator re
>>>gimple_stmt_iterator gsi;
>>>
>>>res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
>>> +  new (>base_alignments) vec_base_alignments ();
>>
>> Ick.  I'd rather make this proper C++ and do
>>
>>res = new _bb_vec_info;
>>
>> and add a constructor to the vec_info base initializing the hashtable.
>> The above smells fishy.
>
> I knew I pushing my luck with that one.  I just didn't want to have to
> convert the current xcalloc of loop_vec_info into a long list of explicit
> zero initializers.  (OK, we have a lot of explicit zero assignments already,
> but certainly some fields rely on the calloc zeroing.)
>
>> Looks like vec<> are happy with .create () being called on a zeroed struct.
>>
>> Alternatively add .create () to hashtable/map.
>
> The difference is that vec<> is explicitly meant to be POD, whereas
> hash_map isn't (and I don't think we want it to be).

Ah, indeed.  So your patch makes *vec_info no longer POD (no problem
but then use new/delete and constructors).

> Ah well.  I'll do a separate pre-patch to C++-ify the structures.

Thanks.
Richard.

> Thanks,
> Richard


Re: [PATCH][AArch64] Fix ILP32 memory access

2017-07-04 Thread Wilco Dijkstra
Andreas Schwab wrote:
> @@ -5207,6 +5209,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>  
>    case MEM:
>  output_address (GET_MODE (x), XEXP (x, 0));
> +   gcc_assert (GET_MODE (XEXP (x, 0)) == Pmode);
>  break;
>  
>    case CONST:

> That breaks a lot of gnat tests in ilp32 mode:

That's interesting since it works fine in C and C++, and unless there are some
ADA specific MD patterns, it seems unlikely it could only affect ADA.

So how do you build the ADA backend? GCC can't even get past the configure as
it doesn't appear to understand GNAT is installed after finding it...
Is there some more magic setup required for ADA?

Wilco

checking for gnatbind... gnatbind
checking for gnatmake... gnatmake
checking whether compiler driver understands Ada... no
checking how to compare bootstrapped objects... cmp --ignore-initial=16 $$f1 
$$f2
checking for objdir... .libs
configure: WARNING: using in-tree isl, disabling version check
configure: error: GNAT is required to build ada

>gnat -v
GNAT 7.1.0
Copyright 1996-2017, Free Software Foundation, Inc.

List of available commands
...

Re: C PATCH to fix ICE-on-invalid with __atomic_load (PR c/81231)

2017-07-04 Thread Joseph Myers
On Tue, 4 Jul 2017, Marek Polacek wrote:

> This patch fixes an ICE-on-invalid with __atomic_*.  We should check
> that we're dealing with a complete type before we're accessing its
> TYPE_SIZE_UNIT.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom

2017-07-04 Thread Bin.Cheng
On Tue, Jul 4, 2017 at 12:19 PM, Richard Biener
 wrote:
> On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng  wrote:
>> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
>>  wrote:
>>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng  wrote:
 Hi,
 For the moment, tree-predcom.c only supports 
 invariant/load-loads/store-loads chains.
 This patch generalizes dead store elimination (or store motion) across 
 loop iterations in
 predictive commoning pass by supporting store-store chain.  As comment in 
 the patch:

Apart from predictive commoning on Load-Load and Store-Load chains, we
also support Store-Store chains -- stores killed by other store can be
eliminated.  Given below example:

  for (i = 0; i < n; i++)
{
  a[i] = 1;
  a[i+2] = 2;
}

It can be replaced with:

  t0 = a[0];
  t1 = a[1];
  for (i = 0; i < n; i++)
{
  a[i] = 1;
  t2 = 2;
  t0 = t1;
  t1 = t2;
}
  a[n] = t0;
  a[n+1] = t1;

If the loop runs more than 1 iterations, it can be further simplified 
 into:

  for (i = 0; i < n; i++)
{
  a[i] = 1;
}
  a[n] = 2;
  a[n+1] = 2;

The interesting part is this can be viewed either as general store 
 motion
or general dead store elimination in either intra/inter-iterations way.

 There are number of interesting facts about this enhancement:
 a) This patch supports dead store elimination for both across-iteration 
 case and single-iteration
  case.  For the latter, it is dead store elimination.
 b) There are advantages supporting dead store elimination in predcom, for 
 example, it has
  complete information about memory address.  On the contrary, DSE pass 
 can only handle
  memory references with exact the same memory address expression.
 c) It's cheap to support store-stores chain in predcom based on existing 
 code.
 d) As commented, the enhancement can be viewed as either generalized dead 
 store elimination
  or generalized store motion.  I prefer DSE here.

 Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>>
>>> Looks mostly ok.  I have a few questions though.
>>>
>>> +  /* Don't do store elimination if loop has multiple exit edges.  */
>>> +  bool eliminate_store_p = single_exit (loop) != NULL;
>>>
>>> handling this would be an enhancement?  IIRC LIM store-motion handles this
>>> just fine by emitting code on all exits.
>> It is an enhancement with a little bit more complication.  We would
>> need to setup/record finalizer memory references for different exit
>> edges.  I added TODO description for this (and following one).  Is it
>> okay to pick up this in the future?
>
> Yes.
>
>>>
>>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec chains)
>>>  {
>>>if (chain->type == CT_INVARIANT)
>>> continue;
>>> +  /* Don't unroll when eliminating stores.  */
>>> +  else if (chain->type == CT_STORE_STORE)
>>> +   return 1;
>>>
>>> this is a hard exit value so we do not handle the case where another chain
>>> in the loop would want to unroll? (enhancement?)  I'd have expected to do
>>> the same as for CT_INVARIANT here.
>> I didn't check what change is needed in case of unrolling.  I am not
>> very sure if we should prefer unroll for *load chains or prefer not
>> unroll for store-store chains, because unroll in general increases
>> loop-carried register pressure for store-store chains rather than
>> decreases register pressure for *load chains.
>> I was also thinking if it's possible to restrict unrolling somehow in
>> order to enable predcom at O2.  BTW, this is not common, it only
>> happens once in spec2k6 with factor forced to 1.  So okay if as it is
>> now?
>
> I think it is ok for now with a TODO added.  Please change the comment
> to "we can't handle unrolling when eliminating stores" (it's not clear if we
> can -- did you try?  maybe add a testcase that would ICE)
>
>>
>>>
>>> +  tree init = ref_at_iteration (dr, (int) 0 - i, );
>>> +  if (!chain->all_always_accessed && tree_could_trap_p (init))
>>> +   {
>>> + gimple_seq_discard (stmts);
>>> + return false;
>>>
>>> so this is the only place that remotely cares for not always performed 
>>> stores.
>>> But as-is the patch doesn't seem to avoid speculating stores and thus
>>> violates the C++ memory model, aka, introduces store-data-races?  The LIM
>>> store-motion code was fixed to avoid this by keeping track of whether a BB
>>> has executed to guard the stores done in the compensation code on the loop
>>> exit.
>>>
>>> That said, to "fix" this all && 

Re: [7/7] Pool alignment information for common bases

2017-07-04 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Jul 3, 2017 at 9:49 AM, Richard Sandiford
>  wrote:
>> @@ -2070,8 +2143,7 @@ vect_find_same_alignment_drs (struct dat
>>if (dra == drb)
>>  return;
>>
>> -  if (!operand_equal_p (DR_BASE_OBJECT (dra), DR_BASE_OBJECT (drb),
>> -   OEP_ADDRESS_OF)
>> +  if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0)
>
> Why this change?  It's semantically weaker after your change.

It's because the DR_BASE_OBJECT comes from the access_fn analysis
while the DR_BASE_ADDRESS comes from the innermost_loop_behavior.
I hadn't realised when adding the original code how different the
two were, and since all the other parts are based on the
innermost_loop_behavior, I think this check should be too.
E.g. it doesn't really make sense to compare DR_INITs based
on DR_BASE_OBJECT.

I guess it should have been a separate patch though.

>>|| !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0)
>>|| !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0))
>>  return;
>> @@ -2129,6 +2201,7 @@ vect_analyze_data_refs_alignment (loop_v
>>vec datarefs = vinfo->datarefs;
>>struct data_reference *dr;
>>
>> +  vect_record_base_alignments (vinfo);
>>FOR_EACH_VEC_ELT (datarefs, i, dr)
>>  {
>>stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
>> @@ -3327,7 +3400,8 @@ vect_analyze_data_refs (vec_info *vinfo,
>> {
>>   struct data_reference *newdr
>> = create_data_ref (NULL, loop_containing_stmt (stmt),
>> - DR_REF (dr), stmt, maybe_scatter ? false : true);
>> +  DR_REF (dr), stmt, !maybe_scatter,
>> +  DR_IS_CONDITIONAL_IN_STMT (dr));
>>   gcc_assert (newdr != NULL && DR_REF (newdr));
>>   if (DR_BASE_ADDRESS (newdr)
>>   && DR_OFFSET (newdr)
>> Index: gcc/tree-vect-slp.c
>> ===
>> --- gcc/tree-vect-slp.c 2017-07-03 08:20:56.404763323 +0100
>> +++ gcc/tree-vect-slp.c 2017-07-03 08:42:51.149380545 +0100
>> @@ -2358,6 +2358,7 @@ new_bb_vec_info (gimple_stmt_iterator re
>>gimple_stmt_iterator gsi;
>>
>>res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
>> +  new (>base_alignments) vec_base_alignments ();
>
> Ick.  I'd rather make this proper C++ and do
>
>res = new _bb_vec_info;
>
> and add a constructor to the vec_info base initializing the hashtable.
> The above smells fishy.

I knew I pushing my luck with that one.  I just didn't want to have to
convert the current xcalloc of loop_vec_info into a long list of explicit
zero initializers.  (OK, we have a lot of explicit zero assignments already,
but certainly some fields rely on the calloc zeroing.)

> Looks like vec<> are happy with .create () being called on a zeroed struct.
>
> Alternatively add .create () to hashtable/map.

The difference is that vec<> is explicitly meant to be POD, whereas
hash_map isn't (and I don't think we want it to be).

Ah well.  I'll do a separate pre-patch to C++-ify the structures.

Thanks,
Richard


Re: [PATCH] Fix -fcompare-debug issues caused by recent VRP assert expr sorting changes (PR debug/81278)

2017-07-04 Thread Richard Biener
On Tue, 4 Jul 2017, Jakub Jelinek wrote:

> On Tue, Jul 04, 2017 at 01:46:25PM +0200, Richard Biener wrote:
> > > 2017-07-04  Jakub Jelinek  
> > > 
> > >   PR debug/81278
> > >   * tree-vrp.c (compare_assert_loc): Only test if a->e is NULL,
> > >   !a->e == !b->e has been verified already.  Use e == NULL or
> > >   e != NULL instead of e or ! e tests.
> > >   (compare_assert_loc_stable): New function.
> > >   (process_assert_insertions): Sort using compare_assert_loc_stable
> > >   before calling process_assert_insertions_for in a loop.  Use break
> > >   instead of continue once seen NULL pointer.
> > > 
> > > --- gcc/tree-vrp.c.jj 2017-07-03 19:03:23.817824263 +0200
> > > +++ gcc/tree-vrp.c2017-07-04 10:30:36.403204757 +0200
> > > @@ -6400,13 +6400,13 @@ compare_assert_loc (const void *pa, cons
> > >  {
> > >assert_locus * const a = *(assert_locus * const *)pa;
> > >assert_locus * const b = *(assert_locus * const *)pb;
> > > -  if (! a->e && b->e)
> > > +  if (a->e == NULL && b->e != NULL)
> > >  return 1;
> > > -  else if (a->e && ! b->e)
> > > +  else if (a->e != NULL && b->e == NULL)
> > >  return -1;
> > >  
> > >/* Sort after destination index.  */
> > > -  if (! a->e && ! b->e)
> > > +  if (a->e == NULL)
> > >  ;
> > >else if (a->e->dest->index > b->e->dest->index)
> > >  return 1;
> > 
> > so this will now ICE if b->e is NULL, did you forget the && b->e == NULL
> > above?
> 
> That was intentional.  If a->e != NULL, then we know that b->e != NULL,
> because we have
>   else if (a->e != NULL && b->e == NULL)
> return -1;
> earlier.  Similarly, if a->e == NULL, then we know that b-> == NULL, because
> we have:
>   if (a->e == NULL && b->e != NULL)
> return 1;
> earlier.

Ah, ok.  Twisty ;)  I suppose jump threading will have eliminated
the extra test.

> > > @@ -6506,11 +6547,12 @@ process_assert_insertions (void)
> > >   }
> > >   }
> > >  
> > > +  asserts.qsort (compare_assert_loc_stable);
> > 
> > please add a comment here why we re-sort.
> 
> Ok, will do.
> 
> > >for (unsigned j = 0; j < asserts.length (); ++j)
> > >   {
> > > loc = asserts[j];
> > > if (! loc)
> > > - continue;
> > > + break;
> > > update_edges_p |= process_assert_insertions_for (ssa_name (i), loc);
> > > num_asserts++;
> > > free (loc);
> > 
> > Otherwise looks ok to me.  I wonder if we should merge the two
> > sorting functions and change behavior with a global var or a
> > template parameter instead (to reduce source duplication).  Does
> > 
> > vec.qsort (function_template);
> > 
> > work?
> 
> Let me try that.

Thanks,
Richard.


Re: [PATCH] Fix -fcompare-debug issues caused by recent VRP assert expr sorting changes (PR debug/81278)

2017-07-04 Thread Jakub Jelinek
On Tue, Jul 04, 2017 at 01:46:25PM +0200, Richard Biener wrote:
> > 2017-07-04  Jakub Jelinek  
> > 
> > PR debug/81278
> > * tree-vrp.c (compare_assert_loc): Only test if a->e is NULL,
> > !a->e == !b->e has been verified already.  Use e == NULL or
> > e != NULL instead of e or ! e tests.
> > (compare_assert_loc_stable): New function.
> > (process_assert_insertions): Sort using compare_assert_loc_stable
> > before calling process_assert_insertions_for in a loop.  Use break
> > instead of continue once seen NULL pointer.
> > 
> > --- gcc/tree-vrp.c.jj   2017-07-03 19:03:23.817824263 +0200
> > +++ gcc/tree-vrp.c  2017-07-04 10:30:36.403204757 +0200
> > @@ -6400,13 +6400,13 @@ compare_assert_loc (const void *pa, cons
> >  {
> >assert_locus * const a = *(assert_locus * const *)pa;
> >assert_locus * const b = *(assert_locus * const *)pb;
> > -  if (! a->e && b->e)
> > +  if (a->e == NULL && b->e != NULL)
> >  return 1;
> > -  else if (a->e && ! b->e)
> > +  else if (a->e != NULL && b->e == NULL)
> >  return -1;
> >  
> >/* Sort after destination index.  */
> > -  if (! a->e && ! b->e)
> > +  if (a->e == NULL)
> >  ;
> >else if (a->e->dest->index > b->e->dest->index)
> >  return 1;
> 
> so this will now ICE if b->e is NULL, did you forget the && b->e == NULL
> above?

That was intentional.  If a->e != NULL, then we know that b->e != NULL,
because we have
  else if (a->e != NULL && b->e == NULL)
return -1;
earlier.  Similarly, if a->e == NULL, then we know that b-> == NULL, because
we have:
  if (a->e == NULL && b->e != NULL)
return 1;
earlier.

> > @@ -6506,11 +6547,12 @@ process_assert_insertions (void)
> > }
> > }
> >  
> > +  asserts.qsort (compare_assert_loc_stable);
> 
> please add a comment here why we re-sort.

Ok, will do.

> >for (unsigned j = 0; j < asserts.length (); ++j)
> > {
> >   loc = asserts[j];
> >   if (! loc)
> > -   continue;
> > +   break;
> >   update_edges_p |= process_assert_insertions_for (ssa_name (i), loc);
> >   num_asserts++;
> >   free (loc);
> 
> Otherwise looks ok to me.  I wonder if we should merge the two
> sorting functions and change behavior with a global var or a
> template parameter instead (to reduce source duplication).  Does
> 
> vec.qsort (function_template);
> 
> work?

Let me try that.

Jakub


Re: [PATCH] Fix -fcompare-debug issues caused by recent VRP assert expr sorting changes (PR debug/81278)

2017-07-04 Thread Richard Biener
On Tue, 4 Jul 2017, Jakub Jelinek wrote:

> Hi!
> 
> The compare_assert_loc added recently to sort assert exprs that could
> operand_equal_p the expressions/values in there unfortunately broke
> -fcompare-debug.  The problem is that DECL_UIDs don't have to be the same
> between -g and -g0, and thus what iterative_hash_expr returns might not be
> the same.  For the removal of duplicate assert exprs or moving assert expr
> to the dominator if present on all edges this doesn't matter, because
> all we care about there are the adjacent vector entries and code generation
> is not affected by the traversal order, but when we actually
> process_assert_insertions_for afterwards, the order matters a lot for
> code generation (different SSA_NAME_VERSION on the ASSERT_EXPR lhs will
> mean different order of release_ssa_name afterwards and might result
> in different SSA_NAME versions created later on in other passes and that
> in many cases affects code generation.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?  No testcase, as the reduced testcase doesn't reproduce it (at least
> for me) and the original is way too large).
> 
> 2017-07-04  Jakub Jelinek  
> 
>   PR debug/81278
>   * tree-vrp.c (compare_assert_loc): Only test if a->e is NULL,
>   !a->e == !b->e has been verified already.  Use e == NULL or
>   e != NULL instead of e or ! e tests.
>   (compare_assert_loc_stable): New function.
>   (process_assert_insertions): Sort using compare_assert_loc_stable
>   before calling process_assert_insertions_for in a loop.  Use break
>   instead of continue once seen NULL pointer.
> 
> --- gcc/tree-vrp.c.jj 2017-07-03 19:03:23.817824263 +0200
> +++ gcc/tree-vrp.c2017-07-04 10:30:36.403204757 +0200
> @@ -6400,13 +6400,13 @@ compare_assert_loc (const void *pa, cons
>  {
>assert_locus * const a = *(assert_locus * const *)pa;
>assert_locus * const b = *(assert_locus * const *)pb;
> -  if (! a->e && b->e)
> +  if (a->e == NULL && b->e != NULL)
>  return 1;
> -  else if (a->e && ! b->e)
> +  else if (a->e != NULL && b->e == NULL)
>  return -1;
>  
>/* Sort after destination index.  */
> -  if (! a->e && ! b->e)
> +  if (a->e == NULL)
>  ;
>else if (a->e->dest->index > b->e->dest->index)
>  return 1;

so this will now ICE if b->e is NULL, did you forget the && b->e == NULL
above?

> @@ -6423,12 +6423,53 @@ compare_assert_loc (const void *pa, cons
>hashval_t ha = iterative_hash_expr (a->expr, iterative_hash_expr (a->val, 
> 0));
>hashval_t hb = iterative_hash_expr (b->expr, iterative_hash_expr (b->val, 
> 0));
>if (ha == hb)
> -return (a->e && b->e
> +return (a->e != NULL
>   ? a->e->src->index - b->e->src->index
>   : a->bb->index - b->bb->index);
>return ha - hb;
>  }

Likewise.

> +/* Qsort helper for sorting assert locations.  Like the above, but
> +   don't use expression hashes for the sorting to make the sorting
> +   stable for -fcompare-debug.  Some assert_locus pointers could
> +   be NULL, sort those last.  */
> +
> +static int
> +compare_assert_loc_stable (const void *pa, const void *pb)
> +{
> +  assert_locus * const a = *(assert_locus * const *)pa;
> +  assert_locus * const b = *(assert_locus * const *)pb;
> +
> +  if (a == NULL)
> +return b != NULL;
> +  else if (b == NULL)
> +return -1;
> +
> +  if (a->e == NULL && b->e != NULL)
> +return 1;
> +  else if (a->e != NULL && b->e == NULL)
> +return -1;
> +
> +  /* Sort after destination index.  */
> +  if (a->e == NULL)

See above.  The changelog says it has been verified already but
I can't find where.  A comment in the code is warranted IMHO.

> +;
> +  else if (a->e->dest->index > b->e->dest->index)
> +return 1;
> +  else if (a->e->dest->index < b->e->dest->index)
> +return -1;
> +
> +  /* Sort after comp_code.  */
> +  if (a->comp_code > b->comp_code)
> +return 1;
> +  else if (a->comp_code < b->comp_code)
> +return -1;
> +
> +  /* Break the tie using source/bb index.  */
> +  return (a->e != NULL
> +   ? a->e->src->index - b->e->src->index
> +   : a->bb->index - b->bb->index);
> +}
> +
>  /* Process all the insertions registered for every name N_i registered
> in NEED_ASSERT_FOR.  The list of assertions to be inserted are
> found in ASSERTS_FOR[i].  */
> @@ -6506,11 +6547,12 @@ process_assert_insertions (void)
>   }
>   }
>  
> +  asserts.qsort (compare_assert_loc_stable);

please add a comment here why we re-sort.

>for (unsigned j = 0; j < asserts.length (); ++j)
>   {
> loc = asserts[j];
> if (! loc)
> - continue;
> + break;
> update_edges_p |= process_assert_insertions_for (ssa_name (i), loc);
> num_asserts++;
> free (loc);

Otherwise looks ok to me.  I wonder if we should merge the two
sorting functions and change behavior with a global var or a
template 

Re: [PATCH] Fix bootstrap with brig FE

2017-07-04 Thread Richard Biener
On Tue, 4 Jul 2017, Jakub Jelinek wrote:

> Hi!
> 
> Seems tree-cfg.h now requires profile-count.h (or some header that includes
> it like basic-block.h) to be included first (the flattened headers without
> including their dependencies and without aggregate headers are really terrible
> idea), so right now bootstrap fails in brig FE.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Obvious.

Yes.
Richard.

> 2017-07-04  Jakub Jelinek  
> 
>   * brigfrontend/brig-function.cc: Include profile-count.h.
>   * brigfrontend/brig-to-generic.cc: Likewise.
> 
> --- gcc/brig/brigfrontend/brig-function.cc.jj 2017-01-26 09:15:50.0 
> +0100
> +++ gcc/brig/brigfrontend/brig-function.cc2017-07-04 11:07:10.899930814 
> +0200
> @@ -38,6 +38,7 @@
>  #include "phsa.h"
>  #include "tree-pretty-print.h"
>  #include "dumpfile.h"
> +#include "profile-count.h"
>  #include "tree-cfg.h"
>  #include "errors.h"
>  #include "function.h"
> --- gcc/brig/brigfrontend/brig-to-generic.cc.jj   2017-02-01 
> 16:25:19.0 +0100
> +++ gcc/brig/brigfrontend/brig-to-generic.cc  2017-07-04 11:06:56.413106803 
> +0200
> @@ -45,6 +45,7 @@
>  #include "phsa.h"
>  #include "tree-pretty-print.h"
>  #include "dumpfile.h"
> +#include "profile-count.h"
>  #include "tree-cfg.h"
>  #include "errors.h"
>  #include "fold-const.h"
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


C PATCH to fix ICE-on-invalid with __atomic_load (PR c/81231)

2017-07-04 Thread Marek Polacek
This patch fixes an ICE-on-invalid with __atomic_*.  We should check
that we're dealing with a complete type before we're accessing its
TYPE_SIZE_UNIT.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-07-04  Marek Polacek  

PR c/81231
* c-common.c (sync_resolve_size): Give error for pointers to incomplete
types.

* gcc.dg/atomic-pr81231.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index f6a9d05..1b6ac8c 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -6485,6 +6485,9 @@ sync_resolve_size (tree function, vec 
*params, bool fetch)
   if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type))
 goto incompatible;
 
+  if (!COMPLETE_TYPE_P (type))
+goto incompatible;
+
   if (fetch && TREE_CODE (type) == BOOLEAN_TYPE)
 goto incompatible;
 
diff --git gcc/testsuite/gcc.dg/atomic-pr81231.c 
gcc/testsuite/gcc.dg/atomic-pr81231.c
index e69de29..304e428 100644
--- gcc/testsuite/gcc.dg/atomic-pr81231.c
+++ gcc/testsuite/gcc.dg/atomic-pr81231.c
@@ -0,0 +1,12 @@
+/* PR c/81231 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+enum E;
+
+void
+foo (void)
+{
+  __atomic_load_n ((enum E *) 0, 0); /* { dg-error "incompatible" } */
+  __atomic_load_n ((enum X *) 0, 0); /* { dg-error "incompatible" } */
+}

Marek


Re: PR 81292: ICE on related strlens after r249880

2017-07-04 Thread Jakub Jelinek
On Tue, Jul 04, 2017 at 12:14:48PM +0100, Richard Sandiford wrote:
> r249880 installed the result of a strlen in a strinfo if the strinfo
> wasn't previously a full string.  But as Jakub says in the PR comments,
> we can't just do that in isolation, because there are no vdefs on the
> call that would invalidate any related strinfos.
> 
> This patch updates the related strinfos if the adjustment is simple and
> invalidates them otherwise.  As elsewhere, we treat adjustments of the
> form strlen +/- INTEGER_CST as simple but anything else as too complex.
> 
> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
> 
> Sorry for the breakage.
> 
> Richard
> 
> 
> gcc/
>   PR tree-optimization/81292
>   * tree-ssa-strlen.c (handle_builtin_strlen): When setting
>   full_string_p, also call adjust_related_strinfos if the adjustment
>   is simple, otherwise invalidate related strinfos.
> 
> gcc/testsuite/
>   PR tree-optimization/81292
>   * gcc.dg/pr81292-1.c: New test.
>   * gcc.dg/pr81292-2.c: Likewise.

Ok, thanks.

Jakub


[PATCH] Fix bootstrap with brig FE

2017-07-04 Thread Jakub Jelinek
Hi!

Seems tree-cfg.h now requires profile-count.h (or some header that includes
it like basic-block.h) to be included first (the flattened headers without
including their dependencies and without aggregate headers are really terrible
idea), so right now bootstrap fails in brig FE.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-07-04  Jakub Jelinek  

* brigfrontend/brig-function.cc: Include profile-count.h.
* brigfrontend/brig-to-generic.cc: Likewise.

--- gcc/brig/brigfrontend/brig-function.cc.jj   2017-01-26 09:15:50.0 
+0100
+++ gcc/brig/brigfrontend/brig-function.cc  2017-07-04 11:07:10.899930814 
+0200
@@ -38,6 +38,7 @@
 #include "phsa.h"
 #include "tree-pretty-print.h"
 #include "dumpfile.h"
+#include "profile-count.h"
 #include "tree-cfg.h"
 #include "errors.h"
 #include "function.h"
--- gcc/brig/brigfrontend/brig-to-generic.cc.jj 2017-02-01 16:25:19.0 
+0100
+++ gcc/brig/brigfrontend/brig-to-generic.cc2017-07-04 11:06:56.413106803 
+0200
@@ -45,6 +45,7 @@
 #include "phsa.h"
 #include "tree-pretty-print.h"
 #include "dumpfile.h"
+#include "profile-count.h"
 #include "tree-cfg.h"
 #include "errors.h"
 #include "fold-const.h"

Jakub


[RFC] Add -fmap-abort-to-trap

2017-07-04 Thread Tom de Vries

Hi,

In gcc we map __builtin_trap to abort if there's no trap insn:
...
Built-in Function: void __builtin_trap (void)

 This function causes the program to exit abnormally. GCC 
implements this function by using a target-dependent mechanism (such as 
intentionally executing an illegal instruction) or by calling abort. The 
mechanism used may vary from release to release so you should not rely 
on any particular implementation.

...

This patch adds an option -fmap-abort-to-trap that does the opposite: it 
maps a call to either abort or __builtin_abort to __builtin_trap.


[ The patch is incomplete: it lacks documentation of:
- the option fmap-abort-to-trap, and
- the effective target trap_insn
]

The patch is tested by:
- running the test-cases in the patch on x86_64
- running the libgomp/testsuite/libgomp.oacc-c-c++-common/abort-*
  testcases on x86_64 with nvptx accelerator

I'm using this patch atm for the purpose of debugging libgomp openacc 
testcases on nvptx. There are many test-cases where abort is the only 
function called in offloaded code, and this option changes the offload 
functions from non-leaf into leaf.


I'm not sure if there's a general usefulness to this patch, but I'm 
posting it here in case anybody finds it useful.


Thanks,
- Tom
Add -fmap-abort-to-trap

2017-06-21  Tom de Vries  

	* builtins.c (expand_builtin): Map abort and __builtin_abort to
	__builtin_trap for -fmap-abort-to-trap.
	* common.opt (fmap-abort-to-trap): New option.
	* config/nvptx/nvptx.c (nvptx_option_override): Set fmap-abort-to-trap
	by default.

	* lib/target-supports.exp (check_effective_target_trap_insn): New proc.
	* gcc.dg/fmap-abort-to-trap.inc: New include file.
	* gcc.dg/fmap-abort-to-trap.c: New test.
	* gcc.dg/fmap-abort-to-trap-2.c: New test.
	* gcc.dg/fmap-abort-to-trap-3.c: New test.
	* gcc.dg/fmap-abort-to-trap-4.c: New test.

---
 gcc/builtins.c  |  6 ++
 gcc/common.opt  |  4 
 gcc/config/nvptx/nvptx.c|  3 +++
 gcc/testsuite/gcc.dg/fmap-abort-to-trap-2.c |  7 +++
 gcc/testsuite/gcc.dg/fmap-abort-to-trap-3.c |  6 ++
 gcc/testsuite/gcc.dg/fmap-abort-to-trap-4.c |  6 ++
 gcc/testsuite/gcc.dg/fmap-abort-to-trap.c   |  7 +++
 gcc/testsuite/gcc.dg/fmap-abort-to-trap.inc | 13 +
 gcc/testsuite/lib/target-supports.exp   |  8 
 9 files changed, 60 insertions(+)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index ce657bf..8db29fc 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6513,6 +6513,8 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
  set of builtins.  */
   if (!optimize
   && !called_as_built_in (fndecl)
+  && (fcode != BUILT_IN_ABORT
+	  || !flag_map_abort_to_trap)
   && fcode != BUILT_IN_FORK
   && fcode != BUILT_IN_EXECL
   && fcode != BUILT_IN_EXECV
@@ -6996,6 +6998,10 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	}
   break;
 
+case BUILT_IN_ABORT:
+  if (!flag_map_abort_to_trap)
+	break;
+  /* FALLTHRU.  */
 case BUILT_IN_TRAP:
   expand_builtin_trap ();
   return const0_rtx;
diff --git a/gcc/common.opt b/gcc/common.opt
index 4f9c3dc..77143fa 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1793,6 +1793,10 @@ flto-report-wpa
 Common Report Var(flag_lto_report_wpa) Init(0)
 Report various link-time optimization statistics for WPA only.
 
+fmap-abort-to-trap
+Common Report Var(flag_map_abort_to_trap) Init(0)
+Map abort and __builtin_abort to __builtin_trap
+
 fmath-errno
 Common Report Var(flag_errno_math) Init(1) Optimization SetByCombined
 Set errno after built-in math functions.
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index daeec27..4db23b3 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -181,6 +181,9 @@ nvptx_option_override (void)
   /* Assumes that it will see only hard registers.  */
   flag_var_tracking = 0;
 
+  if (!global_options_set.x_flag_map_abort_to_trap)
+flag_map_abort_to_trap = 1;
+
   if (nvptx_optimize < 0)
 nvptx_optimize = optimize > 0;
 
diff --git a/gcc/testsuite/gcc.dg/fmap-abort-to-trap-2.c b/gcc/testsuite/gcc.dg/fmap-abort-to-trap-2.c
new file mode 100644
index 000..d678d5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fmap-abort-to-trap-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fmap-abort-to-trap -fdump-rtl-expand" } */
+
+#include "fmap-abort-to-trap.inc"
+
+/* { dg-final { scan-rtl-dump-not "call_insn" "expand" { target trap_insn } } } */
+/* { dg-final { scan-rtl-dump-times "call_insn" 2 "expand" { target { ! trap_insn } } } } */
diff --git a/gcc/testsuite/gcc.dg/fmap-abort-to-trap-3.c b/gcc/testsuite/gcc.dg/fmap-abort-to-trap-3.c
new file mode 100644
index 000..22e8889
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fmap-abort-to-trap-3.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 

[PATCH] Fix -fcompare-debug issues caused by recent VRP assert expr sorting changes (PR debug/81278)

2017-07-04 Thread Jakub Jelinek
Hi!

The compare_assert_loc added recently to sort assert exprs that could
operand_equal_p the expressions/values in there unfortunately broke
-fcompare-debug.  The problem is that DECL_UIDs don't have to be the same
between -g and -g0, and thus what iterative_hash_expr returns might not be
the same.  For the removal of duplicate assert exprs or moving assert expr
to the dominator if present on all edges this doesn't matter, because
all we care about there are the adjacent vector entries and code generation
is not affected by the traversal order, but when we actually
process_assert_insertions_for afterwards, the order matters a lot for
code generation (different SSA_NAME_VERSION on the ASSERT_EXPR lhs will
mean different order of release_ssa_name afterwards and might result
in different SSA_NAME versions created later on in other passes and that
in many cases affects code generation.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?  No testcase, as the reduced testcase doesn't reproduce it (at least
for me) and the original is way too large).

2017-07-04  Jakub Jelinek  

PR debug/81278
* tree-vrp.c (compare_assert_loc): Only test if a->e is NULL,
!a->e == !b->e has been verified already.  Use e == NULL or
e != NULL instead of e or ! e tests.
(compare_assert_loc_stable): New function.
(process_assert_insertions): Sort using compare_assert_loc_stable
before calling process_assert_insertions_for in a loop.  Use break
instead of continue once seen NULL pointer.

--- gcc/tree-vrp.c.jj   2017-07-03 19:03:23.817824263 +0200
+++ gcc/tree-vrp.c  2017-07-04 10:30:36.403204757 +0200
@@ -6400,13 +6400,13 @@ compare_assert_loc (const void *pa, cons
 {
   assert_locus * const a = *(assert_locus * const *)pa;
   assert_locus * const b = *(assert_locus * const *)pb;
-  if (! a->e && b->e)
+  if (a->e == NULL && b->e != NULL)
 return 1;
-  else if (a->e && ! b->e)
+  else if (a->e != NULL && b->e == NULL)
 return -1;
 
   /* Sort after destination index.  */
-  if (! a->e && ! b->e)
+  if (a->e == NULL)
 ;
   else if (a->e->dest->index > b->e->dest->index)
 return 1;
@@ -6423,12 +6423,53 @@ compare_assert_loc (const void *pa, cons
   hashval_t ha = iterative_hash_expr (a->expr, iterative_hash_expr (a->val, 
0));
   hashval_t hb = iterative_hash_expr (b->expr, iterative_hash_expr (b->val, 
0));
   if (ha == hb)
-return (a->e && b->e
+return (a->e != NULL
? a->e->src->index - b->e->src->index
: a->bb->index - b->bb->index);
   return ha - hb;
 }
 
+/* Qsort helper for sorting assert locations.  Like the above, but
+   don't use expression hashes for the sorting to make the sorting
+   stable for -fcompare-debug.  Some assert_locus pointers could
+   be NULL, sort those last.  */
+
+static int
+compare_assert_loc_stable (const void *pa, const void *pb)
+{
+  assert_locus * const a = *(assert_locus * const *)pa;
+  assert_locus * const b = *(assert_locus * const *)pb;
+
+  if (a == NULL)
+return b != NULL;
+  else if (b == NULL)
+return -1;
+
+  if (a->e == NULL && b->e != NULL)
+return 1;
+  else if (a->e != NULL && b->e == NULL)
+return -1;
+
+  /* Sort after destination index.  */
+  if (a->e == NULL)
+;
+  else if (a->e->dest->index > b->e->dest->index)
+return 1;
+  else if (a->e->dest->index < b->e->dest->index)
+return -1;
+
+  /* Sort after comp_code.  */
+  if (a->comp_code > b->comp_code)
+return 1;
+  else if (a->comp_code < b->comp_code)
+return -1;
+
+  /* Break the tie using source/bb index.  */
+  return (a->e != NULL
+ ? a->e->src->index - b->e->src->index
+ : a->bb->index - b->bb->index);
+}
+
 /* Process all the insertions registered for every name N_i registered
in NEED_ASSERT_FOR.  The list of assertions to be inserted are
found in ASSERTS_FOR[i].  */
@@ -6506,11 +6547,12 @@ process_assert_insertions (void)
}
}
 
+  asserts.qsort (compare_assert_loc_stable);
   for (unsigned j = 0; j < asserts.length (); ++j)
{
  loc = asserts[j];
  if (! loc)
-   continue;
+   break;
  update_edges_p |= process_assert_insertions_for (ssa_name (i), loc);
  num_asserts++;
  free (loc);

Jakub


Re: [7/7] Pool alignment information for common bases

2017-07-04 Thread Richard Biener
On Mon, Jul 3, 2017 at 9:49 AM, Richard Sandiford
 wrote:
> This patch is a follow-on to the fix for PR81136.  The testcase for that
> PR shows that we can (correctly) calculate different base alignments
> for two data_references but still tell that their misalignments wrt the
> vector size are equal.  This is because we calculate the base alignments
> for each dr individually, without looking at the other drs, and in
> general the alignment we calculate is only guaranteed if the dr's DR_REF
> actually occurs.
>
> This is working as designed, but it does expose a missed opportunity.
> We know that if a vectorised loop is reached, all statements in that
> loop execute at least once, so it should be safe to pool the alignment
> information for all the statements we're vectorising.  The only catch is
> that DR_REFs for masked loads and stores only occur if the mask value is
> nonzero.  For example, in:
>
>   struct s __attribute__((aligned(32))) {
> int misaligner;
> int array[N];
>   };
>
>   int *ptr;
>   for (int i = 0; i < n; ++i)
> ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0;
>
> we can only guarantee that ptr points to a "struct s" if at least
> one c[i] is true.
>
> This patch adds a DR_IS_CONDITIONAL_IN_STMT flag to record whether
> the DR_REF is guaranteed to occur every time that the statement
> executes to completion.  It then pools the alignment information
> for references that aren't conditional in this sense.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2017-07-03  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h: Include tree-hash-traits.h.
> (vec_base_alignments): New typedef.
> (vec_info): Add a base_alignments field.
> (vect_record_base_alignments: Declare.
> * tree-data-ref.h (data_reference): Add an is_conditional_in_stmt
> field.
> (DR_IS_CONDITIONAL_IN_STMT): New macro.
> (create_data_ref): Add an is_conditional_in_stmt argument.
> * tree-data-ref.c (create_data_ref): Likewise.  Use it to initialize
> the is_conditional_in_stmt field.
> (data_ref_loc): Add an is_conditional_in_stmt field.
> (get_references_in_stmt): Set the is_conditional_in_stmt field.
> (find_data_references_in_stmt): Update call to create_data_ref.
> (graphite_find_data_references_in_stmt): Likewise.
> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise.
> * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise.
> (vect_record_base_alignment): New function.
> (vect_record_base_alignments): Likewise.
> (vect_compute_data_ref_alignment): Adjust base_addr and aligned_to
> for nested statements even if we fail to compute a misalignment.
> Use pooled base alignments for unconditional references.
> (vect_find_same_alignment_drs): Compare base addresses instead
> of base objects.
> (vect_compute_data_ref_alignment): Call vect_record_base_alignments.
> * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise.
> (new_bb_vec_info): Initialize base_alignments.
> * tree-vect-loop.c (new_loop_vec_info): Likewise.
> * tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments.
>
> gcc/testsuite/
> * gcc.dg/vect/pr81136.c: Add scan test.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-07-03 08:42:50.186422191 +0100
> +++ gcc/tree-vectorizer.h   2017-07-03 08:45:24.571165851 +0100
> @@ -22,6 +22,7 @@ Software Foundation; either version 3, o
>  #define GCC_TREE_VECTORIZER_H
>
>  #include "tree-data-ref.h"
> +#include "tree-hash-traits.h"
>  #include "target.h"
>
>  /* Used for naming of new temporaries.  */
> @@ -84,6 +85,11 @@ struct stmt_info_for_cost {
>
>  typedef vec stmt_vector_for_cost;
>
> +/* Maps base addresses to an innermost_loop_behavior that gives the maximum
> +   known alignment for that base.  */
> +typedef hash_map +innermost_loop_behavior *> vec_base_alignments;
> +
>  /
>SLP
>   /
> @@ -156,6 +162,10 @@ struct vec_info {
>/* All data references.  */
>vec datarefs;
>
> +  /* Maps base addresses to an innermost_loop_behavior that gives the maximum
> + known alignment for that base.  */
> +  vec_base_alignments base_alignments;
> +
>/* All data dependences.  */
>vec ddrs;
>
> @@ -1132,6 +1142,7 @@ extern bool vect_prune_runtime_alias_tes
>  extern bool vect_check_gather_scatter (gimple *, loop_vec_info,
>gather_scatter_info *);
>  extern bool vect_analyze_data_refs (vec_info *, int *);
> +extern void 

[Patch ARM] Add initial tuning for Cortex-A55 and Cortex-A75

2017-07-04 Thread James Greenhalgh

Much like my AArch64 patch a few weeks ago, this patch adds support
for the ARM Cortex-A75 and Cortex-A55 processors through the
-mcpu/-mtune values cortex-a55 and cortex-a75, and an
ARM DynamIQ big.LITTLE configuration of these two processors through
the -mcpu/-mtune value cortex-a75.cortex-a55

Both Cortex-A55 and Cortex-A75 support ARMv8-A with the ARM8.1-A and
ARMv8.2-A extensions. This is reflected in the patch, -mcpu=cortex-a75 is
treated as equivalent to passing -mtune=cortex-a75 -march=armv8.2-a+fp16

OK?

Thanks,
James

---
2017-07-04  James Greenhalgh  

* config/arm/arm-cpus.in (cortex-a55): New.
(cortex-a75): Likewise.
(cortex-a75.cortex-a55): Likewise.
* config/arm/driver-arm.c (arm_cpu_table): Add cortex-a55 and
cortex-a75.
* doc/invoke.texi (-mcpu): Document cortex-a55 and cortex-a75.

* config/arm/arm-cpu-cdata.h: Regenerate.
* config/arm/arm-cpu-data.h: Regenerate.
* config/arm/arm-cpu.h: Regenerate.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.

diff --git a/gcc/config/arm/arm-cpu-cdata.h b/gcc/config/arm/arm-cpu-cdata.h
index 1cf1149..acd36d4 100644
--- a/gcc/config/arm/arm-cpu-cdata.h
+++ b/gcc/config/arm/arm-cpu-cdata.h
@@ -388,6 +388,34 @@ static const cpu_arch_extension cpu_opttab_cortexa73cortexa53[] = {
   { NULL, false, false, {isa_nobit}}
 };
 
+static const cpu_arch_extension cpu_opttab_cortexa55[] = {
+  {
+"crypto", false, false,
+{ ISA_FP_ARMv8,ISA_CRYPTO, isa_nobit }
+  },
+  {
+"nofp", true, false,
+{ ISA_ALL_FP, isa_nobit }
+  },
+  { NULL, false, false, {isa_nobit}}
+};
+
+static const cpu_arch_extension cpu_opttab_cortexa75[] = {
+  {
+"crypto", false, false,
+{ ISA_FP_ARMv8,ISA_CRYPTO, isa_nobit }
+  },
+  { NULL, false, false, {isa_nobit}}
+};
+
+static const cpu_arch_extension cpu_opttab_cortexa75cortexa55[] = {
+  {
+"crypto", false, false,
+{ ISA_FP_ARMv8,ISA_CRYPTO, isa_nobit }
+  },
+  { NULL, false, false, {isa_nobit}}
+};
+
 static const cpu_arch_extension cpu_opttab_cortexm33[] = {
   {
 "nofp", true, false,
@@ -1624,6 +1652,45 @@ const cpu_option all_cores[] =
   },
   {
 {
+  "cortex-a55",
+  cpu_opttab_cortexa55,
+  {
+ISA_ARMv8_2a,
+isa_bit_fp16,ISA_FP_ARMv8,ISA_NEON,
+ISA_FP_ARMv8,ISA_NEON,
+isa_nobit
+  }
+},
+TARGET_ARCH_armv8_2_a
+  },
+  {
+{
+  "cortex-a75",
+  cpu_opttab_cortexa75,
+  {
+ISA_ARMv8_2a,
+isa_bit_fp16,ISA_FP_ARMv8,ISA_NEON,
+ISA_FP_ARMv8,ISA_NEON,
+isa_nobit
+  }
+},
+TARGET_ARCH_armv8_2_a
+  },
+  {
+{
+  "cortex-a75.cortex-a55",
+  cpu_opttab_cortexa75cortexa55,
+  {
+ISA_ARMv8_2a,
+isa_bit_fp16,ISA_FP_ARMv8,ISA_NEON,
+ISA_FP_ARMv8,ISA_NEON,
+isa_nobit
+  }
+},
+TARGET_ARCH_armv8_2_a
+  },
+  {
+{
   "cortex-m23",
   NULL,
   {
diff --git a/gcc/config/arm/arm-cpu-data.h b/gcc/config/arm/arm-cpu-data.h
index d42021d..1e05522 100644
--- a/gcc/config/arm/arm-cpu-data.h
+++ b/gcc/config/arm/arm-cpu-data.h
@@ -552,6 +552,21 @@ static const cpu_tune all_tunes[] =
 (TF_LDSCHED),
 _cortex_a73_tune
   },
+  { /* cortex-a55.  */
+TARGET_CPU_cortexa53,
+(TF_LDSCHED),
+_cortex_a53_tune
+  },
+  { /* cortex-a75.  */
+TARGET_CPU_cortexa57,
+(TF_LDSCHED),
+_cortex_a73_tune
+  },
+  { /* cortex-a75.cortex-a55.  */
+TARGET_CPU_cortexa53,
+(TF_LDSCHED),
+_cortex_a73_tune
+  },
   { /* cortex-m23.  */
 TARGET_CPU_cortexm23,
 (TF_LDSCHED),
diff --git a/gcc/config/arm/arm-cpu.h b/gcc/config/arm/arm-cpu.h
index e27634c..8fda717 100644
--- a/gcc/config/arm/arm-cpu.h
+++ b/gcc/config/arm/arm-cpu.h
@@ -128,6 +128,9 @@ enum processor_type
   TARGET_CPU_cortexa72cortexa53,
   TARGET_CPU_cortexa73cortexa35,
   TARGET_CPU_cortexa73cortexa53,
+  TARGET_CPU_cortexa55,
+  TARGET_CPU_cortexa75,
+  TARGET_CPU_cortexa75cortexa55,
   TARGET_CPU_cortexm23,
   TARGET_CPU_cortexm33,
   TARGET_CPU_arm_none
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 3231740..946d543 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1206,7 +1206,6 @@ begin cpu xgene1
  costs xgene1
 end cpu xgene1
 
-
 # V8 A-profile big.LITTLE implementations
 begin cpu cortex-a57.cortex-a53
  cname cortexa57cortexa53
@@ -1249,6 +1248,40 @@ begin cpu cortex-a73.cortex-a53
 end cpu cortex-a73.cortex-a53
 
 
+# ARMv8.2 A-profile Architecture Processors
+begin cpu cortex-a55
+ cname cortexa55
+ tune for cortex-a53
+ tune flags LDSCHED
+ architecture armv8.2-a+fp16
+ fpu neon-fp-armv8
+ option crypto add FP_ARMv8 CRYPTO
+ option nofp remove ALL_FP
+ costs cortex_a53
+end cpu cortex-a55
+
+begin cpu cortex-a75
+ cname cortexa75
+ tune for cortex-a57
+ tune flags LDSCHED
+ architecture armv8.2-a+fp16
+ fpu neon-fp-armv8
+ option 

Re: Update profile for haifa-sched's recovery blocks

2017-07-04 Thread Jan Hubicka
> On Tue, 4 Jul 2017, Jan Hubicka wrote:
> 
> > Hi,
> > this is another bug I noticed while looking into Itanium rgression.
> > There is no profile attached to recovery blocks in scheduler.
> > I made them very unlikely, but I wonder if we can do better? After all
> > we probably know the probability of path that will lead for speculation
> > to suceed?
> 
> I don't really understand why/how that would help (after all, when doing
> speculation we must have already decided that failure is very unlikely),
> but to answer the last question, scheduler's estimation of speculation
> success is tracked using the 'ds_t' data type - see comments regarding
> 'dw_t' and 'ds_t' in sched-int.h.

Well, the probabilities/counts controls later passes. For example if chance
for speculation to fail is 1/100 one does not want to offload the block
into cold section or agressively optimize it for size.
If failure happens only over path that is predicted to not be taken at
all during execution run (for example because it leads to abort) then
such transforms are fine.

For this to work we would need to actually track profile_probability not
just sum of values convered to reg_br_prob_base. So perhaps I will keep code
as it is and add a TODO comment.  OK?

Honza
> 
> I think in haifa-sched.c 'todo_spec' variable in create_check_block_twin
> holds the desired estimation, in sel-sched.c it's 'check_ds' in
> create_speculation_check.
> 
> Alexander


Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom

2017-07-04 Thread Richard Biener
On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng  wrote:
> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
>  wrote:
>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng  wrote:
>>> Hi,
>>> For the moment, tree-predcom.c only supports 
>>> invariant/load-loads/store-loads chains.
>>> This patch generalizes dead store elimination (or store motion) across loop 
>>> iterations in
>>> predictive commoning pass by supporting store-store chain.  As comment in 
>>> the patch:
>>>
>>>Apart from predictive commoning on Load-Load and Store-Load chains, we
>>>also support Store-Store chains -- stores killed by other store can be
>>>eliminated.  Given below example:
>>>
>>>  for (i = 0; i < n; i++)
>>>{
>>>  a[i] = 1;
>>>  a[i+2] = 2;
>>>}
>>>
>>>It can be replaced with:
>>>
>>>  t0 = a[0];
>>>  t1 = a[1];
>>>  for (i = 0; i < n; i++)
>>>{
>>>  a[i] = 1;
>>>  t2 = 2;
>>>  t0 = t1;
>>>  t1 = t2;
>>>}
>>>  a[n] = t0;
>>>  a[n+1] = t1;
>>>
>>>If the loop runs more than 1 iterations, it can be further simplified 
>>> into:
>>>
>>>  for (i = 0; i < n; i++)
>>>{
>>>  a[i] = 1;
>>>}
>>>  a[n] = 2;
>>>  a[n+1] = 2;
>>>
>>>The interesting part is this can be viewed either as general store motion
>>>or general dead store elimination in either intra/inter-iterations way.
>>>
>>> There are number of interesting facts about this enhancement:
>>> a) This patch supports dead store elimination for both across-iteration 
>>> case and single-iteration
>>>  case.  For the latter, it is dead store elimination.
>>> b) There are advantages supporting dead store elimination in predcom, for 
>>> example, it has
>>>  complete information about memory address.  On the contrary, DSE pass 
>>> can only handle
>>>  memory references with exact the same memory address expression.
>>> c) It's cheap to support store-stores chain in predcom based on existing 
>>> code.
>>> d) As commented, the enhancement can be viewed as either generalized dead 
>>> store elimination
>>>  or generalized store motion.  I prefer DSE here.
>>>
>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>
>> Looks mostly ok.  I have a few questions though.
>>
>> +  /* Don't do store elimination if loop has multiple exit edges.  */
>> +  bool eliminate_store_p = single_exit (loop) != NULL;
>>
>> handling this would be an enhancement?  IIRC LIM store-motion handles this
>> just fine by emitting code on all exits.
> It is an enhancement with a little bit more complication.  We would
> need to setup/record finalizer memory references for different exit
> edges.  I added TODO description for this (and following one).  Is it
> okay to pick up this in the future?

Yes.

>>
>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec chains)
>>  {
>>if (chain->type == CT_INVARIANT)
>> continue;
>> +  /* Don't unroll when eliminating stores.  */
>> +  else if (chain->type == CT_STORE_STORE)
>> +   return 1;
>>
>> this is a hard exit value so we do not handle the case where another chain
>> in the loop would want to unroll? (enhancement?)  I'd have expected to do
>> the same as for CT_INVARIANT here.
> I didn't check what change is needed in case of unrolling.  I am not
> very sure if we should prefer unroll for *load chains or prefer not
> unroll for store-store chains, because unroll in general increases
> loop-carried register pressure for store-store chains rather than
> decreases register pressure for *load chains.
> I was also thinking if it's possible to restrict unrolling somehow in
> order to enable predcom at O2.  BTW, this is not common, it only
> happens once in spec2k6 with factor forced to 1.  So okay if as it is
> now?

I think it is ok for now with a TODO added.  Please change the comment
to "we can't handle unrolling when eliminating stores" (it's not clear if we
can -- did you try?  maybe add a testcase that would ICE)

>
>>
>> +  tree init = ref_at_iteration (dr, (int) 0 - i, );
>> +  if (!chain->all_always_accessed && tree_could_trap_p (init))
>> +   {
>> + gimple_seq_discard (stmts);
>> + return false;
>>
>> so this is the only place that remotely cares for not always performed 
>> stores.
>> But as-is the patch doesn't seem to avoid speculating stores and thus
>> violates the C++ memory model, aka, introduces store-data-races?  The LIM
>> store-motion code was fixed to avoid this by keeping track of whether a BB
>> has executed to guard the stores done in the compensation code on the loop
>> exit.
>>
>> That said, to "fix" this all && tree_could_trap_p cases would need to be 
>> removed
>> (or similarly flag vars be introduced).  Speculating loads that do not
>> trap is ok
>> (might only introduce false uninit use reports by tools like valgrind).
> Hmm, not 

Re: Update profile for haifa-sched's recovery blocks

2017-07-04 Thread Alexander Monakov
On Tue, 4 Jul 2017, Jan Hubicka wrote:

> Hi,
> this is another bug I noticed while looking into Itanium rgression.
> There is no profile attached to recovery blocks in scheduler.
> I made them very unlikely, but I wonder if we can do better? After all
> we probably know the probability of path that will lead for speculation
> to suceed?

I don't really understand why/how that would help (after all, when doing
speculation we must have already decided that failure is very unlikely),
but to answer the last question, scheduler's estimation of speculation
success is tracked using the 'ds_t' data type - see comments regarding
'dw_t' and 'ds_t' in sched-int.h.

I think in haifa-sched.c 'todo_spec' variable in create_check_block_twin
holds the desired estimation, in sel-sched.c it's 'check_ds' in
create_speculation_check.

Alexander


PR 81292: ICE on related strlens after r249880

2017-07-04 Thread Richard Sandiford
r249880 installed the result of a strlen in a strinfo if the strinfo
wasn't previously a full string.  But as Jakub says in the PR comments,
we can't just do that in isolation, because there are no vdefs on the
call that would invalidate any related strinfos.

This patch updates the related strinfos if the adjustment is simple and
invalidates them otherwise.  As elsewhere, we treat adjustments of the
form strlen +/- INTEGER_CST as simple but anything else as too complex.

Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?

Sorry for the breakage.

Richard


gcc/
PR tree-optimization/81292
* tree-ssa-strlen.c (handle_builtin_strlen): When setting
full_string_p, also call adjust_related_strinfos if the adjustment
is simple, otherwise invalidate related strinfos.

gcc/testsuite/
PR tree-optimization/81292
* gcc.dg/pr81292-1.c: New test.
* gcc.dg/pr81292-2.c: Likewise.

Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c   2017-07-03 21:00:21.438433126 +0100
+++ gcc/tree-ssa-strlen.c   2017-07-03 21:00:34.971809881 +0100
@@ -1214,8 +1214,23 @@ handle_builtin_strlen (gimple_stmt_itera
  /* Until now we only had a lower bound on the string length.
 Install LHS as the actual length.  */
  si = unshare_strinfo (si);
+ tree old = si->nonzero_chars;
  si->nonzero_chars = lhs;
  si->full_string_p = true;
+ if (TREE_CODE (old) == INTEGER_CST)
+   {
+ location_t loc = gimple_location (stmt);
+ old = fold_convert_loc (loc, TREE_TYPE (lhs), old);
+ tree adj = fold_build2_loc (loc, MINUS_EXPR,
+ TREE_TYPE (lhs), lhs, old);
+ adjust_related_strinfos (loc, si, adj);
+   }
+ else
+   {
+ si->first = 0;
+ si->prev = 0;
+ si->next = 0;
+   }
}
  return;
}
Index: gcc/testsuite/gcc.dg/pr81292-1.c
===
--- /dev/null   2017-07-03 19:11:31.344941082 +0100
+++ gcc/testsuite/gcc.dg/pr81292-1.c2017-07-03 21:00:34.971809881 +0100
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-strlen" } */
+
+#include "strlenopt.h"
+
+char a[10];
+
+int __attribute__ ((noinline, noclone))
+f1 (int n)
+{
+  a[0] = '1';
+  a[1] = '2';
+  return strlen (a + 1) < n ? strlen (a) : 100;
+}
+
+int __attribute__ ((noinline, noclone))
+f2 (char *a, int n)
+{
+  a[0] = '1';
+  a[1] = '2';
+  return strlen (a + 1) < n ? strlen (a) : 100;
+}
+
+int
+main (void)
+{
+  char b[10];
+  strcpy (a + 2, "345");
+  strcpy (b + 2, "34567");
+  if (f1 (100) != 5 || f2 (b, 100) != 7)
+__builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "strlen \\(" 2 "strlen" } } */
Index: gcc/testsuite/gcc.dg/pr81292-2.c
===
--- /dev/null   2017-07-03 19:11:31.344941082 +0100
+++ gcc/testsuite/gcc.dg/pr81292-2.c2017-07-03 21:00:34.971809881 +0100
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-strlen" } */
+
+#include "strlenopt.h"
+
+char a[] = { 0, 'a', 0, 'b', 'c', 0, 'd', 'e', 'f', 0 };
+
+int __attribute__ ((noinline, noclone))
+f1 (void)
+{
+  a[0] = '1';
+  a[strlen (a)] = '2';
+  a[strlen (a)] = '3';
+  return strlen (a);
+}
+
+int __attribute__ ((noinline, noclone))
+f2 (char *a)
+{
+  a[0] = '1';
+  a[strlen (a)] = '2';
+  a[strlen (a)] = '3';
+  return strlen (a);
+}
+
+int
+main (void)
+{
+  char b[] = { 0, 0, 'a', 'b', 0, 0 };
+  if (f1 () != 9 || f2 (b) != 5)
+__builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "strlen \\(" 6 "strlen" } } */


Re: [PATCH] Transform (m1 > m2) * d into m1> m2 ? d : 0

2017-07-04 Thread Hurugalawadi, Naveen
Hi,

Thanks for the review and comments on the patch.

>> The proposed patch handled both the same.  This means the pattern
>> shouldn't use range-info but instead match a more complex

The patch handles as per the discussion by matching the pattern
in match.pd.

Bootstrapped and Regression tested on AArch64 and X86_64.
Please review the patch and let us know if its okay?

Thanks,
Naveen

2017-07-04  Naveen H.S  

gcc
* match.pd (((m1 >/=/<= m2) * d -> (m1 >/=/<= m2) ? d : 0) New
pattern.

gcc/testsuite
* gcc.dg/tree-ssa/vrp116.c: New Test.diff --git a/gcc/match.pd b/gcc/match.pd
index 4c64b21..d914db1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1088,6 +1088,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
   (convert (bit_and (bit_not @1) @0
 
+/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
+(for cmp (gt lt ge le)
+(simplify
+ (mult (convert (cmp @0 @1)) @2)
+  (cond (cmp @0 @1) @2 { build_zero_cst (type); })))
+
 /* For integral types with undefined overflow and C != 0 fold
x * C EQ/NE y * C into x EQ/NE y.  */
 (for cmp (eq ne)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
new file mode 100644
index 000..d9d7b23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp116.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+int
+f (int m1, int m2, int c)
+{
+  int d = m1 > m2;
+  int e = d * c;
+  return e ? m1 : m2;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "vrp1" } } */


Re: [PATCH][1/2] Early LTO debug, simple-object part

2017-07-04 Thread Richard Biener
On Tue, 20 Jun 2017, Richard Biener wrote:

> On Wed, 7 Jun 2017, Richard Biener wrote:
> 
> > On Fri, 19 May 2017, Richard Biener wrote:
> > 
> > > 
> > > This is a repost (unchanged) of the simple-object ELF support for
> > > early LTO debug transfer from IL object to a separate debug-only object 
> > > file.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > Ping.
> 
> Ping^2.

Ping^3.

> > > Richard.
> > > 
> > > 2017-05-19  Richard Biener  
> > > 
> > >   include/
> > >   * simple-object.h (simple_object_copy_lto_debug_sections): New
> > >   function.
> > > 
> > >   libiberty/
> > >   * simple-object-common.h (struct simple_object_functions): Add
> > >   copy_lto_debug_sections hook.
> > >   * simple-object.c: Include fcntl.h.
> > >   (handle_lto_debug_sections): New helper function.
> > >   (simple_object_copy_lto_debug_sections): New function copying
> > >   early LTO debug sections to regular debug sections in a new file.
> > >   (simple_object_start_write): Handle NULL segment_name.
> > >   * simple-object-coff.c (simple_object_coff_functions): Adjust
> > >   for not implemented copy_lto_debug_sections hook.
> > >   * simple-object-mach-o.c (simple_object_mach_o_functions): Likewise.
> > >   * simple-object-xcoff.c (simple_object_xcoff_functions): Likewise.
> > >   * simple-object-elf.c (SHT_NULL, SHT_SYMTAB, SHT_RELA, SHT_REL,
> > >   SHT_GROUP): Add various sectopn header types.
> > >   (SHF_EXCLUDE): Add flag.
> > >   (Elf32_External_Sym, Elf64_External_Sym): Add symbol struct.
> > >   (ELF_ST_BIND, ELF_ST_TYPE, ELF_ST_INFO): Add accessors.
> > >   (STT_OBJECT, STT_FUNC, STT_TLS, STT_GNU_IFUNC): Add Symbol types.
> > >   (STV_DEFAULT): Add symbol visibility.
> > >   (SHN_COMMON): Add special section index name.
> > >   (struct simple_object_elf_write): New.
> > >   (simple_object_elf_start_write): Adjust for new private data.
> > >   (simple_object_elf_write_shdr): Pass in values for all fields
> > >   we write.
> > >   (simple_object_elf_write_to_file): Adjust.  Copy from recorded
> > >   section headers if requested.
> > >   (simple_object_elf_release_write): Release private data.
> > >   (simple_object_elf_copy_lto_debug_sections): Copy and rename sections
> > >   as denoted by PFN and all their dependences, symbols and relocations
> > >   to the empty destination file.
> > >   (simple_object_elf_functions): Adjust for copy_lto_debug_sections hook.
> > > 
> > > Index: early-lto-debug/include/simple-object.h
> > > ===
> > > --- early-lto-debug.orig/include/simple-object.h  2016-10-19 
> > > 13:19:58.012326431 +0200
> > > +++ early-lto-debug/include/simple-object.h   2016-10-20 
> > > 10:51:49.861722998 +0200
> > > @@ -197,6 +197,14 @@ simple_object_write_to_file (simple_obje
> > >  extern void
> > >  simple_object_release_write (simple_object_write *);
> > >  
> > > +/* Copy LTO debug sections from SRC_OBJECT to DEST.
> > > +   If an error occurs, return the errno value in ERR and an error 
> > > string.  */
> > > +
> > > +extern const char *
> > > +simple_object_copy_lto_debug_sections (simple_object_read *src_object,
> > > +const char *dest,
> > > +int *err);
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > Index: early-lto-debug/libiberty/simple-object-common.h
> > > ===
> > > --- early-lto-debug.orig/libiberty/simple-object-common.h 2016-10-19 
> > > 13:19:58.012326431 +0200
> > > +++ early-lto-debug/libiberty/simple-object-common.h  2016-10-20 
> > > 10:51:49.865723045 +0200
> > > @@ -141,6 +141,12 @@ struct simple_object_functions
> > >  
> > >/* Release the private data for an simple_object_write.  */
> > >void (*release_write) (void *);
> > > +
> > > +  /* Copy LTO debug sections.  */
> > > +  const char *(*copy_lto_debug_sections) (simple_object_read *sobj,
> > > +   simple_object_write *dobj,
> > > +   int (*pfn) (const char **),
> > > +   int *err);
> > >  };
> > >  
> > >  /* The known object file formats.  */
> > > Index: early-lto-debug/libiberty/simple-object-elf.c
> > > ===
> > > --- early-lto-debug.orig/libiberty/simple-object-elf.c2016-10-19 
> > > 13:19:58.012326431 +0200
> > > +++ early-lto-debug/libiberty/simple-object-elf.c 2016-10-20 
> > > 10:51:49.865723045 +0200
> > > @@ -183,8 +183,55 @@ typedef struct {
> > >  
> > >  /* Values for sh_type field.  */
> > >  
> > > +#define SHT_NULL 0   /* Section header table entry unused */
> > >  #define SHT_PROGBITS 1   /* Program data */
> > > +#define SHT_SYMTAB   2   /* Link editing symbol table */
> > >  #define SHT_STRTAB   3   /* A string 

Re: [PATCH][2/2] early LTO debug, main part

2017-07-04 Thread Richard Biener
On Tue, 20 Jun 2017, Richard Biener wrote:

> On Wed, 7 Jun 2017, Richard Biener wrote:
> 
> > On Fri, 19 May 2017, Richard Biener wrote:
> > 
> > > 
> > > This is a repost of the main part of the early LTO debug support.
> > > The only changes relative to the last post is in the dwarf2out.c
> > > pieces due to Jasons review and Jakubs introduction of
> > > DW_OP_GNU_variable_value.
> > > 
> > > I've also adjusted testcases for fallout (the asan backtraces do
> > > give files / line numbers because libbacktrace doesn't understand
> > > the DWARF) plus added a -flto run over the libstdc++ pretty printer
> > > testsuite -- after all the goal was to make those work with LTO,
> > > and they now nicely do.
> > > 
> > > [LTO-]bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > 
> > > I've also tested with -flto -g and compared to before the patch and
> > > the outcome doesn't contain any surprises.
> > > 
> > > I've also ran the gdb testsuite with no differences (but I guess
> > > it doesn't exercise LTO).
> > > 
> > > I've also built SPEC 2k6 with -flto -g.
> > > 
> > > I've also debugged optimized LTO bootstrapped cc1 a bit - not that
> > > debugging (LTO) optimized cc1 is a pleasant experience, but at least
> > > gdb doesn't crash.
> > > 
> > > Ok for trunk?
> > 
> > Ping.
> 
> Ping^2.

Ping^3.

> > > Both darwin and mingw maintainers were not concerned about LTO with -g
> > > being broken for them.
> > > 
> > > This patch allows us to go forward with freeing more stuff after
> > > the frontend finished, in particular remove LTO streaming of a lot
> > > of type information that is referenced from trees (and, as a first
> > > step, enable free-lang-data for non-LTO compiles).
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > 2017-05-19  Richard Biener  
> > > 
> > > * debug.h (struct gcc_debug_hooks): Add die_ref_for_decl and
> > > register_external_die hooks.
> > > (debug_false_tree_charstarstar_uhwistar): Declare.
> > > (debug_nothing_tree_charstar_uhwi): Likewise.
> > > * debug.c (do_nothing_debug_hooks): Adjust.
> > > (debug_false_tree_charstarstar_uhwistar): New do nothing.
> > > (debug_nothing_tree_charstar_uhwi): Likewise.
> > > * dbxout.c (dbx_debug_hooks): Adjust.
> > > (xcoff_debug_hooks): Likewise.
> > > * sdbout.c (sdb_debug_hooks): Likewise.
> > > * vmsdbgout.c (vmsdbg_debug_hooks): Likewise.
> > > 
> > > * dwarf2out.c (macinfo_label_base): New global.
> > >   (dwarf2out_register_external_die): New function for the
> > >   register_external_die hook.
> > > (dwarf2out_die_ref_for_decl): Likewise for die_ref_for_decl.
> > > (dwarf2_debug_hooks): Use them.
> > > (dwarf2_lineno_debug_hooks): Adjust.
> > > (struct die_struct): Add with_offset flag.
> > > (DEBUG_LTO_DWO_INFO_SECTION, DEBUG_LTO_INFO_SECTION,
> > > DEBUG_LTO_DWO_ABBREV_SECTION, DEBUG_LTO_ABBREV_SECTION,
> > > DEBUG_LTO_DWO_MACINFO_SECTION, DEBUG_LTO_MACINFO_SECTION,
> > > DEBUG_LTO_DWO_MACRO_SECTION, DEBUG_LTO_MACRO_SECTION,
> > > DEBUG_LTO_LINE_SECTION, DEBUG_LTO_DWO_STR_OFFSETS_SECTION,
> > > DEBUG_LTO_STR_DWO_SECTION, DEBUG_STR_LTO_SECTION): New macros
> > > defining section names for the early LTO debug variants.
> > >   (reset_indirect_string): New helper.
> > > (add_AT_external_die_ref): Helper for 
> > > dwarf2out_register_external_die.
> > > (print_dw_val): Add support for offsetted symbol references.
> > > (compute_section_prefix_1): Split out worker to distinguish
> > > the comdat from the LTO case.
> > > (compute_section_prefix): Wrap old comdat case here.
> > > (output_die): Skip DIE symbol output for the LTO added one.
> > > Handle DIE symbol references with offset.
> > > (output_comp_unit): Guard section name mangling properly.
> > > For LTO debug sections emit a symbol at the section beginning
> > > which we use to refer to its DIEs.
> > > (add_abstract_origin_attribute): For DIEs registered via
> > > dwarf2out_register_external_die directly refer to the early
> > > DIE rather than indirectly through the shadow one we created.
> > > (gen_array_type_die): When generating early LTO debug do
> > > not emit DW_AT_string_length.
> > > (gen_formal_parameter_die): Do not re-create DIEs for PARM_DECLs
> > > late when in LTO.
> > > (gen_subprogram_die): Adjust the check for whether we face
> > > a concrete instance DIE for an inline we can reuse for the
> > > late LTO case.  Likewise avoid another specification DIE
> > > for early built declarations/definitions for the late LTO case.
> > > (gen_variable_die): Add type references for late duplicated VLA 
> > > dies
> > > when in late LTO.
> > > (gen_inlined_subroutine_die): Do not call 
> 

Re: [PATCH v9] add -fpatchable-function-entry=N,M option

2017-07-04 Thread Richard Earnshaw (lists)
On 13/06/17 18:00, Torsten Duwe wrote:
> Changes since v8:
> 
>   * Documentation changes as requested by Sandra
>   * 3 functional test cases added
> 
>   Torsten
> 
> 
> gcc/c-family/ChangeLog
> 2017-06-13  Torsten Duwe  
> 
>   * c-attribs.c (c_common_attribute_table): Add entry for
>   "patchable_function_entry".
> 
> gcc/lto/ChangeLog
> 2017-06-13  Torsten Duwe  
> 
>   * lto-lang.c (lto_attribute_table): Add entry for
>   "patchable_function_entry".
> 
> gcc/ChangeLog
> 2017-06-13  Torsten Duwe  
> 
>   * common.opt: Introduce -fpatchable-function-entry
>   command line option, and its variables function_entry_patch_area_size
>   and function_entry_patch_area_start.
>   * opts.c (common_handle_option): Add -fpatchable_function_entry_ case,
>   including a two-value parser.
>   * target.def (print_patchable_function_entry): New target hook.
>   * targhooks.h (default_print_patchable_function_entry): New function.
>   * targhooks.c (default_print_patchable_function_entry): Likewise.
>   * toplev.c (process_options): Switch off IPA-RA if
>   patchable function entries are being generated.
>   * varasm.c (assemble_start_function): Look at the
>   patchable-function-entry command line switch and current
>   function attributes and maybe generate NOP instructions by
>   calling the print_patchable_function_entry hook.
>   * doc/extend.texi: Document patchable_function_entry attribute.
>   * doc/invoke.texi: Document -fpatchable_function_entry
>   command line option.
>   * doc/tm.texi.in (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
>   New target hook.
>   * doc/tm.texi: Likewise.
> 
> gcc/testsuite/ChangeLog
> 2017-06-13  Torsten Duwe  
> 
>   * c-c++-common/patchable_function_entry-default.c: New test.
>   * c-c++-common/patchable_function_entry-decl.c: Likewise.
>   * c-c++-common/patchable_function_entry-definition.c: Likewise.

I think we're nearly there with this one, but there are a couple of nits
still to sort out.

I can't see anything in the patch to deal with targets that set
TARGET_HAVE_NAMED_SECTIONS to false.  We'll probably ICE in that case
since the compiler will be unable to record the location of the patch
region in the explicitly named section.  I think the correct thing to do
there is to reject the option during option validation when we can't
support it properly.

The template NOP in default_print_patchable_function_entry needs to be
added as a GC root to prevent it being discarded during garbage collection.

There are a couple of documentation tweaks, which I think help make the
text a little more comprehensible.  See inline.

R.

> 
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index f2a88e147ba..31137ce0433 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -139,6 +139,8 @@ static tree handle_bnd_variable_size_attribute (tree *, 
> tree, tree, int, bool *)
>  static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
>  static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
>  static tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
> +int, bool *);
>  
>  /* Table of machine-independent attributes common to all C-like languages.
>  
> @@ -345,6 +347,9 @@ const struct attribute_spec c_common_attribute_table[] =
> handle_bnd_instrument, false },
>{ "fallthrough", 0, 0, false, false, false,
> handle_fallthrough_attribute, false },
> +  { "patchable_function_entry",  1, 2, true, false, false,
> +   handle_patchable_function_entry_attribute,
> +   false },
>{ NULL, 0, 0, false, false, false, NULL, false }
>  };
>  
> @@ -3173,3 +3178,10 @@ handle_fallthrough_attribute (tree *, tree name, tree, 
> int,
>*no_add_attrs = true;
>return NULL_TREE;
>  }
> +
> +static tree
> +handle_patchable_function_entry_attribute (tree *, tree, tree, int, bool *)
> +{
> +  /* Nothing to be done here.  */
> +  return NULL_TREE;
> +}
> diff --git a/gcc/common.opt b/gcc/common.opt
> index a5c3aeaa336..f542590650c 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -163,6 +163,13 @@ bool flag_stack_usage_info = false
>  Variable
>  int flag_debug_asm
>  
> +; How many NOP insns to place at each function entry by default
> +Variable
> +HOST_WIDE_INT function_entry_patch_area_size
> +
> +; And how far the real asm entry point is into this area
> +Variable
> +HOST_WIDE_INT function_entry_patch_area_start
>  
>  ; Balance between GNAT encodings and standard DWARF to emit.
>  Variable
> @@ -2022,6 +2029,10 @@ fprofile-reorder-functions
>  Common Report 

Re: [PATCH] Enable addressable params sanitization with --param asan-stack=1.

2017-07-04 Thread Martin Liška
On 07/04/2017 09:59 AM, Jakub Jelinek wrote:
> On Tue, Jul 04, 2017 at 09:47:29AM +0200, Martin Liška wrote:
>> As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81040#c15, the 
>> sanitization is
>> done only when one uses use-after-scope. That's caused by fact that I 
>> decorated the newly
>> created auto variables with DECL_ARTIFICIAL = 1. Because of that
>>
>> static inline bool
>> asan_protect_stack_decl (tree decl)
>> {
>>   return DECL_P (decl)
>> && (!DECL_ARTIFICIAL (decl)
>>  || (asan_sanitize_use_after_scope () && TREE_ADDRESSABLE (decl)));
>> }
>>
>> returns false. I hope not marking the variable as DECL_ARTIFICIAL will work 
>> fine?
>> Or am I missing something?
> 
> Well, you should make sure the debug info is correct.
> Which means ideally that there is just one DW_TAG_formal_parameter and no
> DW_TAG_variable for the parameter.
> For the addressable parameters I hope the corresponding artificial
> vars just live in memory for the whole rest of the scope, at least for the
> case where you emit a debug bind (hope it is after the assignment to the
> artificial var) I think it should be fine to set DECL_IGNORED_P on the
> artificial var instead of DECL_ARTIFICIAL.
> For the other case where there is DECL_VALUE_EXPR, perhaps try it too and
> see what you get.
> 
>   Jakub
> 

Using DECL_IGNORED_P works for me.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

I'm going to install the patch.
Martin
>From 20d69fbf4076add09df363ffb9d03cd243f8190d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 4 Jul 2017 09:22:23 +0200
Subject: [PATCH] Enable addressable params sanitization with --param
 asan-stack=1.

gcc/ChangeLog:

2017-07-04  Martin Liska  

	PR sanitizer/81040
	* sanopt.c (sanitize_rewrite_addressable_params): Mark the
	newly created variable as DECL_IGNORED_P.

gcc/testsuite/ChangeLog:

2017-07-04  Martin Liska  

	PR sanitizer/81040
	* g++.dg/asan/function-argument-1.C: Run the test-case w/o
	use-after-scope sanitization.
---
 gcc/sanopt.c| 2 +-
 gcc/testsuite/g++.dg/asan/function-argument-1.C | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/sanopt.c b/gcc/sanopt.c
index 7692f6a9db7..b7740741d43 100644
--- a/gcc/sanopt.c
+++ b/gcc/sanopt.c
@@ -905,7 +905,7 @@ sanitize_rewrite_addressable_params (function *fun)
 	  tree var = build_decl (DECL_SOURCE_LOCATION (arg),
  VAR_DECL, DECL_NAME (arg), type);
 	  TREE_ADDRESSABLE (var) = 1;
-	  DECL_ARTIFICIAL (var) = 1;
+	  DECL_IGNORED_P (var) = 1;
 
 	  gimple_add_tmp_var (var);
 
diff --git a/gcc/testsuite/g++.dg/asan/function-argument-1.C b/gcc/testsuite/g++.dg/asan/function-argument-1.C
index 148c4628316..bdbb37a44a4 100644
--- a/gcc/testsuite/g++.dg/asan/function-argument-1.C
+++ b/gcc/testsuite/g++.dg/asan/function-argument-1.C
@@ -1,5 +1,6 @@
 // { dg-do run }
 // { dg-shouldfail "asan" }
+// { dg-options "-fsanitize=address -fno-sanitize-address-use-after-scope" }
 
 struct A
 {
-- 
2.13.2



[PATCH][OBVIOUS] Use xstrdup_for_dump in ipa-inline.c (PR ipa/81293).

2017-07-04 Thread Martin Liška
Hello.

It's obvious patch for case where we call cxx_printable_name_internal twice. 
Once a function
name is found in buffer in a slot that then release for purpose of the second 
function name
buffering.

Installed as obvious.

Martin

gcc/ChangeLog:

2017-07-04  Martin Liska  

PR ipa/81293
* ipa-inline.c (inline_small_functions):
Use xstrdup_for_dump.
---
 gcc/ipa-inline.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index fb20d3723cc..3ec7fd19993 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2048,8 +2048,8 @@ inline_small_functions (void)
 	  fprintf (dump_file,
 		   " Inlined %s into %s which now has time %f and size %i, "
 		   "net change of %+i.\n",
-		   edge->callee->name (),
-		   edge->caller->name (),
+		   xstrdup_for_dump (edge->callee->name ()),
+		   xstrdup_for_dump (edge->caller->name ()),
 		   ipa_fn_summaries->get (edge->caller)->time.to_double (),
 		   ipa_fn_summaries->get (edge->caller)->size,
 		   overall_size - old_size);



[PATCH, 3/3] Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx plugin

2017-07-04 Thread Tom de Vries

On 07/04/2017 12:05 PM, Tom de Vries wrote:

On 07/03/2017 04:24 PM, Tom de Vries wrote:

On 07/03/2017 04:08 PM, Thomas Schwinge wrote:

Hi!

On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek  
wrote:

On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:

On Mon, 26 Jun 2017, Tom de Vries wrote:

2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx 
plugin


This patch adds handling of:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
- GOMP_OPENACC_NVPTX_DISASM=[01]


Why the "OPENACC" in these names?


I took the format from 'GOMP_OPENACC_DIM'.


Doesn't this debugging aid apply to
any variant of offloading?


I guess you're right. These environment variables would also be 
applicable for f.i. offloading via openmp on nvptx. I'll strip the 
'OPENACC_' bit from the variables.



The filename used for dumping the module is plugin-nvptx..cubin.


Also, I suggest to make these names similar to their controlling 
options,

that is: "gomp-nvptx*", for example.



Makes sense, will do.


Changes in the patch series:
- removed OPENACC_ from environment variable names
- made temp files use gomp-nvptx prefix.
- fixed build error due to missing _GNU_SOURCE in libgomp-nvptx.c.
- merged the three GOMP_NVPTX_JIT patches into one
- rewrote GOMP_NVPTX_JIT to add no extra flags to the JIT compiler
   invocation if GOMP_NVPTX_JIT if not defined, removing the need for
   hardcoding default values
- added CU_JIT_TARGET to plugin/cuda/cuda.h

Build on x86_64 with nvptx offloading enabled (using plugin/cuda/cuda.h).

The patch series now looks like:
1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx
plugin

I'll repost the patch series in reply to this email.



3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx
   plugin
   ( combination of 3 GOMP_NVPTX_JIT patches originally submitted at:
 https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01921.html
 https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01920.html
 https://gcc.gnu.org/ml/gcc-patches/2017-06/msg02407.html )

Thanks,
- Tom
Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx plugin

2017-06-26  Tom de Vries  

	* plugin/cuda/cuda.h (enum CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL,
	CU_JIT_NEW_SM3X_OPT and CU_JIT_TARGET.
	* plugin/plugin-nvptx.c (parse_number): New function.
	(process_GOMP_NVPTX_JIT): New function.
	(link_ptx): Add CU_JIT_OPTIMIZATION_LEVEL, CU_JIT_NEW_SM3X_OPT and
	CU_JIT_TARGET to opts if specified.

---
 libgomp/plugin/cuda/cuda.h|   5 +-
 libgomp/plugin/plugin-nvptx.c | 108 --
 2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 25d5d19..7d190f1 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -88,7 +88,10 @@ typedef enum {
   CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
-  CU_JIT_LOG_VERBOSE = 12
+  CU_JIT_OPTIMIZATION_LEVEL = 7,
+  CU_JIT_TARGET = 9,
+  CU_JIT_LOG_VERBOSE = 12,
+  CU_JIT_NEW_SM3X_OPT = 15
 } CUjit_option;
 
 typedef enum {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index cc2ee5e..f5b9502 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -144,6 +144,10 @@ init_cuda_lib (void)
 
 #include "secure_getenv.h"
 
+#if CUDA_VERSION < 8000
+#define CU_JIT_NEW_SM3X_OPT 15
+#endif
+
 /* Convenience macros for the frequently used CUDA library call and
error handling sequence as well as CUDA library calls that
do the error checking themselves or don't do it at all.  */
@@ -1106,11 +1110,77 @@ post_process_ptx (unsigned num, const char **res_code, size_t *res_size)
 }
 
 static bool
+parse_number (const char *c, unsigned long* resp, char **end)
+{
+  unsigned long res;
+
+  errno = 0;
+  res = strtoul (c, end, 10);
+  if (errno)
+return false;
+
+  *resp = res;
+  return true;
+}
+
+static void
+process_GOMP_NVPTX_JIT (intptr_t *gomp_nvptx_o, intptr_t *gomp_nvptx_ori,
+			uintptr_t *gomp_nvptx_target)
+{
+  const char *var_name = "GOMP_NVPTX_JIT";
+  const char *env_var = getenv (var_name);
+  notify_var (var_name, env_var);
+
+  if (env_var == NULL)
+return;
+
+  const char *c = env_var;
+  while (*c != '\0')
+{
+  while (*c == ' ')
+	c++;
+
+  if (c[0] == '-' && c[1] == 'O'
+	  && '0' <= c[2] && c[2] <= '4'
+	  && (c[3] == '\0' || c[3] == ' '))
+	{
+	  *gomp_nvptx_o = c[2] - '0';
+	  c += 3;
+	  continue;
+	}
+
+  if (c[0] == '-' && c[1] == 'o' && c[2] == 'r' && c[3] == 'i'
+	  && (c[4] == '\0' || c[4] == ' '))
+	{
+	  *gomp_nvptx_ori = 1;
+	  c += 4;
+	  continue;
+	}
+
+  if (c[0] == '-' && c[1] == 'a' && c[2] == 'r' && c[3] == 'c'
+	  && c[4] == 'h' && c[5] == '=')
+	{
+	  const char *end;
+	  unsigned long 

Update profile for haifa-sched's recovery blocks

2017-07-04 Thread Jan Hubicka
Hi,
this is another bug I noticed while looking into Itanium rgression.
There is no profile attached to recovery blocks in scheduler.
I made them very unlikely, but I wonder if we can do better? After all
we probably know the probability of path that will lead for speculation
to suceed?

Honza

* haifa-sched.c (sched_create_recovery_edges): Update profile after
creating recovery edges.
Index: haifa-sched.c
===
--- haifa-sched.c   (revision 249928)
+++ haifa-sched.c   (working copy)
@@ -8302,7 +8302,15 @@ sched_create_recovery_edges (basic_block
   else
 edge_flags = 0;
 
-  make_edge (first_bb, rec, edge_flags);
+  edge e2 = single_succ_edge (first_bb);
+  edge e = make_edge (first_bb, rec, edge_flags);
+  e->probability = profile_probability::very_unlikely ();
+  e->count = first_bb->count.apply_probability (e->probability);
+  rec->count = e->count;
+  rec->frequency = EDGE_FREQUENCY (e);
+  e2->probability = e->probability.invert ();
+  e2->count = first_bb->count - e2->count;
+
   rtx_code_label *label = block_label (second_bb);
   rtx_jump_insn *jump = emit_jump_insn_after (targetm.gen_jump (label),
  BB_END (rec));


[PATCH, 2/3] Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin

2017-07-04 Thread Tom de Vries

On 07/04/2017 12:05 PM, Tom de Vries wrote:

On 07/03/2017 04:24 PM, Tom de Vries wrote:

On 07/03/2017 04:08 PM, Thomas Schwinge wrote:

Hi!

On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek  
wrote:

On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:

On Mon, 26 Jun 2017, Tom de Vries wrote:

2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx 
plugin


This patch adds handling of:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
- GOMP_OPENACC_NVPTX_DISASM=[01]


Why the "OPENACC" in these names?


I took the format from 'GOMP_OPENACC_DIM'.


Doesn't this debugging aid apply to
any variant of offloading?


I guess you're right. These environment variables would also be 
applicable for f.i. offloading via openmp on nvptx. I'll strip the 
'OPENACC_' bit from the variables.



The filename used for dumping the module is plugin-nvptx..cubin.


Also, I suggest to make these names similar to their controlling 
options,

that is: "gomp-nvptx*", for example.



Makes sense, will do.


Changes in the patch series:
- removed OPENACC_ from environment variable names
- made temp files use gomp-nvptx prefix.
- fixed build error due to missing _GNU_SOURCE in libgomp-nvptx.c.
- merged the three GOMP_NVPTX_JIT patches into one
- rewrote GOMP_NVPTX_JIT to add no extra flags to the JIT compiler
   invocation if GOMP_NVPTX_JIT if not defined, removing the need for
   hardcoding default values
- added CU_JIT_TARGET to plugin/cuda/cuda.h

Build on x86_64 with nvptx offloading enabled (using plugin/cuda/cuda.h).

The patch series now looks like:
1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx
plugin

I'll repost the patch series in reply to this email.


2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
   ( original submission at
 https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01992.html )

Thanks,
- Tom
Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin

2017-06-27  Tom de Vries  

	* plugin/plugin-nvptx.c (post_process_ptx): New function.
	(link_ptx): Call post_process_ptx.

---
 libgomp/plugin/plugin-nvptx.c | 132 +-
 1 file changed, 130 insertions(+), 2 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3e33c5b..cc2ee5e 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -980,6 +980,131 @@ debug_linkout (void *linkout, size_t linkoutsize)
 }
 }
 
+/* If environment variable GOMP_NVPTX_PTXRW=[Ww], write *RES_CODE to file
+   gomp-nvptx..ptx.  If it is [Rr], read *RES_CODE from file
+   instead.  */
+
+static void
+post_process_ptx (unsigned num, const char **res_code, size_t *res_size)
+{
+  static int gomp_nvptx_ptxrw = -1;
+
+  if (gomp_nvptx_ptxrw == -1)
+{
+  const char *var_name = "GOMP_NVPTX_PTXRW";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+
+  gomp_nvptx_ptxrw = 0;
+  if (env_var == NULL)
+	;
+  else if ((env_var[0] == 'w' || env_var[0] == 'W')
+	   && env_var[1] == '\0')
+	gomp_nvptx_ptxrw = 1;
+  else if ((env_var[0] == 'r' || env_var[0] == 'R')
+	   && env_var[1] == '\0')
+	gomp_nvptx_ptxrw = 2;
+  else
+	GOMP_PLUGIN_error ("Error parsing %s", var_name);
+}
+
+  if (gomp_nvptx_ptxrw == 0)
+return;
+
+  const char *code = *res_code;
+  size_t size = *res_size;
+
+  const char *prefix = "gomp-nvptx.";
+  const char *postfix = ".ptx";
+  const int len =	(strlen (prefix)
+			 + 10 /* %u.  */
+			 + strlen (postfix)
+			 + 1  /* '\0'.  */);
+  char file_name[len];
+  int res = snprintf (file_name, len, "%s%u%s", prefix,
+		  num, postfix);
+  assert (res < len); /* Assert there's no truncation.  */
+
+  GOMP_PLUGIN_debug (0, "%s %s \n",
+		 (gomp_nvptx_ptxrw == 1 ? "Writing" : "Reading"),
+		 file_name);
+
+  if (gomp_nvptx_ptxrw == 1)
+{
+  FILE *ptx_file = fopen (file_name, "w");
+  if (ptx_file == NULL)
+	{
+	  GOMP_PLUGIN_debug (0, "Opening %s failed\n", file_name);
+	  return;
+	}
+
+  int res = fprintf (ptx_file, "%s", code);
+  unsigned int write_succeeded = res == size - 1;
+  if (!write_succeeded)
+	GOMP_PLUGIN_debug (0,
+			   "Writing %s failed: written %d but expected %zu\n",
+			   file_name, res, size - 1);
+
+  res = fclose (ptx_file);
+  if (res != 0)
+	GOMP_PLUGIN_debug (0, "Closing %s failed\n", file_name);
+
+  return;
+}
+
+  if (gomp_nvptx_ptxrw == 2)
+{
+  FILE *ptx_file = fopen (file_name, "r");
+  if (ptx_file == NULL)
+	{
+	  GOMP_PLUGIN_debug (0, "Opening %s failed\n", file_name);
+	  return;
+	}
+
+  if (fseek (ptx_file, 0L, SEEK_END) != 0)
+	{
+	  GOMP_PLUGIN_debug (0, "Seeking end of %s failed\n", file_name);
+	  return;
+	}
+
+  long bufsize = ftell (ptx_file);
+  if (bufsize == -1)
+	{
+	  

Fix bb-reorder code size regression on Itanium

2017-07-04 Thread Jan Hubicka
Hi,
better_edge_p will always return false when best_prob is uninitialized.  This
patch makes it to chose first edge with initialized probability instead.
This solves code size regression seen on our Itanium tester at least for
gzip's deflate.

Will commit it after bootstrapping/regtesting x86_64-linux.

Honza

* bb-reorder.c (better_edge_p): Deal with uninitialized best_prob.

Index: bb-reorder.c
===
--- bb-reorder.c(revision 249928)
+++ bb-reorder.c(working copy)
@@ -957,7 +957,7 @@ better_edge_p (const_basic_block bb, con
 return !cur_best_edge
   || cur_best_edge->dest->index > e->dest->index;
 
-  if (prob > best_prob + diff_prob)
+  if (prob > best_prob + diff_prob || !best_prob.initialized_p ())
 /* The edge has higher probability than the temporary best edge.  */
 is_better_edge = true;
   else if (prob < best_prob - diff_prob)


[PATCH, 1/3] Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-07-04 Thread Tom de Vries

On 07/04/2017 12:05 PM, Tom de Vries wrote:

On 07/03/2017 04:24 PM, Tom de Vries wrote:

On 07/03/2017 04:08 PM, Thomas Schwinge wrote:

Hi!

On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek  
wrote:

On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:

On Mon, 26 Jun 2017, Tom de Vries wrote:

2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx 
plugin


This patch adds handling of:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
- GOMP_OPENACC_NVPTX_DISASM=[01]


Why the "OPENACC" in these names?


I took the format from 'GOMP_OPENACC_DIM'.


Doesn't this debugging aid apply to
any variant of offloading?


I guess you're right. These environment variables would also be 
applicable for f.i. offloading via openmp on nvptx. I'll strip the 
'OPENACC_' bit from the variables.



The filename used for dumping the module is plugin-nvptx..cubin.


Also, I suggest to make these names similar to their controlling 
options,

that is: "gomp-nvptx*", for example.



Makes sense, will do.


Changes in the patch series:
- removed OPENACC_ from environment variable names
- made temp files use gomp-nvptx prefix.
- fixed build error due to missing _GNU_SOURCE in libgomp-nvptx.c.
- merged the three GOMP_NVPTX_JIT patches into one
- rewrote GOMP_NVPTX_JIT to add no extra flags to the JIT compiler
   invocation if GOMP_NVPTX_JIT if not defined, removing the need for
   hardcoding default values
- added CU_JIT_TARGET to plugin/cuda/cuda.h

Build on x86_64 with nvptx offloading enabled (using plugin/cuda/cuda.h).

The patch series now looks like:
1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx
plugin

I'll repost the patch series in reply to this email.


1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
   ( original submission at
 https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01918.html )

Thanks,
- Tom
Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-06-26  Tom de Vries  

	* plugin/plugin-nvptx.c (do_prog, debug_linkout): New function.
	(link_ptx): Use debug_linkout.

---
 libgomp/plugin/plugin-nvptx.c | 106 ++
 1 file changed, 106 insertions(+)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 71630b5..3e33c5b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -31,6 +31,7 @@
is not clear as to what that state might be.  Or how one might
propagate it from one thread to another.  */
 
+#define _GNU_SOURCE
 #include "openacc.h"
 #include "config.h"
 #include "libgomp-plugin.h"
@@ -47,6 +48,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #if PLUGIN_NVPTX_DYNAMIC
 # include 
@@ -138,6 +142,8 @@ init_cuda_lib (void)
 # define init_cuda_lib() true
 #endif
 
+#include "secure_getenv.h"
+
 /* Convenience macros for the frequently used CUDA library call and
error handling sequence as well as CUDA library calls that
do the error checking themselves or don't do it at all.  */
@@ -876,6 +882,104 @@ notify_var (const char *var_name, const char *env_var)
 GOMP_PLUGIN_debug (0, "%s: '%s'\n", var_name, env_var);
 }
 
+static void
+do_prog (const char *prog, const char *arg)
+{
+  pid_t pid = fork ();
+
+  if (pid == -1)
+{
+  GOMP_PLUGIN_error ("Fork failed");
+  return;
+}
+  else if (pid > 0)
+{
+  int status;
+  waitpid (pid, , 0);
+  if (!WIFEXITED (status))
+	GOMP_PLUGIN_error ("Running %s %s failed", prog, arg);
+}
+  else
+{
+  execlp (prog, prog /* argv[0] */, arg, NULL);
+  abort ();
+}
+}
+
+static void
+debug_linkout (void *linkout, size_t linkoutsize)
+{
+  static int gomp_nvptx_disasm = -1;
+  if (gomp_nvptx_disasm == -1)
+{
+  const char *var_name = "GOMP_NVPTX_DISASM";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+  gomp_nvptx_disasm
+	= ((env_var != NULL && env_var[0] == '1' && env_var[1] == '\0')
+	   ? 1 : 0);
+}
+
+  static int gomp_nvptx_save_temps = -1;
+  if (gomp_nvptx_save_temps == -1)
+{
+  const char *var_name = "GOMP_NVPTX_SAVE_TEMPS";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+  gomp_nvptx_save_temps
+	= ((env_var != NULL && env_var[0] == '1' && env_var[1] == '\0')
+	   ? 1 : 0);
+}
+
+  if (gomp_nvptx_disasm == 0
+  && gomp_nvptx_save_temps == 0)
+return;
+
+  const char *prefix = "gomp-nvptx.";
+  const char *postfix = ".cubin";
+  const int len =	(strlen (prefix)
+			 + 20 /* %lld.  */
+			 + strlen (postfix)
+			 + 1  /* '\0'.  */);
+  char file_name[len];
+  int res = snprintf (file_name, len, "%s%lld%s", prefix,
+		  (long long)getpid (), postfix);
+  assert (res < len); /* Assert there's no truncation.  */
+
+  GOMP_PLUGIN_debug (0, 

Re: [PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-07-04 Thread Tom de Vries

On 07/03/2017 04:24 PM, Tom de Vries wrote:

On 07/03/2017 04:08 PM, Thomas Schwinge wrote:

Hi!

On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek  
wrote:

On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:

On Mon, 26 Jun 2017, Tom de Vries wrote:

2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx 
plugin


This patch adds handling of:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
- GOMP_OPENACC_NVPTX_DISASM=[01]


Why the "OPENACC" in these names?


I took the format from 'GOMP_OPENACC_DIM'.


Doesn't this debugging aid apply to
any variant of offloading?


I guess you're right. These environment variables would also be 
applicable for f.i. offloading via openmp on nvptx. I'll strip the 
'OPENACC_' bit from the variables.



The filename used for dumping the module is plugin-nvptx..cubin.


Also, I suggest to make these names similar to their controlling options,
that is: "gomp-nvptx*", for example.



Makes sense, will do.


Changes in the patch series:
- removed OPENACC_ from environment variable names
- made temp files use gomp-nvptx prefix.
- fixed build error due to missing _GNU_SOURCE in libgomp-nvptx.c.
- merged the three GOMP_NVPTX_JIT patches into one
- rewrote GOMP_NVPTX_JIT to add no extra flags to the JIT compiler
  invocation if GOMP_NVPTX_JIT if not defined, removing the need for
  hardcoding default values
- added CU_JIT_TARGET to plugin/cuda/cuda.h

Build on x86_64 with nvptx offloading enabled (using plugin/cuda/cuda.h).

The patch series now looks like:
1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx
   plugin

I'll repost the patch series in reply to this email.

Thanks,
- Tom



Re: Handle data dependence relations with different bases

2017-07-04 Thread Eric Botcazou
[Sorry for missing the previous messages]

> Thanks.  Just been retesting, and I think I must have forgotten
> to include Ada last time.  It turns out that the patch causes a dg-scan
> regression in gnat.dg/vect17.adb, because we now think that if the
> array RECORD_TYPEs *do* alias in:
> 
>procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is
>begin
>   for I in Sarray'Range loop
>  R(I) := X(I) + Y(I);
>   end loop;
>end;
> 
> then the dependence distance must be zero.  Eric, does that hold true
> for Ada?  I.e. if X and R (or Y and R) alias, must it be the case that
> X(I) can only alias R(I) and not for example R(I-1) or R(I+1)?

Yes, I'd think so (even without the artificial RECORD_TYPE around the arrays).

> 2017-06-07  Richard Sandiford  
> 
> gcc/testsuite/
>   * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5.
>   * gnat.dg/vect17.adb (Add): Create a dependence distance of 1
>   when X = R or Y = R.

I think that you need to modify vect15 and vect16 the same way.

-- 
Eric Botcazou


Re: [PATCH] warn on mem calls modifying objects of non-trivial types (PR 80560)

2017-07-04 Thread Richard Earnshaw (lists)
On 29/06/17 17:15, Jan Hubicka wrote:
> Hello,
>> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
>> index 0f7e21a..443d16c 100644
>> --- a/gcc/hash-table.h
>> +++ b/gcc/hash-table.h
>> @@ -803,7 +803,10 @@ hash_table::empty_slow ()
>>m_size_prime_index = nindex;
>>  }
>>else
>> -memset (entries, 0, size * sizeof (value_type));
>> +{
>> +  for ( ; size; ++entries, --size)
>> +*entries = value_type ();
>> +}
>>m_n_deleted = 0;
>>m_n_elements = 0;
>>  }
> 
> This change sends our periodic testers into an infinite loop.  It is fault of 
> gcc 4.2 being used
> as bootstrap compiler, but perhaps that can be worked around?
> 
> Honza
> 

Same is also true on RHE5 (gcc-4.1).

R.


Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)

2017-07-04 Thread Jakub Jelinek
On Tue, Jun 20, 2017 at 10:18:20AM +0200, Richard Biener wrote:
> > Ok (of course, will handle this separately from the rest).
> 
> Yes.  Note I didn't look at the actual patch (yet).

I'd like to ping the -fsanitize=pointer-overflow patch (though if you're
busy, it can certainly wait a few weeks).

Jakub


Re: [PATCH][testsuite] Add dg-require-stack-check

2017-07-04 Thread Christophe Lyon
On 3 July 2017 at 17:30, Jeff Law  wrote:
> On 07/03/2017 09:00 AM, Christophe Lyon wrote:
>> Hi,
>>
>> This is a follow-up to
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01791.html
>>
>> This patch adds dg-require-stack-check and updates the tests that use
>> dg-options "-fstack-check" to avoid failures on configurations that to
>> not support it.
>>
>> I merely copied what we currently do to check if visibility flags are
>> supported, and cross-tested on aarch64 and arm targets with the
>> results I expected.
>>
>> This means that my testing does not cover the changes I propose for
>> i386 and gnat.
>>
>> Is it OK nonetheless?
>>
>> Thanks,
>>
>> Christophe
>>
>>
>> stack-check-et.chlog.txt
>>
>>
>> 2017-07-03  Christophe Lyon  
>>
>>   * lib/target-supports-dg.exp (dg-require-stack-check): New.
>>   * lib/target-supports.exp (check_stack_check_available): New.
>>   * g++.dg/other/i386-9.C: Add dg-require-stack-check.
>>   * gcc.c-torture/compile/stack-check-1.c: Likewise.
>>   * gcc.dg/graphite/run-id-pr47653.c: Likewise.
>>   * gcc.dg/pr47443.c: Likewise.
>>   * gcc.dg/pr48134.c: Likewise.
>>   * gcc.dg/pr70017.c: Likewise.
>>   * gcc.target/aarch64/stack-checking.c: Likewise.
>>   * gcc.target/arm/stack-checking.c: Likewise.
>>   * gcc.target/i386/pr48723.c: Likewise.
>>   * gcc.target/i386/pr55672.c: Likewise.
>>   * gcc.target/i386/pr67265-2.c: Likewise.
>>   * gcc.target/i386/pr67265.c: Likewise.
>>   * gnat.dg/opt49.adb: Likewise.
>>   * gnat.dg/stack_check1.adb: Likewise.
>>   * gnat.dg/stack_check2.adb: Likewise.
>>   * gnat.dg/stack_check3.adb: Likewise.
> ACK once you address Rainer's comments.  I've got further stack-check
> tests in the queue which I'll update once your change goes in.
>
> jeff

Here is an updated version, which adds documentation for dg-require-stack-check.

I also ran make-check on and x86_64 with ada enabled and checked the logs:
the updated i386/* and gnat.dg* tests all pass, and are preceded by
the compilation
of the "stack_check" sample.

OK?

Thanks,

Christophe
2017-07-04  Christophe Lyon  

gcc/
* doc/sourcebuild.texi (Test Directives, Variants of
dg-require-support): Add documentation for dg-require-stack-check.

gcc/testsuite/
* lib/target-supports-dg.exp (dg-require-stack-check): New.
* lib/target-supports.exp (check_stack_check_available): New.
* g++.dg/other/i386-9.C: Add dg-require-stack-check.
* gcc.c-torture/compile/stack-check-1.c: Likewise.
* gcc.dg/graphite/run-id-pr47653.c: Likewise.
* gcc.dg/pr47443.c: Likewise.
* gcc.dg/pr48134.c: Likewise.
* gcc.dg/pr70017.c: Likewise.
* gcc.target/aarch64/stack-checking.c: Likewise.
* gcc.target/arm/stack-checking.c: Likewise.
* gcc.target/i386/pr48723.c: Likewise.
* gcc.target/i386/pr55672.c: Likewise.
* gcc.target/i386/pr67265-2.c: Likewise.
* gcc.target/i386/pr67265.c: Likewise.
* gnat.dg/opt49.adb: Likewise.
* gnat.dg/stack_check1.adb: Likewise.
* gnat.dg/stack_check2.adb: Likewise.
* gnat.dg/stack_check3.adb: Likewise.
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 4136a68..85af877 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2311,6 +2311,11 @@ the codeset to convert to.
 Skip the test if the target does not support profiling with option
 @var{profopt}.
 
+@item dg-require-stack-check @var{check}
+Skip the test if the target does not support the @code{-fstack-check}
+option.  If @var{check} is @code{""}, support for @code{-fstack-check}
+is checked, for @code{-fstack-check=("@var{check}")} otherwise.
+
 @item dg-require-visibility @var{vis}
 Skip the test if the target does not support the @code{visibility} attribute.
 If @var{vis} is @code{""}, support for @code{visibility("hidden")} is
diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp
index 6400d64..d50d8b0 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -265,6 +265,21 @@ proc dg-require-linker-plugin { args } {
 }
 }
 
+# If this target does not support the "stack-check" option, skip this
+# test.
+
+proc dg-require-stack-check { args } {
+set stack_check_available [ check_stack_check_available [lindex $args 1 ] ]
+if { $stack_check_available == -1 } {
+   upvar name name
+   unresolved "$name"
+}
+if { $stack_check_available != 1 } {
+   upvar dg-do-what dg-do-what
+   set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+}
+
 # Add any target-specific flags needed for accessing the given list
 # of features.  This must come after all dg-options.
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp

Re: [PATCH] ASAN: handle addressable params (PR sanitize/81040).

2017-07-04 Thread Jakub Jelinek
On Fri, Jun 30, 2017 at 11:21:48AM +0200, Martin Liška wrote:
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/asan/function-argument-3.C
> @@ -0,0 +1,27 @@
> +// { dg-do run }
> +// { dg-shouldfail "asan" }
> +
> +typedef int v4si __attribute__ ((vector_size (16)));
> +
> +static __attribute__ ((noinline)) int
> +goo (v4si *a)
> +{
> +  return (*(volatile v4si *) (a + 1))[2];
> +}
> +
> +__attribute__ ((noinline)) int
> +foo (v4si arg)
> +{
> +  return goo ();
> +}

This test fails on i686-linux when -msse2 is not on by default,
fixed thusly, committed as obvious:

2017-07-04  Jakub Jelinek  

* g++.dg/asan/function-argument-3.C: Add -Wno-psabi to additional
options.

--- gcc/testsuite/g++.dg/asan/function-argument-3.C.jj  2017-07-03 
19:03:25.0 +0200
+++ gcc/testsuite/g++.dg/asan/function-argument-3.C 2017-07-04 
10:42:18.558716171 +0200
@@ -1,5 +1,6 @@
 // { dg-do run }
 // { dg-shouldfail "asan" }
+// { dg-additional-options "-Wno-psabi" }
 
 typedef int v4si __attribute__ ((vector_size (16)));
 


Jakub


Re: [PATCH 0/7] Support for the SPARC M8 cpu

2017-07-04 Thread Rainer Orth
Hi Jose,

> This patch serie adds support for the SPARC M8 processor to GCC.
> The SPARC M8 processor implements the Oracle SPARC Architecture 2017.
[...]
> Note that full binutils support for M8 was upstreamed in May 19.
> Bootstrapped and tested in sparc64-linux-gnu.  No regressions.

since the patch is touching Solaris-specific files, too, please also
regtest it on Solaris (S12 with Studio 12.6 fbe would be best, I
suppose).

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Patch ping (Re: [PATCH] Fix PR81175, make gather builtins pure)

2017-07-04 Thread Richard Biener
On Tue, 4 Jul 2017, Jakub Jelinek wrote:

> Hi!
> 
> On Tue, Jun 27, 2017 at 12:27:25PM +0200, Jakub Jelinek wrote:
> > Fixed thusly, ok for trunk?  Perhaps we should add another testcase to check
> > similarly gatherpf builtin without the lhs, but we'd need different options.
> 
> I'd like to ping this patch, ok for trunk?

Ok (I thought it was quite obvious).

Thanks,
Richard.

> > 2017-06-27  Jakub Jelinek  
> > 
> > PR target/81175
> > * gcc.target/i386/pr69255-2.c (foo): Use the return value of the
> > gather.
> > 
> > --- gcc/testsuite/gcc.target/i386/pr69255-2.c.jj2017-05-05 
> > 09:19:48.0 +0200
> > +++ gcc/testsuite/gcc.target/i386/pr69255-2.c   2017-06-27 
> > 12:20:31.697944761 +0200
> > @@ -12,7 +12,8 @@ __attribute__ ((__vector_size__ (16))) i
> >  void
> >  foo (const long long *p)
> >  {
> > -  __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { dg-error 
> > "needs isa option -m32 -mavx512vl" } */
> > +  volatile __attribute__ ((__vector_size__ (32))) long long c;
> > +  c = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);/* { 
> > dg-error "needs isa option -m32 -mavx512vl" } */
> >/* { dg-warning "AVX vector return without AVX enabled changes the ABI" 
> > "" { target *-*-* } .-1 } */
> >/* { dg-warning "AVX vector argument without AVX enabled changes the 
> > ABI" "" { target *-*-* } .-2 } */
> >  }


Re: Patch ping (Re: [PATCH] Fix PR81175, make gather builtins pure)

2017-07-04 Thread Uros Bizjak
On Tue, Jul 4, 2017 at 10:35 AM, Jakub Jelinek  wrote:
> Hi!
>
> On Tue, Jun 27, 2017 at 12:27:25PM +0200, Jakub Jelinek wrote:
>> Fixed thusly, ok for trunk?  Perhaps we should add another testcase to check
>> similarly gatherpf builtin without the lhs, but we'd need different options.
>
> I'd like to ping this patch, ok for trunk?
>
>> 2017-06-27  Jakub Jelinek  
>>
>>   PR target/81175
>>   * gcc.target/i386/pr69255-2.c (foo): Use the return value of the
>>   gather.

OK.

Thanks,
Uros.

>> --- gcc/testsuite/gcc.target/i386/pr69255-2.c.jj  2017-05-05 
>> 09:19:48.0 +0200
>> +++ gcc/testsuite/gcc.target/i386/pr69255-2.c 2017-06-27 12:20:31.697944761 
>> +0200
>> @@ -12,7 +12,8 @@ __attribute__ ((__vector_size__ (16))) i
>>  void
>>  foo (const long long *p)
>>  {
>> -  __builtin_ia32_gather3siv4di (a, p, b, 1, 1);  /* { dg-error 
>> "needs isa option -m32 -mavx512vl" } */
>> +  volatile __attribute__ ((__vector_size__ (32))) long long c;
>> +  c = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);  /* { dg-error 
>> "needs isa option -m32 -mavx512vl" } */
>>/* { dg-warning "AVX vector return without AVX enabled changes the ABI" 
>> "" { target *-*-* } .-1 } */
>>/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" 
>> "" { target *-*-* } .-2 } */
>>  }
>
> Jakub


move cpu specific settings for VxWorks to cpu specific config file

2017-07-04 Thread Olivier Hainque
Hello,

This fixes an inaccuracy introduced in the recent series
of patches preparing support for 64bit VxWorks configurations
together with VxWorks 7.

Alternative definitions for SIZE_TYPE, PTRDIFF_TYPE and
RTP library options were provided through a common VxWorks
configuration file, using conditional macros typically
provided in cpu specific config files, e.g. TARGET_LP64
taken from the x86 family of targets, which isn't quite
correct.

This was noticed by way of a build failure of the arm-wrs-vxworks
port, reported by Jeff (thanks !).

This change lets the arm-wrs-vxworks build get past the original
failure and I verified that the x86_64-vxworks7 port remains functional
afterwards. Committing.

The ARM build hits another failure now, which I think is unrelated to
the recent vxworks specific series of changes, but related to the large
ARM port specific cleanups currently underway.

Olivier


2017-07-04  Olivier Hainque  

* config/vxworks.h (PTRDIFF_TYPE, SIZE_TYPE): Restore
unconditional basic definitions.
(VXWORKS_LIBS_RTP): Likewise, prefixed by VXWORKS_SYSCALL_LIBS_RTP,
empty by default.
* config/i386/vxworks.h (PTRDIFF_TYPE, SIZE_TYPE): Redefine,
accounting for 64bit ABIs using cpu specific macros available for
this purpose.
(VXWORKS_SYSCALL_LIBS_RTP): Likewise.



vx-cpu64.diff
Description: Binary data


Patch ping (Re: [PATCH] Fix PR81175, make gather builtins pure)

2017-07-04 Thread Jakub Jelinek
Hi!

On Tue, Jun 27, 2017 at 12:27:25PM +0200, Jakub Jelinek wrote:
> Fixed thusly, ok for trunk?  Perhaps we should add another testcase to check
> similarly gatherpf builtin without the lhs, but we'd need different options.

I'd like to ping this patch, ok for trunk?

> 2017-06-27  Jakub Jelinek  
> 
>   PR target/81175
>   * gcc.target/i386/pr69255-2.c (foo): Use the return value of the
>   gather.
> 
> --- gcc/testsuite/gcc.target/i386/pr69255-2.c.jj  2017-05-05 
> 09:19:48.0 +0200
> +++ gcc/testsuite/gcc.target/i386/pr69255-2.c 2017-06-27 12:20:31.697944761 
> +0200
> @@ -12,7 +12,8 @@ __attribute__ ((__vector_size__ (16))) i
>  void
>  foo (const long long *p)
>  {
> -  __builtin_ia32_gather3siv4di (a, p, b, 1, 1);  /* { dg-error 
> "needs isa option -m32 -mavx512vl" } */
> +  volatile __attribute__ ((__vector_size__ (32))) long long c;
> +  c = __builtin_ia32_gather3siv4di (a, p, b, 1, 1);  /* { dg-error 
> "needs isa option -m32 -mavx512vl" } */
>/* { dg-warning "AVX vector return without AVX enabled changes the ABI" "" 
> { target *-*-* } .-1 } */
>/* { dg-warning "AVX vector argument without AVX enabled changes the ABI" 
> "" { target *-*-* } .-2 } */
>  }

Jakub


Re: [PATCH] Enable addressable params sanitization with --param asan-stack=1.

2017-07-04 Thread Jakub Jelinek
On Tue, Jul 04, 2017 at 09:47:29AM +0200, Martin Liška wrote:
> As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81040#c15, the 
> sanitization is
> done only when one uses use-after-scope. That's caused by fact that I 
> decorated the newly
> created auto variables with DECL_ARTIFICIAL = 1. Because of that
> 
> static inline bool
> asan_protect_stack_decl (tree decl)
> {
>   return DECL_P (decl)
> && (!DECL_ARTIFICIAL (decl)
>   || (asan_sanitize_use_after_scope () && TREE_ADDRESSABLE (decl)));
> }
> 
> returns false. I hope not marking the variable as DECL_ARTIFICIAL will work 
> fine?
> Or am I missing something?

Well, you should make sure the debug info is correct.
Which means ideally that there is just one DW_TAG_formal_parameter and no
DW_TAG_variable for the parameter.
For the addressable parameters I hope the corresponding artificial
vars just live in memory for the whole rest of the scope, at least for the
case where you emit a debug bind (hope it is after the assignment to the
artificial var) I think it should be fine to set DECL_IGNORED_P on the
artificial var instead of DECL_ARTIFICIAL.
For the other case where there is DECL_VALUE_EXPR, perhaps try it too and
see what you get.

Jakub


[PATCH] Enable addressable params sanitization with --param asan-stack=1.

2017-07-04 Thread Martin Liška
Hello.

As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81040#c15, the 
sanitization is
done only when one uses use-after-scope. That's caused by fact that I decorated 
the newly
created auto variables with DECL_ARTIFICIAL = 1. Because of that

static inline bool
asan_protect_stack_decl (tree decl)
{
  return DECL_P (decl)
&& (!DECL_ARTIFICIAL (decl)
|| (asan_sanitize_use_after_scope () && TREE_ADDRESSABLE (decl)));
}

returns false. I hope not marking the variable as DECL_ARTIFICIAL will work 
fine?
Or am I missing something?

Thanks,
Martin

>From b79133e3c9ad41b44f0a12c574fc1d0b8348ad89 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 4 Jul 2017 09:22:23 +0200
Subject: [PATCH] Enable addressable params sanitization with --param
 asan-stack=1.

gcc/ChangeLog:

2017-07-04  Martin Liska  

	PR sanitizer/81040
	* sanopt.c (sanitize_rewrite_addressable_params): Do not
	decorate variable as DECL_ARTIFICIAL in order to sanitize it.

gcc/testsuite/ChangeLog:

2017-07-04  Martin Liska  

	PR sanitizer/81040
	* g++.dg/asan/function-argument-1.C: Run the test-case w/o
	use-after-scope sanitization.
---
 gcc/sanopt.c| 1 -
 gcc/testsuite/g++.dg/asan/function-argument-1.C | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/sanopt.c b/gcc/sanopt.c
index 7692f6a9db7..8c80ff37d4d 100644
--- a/gcc/sanopt.c
+++ b/gcc/sanopt.c
@@ -905,7 +905,6 @@ sanitize_rewrite_addressable_params (function *fun)
 	  tree var = build_decl (DECL_SOURCE_LOCATION (arg),
  VAR_DECL, DECL_NAME (arg), type);
 	  TREE_ADDRESSABLE (var) = 1;
-	  DECL_ARTIFICIAL (var) = 1;
 
 	  gimple_add_tmp_var (var);
 
diff --git a/gcc/testsuite/g++.dg/asan/function-argument-1.C b/gcc/testsuite/g++.dg/asan/function-argument-1.C
index 148c4628316..bdbb37a44a4 100644
--- a/gcc/testsuite/g++.dg/asan/function-argument-1.C
+++ b/gcc/testsuite/g++.dg/asan/function-argument-1.C
@@ -1,5 +1,6 @@
 // { dg-do run }
 // { dg-shouldfail "asan" }
+// { dg-options "-fsanitize=address -fno-sanitize-address-use-after-scope" }
 
 struct A
 {
-- 
2.13.2



Re: [PATCH] Fix removal of ifunc (PR ipa/81214).

2017-07-04 Thread Martin Liška
On 07/03/2017 03:44 PM, Rainer Orth wrote:
> Hi Martin,
> 
>> Following patch fixes the issue where we do not emit ifunc and resolver
>> for function that are not called in a compilation unit or and not
>> referenced.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>> i386.exp tests work on x86_64-linux-gnu.
> 
> your patch caused a testsuite regression on various targets:
> 
> FAIL: gcc.target/i386/mvc6.c (test for excess errors)
> UNRESOLVED: gcc.target/i386/mvc6.c scan-assembler punpcklbw
> UNRESOLVED: gcc.target/i386/mvc6.c scan-assembler vpshufb

Hello.

Thanks for the patch, actually I was working on the patch when you committed :)

> 
> Excess errors:
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/mvc6.c:8:1: error: 
> the call requires ifunc, which is not supported by this target
> 
> I'm seeing it on i386-pc-solaris2.12, Dominique reported it in the PR on
> Darwin/x86, and there are also testsuite results on FreeBSD/x86.
> 
> Unlike most other __attribute__((target_clones)) tests, this one lacked
> { dg-require-ifunc "" } and didn't need it before.
> 
> The following patch fixes this.  Tested with the appropriate runtest
> invocation on i386-pc-solaris2.12 and x86_64-pc-linux-gnu, installed on
> mainline.
> 
> While I was at it, I checked the other testcases with
> __attribute__((target_clones)):
> 
> g++.dg/ext/mvc1.C dg-require-ifunc ""
> g++.dg/ext/mvc2.C 00 (dg-warning)
> g++.dg/ext/mvc3.C 00 (dg-warning)
> g++.dg/ext/mvc4.C dg-require-ifunc ""
> gcc.dg/tree-prof/pr66295.cdg-require-ifunc ""
> gcc.target/i386/mvc1.cdg-require-ifunc ""
> gcc.target/i386/mvc2.c00
> gcc.target/i386/mvc3.c00 (dg-error)
> gcc.target/i386/mvc4.cdg-require-ifunc ""
> gcc.target/i386/mvc5.cdg-require-ifunc ""
> gcc.target/i386/mvc6.c00
> gcc.target/i386/mvc7.cdg-require-ifunc ""
> gcc.target/i386/mvc8.cdg-require-ifunc ""
> gcc.target/i386/mvc9.cdg-require-ifunc ""
> gcc.target/i386/pr78419.c dg-require-ifunc ""
> gcc.target/i386/pr80732.c dg-require-ifunc ""
> gcc.target/i386/pr81214.c dg-require-ifunc ""
> gcc.target/powerpc/clone1.c   powerpc*-*-linux* && lp64
> 
> Of those without dg-require-ifunc, the powerpc one is (sort of) ok since
> it's restricted to Linux, and those with dg-warning/dg-error are too
> since the warnings is emitted before the error about missing ifunc
> support.  That leaves us with gcc.target/i386/mvc2.c, which is sort of
> weird because it emits no code at all.  No idea if this intended,
> though.

I prefer to add ifunc to all these tests. I'll do it in attached patch that I'm 
going
to install.

Thanks and sorry for the fallout.
Martin

> 
>   Rainer
> 

>From 8b3d6abf45c6f1fdfe6780869eb0c8e6a0380f5a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 4 Jul 2017 09:39:10 +0200
Subject: [PATCH] Add dg-require ifunc for mvc test-cases.

gcc/testsuite/ChangeLog:

2017-07-04  Martin Liska  

	PR ipa/81214
	* g++.dg/ext/mvc2.C: Add dg-require ifunc.
	* g++.dg/ext/mvc3.C: Likewise.
	* gcc.target/i386/mvc2.c: Likewise.
	* gcc.target/i386/mvc3.c: Likewise.
---
 gcc/testsuite/g++.dg/ext/mvc2.C  | 1 +
 gcc/testsuite/g++.dg/ext/mvc3.C  | 1 +
 gcc/testsuite/gcc.target/i386/mvc2.c | 1 +
 gcc/testsuite/gcc.target/i386/mvc3.c | 1 +
 4 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/g++.dg/ext/mvc2.C b/gcc/testsuite/g++.dg/ext/mvc2.C
index e7abab81d95..1b8c6f4d6e9 100644
--- a/gcc/testsuite/g++.dg/ext/mvc2.C
+++ b/gcc/testsuite/g++.dg/ext/mvc2.C
@@ -1,4 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 
 __attribute__((target_clones("avx","arch=slm","default")))
 __attribute__((target("avx")))
diff --git a/gcc/testsuite/g++.dg/ext/mvc3.C b/gcc/testsuite/g++.dg/ext/mvc3.C
index 05bebf7d4fb..d32b2c93aa0 100644
--- a/gcc/testsuite/g++.dg/ext/mvc3.C
+++ b/gcc/testsuite/g++.dg/ext/mvc3.C
@@ -1,4 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-ifunc "" } */
 
 __attribute__((target("avx")))
 __attribute__((target_clones("avx","arch=slm","default")))
diff --git a/gcc/testsuite/gcc.target/i386/mvc2.c b/gcc/testsuite/gcc.target/i386/mvc2.c
index 9635ec83fac..34a777c1d5e 100644
--- a/gcc/testsuite/gcc.target/i386/mvc2.c
+++ b/gcc/testsuite/gcc.target/i386/mvc2.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-ifunc "" } */
 
 __attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
 int foo ();
diff --git a/gcc/testsuite/gcc.target/i386/mvc3.c b/gcc/testsuite/gcc.target/i386/mvc3.c
index f940cdbbf55..1c7755fabbe 100644
--- a/gcc/testsuite/gcc.target/i386/mvc3.c
+++ b/gcc/testsuite/gcc.target/i386/mvc3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-ifunc "" } */
 
 

Re: [PATCH] Save and restore EDGE_DFS_BACK in draw_cfg_edges

2017-07-04 Thread Richard Biener
On Tue, 4 Jul 2017, Tom de Vries wrote:

> [was: Re: [PATCH] Add dotfn ]
> 
> On 07/03/2017 12:23 PM, Richard Biener wrote:
> > > Btw, I think this needs fixing:
> > > ...
> > > /* Draw all edges in the CFG.  Retreating edges are drawin as not
> > > constraining, this makes the layout of the graph better.
> > > (??? Calling mark_dfs_back may change the compiler's behavior when
> > > dumping, but computing back edges here for ourselves is also not
> > > desirable.)  */
> > > 
> > > static void
> > > draw_cfg_edges (pretty_printer *pp, struct function *fun)
> > > {
> > >basic_block bb;
> > >mark_dfs_back_edges ();
> > >FOR_ALL_BB_FN (bb, cfun)
> > >  draw_cfg_node_succ_edges (pp, fun->funcdef_no, bb);
> > > ...
> > > 
> > > We don't want that calling a debug function changes compiler behavior
> > > (something I ran into while debugging PR81192).
> > > 
> > > Any suggestion on how to address this? We could allocate a bitmap before
> > > and
> > > save the edge flag for all edges, and restore afterwards.
> 
> > Something like that, yes.
> > 
> 
> This patch implements that approach.
> 
> I've tried it with the PR81192 example and calling DOTFN in tail-merge, like
> this:
> 1. Just compiling the example without any patches gives a tail-merge
>sigsegv.
> 2. Compiling with the DOTFN call in tail-merge makes the sigsegv go
>away.
> 3. Adding this patch makes the sigsegv come back.
> 
> OK for trunk if bootstrap and reg-test on x86_64 succeeds?

You don't need to iterate over both preds and succs, succs
is enough.

Ok with that change.

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)