Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-18 Thread Dimitar Dimitrov
On Mon, Dec 17 2018 20:15:02 EET Bernd Edlinger wrote:
> out of curiosity I looked at the clobber statement in
> gdb/nat/linux-ptrace.c:
> 
>asm volatile ("pushq %0;"
>  ".globl linux_ptrace_test_ret_to_nx_instr;"
>  "linux_ptrace_test_ret_to_nx_instr:"
>  "ret"
>  : : "r" ((uint64_t) (uintptr_t) return_address)
>  : "%rsp", "memory");
> 
> it turns out to be a far jump, instruction.

GDB functionality should not be affected if SP clobber is removed, even if the 
generated code is slightly different. Please see this comment:
http://sourceware.org/ml/gdb-patches/2018-12/msg00204.html

As I understand it, this particular code is never meant to return. It should 
either stop due to the NX mapping of return_address/%0, or hit the breakpoint 
placed at return_address/%0.

Regards,
Dimitar


[PATCH] Shuffle ieee_arithmetic.F90

2018-12-18 Thread Steve Kargl
The attach patch shuffles lines around to eliminate 80
lines of #ifdef...#endif; thereby making the file more
readable.  Tested on i586-*-freebsd and x86_64-*-freebsd.
Patch is a pre-requisite to fixing issues and adding 
missing functionality.  OK to commit?

2018-12-18  Steven G. Kargl  

* libgfortran/ieee/ieee_arithmetic.F90: Re-organize file to 
eliminate excessive #ifdef ... #endif.

-- 
Steve
Index: libgfortran/ieee/ieee_arithmetic.F90
===
--- libgfortran/ieee/ieee_arithmetic.F90	(revision 267191)
+++ libgfortran/ieee/ieee_arithmetic.F90	(working copy)
@@ -228,38 +228,28 @@ module IEEE_ARITHMETIC
   end function
 
   interface
-COPYSIGN_MACRO(4,4)
-COPYSIGN_MACRO(4,8)
-#ifdef HAVE_GFC_REAL_10
-COPYSIGN_MACRO(4,10)
-#endif
 #ifdef HAVE_GFC_REAL_16
-COPYSIGN_MACRO(4,16)
-#endif
-COPYSIGN_MACRO(8,4)
-COPYSIGN_MACRO(8,8)
+COPYSIGN_MACRO(16,16)
 #ifdef HAVE_GFC_REAL_10
-COPYSIGN_MACRO(8,10)
+COPYSIGN_MACRO(16,10)
+COPYSIGN_MACRO(10,16)
 #endif
-#ifdef HAVE_GFC_REAL_16
+COPYSIGN_MACRO(16,8)
+COPYSIGN_MACRO(16,4)
 COPYSIGN_MACRO(8,16)
+COPYSIGN_MACRO(4,16)
 #endif
 #ifdef HAVE_GFC_REAL_10
-COPYSIGN_MACRO(10,4)
-COPYSIGN_MACRO(10,8)
 COPYSIGN_MACRO(10,10)
-#ifdef HAVE_GFC_REAL_16
-COPYSIGN_MACRO(10,16)
+COPYSIGN_MACRO(10,8)
+COPYSIGN_MACRO(10,4)
+COPYSIGN_MACRO(8,10)
+COPYSIGN_MACRO(4,10)
 #endif
-#endif
-#ifdef HAVE_GFC_REAL_16
-COPYSIGN_MACRO(16,4)
-COPYSIGN_MACRO(16,8)
-#ifdef HAVE_GFC_REAL_10
-COPYSIGN_MACRO(16,10)
-#endif
-COPYSIGN_MACRO(16,16)
-#endif
+COPYSIGN_MACRO(8,8)
+COPYSIGN_MACRO(8,4)
+COPYSIGN_MACRO(4,8)
+COPYSIGN_MACRO(4,4)
   end interface
 
   interface IEEE_COPY_SIGN
@@ -268,32 +258,22 @@ COPYSIGN_MACRO(16,16)
   _gfortran_ieee_copy_sign_16_16, &
 #ifdef HAVE_GFC_REAL_10
   _gfortran_ieee_copy_sign_16_10, &
+  _gfortran_ieee_copy_sign_10_16, &
 #endif
   _gfortran_ieee_copy_sign_16_8, &
   _gfortran_ieee_copy_sign_16_4, &
+  _gfortran_ieee_copy_sign_8_16, &
+  _gfortran_ieee_copy_sign_4_16, &
 #endif
 #ifdef HAVE_GFC_REAL_10
-#ifdef HAVE_GFC_REAL_16
-  _gfortran_ieee_copy_sign_10_16, &
-#endif
   _gfortran_ieee_copy_sign_10_10, &
   _gfortran_ieee_copy_sign_10_8, &
   _gfortran_ieee_copy_sign_10_4, &
-#endif
-#ifdef HAVE_GFC_REAL_16
-  _gfortran_ieee_copy_sign_8_16, &
-#endif
-#ifdef HAVE_GFC_REAL_10
   _gfortran_ieee_copy_sign_8_10, &
+  _gfortran_ieee_copy_sign_4_10, &
 #endif
   _gfortran_ieee_copy_sign_8_8, &
   _gfortran_ieee_copy_sign_8_4, &
-#ifdef HAVE_GFC_REAL_16
-  _gfortran_ieee_copy_sign_4_16, &
-#endif
-#ifdef HAVE_GFC_REAL_10
-  _gfortran_ieee_copy_sign_4_10, &
-#endif
   _gfortran_ieee_copy_sign_4_8, &
   _gfortran_ieee_copy_sign_4_4
   end interface
@@ -309,38 +289,28 @@ COPYSIGN_MACRO(16,16)
   end function
 
   interface
-UNORDERED_MACRO(4,4)
-UNORDERED_MACRO(4,8)
-#ifdef HAVE_GFC_REAL_10
-UNORDERED_MACRO(4,10)
-#endif
 #ifdef HAVE_GFC_REAL_16
-UNORDERED_MACRO(4,16)
-#endif
-UNORDERED_MACRO(8,4)
-UNORDERED_MACRO(8,8)
+UNORDERED_MACRO(16,16)
 #ifdef HAVE_GFC_REAL_10
-UNORDERED_MACRO(8,10)
+UNORDERED_MACRO(16,10)
+UNORDERED_MACRO(10,16)
 #endif
-#ifdef HAVE_GFC_REAL_16
+UNORDERED_MACRO(16,8)
+UNORDERED_MACRO(16,4)
 UNORDERED_MACRO(8,16)
+UNORDERED_MACRO(4,16)
 #endif
 #ifdef HAVE_GFC_REAL_10
-UNORDERED_MACRO(10,4)
-UNORDERED_MACRO(10,8)
 UNORDERED_MACRO(10,10)
-#ifdef HAVE_GFC_REAL_16
-UNORDERED_MACRO(10,16)
+UNORDERED_MACRO(10,8)
+UNORDERED_MACRO(10,4)
+UNORDERED_MACRO(8,10)
+UNORDERED_MACRO(4,10)
 #endif
-#endif
-#ifdef HAVE_GFC_REAL_16
-UNORDERED_MACRO(16,4)
-UNORDERED_MACRO(16,8)
-#ifdef HAVE_GFC_REAL_10
-UNORDERED_MACRO(16,10)
-#endif
-UNORDERED_MACRO(16,16)
-#endif
+UNORDERED_MACRO(8,8)
+UNORDERED_MACRO(8,4)
+UNORDERED_MACRO(4,8)
+UNORDERED_MACRO(4,4)
   end interface
 
   interface IEEE_UNORDERED
@@ -349,32 +319,22 @@ UNORDERED_MACRO(16,16)
   _gfortran_ieee_unordered_16_16, &
 #ifdef HAVE_GFC_REAL_10
   _gfortran_ieee_unordered_16_10, &
+  _gfortran_ieee_unordered_10_16, &
 #endif
   _gfortran_ieee_unordered_16_8, &
   _gfortran_ieee_unordered_16_4, &
+  _gfortran_ieee_unordered_8_16, &
+  _gfortran_ieee_unordered_4_16, &
 #endif
 #ifdef HAVE_GFC_REAL_10
-#ifdef HAVE_GFC_REAL_16
-  _gfortran_ieee_unordered_10_16, &
-#endif
   _gfortran_ieee_unordered_10_10, &
   _gfortran_ieee_unordered_10_8, &
   _gfortran_ieee_unordered_10_4, &
-#endif
-#ifdef HAVE_GFC_REAL_16
-  _gfortran_ieee_unordered_8_16, &
-#endif
-#ifdef HAVE_GFC_REAL_10
   _gfortran_ieee_unordered_8_10, &
+  _gfortran_ieee_unordered_4_10, &
 #endif
   _gfortran_ieee_unordered_8_8, &
   

[rs6000] Fix x86 SSSE3 compatibility implementations and testcases

2018-12-18 Thread Paul Clarke
This patch is the analog to r266868-r266870, but for SSSE3.
The SSSE3 tests had been inadvertently made to PASS without actually running
the test code. Actually running the code turned up some previously undetected
issues.

This patch fixes some issues in the implementations, fixes up the tests
to use a union for the test data, which avoids strict aliasing issues,
and enables the tests to actually run (by removing a dependency on
__BUILTIN_CPU_SUPPORTS).

Also, there's a fairly insignificant change in the testcases that walk
through the data as pairs of vectors from:
  [0] and [1]
  [2] and [3]
  ...
  [n-4] and [n-3]
  [n-2] and [n-1]

to:
  [0] and [1]
  [1] and [2]
  ...
  [n-3] and [n-2]
  [n-2] and [n-1]

Since the testcases compute the correct answers based on the input, no
other changes were necessary to effect the change.

2018-12-18  Paul A. Clarke  

[gcc]

* config/rs6000/tmmintrin.h (_mm_hadds_epi16): Vector lanes swapped.
(_mm_hsub_epi32): Likewise.
(_mm_shuffle_epi8): Fix reversed interpretation of parameters.
(_mm_shuffle_pi8): Likewise.
(_mm_addubs_pi16): Likewise.

[gcc/testsuite]

* gcc.target/powerpc/ssse3-check.h: Enable tests to run.
* gcc.target/powerpc/ssse3-pabsb.c: Code fixes for strict aliasing
issues.
* gcc.target/powerpc/ssse3-pabsd.c: Likewise.
* gcc.target/powerpc/ssse3-palignr.c: Likewise.
* gcc.target/powerpc/ssse3-phaddd.c: Likewise.
* gcc.target/powerpc/ssse3-phaddsw.c: Likewise.
* gcc.target/powerpc/ssse3-phaddw.c: Likewise.
* gcc.target/powerpc/ssse3-phsubd.c: Likewise.
* gcc.target/powerpc/ssse3-phsubw.c: Likewise.
* gcc.target/powerpc/ssse3-pmulhrsw.c: Likewise.
* gcc.target/powerpc/ssse3-pshufb.c: Likewise.
* gcc.target/powerpc/ssse3-psignb.c: Likewise.
* gcc.target/powerpc/ssse3-psignd.c: Likewise.
* gcc.target/powerpc/ssse3-psignw.c: Likewise.
* gcc.target/powerpc/ssse3-vals.h: Provide input data as a union.

Index: gcc/config/rs6000/tmmintrin.h
===
diff --git a/trunk/gcc/config/rs6000/tmmintrin.h 
b/trunk/gcc/config/rs6000/tmmintrin.h
--- a/trunk/gcc/config/rs6000/tmmintrin.h   (revision 267245)
+++ b/trunk/gcc/config/rs6000/tmmintrin.h   (working copy)
@@ -228,7 +228,7 @@ _mm_hadds_epi16 (__m128i __A, __m128i __B)
   __v4si __C = { 0 }, __D = { 0 };
   __C = vec_sum4s ((__v8hi) __A, __C);
   __D = vec_sum4s ((__v8hi) __B, __D);
-  __C = (__v4si) vec_packs (__D, __C);
+  __C = (__v4si) vec_packs (__C, __D);
   return (__m128i) __C;
 }
 
@@ -264,8 +264,8 @@ _mm_hsub_epi32 (__m128i __A, __m128i __B)
 {  0,  1,  2,  3,  8,  9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27 };
   const __v16qu __Q =
 {  4,  5,  6,  7, 12, 13, 14, 15, 20, 21, 22, 23, 28, 29, 30, 31 };
-  __v4si __C = vec_perm ((__v4si) __B, (__v4si) __A, __P);
-  __v4si __D = vec_perm ((__v4si) __B, (__v4si) __A, __Q);
+  __v4si __C = vec_perm ((__v4si) __A, (__v4si) __B, __P);
+  __v4si __D = vec_perm ((__v4si) __A, (__v4si) __B, __Q);
   return (__m128i) vec_sub (__C, __D);
 }
 
@@ -332,7 +332,7 @@ __attribute__((__gnu_inline__, __always_inline__,
 _mm_shuffle_epi8 (__m128i __A, __m128i __B)
 {
   const __v16qi __zero = { 0 };
-  __vector __bool char __select = vec_cmplt ((__v16qi) __A, __zero);
+  __vector __bool char __select = vec_cmplt ((__v16qi) __B, __zero);
   __v16qi __C = vec_perm ((__v16qi) __A, (__v16qi) __A, (__v16qu) __B);
   return (__m128i) vec_sel (__C, __zero, __select);
 }
@@ -344,7 +344,7 @@ _mm_shuffle_pi8 (__m64 __A, __m64 __B)
   const __v16qi __zero = { 0 };
   __v16qi __C = (__v16qi) (__v2du) { __A, __A };
   __v16qi __D = (__v16qi) (__v2du) { __B, __B };
-  __vector __bool char __select = vec_cmplt ((__v16qi) __C, __zero);
+  __vector __bool char __select = vec_cmplt ((__v16qi) __D, __zero);
   __C = vec_perm ((__v16qi) __C, (__v16qi) __C, (__v16qu) __D);
   __C = vec_sel (__C, __zero, __select);
   return (__m64) ((__v2du) (__C))[0];
@@ -423,11 +423,11 @@ extern __inline __m128i
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_maddubs_epi16 (__m128i __A, __m128i __B)
 {
-  __v8hi __C = vec_unpackh ((__v16qi) __A);
-  __v8hi __D = vec_unpackl ((__v16qi) __A);
   __v8hi __unsigned = vec_splats ((signed short) 0x00ff);
-  __v8hi __E = vec_and (vec_unpackh ((__v16qi) __B), __unsigned);
-  __v8hi __F = vec_and (vec_unpackl ((__v16qi) __B), __unsigned);
+  __v8hi __C = vec_and (vec_unpackh ((__v16qi) __A), __unsigned);
+  __v8hi __D = vec_and (vec_unpackl ((__v16qi) __A), __unsigned);
+  __v8hi __E = vec_unpackh ((__v16qi) __B);
+  __v8hi __F = vec_unpackl ((__v16qi) __B);
   __C = vec_mul (__C, __E);
   __D = vec_mul (__D, __F);
   const __v16qu __odds  =
@@ -445,10 +445,10 @@ _mm_maddubs_pi16 (__m64 __A, __m64 __B)
 {
   __v8hi __C = (__v8hi) (__v2du) { __A, __A };
   __C = vec_unpackl ((__v16qi) __C);
+  const __v8hi 

Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-18 Thread Bin.Cheng
On Wed, Dec 19, 2018 at 12:00 PM Andi Kleen  wrote:
>
> On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> > On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng  wrote:
> > >
> > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen  wrote:
> > > >
> > > > "bin.cheng"  writes:
> > > >
> > > > > Hi,
> > > > >
> > > > > Due to ICE and mal-functional bugs, indirect call value profile 
> > > > > transformation
> > > > > is disabled on GCC-7/8/trunk.  This patch restores the 
> > > > > transformation.  The
> > > > > main issue is AutoFDO should store cgraph_node's profile_id of callee 
> > > > > func in
> > > > > the first histogram value's counter, rather than pointer to callee's 
> > > > > name string
> > > > > as it is now.
> > > > > With the patch, some "Indirect call -> direct call" tests pass with 
> > > > > autofdo, while
> > > > > others are unstable.  I think the instability is caused by poor perf 
> > > > > data collected
> > > > > during regrets run, and can confirm these tests pass if good perf 
> > > > > data could be
> > > > > collected in manual experiments.
> > > >
> > > > Would be good to make the tests stable, otherwise we'll just have
> > > > regressions in the future again.
> > > >
> > > > The problem is that the tests don't run long enough and don't get 
> > > > enough samples?
> > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > -  int i;
> > > -  for (i = 0; i < 1000; i++)
> > > +  int i, j;
> > > +  for (i = 0; i < 100; i++)
> > > +for (j = 0; j < 50; j++)
> > >   g += tc->foo();
> > > if (g<100) g++;
> > >  }
> > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > >  static __attribute__((always_inline))
> > >  void test2 (B *tc)
> > >  {
> > > -  int i;
> > > +  int i, j;
> > >for (i = 0; i < 100; i++)
> > > +for (j = 0; j < 50; j++)
> > >
> > > I have to increase loop count like this to get stable pass on my
> > > machine.  The original count (1000) is too small to be sampled.
> > >
> > > >
> > > > Could add some loop?
> > > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > > Maybe, I will have a try.
> > Turned out all "Indirect call" test can be resolved by adding -c 100
> > to perf command line:
> > diff --git a/gcc/config/i386/gcc-auto-profile 
> > b/gcc/config/i386/gcc-auto-profile
> > ...
> > -exec perf record -e $E -b "$@"
> > +exec perf record -e $E -c 100 -b "$@"
> >
> > Is 100 too small here?  Or is it fine for all scenarios?
>
> -c 100 is risky because it can cause perf throttling, which
> makes it lose data.
Right, it looks suspicious to me too.

>
> perf has a limiter that if the PMU handler uses too much CPU
> time it stops measuring for some time. A PMI is 10k+ cycles,
> so doing one every 100 branches is a lot of CPU time.
>
> I wouldn't go down that low. It is better to increase the
> iteration count.
We can combine the two together, increasing iteration count and
decreasing perf count at the same time.  What count would you suggest
from your experience?

Thanks,
bin
>
> -Andi


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-18 Thread Andi Kleen
On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng  wrote:
> >
> > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen  wrote:
> > >
> > > "bin.cheng"  writes:
> > >
> > > > Hi,
> > > >
> > > > Due to ICE and mal-functional bugs, indirect call value profile 
> > > > transformation
> > > > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  
> > > > The
> > > > main issue is AutoFDO should store cgraph_node's profile_id of callee 
> > > > func in
> > > > the first histogram value's counter, rather than pointer to callee's 
> > > > name string
> > > > as it is now.
> > > > With the patch, some "Indirect call -> direct call" tests pass with 
> > > > autofdo, while
> > > > others are unstable.  I think the instability is caused by poor perf 
> > > > data collected
> > > > during regrets run, and can confirm these tests pass if good perf data 
> > > > could be
> > > > collected in manual experiments.
> > >
> > > Would be good to make the tests stable, otherwise we'll just have
> > > regressions in the future again.
> > >
> > > The problem is that the tests don't run long enough and don't get enough 
> > > samples?
> > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > -  int i;
> > -  for (i = 0; i < 1000; i++)
> > +  int i, j;
> > +  for (i = 0; i < 100; i++)
> > +for (j = 0; j < 50; j++)
> >   g += tc->foo();
> > if (g<100) g++;
> >  }
> > @@ -27,8 +28,9 @@ void test1 (A *tc)
> >  static __attribute__((always_inline))
> >  void test2 (B *tc)
> >  {
> > -  int i;
> > +  int i, j;
> >for (i = 0; i < 100; i++)
> > +for (j = 0; j < 50; j++)
> >
> > I have to increase loop count like this to get stable pass on my
> > machine.  The original count (1000) is too small to be sampled.
> >
> > >
> > > Could add some loop?
> > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > Maybe, I will have a try.
> Turned out all "Indirect call" test can be resolved by adding -c 100
> to perf command line:
> diff --git a/gcc/config/i386/gcc-auto-profile 
> b/gcc/config/i386/gcc-auto-profile
> ...
> -exec perf record -e $E -b "$@"
> +exec perf record -e $E -c 100 -b "$@"
> 
> Is 100 too small here?  Or is it fine for all scenarios?

-c 100 is risky because it can cause perf throttling, which
makes it lose data.

perf has a limiter that if the PMU handler uses too much CPU
time it stops measuring for some time. A PMI is 10k+ cycles,
so doing one every 100 branches is a lot of CPU time.

I wouldn't go down that low. It is better to increase the
iteration count.

-Andi


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-18 Thread Andi Kleen
On Wed, Dec 19, 2018 at 09:26:51AM +0800, Bin.Cheng wrote:
> On Wed, Dec 19, 2018 at 5:27 AM Andi Kleen  wrote:
> >
> > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > -  int i;
> > > -  for (i = 0; i < 1000; i++)
> > > +  int i, j;
> > > +  for (i = 0; i < 100; i++)
> > > +for (j = 0; j < 50; j++)
> > >   g += tc->foo();
> > > if (g<100) g++;
> > >  }
> > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > >  static __attribute__((always_inline))
> > >  void test2 (B *tc)
> > >  {
> > > -  int i;
> > > +  int i, j;
> > >for (i = 0; i < 100; i++)
> > > +for (j = 0; j < 50; j++)
> > >
> > > I have to increase loop count like this to get stable pass on my
> > > machine.  The original count (1000) is too small to be sampled.
> >
> > IIRC It was originally higher, but people running on slow simulators 
> > complained,
> > so it was reduced.  Perhaps we need some way to detect in the test suite
> > that the test runs on a real CPU.
> Is there concise way to do this, given gcc may be run on all kinds of
> virtual scenarios?

Virtual should be fine too, just simulators are too slow.

I hope there is, because we certainly need a solution for production
ready autofdo.

Or perhaps could just check if perf is working and only
run the tests if that is true. The TCL code already
checks that. Just would need to pass that information
somehow as a define.

Overall I suspect far more test coverage is needed
to make it solid. The existing tests are not that great.


> 
> >
> > >
> > > > > FYI, an update about AutoFDO status:
> > > > > All AutoFDO ICEs in regtest are fixed, while several tests still 
> > > > > failing fall in below
> > > > > three categories:
> > > >
> > > > Great!
> > > >
> > > > Of course it still ICEs with LTO?
> > > >
> > > > Right now there is no test case for this I think. Probably one should 
> > > > be added.
> >
> >
> > Any comments on this?
> We'd like to further investigate AutoFDO+LTO, may I ask what the
> status is (or was)?  Any background elaboration about this would be
> appreciated.

It just never worked and ICEs very quickly if you try it.  

There's an open PR (PR71672)

There are some other open issues with autofdo BTW, e.g. the
old 4.9 google branch still has more features than mainline.
For example it supported discriminators, so can distinguish more
than one basic block per source line.

The last time I tested the gains with mainline autofdo
were also significantly less than 4.9-google, so there might
be other tunings missing.

-Andi



Re: [PATCH, C++] Fix PR c++/88261

2018-12-18 Thread Jason Merrill

On 12/15/18 3:36 AM, Bernd Edlinger wrote:

this patch implements an error message, for non-static initialization of a 
flexible array member.
This duplicates the existing error message from the C-FE, to avoid ICE and 
wrong code generation
issues, as pointed out in the PR.

It is a bit funny that a non-functional feature like that has already rather 
much test coverage.
The most easy adjustment seems to change the existing test cases to use static 
declarations.


Martin, thoughts?

Jason


Re: [C++ PATCH] Constexpr fold even some TREE_CONSTANT ctors (PR c++/87934)

2018-12-18 Thread Jason Merrill

On 12/18/18 6:19 PM, Jakub Jelinek wrote:

On Tue, Dec 18, 2018 at 05:40:03PM -0500, Jason Merrill wrote:

On 12/18/18 3:45 PM, Jakub Jelinek wrote:

The following testcase FAILs, because parsing creates a TREE_CONSTANT
CONSTRUCTOR that contains CONST_DECL elts.  cp_fold_r can handle that,
but constexpr evaluation doesn't touch those CONSTRUCTORs.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?


OK.  I also wonder if store_init_value should use cp_fold_r rather than just
cp_fully_fold.


I've been thinking about that already when working on the PR88410 bug.

Do you mean something like following completely untested patch?
Perhaps I could add a helper inline so that there is no code repetition
between cp_fully_fold and this new function.


Something like that, yes.

Jason


Re: [gofrontend-dev] Re: libgo patch committed: Add precise stack scan support

2018-12-18 Thread Matthias Klose
Cherry, see
https://gcc.gnu.org/ml/gcc-testresults/2018-12/msg02241.html
https://gcc.gnu.org/ml/gcc-testresults/2018-12/msg02240.html

still shows ~180 test failures on ARM32.  Sorry, no access to an ARM32 box until
next year.

Matthias

On 13.12.18 00:27, Ian Lance Taylor wrote:
> On Wed, Dec 12, 2018 at 8:10 AM Cherry Zhang  wrote:
>>
>> Thank you, Matthias!
>>
>> From the log, essentially all the tests aborted. The only place the new code 
>> can cause abort on all programs that I can think of is in the runtime 
>> startup code, probestackmaps, which calls value_size, which aborts due to an 
>> unhandled case. I haven't been able to try out on an ARM machine, but I 
>> managed to cross-compile a Go program and visually inspect the exception 
>> table. The type table's encoding is DW_EH_PE_absptr, which is indeed not 
>> handled, which will cause abort.
>>
>> I send https://go-review.googlesource.com/c/gofrontend/+/153857 (also as 
>> below). Hopefully this will fix the problem.
>>
>> Thanks,
>> Cherry
>>
>> diff --git a/libgo/runtime/go-unwind.c b/libgo/runtime/go-unwind.c
>> index c44755f9..f4bbfb60 100644
>> --- a/libgo/runtime/go-unwind.c
>> +++ b/libgo/runtime/go-unwind.c
>> @@ -318,6 +318,8 @@ value_size (uint8_t encoding)
>>case DW_EH_PE_sdata8:
>>case DW_EH_PE_udata8:
>>  return 8;
>> +  case DW_EH_PE_absptr:
>> +return sizeof(uintptr);
>>default:
>>  break;
>>  }
> 
> 
> Thanks.
> 
> Committed to mainline.
> 
> Ian
> 
> 
> 
>> On Tue, Dec 11, 2018 at 7:03 PM Matthias Klose  wrote:
>>>
>>> On 11.12.18 22:01, Cherry Zhang wrote:
 On Tue, Dec 11, 2018 at 3:51 PM Ian Lance Taylor  wrote:

> On Tue, Dec 11, 2018 at 6:52 AM Matthias Klose  wrote:
>>
>> On 10.12.18 16:54, Cherry Zhang wrote:
>>> On Mon, Dec 10, 2018 at 1:41 AM Matthias Klose 
> wrote:
>>>
 On 06.12.18 00:09, Ian Lance Taylor wrote:
> This libgo patch by Cherry Zhang adds support for precise stack
> scanning to the Go runtime.  This uses per-function stack maps stored
> in the exception tables in the language-specific data area.  The
> compiler needs to generate these stack maps; currently this is only
> done by a version of LLVM, not by GCC.  Each safepoint in a function
> is associated with a (real or dummy) landing pad, and its "type info"
> in the exception table is a pointer to the stack map. When a stack is
> scanned, the stack map is found by the stack unwinding code.
>
> For precise stack scan we need to unwind the stack. There are three
 cases:
>
> - If a goroutine is scanning its own stack, it can unwind the stack
> and scan the frames.
>
> - If a goroutine is scanning another, stopped, goroutine, it cannot
> directly unwind the target stack. We handle this by switching
> (runtime.gogo) to the target g, letting it unwind and scan the stack,
> and switch back.
>
> - If we are scanning a goroutine that is blocked in a syscall, we
> send
> a signal to the target goroutine's thread, and let the signal handler
> unwind and scan the stack. Extra care is needed as this races with
> enter/exit syscall.
>
> Currently this is only implemented on GNU/Linux.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.

 this broke the libgo build on ARM32:

 ../../../src/libgo/runtime/go-unwind.c: In function
 'scanstackwithmap_callback':
 ../../../src/libgo/runtime/go-unwind.c:754:18: error:
> '_URC_NORMAL_STOP'
 undeclared (first use in this function)
   754 |   return _URC_NORMAL_STOP;
   |  ^~~~
 ../../../src/libgo/runtime/go-unwind.c:754:18: note: each undeclared
 identifier
 is reported only once for each function i
 t appears in
 ../../../src/libgo/runtime/go-unwind.c: In function
 'probestackmaps_callback':
 ../../../src/libgo/runtime/go-unwind.c:802:10: error:
> '_URC_NORMAL_STOP'
 undeclared (first use in this function)
   802 |   return _URC_NORMAL_STOP;
   |  ^~~~
 ../../../src/libgo/runtime/go-unwind.c:803:1: warning: control
> reaches end
 of
 non-void function [-Wreturn-type]
   803 | }
   | ^
 make[6]: *** [Makefile:1474: runtime/go-unwind.lo] Error 1
 make[6]: Leaving directory
 '/<>/build/arm-linux-gnueabihf/libgo'


>>> Hell Matthias,
>>>
>>> Thank you for the report. And sorry about the breakage. Does
>>> https://go-review.googlesource.com/c/gofrontend/+/153417 (or the patch
>>> below) fix ARM32 build? I don't have an ARM32 machine at hand 

Re: V8 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-18 Thread Sandra Loosemore

On 12/18/18 2:12 PM, H.J. Lu wrote:


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ac2ee59d92c..47f2fc3f518 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -358,6 +358,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wuseless-cast  -Wvariadic-macros  -Wvector-operation-performance @gol
 -Wvla  -Wvla-larger-than=@var{byte-size}  -Wvolatile-register-var @gol
 -Wwrite-strings @gol
+-Waddress-of-packed-member @gol
 -Wzero-as-null-pointer-constant  -Whsa}
 
 @item C and Objective-C-only Warning Options


Minor documentation nit:  it looks like some effort has been made to 
alphabetize that list.  Can you please put -Waddress-of-packed member in 
the right place, and also fix the misplaced -Whsa at the end?


-Sandra



[PATCH] [aarch64] Revert support for ARMv8.2 in tsv110

2018-12-18 Thread Shaokun Zhang
For HiSilicon's tsv110 cpu core, it supports some v8_4A features, but
some mandatory features are not implemented. Revert to ARMv8.2 that
all mandatory features are supported.

---
 gcc/ChangeLog| 5 +
 gcc/config/aarch64/aarch64-cores.def | 6 +++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e9f5baa6557c..842876b0ae90 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2018-12-19 Shaokun Zhang  
+
+* config/aarch64/aarch64-cores.def (tsv110) : Revert support for ARMv8.2
+   in tsv110.
+
 2018-12-18  Vladimir Makarov  
 
PR rtl-optimization/87759
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 74be5dbf2595..20f4924e084d 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -96,10 +96,10 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
 AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa72, 0x41, 0xd0b, -1)
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
cortexa72, 0x41, 0xd0c, -1)
 
-/* ARMv8.4-A Architecture Processors.  */
-
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110", tsv110,cortexa57,8_4A, 
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | 
AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+
+/* ARMv8.4-A Architecture Processors.  */
 
 /* Qualcomm ('Q') cores. */
 AARCH64_CORE("saphira", saphira,saphira,8_4A,  
AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 
0xC01, -1)
-- 
2.7.4



Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-18 Thread Bin.Cheng
On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng  wrote:
>
> On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen  wrote:
> >
> > "bin.cheng"  writes:
> >
> > > Hi,
> > >
> > > Due to ICE and mal-functional bugs, indirect call value profile 
> > > transformation
> > > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  
> > > The
> > > main issue is AutoFDO should store cgraph_node's profile_id of callee 
> > > func in
> > > the first histogram value's counter, rather than pointer to callee's name 
> > > string
> > > as it is now.
> > > With the patch, some "Indirect call -> direct call" tests pass with 
> > > autofdo, while
> > > others are unstable.  I think the instability is caused by poor perf data 
> > > collected
> > > during regrets run, and can confirm these tests pass if good perf data 
> > > could be
> > > collected in manual experiments.
> >
> > Would be good to make the tests stable, otherwise we'll just have
> > regressions in the future again.
> >
> > The problem is that the tests don't run long enough and don't get enough 
> > samples?
> Yes, take g++.dg/tree-prof/morefunc.C as an example:
> -  int i;
> -  for (i = 0; i < 1000; i++)
> +  int i, j;
> +  for (i = 0; i < 100; i++)
> +for (j = 0; j < 50; j++)
>   g += tc->foo();
> if (g<100) g++;
>  }
> @@ -27,8 +28,9 @@ void test1 (A *tc)
>  static __attribute__((always_inline))
>  void test2 (B *tc)
>  {
> -  int i;
> +  int i, j;
>for (i = 0; i < 100; i++)
> +for (j = 0; j < 50; j++)
>
> I have to increase loop count like this to get stable pass on my
> machine.  The original count (1000) is too small to be sampled.
>
> >
> > Could add some loop?
> > Or possibly increase the sampling frequency in perf (-F or -c)?
> Maybe, I will have a try.
Turned out all "Indirect call" test can be resolved by adding -c 100
to perf command line:
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
...
-exec perf record -e $E -b "$@"
+exec perf record -e $E -c 100 -b "$@"

Is 100 too small here?  Or is it fine for all scenarios?

Thanks,
bin

> > Or run them multiple times and use gcov_merge to merge the files?
> Without changing loop count or sampling frequency, this is not likely
> to be helpful, since perf doesn't hit the small loop in most cases.
>
> Thanks,
> bin
> >
> >
> > > FYI, an update about AutoFDO status:
> > > All AutoFDO ICEs in regtest are fixed, while several tests still failing 
> > > fall in below
> > > three categories:
> >
> > Great!
> >
> > Of course it still ICEs with LTO?
> >
> > Right now there is no test case for this I think. Probably one should be 
> > added.
> >
> > -Andi


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-18 Thread Bin.Cheng
On Wed, Dec 19, 2018 at 5:27 AM Andi Kleen  wrote:
>
> > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > -  int i;
> > -  for (i = 0; i < 1000; i++)
> > +  int i, j;
> > +  for (i = 0; i < 100; i++)
> > +for (j = 0; j < 50; j++)
> >   g += tc->foo();
> > if (g<100) g++;
> >  }
> > @@ -27,8 +28,9 @@ void test1 (A *tc)
> >  static __attribute__((always_inline))
> >  void test2 (B *tc)
> >  {
> > -  int i;
> > +  int i, j;
> >for (i = 0; i < 100; i++)
> > +for (j = 0; j < 50; j++)
> >
> > I have to increase loop count like this to get stable pass on my
> > machine.  The original count (1000) is too small to be sampled.
>
> IIRC It was originally higher, but people running on slow simulators 
> complained,
> so it was reduced.  Perhaps we need some way to detect in the test suite
> that the test runs on a real CPU.
Is there concise way to do this, given gcc may be run on all kinds of
virtual scenarios?

>
> >
> > > > FYI, an update about AutoFDO status:
> > > > All AutoFDO ICEs in regtest are fixed, while several tests still 
> > > > failing fall in below
> > > > three categories:
> > >
> > > Great!
> > >
> > > Of course it still ICEs with LTO?
> > >
> > > Right now there is no test case for this I think. Probably one should be 
> > > added.
>
>
> Any comments on this?
We'd like to further investigate AutoFDO+LTO, may I ask what the
status is (or was)?  Any background elaboration about this would be
appreciated.

Thanks,
bin
>
> -Andi


Re: [PR86153] simplify more overflow tests in VRP

2018-12-18 Thread Jeff Law
On 12/18/18 3:58 AM, Alexandre Oliva wrote:
> Jeff, you mentioned you had changes to the VRP overflow test that would
> fix this, but I couldn't figure out whether or not you ever put them in
> and it regressed again later, or what.  Anyway, here's my take on it.
No, they're not on the trunk yet.  They're sitting here in my tester --
I lost the testcase I'd written to exercise them and hadn't gone back
and recreated it.

Mine catches fgt32, fge22, fge32, but misses the others in your
testcase.  I was generalizing the code in the same place, targeted
towards the 83239 testcase prior to Jon inserting the
___builtin_unreachable calls into the runtime.  They generalized things
so that instead of +-1 and a comparison against zero, we could have an
arbitrary constant and a relational between A or B and the constant.

I went back and recreated the testcase from 83239 prior to Jon's
patches.  Then verified it will issue a bogus warning on the trunk.
Then I applied your patch to the trunk and verified yours fixes the
warning.  So AFAICT your patch addresses the missed optimization in
83239 as well as the issues in 86153.  Please reference 83239 in your
your ChangeLog and close 83239 when you install  your patch.

I'm going to drop my changes related to 83239.  I don't think they have
much value once your patch is installed, except perhaps to slightly
simplify the code.





> 
> The reason we issued the warnings was that we failed to optimize out
> some parts of _M_fill_insert, used by the C++98 version of vector
> resize, although the call of _M_fill_insert was guarded by a test that
> could never pass: test testcase only calls resize when the vector size
> is >= 3, to decrement the size by two.  The limitation we hit in VRP
> was that the compared values could pass as an overflow test, if the
> vector size was 0 or 1 (we knew it wasn't), but even with dynamic
> ranges we failed to decide that the test result could be determined at
> compile time, even though after the test we introduced ASSERT_EXPRs
> that required a condition known to be false from earlier ones.
> 
> I pondered turning ASSERT_EXPRs that show impossible conditions into
> traps, to enable subsequent instructions to be optimized, but I ended
> up finding an earlier spot in which an overflow test that would have
> introduced the impossible ASSERT_EXPR can have its result deduced from
> earlier known ranges and resolved to the other path.
Right.  IMHO it's better to use the results of the ASSERT_EXPR to deduce
the tighter ranges and either prove a conditional is always true or
always false.


> 
> Although such overflow tests could be uniformly simplified to compares
> against a constant, the original code would only perform such
> simplifications when the test could be resolved to an equality test
> against zero.  I've thus avoided introducing compares against other
> constants, and instead added code that will only simplify overflow
> tests that weren't simplified before when the condition can be
> evaluated at compile time.That limitation was precisely what my (unsubmitted) 
> patch was trying to
address :-)



> 
> 
> Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?
> 
> 
> for  gcc/ChangeLog
> 
>   PR testsuite/86153
>   * vr-values.c
>   (vr_values::vrp_evaluate_conditional_warnv_with_ops): Extend
>   simplification of overflow tests to cover cases in which we
>   can determine the result of the comparison.
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR testsuite/86153
>   * gcc.dg/vrp-overflow-1.c: New.
> ---
>  gcc/testsuite/gcc.dg/vrp-overflow-1.c |  151 
> +
>  gcc/vr-values.c   |   32 +++
>  2 files changed, 183 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vrp-overflow-1.c
> 

> diff --git a/gcc/vr-values.c b/gcc/vr-values.c
> index cbc759a18e6a..25390ed6ef86 100644
> --- a/gcc/vr-values.c
> +++ b/gcc/vr-values.c
> @@ -2336,6 +2336,38 @@ vr_values::vrp_evaluate_conditional_warnv_with_ops 
> (enum tree_code code,
> op1 = wide_int_to_tree (TREE_TYPE (op0), 0);
> code = (code == GT_EXPR || code == GE_EXPR) ? EQ_EXPR : NE_EXPR;
>   }
> +  else
> + {
> +   value_range vro, vri;
> +   if (code == GT_EXPR || code == GE_EXPR)
> + {
> +   vro.set (VR_ANTI_RANGE, TYPE_MIN_VALUE (TREE_TYPE (op0)), x);
> +   vri.set (VR_RANGE, TYPE_MIN_VALUE (TREE_TYPE (op0)), x);
> + }
> +   else if (code == LT_EXPR || code == LE_EXPR)
> + {
> +   vro.set (VR_RANGE, TYPE_MIN_VALUE (TREE_TYPE (op0)), x);
> +   vri.set (VR_ANTI_RANGE, TYPE_MIN_VALUE (TREE_TYPE (op0)), x);
> + }
> +   else
> + gcc_unreachable ();
> +   value_range *vr0 = get_value_range (op0);
> +   /* If the range for OP0 to pass the overflow test, namely
> +  vro, has no intersection with the range for OP0, then the
> +  overflow test can't pass, so return 

Re: [C++ PATCH] Constexpr fold even some TREE_CONSTANT ctors (PR c++/87934)

2018-12-18 Thread Jakub Jelinek
On Tue, Dec 18, 2018 at 05:40:03PM -0500, Jason Merrill wrote:
> On 12/18/18 3:45 PM, Jakub Jelinek wrote:
> > The following testcase FAILs, because parsing creates a TREE_CONSTANT
> > CONSTRUCTOR that contains CONST_DECL elts.  cp_fold_r can handle that,
> > but constexpr evaluation doesn't touch those CONSTRUCTORs.
> > 
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk?
> 
> OK.  I also wonder if store_init_value should use cp_fold_r rather than just
> cp_fully_fold.

I've been thinking about that already when working on the PR88410 bug.

Do you mean something like following completely untested patch?
Perhaps I could add a helper inline so that there is no code repetition
between cp_fully_fold and this new function.

Note, it doesn't fix this PR, as store_init_value is called only after we
emit the error, so the constexpr.c patch is needed too.

--- gcc/cp/cp-tree.h.jj 2018-12-12 23:43:57.211129676 +0100
+++ gcc/cp/cp-tree.h2018-12-19 00:12:59.795154220 +0100
@@ -7542,6 +7542,7 @@ extern bool cxx_omp_privatize_by_referen
 extern bool cxx_omp_disregard_value_expr   (tree, bool);
 extern void cp_fold_function   (tree);
 extern tree cp_fully_fold  (tree);
+extern tree cp_fully_fold_init (tree);
 extern void clear_fold_cache   (void);
 extern tree lookup_hotness_attribute   (tree);
 extern tree process_stmt_hotness_attribute (tree);
--- gcc/cp/typeck2.c.jj 2018-12-01 00:25:09.340988953 +0100
+++ gcc/cp/typeck2.c2018-12-19 00:14:19.306875071 +0100
@@ -750,7 +750,7 @@ split_nonconstant_init (tree dest, tree
 init = TARGET_EXPR_INITIAL (init);
   if (TREE_CODE (init) == CONSTRUCTOR)
 {
-  init = cp_fully_fold (init);
+  init = cp_fully_fold_init (init);
   code = push_stmt_list ();
   if (split_nonconstant_init_1 (dest, init))
init = NULL_TREE;
@@ -858,7 +858,7 @@ store_init_value (tree decl, tree init,
   if (!const_init)
value = oldval;
 }
-  value = cp_fully_fold (value);
+  value = cp_fully_fold_init (value);
 
   /* Handle aggregate NSDMI in non-constant initializers, too.  */
   value = replace_placeholders (value, decl);
--- gcc/cp/cp-gimplify.c.jj 2018-12-17 22:54:02.736416699 +0100
+++ gcc/cp/cp-gimplify.c2018-12-19 00:12:05.862021875 +0100
@@ -2171,6 +2171,32 @@ cp_fully_fold (tree x)
   return cp_fold_rvalue (x);
 }
 
+/* Likewise, but also fold recursively.  */
+
+tree
+cp_fully_fold_init (tree x)
+{
+  if (processing_template_decl)
+return x;
+  /* FIXME cp_fold ought to be a superset of maybe_constant_value so we don't
+ have to call both.  */
+  if (cxx_dialect >= cxx11)
+{
+  x = maybe_constant_value (x);
+  /* Sometimes we are given a CONSTRUCTOR but the call above wraps it into
+a TARGET_EXPR; undo that here.  */
+  if (TREE_CODE (x) == TARGET_EXPR)
+   x = TARGET_EXPR_INITIAL (x);
+  else if (TREE_CODE (x) == VIEW_CONVERT_EXPR
+  && TREE_CODE (TREE_OPERAND (x, 0)) == CONSTRUCTOR
+  && TREE_TYPE (TREE_OPERAND (x, 0)) == TREE_TYPE (x))
+   x = TREE_OPERAND (x, 0);
+}
+  hash_set pset;
+  cp_walk_tree (, cp_fold_r, , NULL);
+  return cp_fold_rvalue (x);
+}
+
 /* c-common interface to cp_fold.  If IN_INIT, this is in a static initializer
and certain changes are made to the folding done.  Or should be (FIXME).  We
never touch maybe_const, as it is only used for the C front-end


Jakub


Re: [C++ PATCH] Constexpr fold even some TREE_CONSTANT ctors (PR c++/87934)

2018-12-18 Thread Jason Merrill

On 12/18/18 3:45 PM, Jakub Jelinek wrote:

The following testcase FAILs, because parsing creates a TREE_CONSTANT
CONSTRUCTOR that contains CONST_DECL elts.  cp_fold_r can handle that,
but constexpr evaluation doesn't touch those CONSTRUCTORs.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?


OK.  I also wonder if store_init_value should use cp_fold_r rather than 
just cp_fully_fold.


Jason


Re: [C++ PATCH] Avoid GC during cp_parser_parenthesized_expression_list (PR c++/88180)

2018-12-18 Thread Jason Merrill

On 12/18/18 3:49 PM, Jakub Jelinek wrote:

cp_parser_parenthesized_expression_list creates expression_list in GC
memory; if it is called when current_function_decl is NULL, there might be
ggc_collect in the middle of the parsing and collect that vector.

Fixed by temporarily bumping function_depth.  Or should that be done in some
other function from this function down to the ggc_collect (the PR has full
backtrace when that happens)?



#2  0x00c56d6a in ggc_collect () at ../../gcc/ggc-page.c:2207
#3  0x00d30095 in cgraph_node::finalize_function (decl=, no_collect=false) at ../../gcc/cgraphunit.c:492
#4  0x00b18c8c in expand_or_defer_fn (fn=) at ../../gcc/cp/semantics.c:4300
#5  0x00a34edf in cp_parser_function_definition_after_declarator 
(parser=0x77ff6ab0, inline_p=true) at ../../gcc/cp/parser.c:27338
#6  0x00a375f6 in cp_parser_late_parsing_for_member (parser=0x77ff6ab0, 
member_function=)
at ../../gcc/cp/parser.c:28215
#7  0x00a2d75f in cp_parser_class_specifier_1 (parser=0x77ff6ab0) 
at ../../gcc/cp/parser.c:23240
#8  0x00a2d83c in cp_parser_class_specifier (parser=0x77ff6ab0) at 
../../gcc/cp/parser.c:23266


So, we end up calling ggc_collect because we're processing a member 
function in a context where defining a type is not allowed.  One 
solution would be to not do late parsing of members in such a context.


We don't have this problem with lambdas because cp_parser_lambda_body 
already increments function_depth to avoid GC in the middle of an 
expression.


Jason




Re: V8 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-18 Thread Jason Merrill

On 12/18/18 4:12 PM, H.J. Lu wrote:

On Tue, Dec 18, 2018 at 12:36 PM Jason Merrill  wrote:


On 12/18/18 9:10 AM, H.J. Lu wrote:

+  switch (TREE_CODE (rhs))
+{
+case ADDR_EXPR:
+  base = TREE_OPERAND (rhs, 0);
+  while (handled_component_p (base))
+ {
+   if (TREE_CODE (base) == COMPONENT_REF)
+ break;
+   base = TREE_OPERAND (base, 0);
+ }
+  if (TREE_CODE (base) != COMPONENT_REF)
+ return NULL_TREE;
+  object = TREE_OPERAND (base, 0);
+  field = TREE_OPERAND (base, 1);
+  break;
+case COMPONENT_REF:
+  object = TREE_OPERAND (rhs, 0);
+  field = TREE_OPERAND (rhs, 1);
+  break;
+default:
+  return NULL_TREE;
+}
+
+  tree context = check_alignment_of_packed_member (type, field);
+  if (context)
+return context;
+
+  /* Check alignment of the object.  */
+  while (TREE_CODE (object) == COMPONENT_REF)
+{
+  field = TREE_OPERAND (object, 1);
+  context = check_alignment_of_packed_member (type, field);
+  if (context)
+ return context;
+  object = TREE_OPERAND (object, 0);
+}
+


You can see interleaved COMPONENT_REF and ARRAY_REF that this still
doesn't look like it will handle, something like

struct A
{
int i;
};

struct B
{
char c;
__attribute ((packed)) A ar[4];
};

B b;

int *p = [1].i;

Rather than have a loop in the ADDR_EXPR case of the switch, you can
handle everything in the lower loop.  And not have a switch at all, just
strip any ADDR_EXPR before the loop.


I changed it to

  if (TREE_CODE (rhs) == ADDR_EXPR)
 rhs = TREE_OPERAND (rhs, 0);
   while (handled_component_p (rhs))
 {
   if (TREE_CODE (rhs) == COMPONENT_REF)
 break;
   rhs = TREE_OPERAND (rhs, 0);
 }

   if (TREE_CODE (rhs) != COMPONENT_REF)
 return NULL_TREE;

   object = TREE_OPERAND (rhs, 0);
   field = TREE_OPERAND (rhs, 1);


That still doesn't warn about my testcase above.


[hjl@gnu-cfl-1 pr51628-6]$ cat a.i
struct A
{
int i;
} __attribute ((packed));

struct B
{
char c;
struct A ar[4];
};

struct B b;

int *p = [1].i;


This testcase is importantly different because 'i' is packed, whereas in 
my testcase only the ar member of B is packed.


My suggestion was that this loop:


+  /* Check alignment of the object.  */
+  while (TREE_CODE (object) == COMPONENT_REF)
+{
+  field = TREE_OPERAND (object, 1);
+  context = check_alignment_of_packed_member (type, field);
+  if (context)
+   return context;
+  object = TREE_OPERAND (object, 0);
+}


could loop over all handled_component_p, but only call 
check_alignment_of_packed_member for COMPONENT_REF.



+  if (TREE_CODE (rhs) != COND_EXPR)
+{
+  while (TREE_CODE (rhs) == COMPOUND_EXPR)
+   rhs = TREE_OPERAND (rhs, 1);


What if you have a COND_EXPR inside a COMPOUND_EXPR?

Jason


Re: patch to fix PR87759

2018-12-18 Thread Vladimir Makarov

On 12/18/18 4:50 PM, Jakub Jelinek wrote:

On Tue, Dec 18, 2018 at 04:23:12PM -0500, Vladimir Makarov wrote:

   The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87759

   The patch was bootstrapped and tested on x86-64.

Committed as rev. 267244.

The test FAILs on i686-linux, fixed thusly, committed as obvious:

2018-12-18  Jakub Jelinek  

PR rtl-optimization/87759
* gcc.target/i386/pr87759.c: Require int128 effective target.

--- gcc/testsuite/gcc.target/i386/pr87759.c.jj  2018-12-18 22:45:00.968103777 
+0100
+++ gcc/testsuite/gcc.target/i386/pr87759.c 2018-12-18 22:47:29.879712883 
+0100
@@ -1,5 +1,5 @@
  /* PR rtl-optimization/87759 */
-/* { dg-do compile } */
+/* { dg-do compile { target int128 } } */
  /* { dg-options "-O2 -w -fschedule-insns -fselective-scheduling -ftrapv -fno-dce 
-fno-expensive-optimizations -fno-ipa-ra -fno-tree-dce -fno-tree-ter" } */
  
  int cc;





Thank you, Jakub.  I should have tested i686 target too.  Sorry about that.



Re: patch to fix PR87759

2018-12-18 Thread Jakub Jelinek
On Tue, Dec 18, 2018 at 04:23:12PM -0500, Vladimir Makarov wrote:
>   The following patch fixes
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87759
> 
>   The patch was bootstrapped and tested on x86-64.
> 
> Committed as rev. 267244.

The test FAILs on i686-linux, fixed thusly, committed as obvious:

2018-12-18  Jakub Jelinek  

PR rtl-optimization/87759
* gcc.target/i386/pr87759.c: Require int128 effective target.

--- gcc/testsuite/gcc.target/i386/pr87759.c.jj  2018-12-18 22:45:00.968103777 
+0100
+++ gcc/testsuite/gcc.target/i386/pr87759.c 2018-12-18 22:47:29.879712883 
+0100
@@ -1,5 +1,5 @@
 /* PR rtl-optimization/87759 */
-/* { dg-do compile } */
+/* { dg-do compile { target int128 } } */
 /* { dg-options "-O2 -w -fschedule-insns -fselective-scheduling -ftrapv 
-fno-dce -fno-expensive-optimizations -fno-ipa-ra -fno-tree-dce -fno-tree-ter" 
} */
 
 int cc;

Jakub


Re: [PATCH] accept all C integer types in function parameters referenced by alloc_align (PR 88363)

2018-12-18 Thread Martin Sebor

On 12/11/18 4:19 PM, Jason Merrill wrote:

On 12/11/18 6:08 PM, Martin Sebor wrote:

On 12/11/18 3:52 PM, Marek Polacek wrote:

On Tue, Dec 11, 2018 at 03:46:37PM -0700, Martin Sebor wrote:

On 12/11/18 1:47 PM, Jakub Jelinek wrote:

On Tue, Dec 11, 2018 at 01:36:58PM -0700, Martin Sebor wrote:

Attached is an updated version of the patch that restores
the original behavior for the positional argument validation
(i.e., prior to r266195) for integral types except bool as
discussed.


I thought Jason wanted to also warn for scoped enums in C++.


I missed that.  It seems needlessly restrictive to me to reject
the preferred kind of an enum when ordinary enums are accepted.
Jason, can you confirm that you really want a warning for B
below when there is none for A (GCC 8 doesn't complain about
either, Clang complains about both, ICC about neither when
using alloc_size -- it doesn't understand alloc_align):

   enum A { /* ... */ };
   __attribute__ ((alloc_align (1))) void* f (A);

   enum class B { /* ... */ };
   __attribute__ ((alloc_align (1))) void* g (B);

The only use case I can think of for enums is in APIs that try
to restrict the available choices of alignment to those of
the enumerators.  In that use case, I would expect it to make
no difference whether the enum is ordinary or the scoped kind.


The reason was that C++ scoped enumerations don't implicitly convert to
integral types.


I'm not sure we're talking about the same thing.  There is no
conversion in the use case I described, the attribute argument
just refers to the function parameter, and the function is called
with an argument of the enumerated type of the parameter.  Like
this:

   enum class Alignment { a4 = 4, a8 = 8 };

   __attribute__ ((alloc_align (1))) void*
   aligned_alloc (Alignment, size_t);

   void *p = aligned_alloc (Alignment::a8, 32);

My question is: if we think it makes sense to accept this use
case with ordinary enums why would we not want to make it possible
with scoped enums?  People tend to think of the latter as preferable
over the former.


OK, I suppose it's reasonable to allow scoped enums as well.


Are there any other suggestions for changes or should I take
this as an approval to commit the updated patch?

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00740.html

Martin


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-18 Thread Andi Kleen
> Yes, take g++.dg/tree-prof/morefunc.C as an example:
> -  int i;
> -  for (i = 0; i < 1000; i++)
> +  int i, j;
> +  for (i = 0; i < 100; i++)
> +for (j = 0; j < 50; j++)
>   g += tc->foo();
> if (g<100) g++;
>  }
> @@ -27,8 +28,9 @@ void test1 (A *tc)
>  static __attribute__((always_inline))
>  void test2 (B *tc)
>  {
> -  int i;
> +  int i, j;
>for (i = 0; i < 100; i++)
> +for (j = 0; j < 50; j++)
> 
> I have to increase loop count like this to get stable pass on my
> machine.  The original count (1000) is too small to be sampled.

IIRC It was originally higher, but people running on slow simulators complained,
so it was reduced.  Perhaps we need some way to detect in the test suite
that the test runs on a real CPU.

> 
> > > FYI, an update about AutoFDO status:
> > > All AutoFDO ICEs in regtest are fixed, while several tests still failing 
> > > fall in below
> > > three categories:
> >
> > Great!
> >
> > Of course it still ICEs with LTO?
> >
> > Right now there is no test case for this I think. Probably one should be 
> > added.


Any comments on this?

-Andi


patch to fix PR87759

2018-12-18 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87759

  The patch was bootstrapped and tested on x86-64.

Committed as rev. 267244.

Index: ChangeLog
===
--- ChangeLog	(revision 267243)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2018-12-18  Vladimir Makarov  
+
+	PR rtl-optimization/87759
+	* lra-assigns.c (lra_split_hard_reg_for): Recalculate
+	non_reload_pseudos.
+
 2018-12-18  Jakub Jelinek  
 
 	PR target/88464
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 267243)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2018-12-18  Vladimir Makarov  
+
+	PR rtl-optimization/87759
+	* gcc.target/i386/pr87759.c: New.
+
 2018-12-18  Jakub Jelinek  
 
 	PR target/88464
Index: lra-assigns.c
===
--- lra-assigns.c	(revision 267135)
+++ lra-assigns.c	(working copy)
@@ -1758,6 +1758,10 @@ lra_split_hard_reg_for (void)
 	 "\n** Splitting a hard reg after assignment #%d: **\n\n",
 	 lra_assignment_iter);
   bitmap_initialize (_reload_pseudos, _obstack);
+  bitmap_initialize (_reload_pseudos, _obstack);
+  bitmap_ior (_reload_pseudos, _inheritance_pseudos, _split_regs);
+  bitmap_ior_into (_reload_pseudos, _subreg_reload_pseudos);
+  bitmap_ior_into (_reload_pseudos, _optional_reload_pseudos);
   for (i = lra_constraint_new_regno_start; i < max_regno; i++)
 if (reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
 	&& (rclass = lra_get_allocno_class (i)) != NO_REGS
@@ -1772,6 +1776,7 @@ lra_split_hard_reg_for (void)
 	  }
 	bitmap_set_bit (_reload_pseudos, i);
   }
+  bitmap_clear (_reload_pseudos);
   bitmap_initialize (_reload_insns, _obstack);
   EXECUTE_IF_SET_IN_BITMAP (_reload_pseudos, 0, u, bi)
 {
Index: testsuite/gcc.target/i386/pr87759.c
===
--- testsuite/gcc.target/i386/pr87759.c	(nonexistent)
+++ testsuite/gcc.target/i386/pr87759.c	(working copy)
@@ -0,0 +1,39 @@
+/* PR rtl-optimization/87759 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -w -fschedule-insns -fselective-scheduling -ftrapv -fno-dce -fno-expensive-optimizations -fno-ipa-ra -fno-tree-dce -fno-tree-ter" } */
+
+int cc;
+
+void
+rc (__int128 *oi)
+{
+  __int128 qz = (__int128)2 << cc;
+
+  if (qz != 0)
+{
+  if (cc != 0)
+{
+  __int128 zp = 1;
+
+  for (;;)
+{
+  unsigned __int128 *ar = 
+  int y5;
+
+  if (oi != 0)
+{
+ y3:
+  zp = *oi + *ar;
+}
+
+  y5 = (cc + 1) == ((*ar /= *oi) << ((zp >>= 128) / cc));
+  qz += !!y5 ? 1 : qz == (*ar ^ zp + 1);
+  ++*oi;
+}
+}
+  else
+++qz;
+}
+
+  goto y3;
+}


V8 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-18 Thread H.J. Lu
On Tue, Dec 18, 2018 at 12:36 PM Jason Merrill  wrote:
>
> On 12/18/18 9:10 AM, H.J. Lu wrote:
> > +  switch (TREE_CODE (rhs))
> > +{
> > +case ADDR_EXPR:
> > +  base = TREE_OPERAND (rhs, 0);
> > +  while (handled_component_p (base))
> > + {
> > +   if (TREE_CODE (base) == COMPONENT_REF)
> > + break;
> > +   base = TREE_OPERAND (base, 0);
> > + }
> > +  if (TREE_CODE (base) != COMPONENT_REF)
> > + return NULL_TREE;
> > +  object = TREE_OPERAND (base, 0);
> > +  field = TREE_OPERAND (base, 1);
> > +  break;
> > +case COMPONENT_REF:
> > +  object = TREE_OPERAND (rhs, 0);
> > +  field = TREE_OPERAND (rhs, 1);
> > +  break;
> > +default:
> > +  return NULL_TREE;
> > +}
> > +
> > +  tree context = check_alignment_of_packed_member (type, field);
> > +  if (context)
> > +return context;
> > +
> > +  /* Check alignment of the object.  */
> > +  while (TREE_CODE (object) == COMPONENT_REF)
> > +{
> > +  field = TREE_OPERAND (object, 1);
> > +  context = check_alignment_of_packed_member (type, field);
> > +  if (context)
> > + return context;
> > +  object = TREE_OPERAND (object, 0);
> > +}
> > +
>
> You can see interleaved COMPONENT_REF and ARRAY_REF that this still
> doesn't look like it will handle, something like
>
> struct A
> {
>int i;
> };
>
> struct B
> {
>char c;
>__attribute ((packed)) A ar[4];
> };
>
> B b;
>
> int *p = [1].i;
>
> Rather than have a loop in the ADDR_EXPR case of the switch, you can
> handle everything in the lower loop.  And not have a switch at all, just
> strip any ADDR_EXPR before the loop.

I changed it to

 if (TREE_CODE (rhs) == ADDR_EXPR)
rhs = TREE_OPERAND (rhs, 0);
  while (handled_component_p (rhs))
{
  if (TREE_CODE (rhs) == COMPONENT_REF)
break;
  rhs = TREE_OPERAND (rhs, 0);
}

  if (TREE_CODE (rhs) != COMPONENT_REF)
return NULL_TREE;

  object = TREE_OPERAND (rhs, 0);
  field = TREE_OPERAND (rhs, 1);

[hjl@gnu-cfl-1 pr51628-6]$ cat a.i
struct A
{
   int i;
} __attribute ((packed));

struct B
{
   char c;
   struct A ar[4];
};

struct B b;

int *p = [1].i;
[hjl@gnu-cfl-1 pr51628-6]$ make a.s
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2
-S a.i
a.i:14:10: warning: taking address of packed member of ‘struct A’ may
result in an unaligned pointer value [-Waddress-of-packed-member]
   14 | int *p = [1].i;
  |  ^~
[hjl@gnu-cfl-1 pr51628-6]$

> > +check_and_warn_address_of_packed_member (tree type, tree rhs)
> > +{
> > +  if (TREE_CODE (rhs) != COND_EXPR)
> > +{
> > +  tree context = check_address_of_packed_member (type, rhs);
> > +  if (context)
> > + {
> > +   location_t loc = EXPR_LOC_OR_LOC (rhs, input_location);
> > +   warning_at (loc, OPT_Waddress_of_packed_member,
> > +   "taking address of packed member of %qT may result "
> > +   "in an unaligned pointer value",
> > +   context);
> > + }
> > +  return;
> > +}
> > +
> > +  /* Check the THEN path.  */
> > +  check_and_warn_address_of_packed_member (type, TREE_OPERAND (rhs, 1));
> > +
> > +  /* Check the ELSE path.  */
> > +  check_and_warn_address_of_packed_member (type, TREE_OPERAND (rhs, 2));
> > +}
>
> You probably also want to handle COMPOUND_EXPR.
>

Done.

[hjl@gnu-cfl-1 pr51628-5]$ cat c.i
struct A {
  int i;
} __attribute__ ((packed));

int*
foo3 (struct A *p1, int *q1, int *q2, struct A *p2)
{
  return q1 ? (*q1 = 1, >i) : (q2 ? (*q2 = 2, >i): q2);
}
[hjl@gnu-cfl-1 pr51628-5]$
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2
-S c.i
c.i: In function ‘foo3’:
c.i:8:25: warning: taking address of packed member of ‘struct A’ may
result in an unaligned pointer value [-Waddress-of-packed-member]
8 |   return q1 ? (*q1 = 1, >i) : (q2 ? (*q2 = 2, >i): q2);
  | ^~
c.i:8:51: warning: taking address of packed member of ‘struct A’ may
result in an unaligned pointer value [-Waddress-of-packed-member]
8 |   return q1 ? (*q1 = 1, >i) : (q2 ? (*q2 = 2, >i): q2);
  |   ^~
[hjl@gnu-cfl-1 pr51628-5]$

Here is the updated patch.  OK for trunk?

Thanks.


-- 
H.J.
From 22e60a16ca5fd9c591e6b44c0245cc51f12d5b6c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 12 Jan 2018 21:12:05 -0800
Subject: [PATCH] C/C++: Add -Waddress-of-packed-member
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When address of packed member of struct or union is taken, it may result
in an unaligned pointer value.  This patch adds -Waddress-of-packed-member
to check alignment at pointer assignment and warn unaligned address as
well as unaligned pointer:

$ cat x.i
struct pair_t
{
  

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v2)

2018-12-18 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 18 Dec 2018 23:06:38 +0800, Chung-Lin Tang  
wrote:
> this part includes some of the lookup_goacc_asyncqueue fixes we talked about.
> I am still thinking about how the queue lock problem should really be solved, 
> so regard
> this patch as just fixing some of the problems.

Sure, thanks.

Two comments, though:

> --- libgomp/oacc-async.c  (revision 267226)
> +++ libgomp/oacc-async.c  (working copy)

> +attribute_hidden struct goacc_asyncqueue *
> +lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
> +{
> +  /* The special value acc_async_noval (-1) maps to the thread-specific
> + default async stream.  */
> +  if (async == acc_async_noval)
> +async = thr->default_async;
> +
> +  if (async == acc_async_sync)
> +return NULL;
> +
> +  if (async < 0)
> +gomp_fatal ("bad async %d", async);
> +
> +  struct gomp_device_descr *dev = thr->dev;
> +
> +  gomp_mutex_lock (>openacc.async.lock);
> +
> +  if (!create
> +  && (async >= dev->openacc.async.nasyncqueue
> +   || !dev->openacc.async.asyncqueue[async]))
> +{
> +  gomp_mutex_unlock (>openacc.async.lock);
> +  return NULL;
> +}
> +
> +  if (async >= dev->openacc.async.nasyncqueue)
> +{
> +  int diff = async + 1 - dev->openacc.async.nasyncqueue;
> +  dev->openacc.async.asyncqueue
> + = gomp_realloc (dev->openacc.async.asyncqueue,
> + sizeof (goacc_aq) * (async + 1));
> +  memset (dev->openacc.async.asyncqueue + dev->openacc.async.nasyncqueue,
> +   0, sizeof (goacc_aq) * diff);
> +  dev->openacc.async.nasyncqueue = async + 1;
> +}
> +
> +  if (!dev->openacc.async.asyncqueue[async])
> +{
> +  dev->openacc.async.asyncqueue[async] = 
> dev->openacc.async.construct_func ();
> +
> +  if (!dev->openacc.async.asyncqueue[async])
> + {
> +   gomp_mutex_unlock (>openacc.async.lock);
> +   gomp_fatal ("async %d creation failed", async);
> + }

That will now always fail for host fallback, where
"host_openacc_async_construct" just always does "return NULL".

Actually, if the device doesn't support asyncqueues, this whole function
should turn into some kind of no-op, so that we don't again and again try
to create a new one for every call to "lookup_goacc_asyncqueue".

I'm attaching one possible solution.  I think it's fine to assume that
the majority of devices will support asyncqueues, and for those that
don't, this is just a one-time overhead per async-argument.  So, no
special handling required in "lookup_goacc_asyncqueue".

> +  /* Link new async queue into active list.  */
> +  goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
> +  n->aq = dev->openacc.async.asyncqueue[async];
> +  n->next = dev->openacc.async.active;
> +  dev->openacc.async.active = n;
> +}
> +  gomp_mutex_unlock (>openacc.async.lock);

You still need to keep "async" locked during...

> +  return dev->openacc.async.asyncqueue[async];

... this dereference.

> +}


Oh, and:

> --- libgomp/oacc-plugin.c (revision 267226)
> +++ libgomp/oacc-plugin.c (working copy)
> @@ -31,14 +31,10 @@
>  #include "oacc-int.h"
>  
>  void
> -GOMP_PLUGIN_async_unmap_vars (void *ptr, int async)
> +GOMP_PLUGIN_async_unmap_vars (void *ptr __attribute__((unused)),
> +   int async __attribute__((unused)))
>  {
> -  struct target_mem_desc *tgt = ptr;
> -  struct gomp_device_descr *devicep = tgt->device_descr;
> -
> -  devicep->openacc.async_set_async_func (async);
> -  gomp_unmap_vars (tgt, true);
> -  devicep->openacc.async_set_async_func (acc_async_sync);
> +  gomp_fatal ("invalid plugin function");
>  }

Please add a comment here, something like: "Obsolete entry point, no
longer used."


Grüße
 Thomas


>From 4cb99c3691f95b6b299e7cb2603af36f723f9e8e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 18 Dec 2018 21:58:41 +0100
Subject: [PATCH] into async re-work: adjust host_openacc_async_construct

---
 libgomp/oacc-host.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 727f8866f45c..cfd8a24f0674 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -212,7 +212,8 @@ host_openacc_async_queue_callback (struct goacc_asyncqueue *aq
 static struct goacc_asyncqueue *
 host_openacc_async_construct (void)
 {
-  return NULL;
+  /* We have to return non-NULL here, but it's OK to use a dummy.  */
+  return (struct goacc_asyncqueue *) -1;
 }
 
 static bool
-- 
2.17.1



[PATCH] Allow _mm256_clmulepi64_epi128 even for just -mvcplmulqdq -mavx (PR target/88541)

2018-12-18 Thread Jakub Jelinek
Hi!

As mentioned in the PR, there is a VEX encoded vpclmulqdq instruction
with ymm arguments that needs VPCLMULQDQ ISA, and then EVEX encoded
vpclmulqdq with zmm arguments that needs VPCLMULQDQ + AVX512F ISAs and
vpclmulqdq with xmm or ymm arguments that needs VPCLMULQDQ + AVX512VL ISAs.

So, _mm256_clmulepi64_epi128 can be done just with AVX (so that VEX encoded
instructions are handled) + VPCLMULQDQ ISAs.
The corresponding builtin matches this:
BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX, 
CODE_FOR_vpclmulqdq_v4di, "__builtin_ia32_vpclmulqdq_v4di", 
IX86_BUILTIN_VPCLMULQDQ4, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT)

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-12-18  Jakub Jelinek  

PR target/88541
* config/i386/vpclmulqdqintrin.h (_mm256_clmulepi64_epi128): Enable
for -mavx -mvpclmulqdq rather than just for -mavx512vl -mvpclmulqdq.

* gcc.target/i386/avx-vpclmulqdq-1.c: New test.

--- gcc/config/i386/vpclmulqdqintrin.h.jj   2018-06-13 10:05:54.775128332 
+0200
+++ gcc/config/i386/vpclmulqdqintrin.h  2018-12-18 20:09:37.693666571 +0100
@@ -53,9 +53,9 @@ _mm512_clmulepi64_epi128 (__m512i __A, _
 #pragma GCC pop_options
 #endif /* __DISABLE_VPCLMULQDQF__ */
 
-#if !defined(__VPCLMULQDQ__) || !defined(__AVX512VL__)
+#if !defined(__VPCLMULQDQ__) || !defined(__AVX__)
 #pragma GCC push_options
-#pragma GCC target("vpclmulqdq,avx512vl")
+#pragma GCC target("vpclmulqdq,avx")
 #define __DISABLE_VPCLMULQDQ__
 #endif /* __VPCLMULQDQ__ */
 
@@ -78,6 +78,4 @@ _mm256_clmulepi64_epi128 (__m256i __A, _
 #pragma GCC pop_options
 #endif /* __DISABLE_VPCLMULQDQ__ */
 
-
 #endif /* _VPCLMULQDQINTRIN_H_INCLUDED */
-
--- gcc/testsuite/gcc.target/i386/avx-vpclmulqdq-1.c.jj 2018-12-18 
20:13:28.683960294 +0100
+++ gcc/testsuite/gcc.target/i386/avx-vpclmulqdq-1.c2018-12-18 
20:12:41.140723131 +0100
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -mvpclmulqdq" } */
+
+#include 
+
+__m256i
+foo (__m256i x, __m256i y)
+{
+  return _mm256_clmulepi64_epi128 (x, y, 0);
+}

Jakub


[C++ PATCH] Avoid GC during cp_parser_parenthesized_expression_list (PR c++/88180)

2018-12-18 Thread Jakub Jelinek
Hi!

cp_parser_parenthesized_expression_list creates expression_list in GC
memory; if it is called when current_function_decl is NULL, there might be
ggc_collect in the middle of the parsing and collect that vector.

Fixed by temporarily bumping function_depth.  Or should that be done in some
other function from this function down to the ggc_collect (the PR has full
backtrace when that happens)?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-18  Jakub Jelinek  

PR c++/88180
* parser.c (cp_parser_parenthesized_expression_list): Temporarily
bump function_depth if current_function_decl is NULL.

* g++.dg/parse/pr88180.C: New test.

--- gcc/cp/parser.c.jj  2018-12-14 23:31:50.530979326 +0100
+++ gcc/cp/parser.c 2018-12-18 19:48:45.218893963 +0100
@@ -7794,6 +7794,11 @@ cp_parser_parenthesized_expression_list
 = parser->greater_than_is_operator_p;
   parser->greater_than_is_operator_p = true;
 
+  /* Avoid GC of the expression_list while parsing the expression
+ list.  */
+  if (!current_function_decl)
+++function_depth;
+
   cp_expr expr (NULL_TREE);
 
   /* Consume expressions until there are no more.  */
@@ -7899,6 +7904,10 @@ cp_parser_parenthesized_expression_list
{
  parser->greater_than_is_operator_p
= saved_greater_than_is_operator_p;
+
+ if (!current_function_decl)
+   --function_depth;
+
  return NULL;
}
 }
@@ -7906,6 +7915,9 @@ cp_parser_parenthesized_expression_list
   parser->greater_than_is_operator_p
 = saved_greater_than_is_operator_p;
 
+  if (!current_function_decl)
+--function_depth;
+
   if (identifier)
 vec_safe_insert (expression_list, 0, identifier);
 
--- gcc/testsuite/g++.dg/parse/pr88180.C.jj 2018-12-18 19:51:26.208290671 
+0100
+++ gcc/testsuite/g++.dg/parse/pr88180.C2018-12-18 19:49:55.242761535 
+0100
@@ -0,0 +1,12 @@
+// PR c++/88180
+// { dg-do compile }
+// { dg-options "--param ggc-min-heapsize=1024" }
+
+struct d {
+  static d *b;
+} * d::b(__builtin_offsetof(struct { // { dg-error "types may not be defined" }
+  int i;
+  struct a { // { dg-error "types may not be defined" }
+int c() { return .1f; }
+  };
+}, i));

Jakub


[C++ PATCH] Constexpr fold even some TREE_CONSTANT ctors (PR c++/87934)

2018-12-18 Thread Jakub Jelinek
Hi!

The following testcase FAILs, because parsing creates a TREE_CONSTANT
CONSTRUCTOR that contains CONST_DECL elts.  cp_fold_r can handle that,
but constexpr evaluation doesn't touch those CONSTRUCTORs.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-12-18  Jakub Jelinek  

PR c++/87934
* constexpr.c (cxx_eval_constant_expression) : Do
re-process TREE_CONSTANT CONSTRUCTORs if they aren't reduced constant
expressions.

* g++.dg/cpp0x/constexpr-87934.C: New test.

--- gcc/cp/constexpr.c.jj   2018-12-12 23:43:57.263128844 +0100
+++ gcc/cp/constexpr.c  2018-12-18 14:43:33.460553853 +0100
@@ -4681,7 +4681,7 @@ cxx_eval_constant_expression (const cons
   break;
 
 case CONSTRUCTOR:
-  if (TREE_CONSTANT (t))
+  if (TREE_CONSTANT (t) && reduced_constant_expression_p (t))
{
  /* Don't re-process a constant CONSTRUCTOR, but do fold it to
 VECTOR_CST if applicable.  */
--- gcc/testsuite/g++.dg/cpp0x/constexpr-87934.C.jj 2018-12-18 
15:05:56.318886878 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-87934.C2018-12-18 
15:02:10.652524999 +0100
@@ -0,0 +1,9 @@
+// PR c++/87934
+// { dg-do compile { target c++11 } }
+
+struct Foo
+{
+  enum { BAR } bar = BAR;
+};
+
+constexpr Foo foo{};

Jakub


Re: [PATCH 3/4] c/c++, asm: Use nicer error for const and restrict

2018-12-18 Thread Jason Merrill

On 12/10/18 5:47 PM, Segher Boessenkool wrote:

Not all qualifiers are asm qualifiers.  We can talk about that in a
nicer way than just giving a generic parser error.

This also adds two testcases for C++, that previously were for C only.


2018-12-10  Segher Boessenkool  

c/
* c-parser.c (c_parser_asm_statement) : Give
a more specific error message (instead of just falling through).

cp/
* parser.c (cp_parser_asm_definition) : Give
a more specific error message (instead of just falling through).


OK.

Jason



Re: [PATCH 4/4] c++, asm: Do not handle any asm-qualifiers in top-level asm

2018-12-18 Thread Jason Merrill

On 12/10/18 5:47 PM, Segher Boessenkool wrote:

Previously, "volatile" was allowed.  Changing this simplifies the code,
makes things more regular, and makes the C and C++ frontends handle
this the same way.


2018-12-10  Segher Boessenkool  

cp/
* parser.c (cp_parser_asm_definition): Do not allow any asm qualifiers
on top-level asm.


OK.

Jason



Re: [PATCH 1/4] c/c++, asm: Write the asm-qualifier loop without "done" boolean

2018-12-18 Thread Jason Merrill

On 12/10/18 5:47 PM, Segher Boessenkool wrote:

As suggested by Jason.


Segher


2018-12-10  Segher Boessenkool  

c/
* c-parser.c (c_parser_asm_statement): Rewrite the loop to work without
"done" boolean variable.

cp/
* parser.c (cp_parser_asm_definition): Rewrite the loop to work without
"done" boolean variable.


OK, thanks.

Jason



Re: [PATCH] v6: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-18 Thread Jason Merrill

On 12/18/18 4:22 PM, David Malcolm wrote:

On Mon, 2018-12-17 at 18:30 -0500, David Malcolm wrote:

On Mon, 2018-12-17 at 14:33 -0500, Jason Merrill wrote:

On 12/14/18 7:17 PM, David Malcolm wrote:

+  /* Since default args are effectively part of the function
type,
+strip location wrappers here, since otherwise the
location of
+one function's default arguments is arbitrarily chosen
for
+all functions with similar signature (due to
canonicalization
+of function types).  */


Hmm, looking at this again, why would this happen?  I see that
type_list_equal uses == to compare default arguments, so two
function
types with the same default argument but different location
wrappers
shouldn't be combined.

Jason


Thanks.

I did some digging into this.  I added this strip to fix
   g++.dg/template/defarg6.C
but it looks like I was overzealous (the comment is correct, but it's
papering over a problem).

It turns out that type_list_equal is doing more than just pointer
equality; it's hitting the simple_cst_equal part of the && at line
7071:

7063bool
7064type_list_equal (const_tree l1, const_tree l2)
7065{
7066  const_tree t1, t2;
7067
7068  for (t1 = l1, t2 = l2; t1 && t2; t1 = TREE_CHAIN (t1),
t2 = TREE_CHAIN (t2))
7069if (TREE_VALUE (t1) != TREE_VALUE (t2)
7070|| (TREE_PURPOSE (t1) != TREE_PURPOSE (t2)
7071&& ! (1 == simple_cst_equal (TREE_PURPOSE
(t1), TREE_PURPOSE (t2))
7072  && (TREE_TYPE (TREE_PURPOSE (t1))
7073  == TREE_TYPE (TREE_PURPOSE
(t2))
7074  return false;
7075
7076  return t1 == t2;
7077}

What's happening is that there are two different functions with
identical types apart from the locations of their (equal) default
arguments: both of the TREE_PURPOSEs are NON_LVALUE_EXPR wrappers
around a CONST_DECL enum value (at different source locations).

simple_cst_equal is stripping the location wrappers here:

7311  if (CONVERT_EXPR_CODE_P (code1) || code1 ==
NON_LVALUE_EXPR)
7312{
7313  if (CONVERT_EXPR_CODE_P (code2)
7314  || code2 == NON_LVALUE_EXPR)
7315return simple_cst_equal (TREE_OPERAND (t1, 0),
TREE_OPERAND (t2, 0));
7316  else
7317return simple_cst_equal (TREE_OPERAND (t1, 0),
t2);
7318}

and thus finds them to be equal; the iteration in type_list_equal
continues, and runs out of parameters with t1 == t2 == NULL, and thus
returns true, and thus the two function types hash to the same slot,
and the two function types get treated as being the same.

It's not clear to me yet what the best solution to this is:
- should simple_cst_equal regard different source locations as being
different?
- should function-type hashing use a custom version of
type_list_equal
when comparing params, and make different source locations of default
args be different?
- something else?

Dave


I tried both of the above approaches, and both work.

Here's v6 of the patch:

I removed the strip of wrappers in cp_parser_late_parsing_default_args
from earlier versions of the patch, in favor of fixing simple_cst_equal
so that it treats location wrappers with unequal source locations as
being unequal.  This ensures that function-types with default arguments
don't get merged when the default argument constants have different
spelling locations.  [I have an alternative patch which instead
introduces a different comparator for FUNCTION_TYPE's TYPE_ARG_TYPES
within type_cache_hasher::equal, almost identical to type_list_equal,
but adding the requirement that  location wrappers around default
arguments have equal source location for the params to be considered
equal; both patches pass bootstrap testing]

Doing so leads to the reported location for the bad default argument
within a template in g++.dg/template/defarg6.C moving to the argument
location.  Previously, the callsite of the instantiation was identified
due to the use of input_location in convert_like_real here:

6816  location_t loc = cp_expr_loc_or_loc (expr, input_location);

With a location wrapper, it uses the spelling location of the
default argument, but doesn't identify the location of the callsite
that's instantiating the template.

So I moved the note in tsubst_default_argument about which callsite
led to a diagnostic to after the check_default_argument call, so that
diagnostics within that receive notes, too.

As before, this was successfully bootstrapped & regrtested on
x86_64-pc-linux-gnu, in conjunction with the followup patch.

OK for trunk?


Ah, I hadn't seen this before my last email.  Let's go with this version.

Jason



Re: V7 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-18 Thread Jason Merrill

On 12/18/18 9:10 AM, H.J. Lu wrote:

+  switch (TREE_CODE (rhs))
+{
+case ADDR_EXPR:
+  base = TREE_OPERAND (rhs, 0);
+  while (handled_component_p (base))
+   {
+ if (TREE_CODE (base) == COMPONENT_REF)
+   break;
+ base = TREE_OPERAND (base, 0);
+   }
+  if (TREE_CODE (base) != COMPONENT_REF)
+   return NULL_TREE;
+  object = TREE_OPERAND (base, 0);
+  field = TREE_OPERAND (base, 1);
+  break;
+case COMPONENT_REF:
+  object = TREE_OPERAND (rhs, 0);
+  field = TREE_OPERAND (rhs, 1);
+  break;
+default:
+  return NULL_TREE;
+}
+
+  tree context = check_alignment_of_packed_member (type, field);
+  if (context)
+return context;
+
+  /* Check alignment of the object.  */
+  while (TREE_CODE (object) == COMPONENT_REF)
+{
+  field = TREE_OPERAND (object, 1);
+  context = check_alignment_of_packed_member (type, field);
+  if (context)
+   return context;
+  object = TREE_OPERAND (object, 0);
+}
+


You can see interleaved COMPONENT_REF and ARRAY_REF that this still 
doesn't look like it will handle, something like


struct A
{
  int i;
};

struct B
{
  char c;
  __attribute ((packed)) A ar[4];
};

B b;

int *p = [1].i;

Rather than have a loop in the ADDR_EXPR case of the switch, you can 
handle everything in the lower loop.  And not have a switch at all, just 
strip any ADDR_EXPR before the loop.



+check_and_warn_address_of_packed_member (tree type, tree rhs)
+{
+  if (TREE_CODE (rhs) != COND_EXPR)
+{
+  tree context = check_address_of_packed_member (type, rhs);
+  if (context)
+   {
+ location_t loc = EXPR_LOC_OR_LOC (rhs, input_location);
+ warning_at (loc, OPT_Waddress_of_packed_member,
+ "taking address of packed member of %qT may result "
+ "in an unaligned pointer value",
+ context);
+   }
+  return;
+}
+
+  /* Check the THEN path.  */
+  check_and_warn_address_of_packed_member (type, TREE_OPERAND (rhs, 1));
+
+  /* Check the ELSE path.  */
+  check_and_warn_address_of_packed_member (type, TREE_OPERAND (rhs, 2));
+}


You probably also want to handle COMPOUND_EXPR.

Jason


[PATCH] v6: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-18 Thread David Malcolm
On Mon, 2018-12-17 at 18:30 -0500, David Malcolm wrote:
> On Mon, 2018-12-17 at 14:33 -0500, Jason Merrill wrote:
> > On 12/14/18 7:17 PM, David Malcolm wrote:
> > > +  /* Since default args are effectively part of the function
> > > type,
> > > +  strip location wrappers here, since otherwise the
> > > location of
> > > +  one function's default arguments is arbitrarily chosen
> > > for
> > > +  all functions with similar signature (due to
> > > canonicalization
> > > +  of function types).  */
> > 
> > Hmm, looking at this again, why would this happen?  I see that 
> > type_list_equal uses == to compare default arguments, so two
> > function 
> > types with the same default argument but different location
> > wrappers 
> > shouldn't be combined.
> > 
> > Jason
> 
> Thanks.
> 
> I did some digging into this.  I added this strip to fix
>   g++.dg/template/defarg6.C
> but it looks like I was overzealous (the comment is correct, but it's
> papering over a problem).
> 
> It turns out that type_list_equal is doing more than just pointer
> equality; it's hitting the simple_cst_equal part of the && at line
> 7071:
> 
> 7063  bool
> 7064  type_list_equal (const_tree l1, const_tree l2)
> 7065  {
> 7066const_tree t1, t2;
> 7067  
> 7068for (t1 = l1, t2 = l2; t1 && t2; t1 = TREE_CHAIN (t1),
> t2 = TREE_CHAIN (t2))
> 7069  if (TREE_VALUE (t1) != TREE_VALUE (t2)
> 7070  || (TREE_PURPOSE (t1) != TREE_PURPOSE (t2)
> 7071  && ! (1 == simple_cst_equal (TREE_PURPOSE
> (t1), TREE_PURPOSE (t2))
> 7072&& (TREE_TYPE (TREE_PURPOSE (t1))
> 7073== TREE_TYPE (TREE_PURPOSE
> (t2))
> 7074return false;
> 7075  
> 7076return t1 == t2;
> 7077  }
> 
> What's happening is that there are two different functions with
> identical types apart from the locations of their (equal) default
> arguments: both of the TREE_PURPOSEs are NON_LVALUE_EXPR wrappers
> around a CONST_DECL enum value (at different source locations).
> 
> simple_cst_equal is stripping the location wrappers here:
> 
> 7311if (CONVERT_EXPR_CODE_P (code1) || code1 ==
> NON_LVALUE_EXPR)
> 7312  {
> 7313if (CONVERT_EXPR_CODE_P (code2)
> 7314|| code2 == NON_LVALUE_EXPR)
> 7315  return simple_cst_equal (TREE_OPERAND (t1, 0),
> TREE_OPERAND (t2, 0));
> 7316else
> 7317  return simple_cst_equal (TREE_OPERAND (t1, 0),
> t2);
> 7318  }
> 
> and thus finds them to be equal; the iteration in type_list_equal
> continues, and runs out of parameters with t1 == t2 == NULL, and thus
> returns true, and thus the two function types hash to the same slot,
> and the two function types get treated as being the same.
> 
> It's not clear to me yet what the best solution to this is:
> - should simple_cst_equal regard different source locations as being
> different?
> - should function-type hashing use a custom version of
> type_list_equal
> when comparing params, and make different source locations of default
> args be different?
> - something else?
> 
> Dave

I tried both of the above approaches, and both work.

Here's v6 of the patch:

I removed the strip of wrappers in cp_parser_late_parsing_default_args
from earlier versions of the patch, in favor of fixing simple_cst_equal
so that it treats location wrappers with unequal source locations as
being unequal.  This ensures that function-types with default arguments
don't get merged when the default argument constants have different
spelling locations.  [I have an alternative patch which instead
introduces a different comparator for FUNCTION_TYPE's TYPE_ARG_TYPES
within type_cache_hasher::equal, almost identical to type_list_equal,
but adding the requirement that  location wrappers around default
arguments have equal source location for the params to be considered
equal; both patches pass bootstrap testing]

Doing so leads to the reported location for the bad default argument
within a template in g++.dg/template/defarg6.C moving to the argument
location.  Previously, the callsite of the instantiation was identified
due to the use of input_location in convert_like_real here:

6816  location_t loc = cp_expr_loc_or_loc (expr, input_location);

With a location wrapper, it uses the spelling location of the
default argument, but doesn't identify the location of the callsite
that's instantiating the template.

So I moved the note in tsubst_default_argument about which callsite
led to a diagnostic to after the check_default_argument call, so that
diagnostics within that receive notes, too.

As before, this was successfully bootstrapped & regrtested on
x86_64-pc-linux-gnu, in conjunction with the followup patch.

OK for trunk?
Dave


Blurb from v1:

The C++ frontend gained various location wrapper nodes in r256448 (GCC 8).
That patch:
  https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00799.html
added wrapper nodes around all nodes with !CAN_HAVE_LOCATION_P for:

* arguments at callsites, and for

  * typeid, 

Re: [PATCH] v5: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-18 Thread Jason Merrill

On 12/17/18 6:30 PM, David Malcolm wrote:

On Mon, 2018-12-17 at 14:33 -0500, Jason Merrill wrote:

On 12/14/18 7:17 PM, David Malcolm wrote:

+  /* Since default args are effectively part of the function
type,
+strip location wrappers here, since otherwise the
location of
+one function's default arguments is arbitrarily chosen
for
+all functions with similar signature (due to
canonicalization
+of function types).  */


Hmm, looking at this again, why would this happen?  I see that
type_list_equal uses == to compare default arguments, so two
function
types with the same default argument but different location wrappers
shouldn't be combined.

Jason


Thanks.

I did some digging into this.  I added this strip to fix
   g++.dg/template/defarg6.C
but it looks like I was overzealous (the comment is correct, but it's
papering over a problem).

It turns out that type_list_equal is doing more than just pointer
equality; it's hitting the simple_cst_equal part of the && at line
7071:

7063bool
7064type_list_equal (const_tree l1, const_tree l2)
7065{
7066  const_tree t1, t2;
7067
7068  for (t1 = l1, t2 = l2; t1 && t2; t1 = TREE_CHAIN (t1), t2 = 
TREE_CHAIN (t2))
7069if (TREE_VALUE (t1) != TREE_VALUE (t2)
7070|| (TREE_PURPOSE (t1) != TREE_PURPOSE (t2)
7071&& ! (1 == simple_cst_equal (TREE_PURPOSE (t1), 
TREE_PURPOSE (t2))
7072  && (TREE_TYPE (TREE_PURPOSE (t1))
7073  == TREE_TYPE (TREE_PURPOSE (t2))
7074  return false;
7075
7076  return t1 == t2;
7077}

What's happening is that there are two different functions with
identical types apart from the locations of their (equal) default
arguments: both of the TREE_PURPOSEs are NON_LVALUE_EXPR wrappers
around a CONST_DECL enum value (at different source locations).

simple_cst_equal is stripping the location wrappers here:

7311  if (CONVERT_EXPR_CODE_P (code1) || code1 == NON_LVALUE_EXPR)
7312{
7313  if (CONVERT_EXPR_CODE_P (code2)
7314  || code2 == NON_LVALUE_EXPR)
7315return simple_cst_equal (TREE_OPERAND (t1, 0), TREE_OPERAND 
(t2, 0));
7316  else
7317return simple_cst_equal (TREE_OPERAND (t1, 0), t2);
7318}

and thus finds them to be equal; the iteration in type_list_equal
continues, and runs out of parameters with t1 == t2 == NULL, and thus
returns true, and thus the two function types hash to the same slot,
and the two function types get treated as being the same.

It's not clear to me yet what the best solution to this is:
- should simple_cst_equal regard different source locations as being
different?
- should function-type hashing use a custom version of type_list_equal
when comparing params, and make different source locations of default
args be different?
- something else?


I'd experiment with removing the simple_cst_equal bit in 
type_list_equal.  But I think that can wait, and you can go ahead and 
commit the v5 patch.


Jason


Re: [PATCH] Enable scatter vectorization with 128-bit and 256-bit vectors with AVX512VL (PR target/88464)

2018-12-18 Thread Uros Bizjak
On Tue, Dec 18, 2018 at 3:57 PM Jakub Jelinek  wrote:
>
> Hi!
>
> We weren't vectorizing with unconditional or masked scatters when
> -mprefered-vector-width={128,256}.  While for DI index and DF/DI
> stores or SI index and SF/SI stores we even have the builtins,
> for the remaining combinations I had to add a few alt builtins (with spaces
> in names as in other cases).  I've also renamed the other alt builtin
> visible names so that they match the IX86_BUILTIN_* names, they were pretty
> confusing before.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-12-18  Jakub Jelinek  
>
> PR target/88464
> * config/i386/i386-builtin-types.def
> (VOID_FTYPE_PDOUBLE_QI_V8SI_V4DF_INT,
> VOID_FTYPE_PFLOAT_QI_V4DI_V8SF_INT,
> VOID_FTYPE_PLONGLONG_QI_V8SI_V4DI_INT,
> VOID_FTYPE_PINT_QI_V4DI_V8SI_INT,
> VOID_FTYPE_PDOUBLE_QI_V4SI_V2DF_INT,
> VOID_FTYPE_PFLOAT_QI_V2DI_V4SF_INT,
> VOID_FTYPE_PLONGLONG_QI_V4SI_V2DI_INT,
> VOID_FTYPE_PINT_QI_V2DI_V4SI_INT): New builtin types.
> * config/i386/i386.c (enum ix86_builtins): Add
> IX86_BUILTIN_SCATTERALTSIV4DF, IX86_BUILTIN_SCATTERALTDIV8SF,
> IX86_BUILTIN_SCATTERALTSIV4DI, IX86_BUILTIN_SCATTERALTDIV8SI,
> IX86_BUILTIN_SCATTERALTSIV2DF, IX86_BUILTIN_SCATTERALTDIV4SF,
> IX86_BUILTIN_SCATTERALTSIV2DI and IX86_BUILTIN_SCATTERALTDIV4SI.
> (ix86_init_mmx_sse_builtins): Fix up names of IX86_BUILTIN_GATHERALT*,
> IX86_BUILTIN_GATHER3ALT* and IX86_BUILTIN_SCATTERALT* builtins to
> match the IX86_BUILTIN codes.  BuildIX86_BUILTIN_SCATTERALTSIV4DF,
> IX86_BUILTIN_SCATTERALTDIV8SF, IX86_BUILTIN_SCATTERALTSIV4DI,
> IX86_BUILTIN_SCATTERALTDIV8SI, IX86_BUILTIN_SCATTERALTSIV2DF,
> IX86_BUILTIN_SCATTERALTDIV4SF, IX86_BUILTIN_SCATTERALTSIV2DI and
> IX86_BUILTIN_SCATTERALTDIV4SI decls.
> (ix86_vectorize_builtin_scatter): Expand those new builtins.
>
> * gcc.target/i386/avx512f-pr88464-5.c: New test.
> * gcc.target/i386/avx512f-pr88464-6.c: New test.
> * gcc.target/i386/avx512f-pr88464-7.c: New test.
> * gcc.target/i386/avx512f-pr88464-8.c: New test.
> * gcc.target/i386/avx512vl-pr88464-5.c: New test.
> * gcc.target/i386/avx512vl-pr88464-6.c: New test.
> * gcc.target/i386/avx512vl-pr88464-7.c: New test.
> * gcc.target/i386/avx512vl-pr88464-8.c: New test.
> * gcc.target/i386/avx512vl-pr88464-9.c: New test.
> * gcc.target/i386/avx512vl-pr88464-10.c: New test.
> * gcc.target/i386/avx512vl-pr88464-11.c: New test.
> * gcc.target/i386/avx512vl-pr88464-12.c: New test.
> * gcc.target/i386/avx512vl-pr88464-13.c: New test.
> * gcc.target/i386/avx512vl-pr88464-14.c: New test.
> * gcc.target/i386/avx512vl-pr88464-15.c: New test.
> * gcc.target/i386/avx512vl-pr88464-16.c: New test.

LGTM.

Thanks,
Uros.

> --- gcc/config/i386/i386-builtin-types.def.jj   2018-11-08 18:07:10.298826353 
> +0100
> +++ gcc/config/i386/i386-builtin-types.def  2018-12-18 11:22:07.965503704 
> +0100
> @@ -1068,7 +1068,14 @@ DEF_FUNCTION_TYPE (VOID, PFLOAT, HI, V8D
>  DEF_FUNCTION_TYPE (VOID, PDOUBLE, QI, V16SI, V8DF, INT)
>  DEF_FUNCTION_TYPE (VOID, PINT, HI, V8DI, V16SI, INT)
>  DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V16SI, V8DI, INT)
> -
> +DEF_FUNCTION_TYPE (VOID, PFLOAT, QI, V4DI, V8SF, INT)
> +DEF_FUNCTION_TYPE (VOID, PDOUBLE, QI, V8SI, V4DF, INT)
> +DEF_FUNCTION_TYPE (VOID, PINT, QI, V4DI, V8SI, INT)
> +DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8SI, V4DI, INT)
> +DEF_FUNCTION_TYPE (VOID, PFLOAT, QI, V2DI, V4SF, INT)
> +DEF_FUNCTION_TYPE (VOID, PDOUBLE, QI, V4SI, V2DF, INT)
> +DEF_FUNCTION_TYPE (VOID, PINT, QI, V2DI, V4SI, INT)
> +DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V4SI, V2DI, INT)
>
>  DEF_FUNCTION_TYPE (V16SF, V16SF, PCVOID, V16SI, HI, INT)
>  DEF_FUNCTION_TYPE (V8DF, V8DF, PCVOID, V8SI, QI, INT)
> --- gcc/config/i386/i386.c.jj   2018-12-18 10:23:58.751164982 +0100
> +++ gcc/config/i386/i386.c  2018-12-18 11:58:18.813311983 +0100
> @@ -30072,6 +30072,14 @@ enum ix86_builtins
>IX86_BUILTIN_SCATTERALTDIV16SF,
>IX86_BUILTIN_SCATTERALTSIV8DI,
>IX86_BUILTIN_SCATTERALTDIV16SI,
> +  IX86_BUILTIN_SCATTERALTSIV4DF,
> +  IX86_BUILTIN_SCATTERALTDIV8SF,
> +  IX86_BUILTIN_SCATTERALTSIV4DI,
> +  IX86_BUILTIN_SCATTERALTDIV8SI,
> +  IX86_BUILTIN_SCATTERALTSIV2DF,
> +  IX86_BUILTIN_SCATTERALTDIV4SF,
> +  IX86_BUILTIN_SCATTERALTSIV2DI,
> +  IX86_BUILTIN_SCATTERALTDIV4SI,
>IX86_BUILTIN_SCATTERDIV16SF,
>IX86_BUILTIN_SCATTERDIV16SI,
>IX86_BUILTIN_SCATTERDIV8DF,
> @@ -30879,7 +30887,7 @@ ix86_init_mmx_sse_builtins (void)
> V4DF_FTYPE_V4DF_PCDOUBLE_V8SI_V4DF_INT,
> IX86_BUILTIN_GATHERALTSIV4DF);
>
> -  def_builtin_pure (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gatheraltdiv4sf256 
> ",
> +  def_builtin_pure (OPTION_MASK_ISA_AVX2, 

[PATCH] LWG 3171: restore stream insertion for filesystem::directory_entry

2018-12-18 Thread Jonathan Wakely

* include/bits/fs_dir.h (operator<<): Overload for directory_entry,
as per LWG 3171.
* testsuite/27_io/filesystem/directory_entry/lwg3171.cc: New test.

Tested x86_64-linux, committed to trunk.


commit 0d24038c0b565dbcd5e7729423398da281245c41
Author: Jonathan Wakely 
Date:   Tue Dec 18 16:47:52 2018 +

LWG 3171: restore stream insertion for filesystem::directory_entry

* include/bits/fs_dir.h (operator<<): Overload for directory_entry,
as per LWG 3171.
* testsuite/27_io/filesystem/directory_entry/lwg3171.cc: New test.

diff --git a/libstdc++-v3/include/bits/fs_dir.h 
b/libstdc++-v3/include/bits/fs_dir.h
index 2f81a1709e4..90bdf7305f8 100644
--- a/libstdc++-v3/include/bits/fs_dir.h
+++ b/libstdc++-v3/include/bits/fs_dir.h
@@ -300,6 +300,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 friend class directory_iterator;
 friend class recursive_directory_iterator;
 
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// 3171. LWG 2989 breaks directory_entry stream insertion
+template
+  friend basic_ostream<_CharT, _Traits>&
+  operator<<(basic_ostream<_CharT, _Traits>& __os,
+const directory_entry& __d)
+  { return __os << __d.path(); }
+
 directory_entry(const filesystem::path& __p, file_type __t)
 : _M_path(__p), _M_type(__t)
 { }


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-18 Thread Jakub Jelinek
On Tue, Dec 18, 2018 at 04:33:48PM +, Uecker, Martin wrote:
> > Yes, something like this. If the trampolines are pre-allocated, this could
> > even avoid the need to clear the cache on archs where this is needed.
> 
> And if we can make the trampolines be all the same (and it somehow derived
> from the IP where it has to look for the static chain), we could map the
> same page of pre-allocated trampolines and not use memory on platforms
> with virtual memory.

Yeah, if it is e.g. a pair of executable page and data page right after it,
say for x86_64 page of:
pushq $0
jmp .L1
pushq $1
jmp .L1
...
push $NNN
jmp .L1
# Almost at the end of page
.L1:
decode the above pushed number
read + decrypt the data (both where to jump to and static chain)
set static chain reg to the static chain data
jmp *function pointer
it could just mmap both pages at once PROT_NONE, and then mmap one from the
file and fill in data in the other page.  Or perhaps one executable and two
data pages, depending on the exact sizes of needed data vs. code.

Jakub


Re: [PATCH] Micro-optimization to avoid creating temporary path

2018-12-18 Thread Jonathan Wakely

On 18/12/18 15:52 +, Jonathan Wakely wrote:

Now that path::operator/=(basic_string_view) works directly
from the string argument, instead of constructing a temporary path from
the string, it's potentially more efficient to do 'path(x) /= s' instead
of 'x / s'. This changes the only relevant place in the library.

* src/filesystem/std-dir.cc (filesystem::_Dir::advance): Append
string to lvalue to avoid creating temporary path.


This is only an optimization if it doesn't introduce a new copy! Fixed
by the attached patch.

Tested x86_64-linux, committed to trunk.


commit c234e8d8c9db216b96bffaba018f50ec7b75
Author: Jonathan Wakely 
Date:   Tue Dec 18 16:34:47 2018 +

Fix previous commit to move instead of copying

* src/filesystem/std-dir.cc (filesystem::_Dir::advance): Move new
path instead of copying.

diff --git a/libstdc++-v3/src/filesystem/std-dir.cc b/libstdc++-v3/src/filesystem/std-dir.cc
index 216182a2e56..b0f869fc8fd 100644
--- a/libstdc++-v3/src/filesystem/std-dir.cc
+++ b/libstdc++-v3/src/filesystem/std-dir.cc
@@ -63,7 +63,7 @@ struct fs::_Dir : _Dir_base
   {
 	auto name = path;
 	name /= entp->d_name;
-	entry = fs::directory_entry{name, get_file_type(*entp)};
+	entry = fs::directory_entry{std::move(name), get_file_type(*entp)};
 	return true;
   }
 else if (!ec)


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-18 Thread Uecker, Martin
Am Dienstag, den 18.12.2018, 17:29 +0100 schrieb Martin Uecker:
> Am Dienstag, den 18.12.2018, 17:24 +0100 schrieb Jakub Jelinek:
> > On Tue, Dec 18, 2018 at 09:03:41AM -0700, Jeff Law wrote:
> > > Right.  This is the classic example and highlights the ABI concerns.  If
> > > we use the low bit to distinguish between a normal function pointer and
> > > a pointer to a descriptor and qsort doesn't know about it, then we lose.
> > > 
> > > One way around this is to make *all* function pointers be some kind of
> > > descriptor and route all indirect calls through a resolver.  THen you
> > 
> > Either way, you are creating a new ABI for calling functions through
> > function pointers.  Because of how rarely GNU C nested functions are used
> > these days, if we want to do anything I'd think it might be better to use
> > trampolines, just don't place them on the stack, say have a mmaped page of
> > trampolines perhaps with some pointer encryption to where they jump to, so
> > it isn't a way to circumvent non-executable stack, and have some register
> > and unregister function you'd call to get or release the trampoline.
> > If more trampolines are needed than currently available, the library could
> > just mmap another such page.  A problem is how it should interact with
> > longjmp or similar APIs, because then we could leak some trampolines (no
> > "destructor" for the trampoline would be called.  The leaking could be
> > handled e.g. through remembering the thread and frame pointer for which it
> > has been allocated and if you ask for a new trampoline with a frame pointer
> > above the already allocated one, release those entries or reuse them,
> > instead of allocating a new one.  And somehow deal with thread exit.
> 
> Yes, something like this. If the trampolines are pre-allocated, this could
> even avoid the need to clear the cache on archs where this is needed.

And if we can make the trampolines be all the same (and it somehow derived
from the IP where it has to look for the static chain), we could map the
same page of pre-allocated trampolines and not use memory on platforms
with virtual memory.

Best,
Martin

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-18 Thread Uecker, Martin
Am Dienstag, den 18.12.2018, 17:24 +0100 schrieb Jakub Jelinek:
> On Tue, Dec 18, 2018 at 09:03:41AM -0700, Jeff Law wrote:
> > Right.  This is the classic example and highlights the ABI concerns.  If
> > we use the low bit to distinguish between a normal function pointer and
> > a pointer to a descriptor and qsort doesn't know about it, then we lose.
> > 
> > One way around this is to make *all* function pointers be some kind of
> > descriptor and route all indirect calls through a resolver.  THen you
> 
> Either way, you are creating a new ABI for calling functions through
> function pointers.  Because of how rarely GNU C nested functions are used
> these days, if we want to do anything I'd think it might be better to use
> trampolines, just don't place them on the stack, say have a mmaped page of
> trampolines perhaps with some pointer encryption to where they jump to, so
> it isn't a way to circumvent non-executable stack, and have some register
> and unregister function you'd call to get or release the trampoline.
> If more trampolines are needed than currently available, the library could
> just mmap another such page.  A problem is how it should interact with
> longjmp or similar APIs, because then we could leak some trampolines (no
> "destructor" for the trampoline would be called.  The leaking could be
> handled e.g. through remembering the thread and frame pointer for which it
> has been allocated and if you ask for a new trampoline with a frame pointer
> above the already allocated one, release those entries or reuse them,
> instead of allocating a new one.  And somehow deal with thread exit.

Yes, something like this. If the trampolines are pre-allocated, this could
even avoid the need to clear the cache on archs where this is needed.

Best,
Martin


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-18 Thread Uecker, Martin
Am Dienstag, den 18.12.2018, 09:03 -0700 schrieb Jeff Law:
> On 12/18/18 8:32 AM, Jakub Jelinek wrote:
> > On Tue, Dec 18, 2018 at 10:23:46AM -0500, Paul Koning wrote:
> > > 
> > > 
> > > > On Dec 17, 2018, at 2:23 PM, Szabolcs Nagy  
> > > > wrote:
> > > > 
> > > > On 17/12/2018 18:22, Uecker, Martin wrote:
> > > > > > 
> > > > > > ...
> > > > > 
> > > > > So a thread_local static variable for storing the static
> > > > > chain?
> > > > 
> > > > something like that, but the more i think about it the
> > > > harder it seems: the call site of the nested function
> > > > may not be under control of the nested function writer,
> > > > in particular the nested function may be called on a
> > > > different thread, and extern library apis are unlikely
> > > > to provide guarantees about this, so in general if a
> > > > nested function escapes into an extern library then
> > > > this cannot be relied on, which limits my original
> > > > idea again to cases where there is no escape (which i
> > > > think is not that useful).
> > > 
> > > I'm not sure I understand "escape" of a nested function pointer. 
> > > 
> > > Your description makes it sound like you're talking about a function 
> > > being called by someone
> > > who has been given the pointer, from outside the scope of the function.  
> > > That sounds like an
> > > illegal operation, exactly as it would be if you attempted to reference 
> > > an automatic variable
> > > via a pointer from outside the scope of that variable.
> > > 
> > > Did I misunderstand?
> > 
> > The most common case is when you pass a call to a nested function
> > to some function that has a function pointer argument, e.g. qsort.
> > This is well defined with GNU nested functions, but the function that calls
> > the callback (qsort in this case) doesn't know it is a call to a nested
> > function.
> 
> Right.  This is the classic example and highlights the ABI concerns.  If
> we use the low bit to distinguish between a normal function pointer and
> a pointer to a descriptor and qsort doesn't know about it, then we lose.
> 
> One way around this is to make *all* function pointers be some kind of
> descriptor and route all indirect calls through a resolver.  THen you
> need either linker hackery or special code to compare function pointers
> to preserve ISO C behavior.

If it has to work with existing code and without the restrictions
discussed above then you have to create a pointer to a new code address 
for each nested functions. One possibility which comes to mind would be a
shadow stack which is executable (and might contain pre-allocated
trampolines).


Best,
Martin


> Note that if you have a nested function and take its address, then go
> out of scope of the containing function, then that function pointer is
> no longer valid -- which makes perfect sense if you think about it.  THe
> trampoline was on the stack and if you go out of scope of the containing
> function, then that stack frame is invalid and you also don't have a
> suitable frame chain to pass to the nested function either.
> 
> 
> Jeff

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-18 Thread Jakub Jelinek
On Tue, Dec 18, 2018 at 09:03:41AM -0700, Jeff Law wrote:
> Right.  This is the classic example and highlights the ABI concerns.  If
> we use the low bit to distinguish between a normal function pointer and
> a pointer to a descriptor and qsort doesn't know about it, then we lose.
> 
> One way around this is to make *all* function pointers be some kind of
> descriptor and route all indirect calls through a resolver.  THen you

Either way, you are creating a new ABI for calling functions through
function pointers.  Because of how rarely GNU C nested functions are used
these days, if we want to do anything I'd think it might be better to use
trampolines, just don't place them on the stack, say have a mmaped page of
trampolines perhaps with some pointer encryption to where they jump to, so
it isn't a way to circumvent non-executable stack, and have some register
and unregister function you'd call to get or release the trampoline.
If more trampolines are needed than currently available, the library could
just mmap another such page.  A problem is how it should interact with
longjmp or similar APIs, because then we could leak some trampolines (no
"destructor" for the trampoline would be called.  The leaking could be
handled e.g. through remembering the thread and frame pointer for which it
has been allocated and if you ask for a new trampoline with a frame pointer
above the already allocated one, release those entries or reuse them,
instead of allocating a new one.  And somehow deal with thread exit.

Jakub


Re: [patch] Fix bootstrap powerpc*-*-freebsd* targets

2018-12-18 Thread Segher Boessenkool
On Tue, Dec 18, 2018 at 11:18:03PM +1030, Alan Modra wrote:
> On Tue, Dec 18, 2018 at 03:20:02AM -0600, Segher Boessenkool wrote:
> > Hi Alan,
> > 
> > On Tue, Dec 18, 2018 at 10:39:27AM +1030, Alan Modra wrote:
> > > On Mon, Dec 17, 2018 at 11:05:57AM -0600, Segher Boessenkool wrote:
> > > > On Mon, Dec 17, 2018 at 10:40:01AM +1030, Alan Modra wrote:
> > > > > Since I broke powerpc*-freebsd and the other non-linux powerpc
> > > > > targets, I guess I ought to fix them.  The following is a variation on
> > > > > your first patch, that results in -mcall-linux for powerpc-freebsd*
> > > > > providing the 32-bit powerpc-linux dynamic linker.
> > > > 
> > > > That, like the first patch, abuses that header file.  Please do it
> > > > somewhere sane instead, not in a random subtarget file?
> > > 
> > > Is there is a better place, currently?  sysv4.h contains a mess of OS
> > > related defines already, to support various -mcall options.  If those
> > > stay in sysv4.h I can't see a better place for the fall-back
> > > GNU_USER_DYNAMIC_LINKER define.
> > 
> > I was hoping you would untangle it a bit.  My dastardly plan failed,
> > apparently.  Drat.
> 
> Me untangling some of the linux bits was what caused the problem..
> 
> I think that -mcall-linux, -mcall-freebsd, -mcall-netbsd and
> -mcall-openbsd should be deprecated.  That would make it possible to
> put the OS specific defines where they belong.

Does anyone still use those options?  A patch to remove them is welcome.


Segher


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-18 Thread Jeff Law
On 12/18/18 8:32 AM, Jakub Jelinek wrote:
> On Tue, Dec 18, 2018 at 10:23:46AM -0500, Paul Koning wrote:
>>
>>
>>> On Dec 17, 2018, at 2:23 PM, Szabolcs Nagy  wrote:
>>>
>>> On 17/12/2018 18:22, Uecker, Martin wrote:
>
> ...

 So a thread_local static variable for storing the static
 chain?
>>>
>>> something like that, but the more i think about it the
>>> harder it seems: the call site of the nested function
>>> may not be under control of the nested function writer,
>>> in particular the nested function may be called on a
>>> different thread, and extern library apis are unlikely
>>> to provide guarantees about this, so in general if a
>>> nested function escapes into an extern library then
>>> this cannot be relied on, which limits my original
>>> idea again to cases where there is no escape (which i
>>> think is not that useful).
>>
>> I'm not sure I understand "escape" of a nested function pointer. 
>>
>> Your description makes it sound like you're talking about a function being 
>> called by someone who has been given the pointer, from outside the scope of 
>> the function.  That sounds like an illegal operation, exactly as it would be 
>> if you attempted to reference an automatic variable via a pointer from 
>> outside the scope of that variable.
>>
>> Did I misunderstand?
> 
> The most common case is when you pass a call to a nested function
> to some function that has a function pointer argument, e.g. qsort.
> This is well defined with GNU nested functions, but the function that calls
> the callback (qsort in this case) doesn't know it is a call to a nested
> function.
Right.  This is the classic example and highlights the ABI concerns.  If
we use the low bit to distinguish between a normal function pointer and
a pointer to a descriptor and qsort doesn't know about it, then we lose.

One way around this is to make *all* function pointers be some kind of
descriptor and route all indirect calls through a resolver.  THen you
need either linker hackery or special code to compare function pointers
to preserve ISO C behavior.

Note that if you have a nested function and take its address, then go
out of scope of the containing function, then that function pointer is
no longer valid -- which makes perfect sense if you think about it.  THe
trampoline was on the stack and if you go out of scope of the containing
function, then that stack frame is invalid and you also don't have a
suitable frame chain to pass to the nested function either.


Jeff


[PATCH] Micro-optimization to avoid creating temporary path

2018-12-18 Thread Jonathan Wakely

Now that path::operator/=(basic_string_view) works directly
from the string argument, instead of constructing a temporary path from
the string, it's potentially more efficient to do 'path(x) /= s' instead
of 'x / s'. This changes the only relevant place in the library.

* src/filesystem/std-dir.cc (filesystem::_Dir::advance): Append
string to lvalue to avoid creating temporary path.

Tested x86_64-linux, committed to trunk.

commit e48b1b71c76d12f999d9cf4a12239845281a26e4
Author: Jonathan Wakely 
Date:   Tue Dec 18 15:39:12 2018 +

Micro-optimization to avoid creating temporary path

Now that path::operator/=(basic_string_view) works directly
from the string argument, instead of constructing a temporary path from
the string, it's potentially more efficient to do 'path(x) /= s' instead
of 'x / s'. This changes the only relevant place in the library.

* src/filesystem/std-dir.cc (filesystem::_Dir::advance): Append
string to lvalue to avoid creating temporary path.

diff --git a/libstdc++-v3/src/filesystem/std-dir.cc 
b/libstdc++-v3/src/filesystem/std-dir.cc
index 038f635a712..216182a2e56 100644
--- a/libstdc++-v3/src/filesystem/std-dir.cc
+++ b/libstdc++-v3/src/filesystem/std-dir.cc
@@ -61,7 +61,9 @@ struct fs::_Dir : _Dir_base
   {
 if (const auto entp = _Dir_base::advance(skip_permission_denied, ec))
   {
-   entry = fs::directory_entry{path / entp->d_name, get_file_type(*entp)};
+   auto name = path;
+   name /= entp->d_name;
+   entry = fs::directory_entry{name, get_file_type(*entp)};
return true;
   }
 else if (!ec)


[PATCH] LWG 2936: update path::compare logic and optimize string comparisons

2018-12-18 Thread Jonathan Wakely

The resolution for LWG 2936 defines the comparison more precisely, which
this patch implements. The patch also defines comparisons with strings
to work without constructing a temporary path object (so avoids any
memory allocations).

* include/bits/fs_path.h (path::compare(const string_type&))
(path::compare(const value_type*)): Add noexcept and construct a
string view to compare to instead of a path.
(path::compare(basic_string_view)): Add noexcept. Remove
inline definition.
* src/filesystem/std-path.cc (path::_Parser): Track last type read
from input.
(path::_Parser::next()): Return a final empty component when the
input ends in a non-root directory separator.
(path::_M_append(basic_string_view)): Remove special cases
for trailing non-root directory separator.
(path::_M_concat(basic_string_view)): Likewise.
(path::compare(const path&)): Implement LWG 2936.
(path::compare(basic_string_view)): Define in terms of
components returned by parser, consistent with LWG 2936.
* testsuite/27_io/filesystem/path/compare/lwg2936.cc: New.
* testsuite/27_io/filesystem/path/compare/path.cc: Test more cases.
* testsuite/27_io/filesystem/path/compare/strings.cc: Likewise.

Tested x86_64-linux, committed to trunk.

commit 4e56e9d3bb31740fb65d7a46bfa9796809f971f0
Author: Jonathan Wakely 
Date:   Tue Dec 18 14:29:14 2018 +

LWG 2936: update path::compare logic and optimize string comparisons

The resolution for LWG 2936 defines the comparison more precisely, which
this patch implements. The patch also defines comparisons with strings
to work without constructing a temporary path object (so avoids any
memory allocations).

* include/bits/fs_path.h (path::compare(const string_type&))
(path::compare(const value_type*)): Add noexcept and construct a
string view to compare to instead of a path.
(path::compare(basic_string_view)): Add noexcept. Remove
inline definition.
* src/filesystem/std-path.cc (path::_Parser): Track last type read
from input.
(path::_Parser::next()): Return a final empty component when the
input ends in a non-root directory separator.
(path::_M_append(basic_string_view)): Remove special 
cases
for trailing non-root directory separator.
(path::_M_concat(basic_string_view)): Likewise.
(path::compare(const path&)): Implement LWG 2936.
(path::compare(basic_string_view)): Define in terms of
components returned by parser, consistent with LWG 2936.
* testsuite/27_io/filesystem/path/compare/lwg2936.cc: New.
* testsuite/27_io/filesystem/path/compare/path.cc: Test more cases.
* testsuite/27_io/filesystem/path/compare/strings.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index c69001bcc3c..b827d85965e 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -341,9 +341,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 // compare
 
 int compare(const path& __p) const noexcept;
-int compare(const string_type& __s) const;
-int compare(const value_type* __s) const;
-int compare(const basic_string_view __s) const;
+int compare(const string_type& __s) const noexcept;
+int compare(const value_type* __s) const noexcept;
+int compare(basic_string_view __s) const noexcept;
 
 // decomposition
 
@@ -1067,14 +1067,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { return generic_string(); }
 
   inline int
-  path::compare(const string_type& __s) const { return compare(path(__s)); }
+  path::compare(const string_type& __s) const noexcept
+  { return compare(basic_string_view(__s)); }
 
   inline int
-  path::compare(const value_type* __s) const { return compare(path(__s)); }
-
-  inline int
-  path::compare(basic_string_view __s) const
-  { return compare(path(__s)); }
+  path::compare(const value_type* __s) const noexcept
+  { return compare(basic_string_view(__s)); }
 
   inline path
   path::filename() const
diff --git a/libstdc++-v3/src/filesystem/std-path.cc 
b/libstdc++-v3/src/filesystem/std-path.cc
index b5ddbdad149..5b0318c1f58 100644
--- a/libstdc++-v3/src/filesystem/std-path.cc
+++ b/libstdc++-v3/src/filesystem/std-path.cc
@@ -62,6 +62,7 @@ struct path::_Parser
   string_view_type input;
   string_view_type::size_type pos = 0;
   size_t origin;
+  _Type last_type = _Type::_Multi;
 
   _Parser(string_view_type s, size_t o = 0) : input(s), origin(o) { }
 
@@ -129,6 +130,12 @@ struct path::_Parser
pos = input.find_first_not_of(L"/\\", 2);
   }
 #endif
+
+if (root.second.valid())
+  last_type = root.second.type;
+else
+  last_type = root.first.type;
+
 return root;
   }
 
@@ -140,15 +147,30 @@ struct 

[PATCH] LWG 3040: define starts_with/ends_with as proposed

2018-12-18 Thread Jonathan Wakely

* include/std/string_view [__cplusplus > 201703L]
(basic_string_view::starts_with(basic_string_view)): Implement
proposed resolution of LWG 3040 to avoid redundant length check.
(basic_string_view::starts_with(_CharT)): Implement proposed
resolution of LWG 3040 to check at most one character.
(basic_string_view::ends_with(_CharT)): Likewise.

Tested powerpc64le-linux, committed to trunk.

commit 6567fce9525ef70f9beaf03887213f594545c92e
Author: Jonathan Wakely 
Date:   Tue Dec 18 13:53:26 2018 +

LWG 3040 define starts_with/ends_with as proposed

* include/std/string_view [__cplusplus > 201703L]
(basic_string_view::starts_with(basic_string_view)): Implement
proposed resolution of LWG 3040 to avoid redundant length check.
(basic_string_view::starts_with(_CharT)): Implement proposed
resolution of LWG 3040 to check at most one character.
(basic_string_view::ends_with(_CharT)): Likewise.

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index 28d3fa46718..ac84b24314e 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -389,14 +389,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus > 201703L
   constexpr bool
   starts_with(basic_string_view __x) const noexcept
-  {
-   return this->size() >= __x.size()
-   && this->compare(0, __x.size(), __x) == 0;
-  }
+  { return this->substr(0, __x.size()) == __x; }
 
   constexpr bool
   starts_with(_CharT __x) const noexcept
-  { return this->starts_with(basic_string_view(&__x, 1)); }
+  { return !this->empty() && traits_type::eq(this->front(), __x); }
 
   constexpr bool
   starts_with(const _CharT* __x) const noexcept
@@ -411,7 +408,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   constexpr bool
   ends_with(_CharT __x) const noexcept
-  { return this->ends_with(basic_string_view(&__x, 1)); }
+  { return !this->empty() && traits_type::eq(this->back(), __x); }
 
   constexpr bool
   ends_with(const _CharT* __x) const noexcept


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-18 Thread Jakub Jelinek
On Tue, Dec 18, 2018 at 10:23:46AM -0500, Paul Koning wrote:
> 
> 
> > On Dec 17, 2018, at 2:23 PM, Szabolcs Nagy  wrote:
> > 
> > On 17/12/2018 18:22, Uecker, Martin wrote:
> >>> 
> >>> ...
> >> 
> >> So a thread_local static variable for storing the static
> >> chain?
> > 
> > something like that, but the more i think about it the
> > harder it seems: the call site of the nested function
> > may not be under control of the nested function writer,
> > in particular the nested function may be called on a
> > different thread, and extern library apis are unlikely
> > to provide guarantees about this, so in general if a
> > nested function escapes into an extern library then
> > this cannot be relied on, which limits my original
> > idea again to cases where there is no escape (which i
> > think is not that useful).
> 
> I'm not sure I understand "escape" of a nested function pointer. 
> 
> Your description makes it sound like you're talking about a function being 
> called by someone who has been given the pointer, from outside the scope of 
> the function.  That sounds like an illegal operation, exactly as it would be 
> if you attempted to reference an automatic variable via a pointer from 
> outside the scope of that variable.
> 
> Did I misunderstand?

The most common case is when you pass a call to a nested function
to some function that has a function pointer argument, e.g. qsort.
This is well defined with GNU nested functions, but the function that calls
the callback (qsort in this case) doesn't know it is a call to a nested
function.

#include 
#include 

int
main ()
{
  bool r = false;
  auto int cmp (const void *a, const void *b)
  {
const signed char *c = (const signed char *) a;
const signed char *d = (const signed char *) b;
return r ? *c - *d : *d - *c;
  }

  signed char l[8] = { 10, 2, 11, 21, 0, 7, 18, 12 };
  qsort (l, 8, 1, cmp);
}

Jakub


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-18 Thread Paul Koning



> On Dec 17, 2018, at 2:23 PM, Szabolcs Nagy  wrote:
> 
> On 17/12/2018 18:22, Uecker, Martin wrote:
>>> 
>>> ...
>> 
>> So a thread_local static variable for storing the static
>> chain?
> 
> something like that, but the more i think about it the
> harder it seems: the call site of the nested function
> may not be under control of the nested function writer,
> in particular the nested function may be called on a
> different thread, and extern library apis are unlikely
> to provide guarantees about this, so in general if a
> nested function escapes into an extern library then
> this cannot be relied on, which limits my original
> idea again to cases where there is no escape (which i
> think is not that useful).

I'm not sure I understand "escape" of a nested function pointer. 

Your description makes it sound like you're talking about a function being 
called by someone who has been given the pointer, from outside the scope of the 
function.  That sounds like an illegal operation, exactly as it would be if you 
attempted to reference an automatic variable via a pointer from outside the 
scope of that variable.

Did I misunderstand?

paul



Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-18 Thread Bernd Edlinger
On 12/18/18 3:16 PM, Bernd Edlinger wrote:
> Hi,
> 
> while I looked closely at the asm statement in the gdb,
> I realized that the SP clobber forces the function to use
> the frame pointer, and prevents the red zone.  That
> makes the push / pop sequence in the asm statement safe
> to use, as long as the stack is restored to the original
> value.  That can be a quite useful feature.  And that might
> have been the reason why the rsp clobber was chosen in the
> first place.
> 
> This seems to work for all targets, but it started to work
> this way with gcc-6, all versions before that do ignore
> this clobber stmt (as confirmed by godbolt).
> 
> The clobber stmt make the LRA register allocator switch
> frame_pointer_needed to 1, and therefore in all likelihood,
> all targets should use that consistently.
> 
> On 12/17/18 12:47 PM, Richard Sandiford wrote:
>> Dimitar Dimitrov  writes:
>>> On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
 Hi,

 if I understood that right, then clobbering sp is and has always been
 ignored.
>>
>> PR77904 was about the clobber not being ignored, so the behaviour
>> hasn't been consistent.
>>
> 
> I think 77904 was a fall-out from the change in the LRA register allocator.
> The patch referenced in the PR does simply honor frame_pointer_needed,
> which changed with gcc-6, and caused a regression on arm.
> 
>> I'm also not sure it was always ignored in recent sources.  The clobber
>> does get added to the associated rtl insn, and it'd be surprising if
>> that never had an effect.
>>
 If that is right, then I would much prefer a warning, that says exactly
 that, because that would also help to understand why removing that clobber
 statement is safe even for old gcc versions.
>>
>> If the asm does leave sp with a different value, then it's never been safe,
>> regardless of the gcc version.  That's why an error seems more appropriate.
>>
>>> Thank you. Looks like general consensus is to have a warning. See attached
>>> patch that switches the error to a warning.
>>
>> I don't think there's a good reason to treat this differently from the
>> preexisting PIC register error.  If the argument for making it a warning
>> rather than an error is that the asm might happen to work by accident,
>> then the same is true for the PIC register.
>>
> 
> In the light of my findings, I believe with a good warning message that
> explains that the SP needs to be restored to the previous value, that
> is a useful feature, that enables the asm statement to push temporary
> values on the stack which would not be safe otherwise.
> 
> Therefore I propose not to rip it out at this time.
> See my proposed patch.  What do you think?
> 
> Is it OK?
> 
> 

Oops, previous version missed the fix of the PR77904 test case, which is
currently broken too.


Bernd-
2018-12-18  Bernd Edlinger  

	* cfgexpand.c (asm_clobber_reg_is_valid): Emit only a warning together
	with an information message when the stack pointer is clobbered.

testsuite:
2018-12-18  Bernd Edlinger  

	* gcc.target/arm/pr77904.c: Adjust test.
	* gcc.target/i386/pr52813.c: Adjust test.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c	(revision 267164)
+++ gcc/cfgexpand.c	(working copy)
@@ -2854,6 +2854,7 @@ tree_conflicts_with_clobbers_p (tree t, HARD_REG_S
asm clobber operand.  Some HW registers cannot be
saved/restored, hence they should not be clobbered by
asm statements.  */
+
 static bool
 asm_clobber_reg_is_valid (int regno, int nregs, const char *regname)
 {
@@ -2872,11 +2873,23 @@ asm_clobber_reg_is_valid (int regno, int nregs, co
   error ("PIC register clobbered by %qs in %", regname);
   is_valid = false;
 }
-  /* Clobbering the STACK POINTER register is an error.  */
+  /* Clobbering the STACK POINTER register is likely an error.
+ However it is useful to force the use of frame pointer and prevent
+ the use of red zone.  Thus without this clobber, pushing temporary
+ values onto the stack might clobber the red zone or make stack based
+ memory references invalid.  */
   if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM))
 {
-  error ("Stack Pointer register clobbered by %qs in %", regname);
-  is_valid = false;
+  if (warning (0, "Stack Pointer register clobbered by %qs in %",
+		   regname))
+	{
+	  inform (input_location,
+		  "This does likely not do what you would expect."
+		  " The Stack Pointer register still needs to be restored to"
+		  " the previous value, however it is safe to push values onto"
+		  " the stack, when they are popped again from the stack"
+		  " before the asm statement terminates");
+	}
 }
 
   return is_valid;
Index: gcc/testsuite/gcc.target/arm/pr77904.c
===
--- gcc/testsuite/gcc.target/arm/pr77904.c	(revision 267164)
+++ gcc/testsuite/gcc.target/arm/pr77904.c	(working 

Re: [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes (revised, v3)

2018-12-18 Thread Chung-Lin Tang

On 2018/12/11 9:50 PM, Chung-Lin Tang wrote:

On 2018/12/10 6:02 PM, Chung-Lin Tang wrote:

On 2018/12/7 04:57 AM, Thomas Schwinge wrote>> --- 
a/libgomp/plugin/plugin-nvptx.c

+++ b/libgomp/plugin/plugin-nvptx.c



+struct goacc_asyncqueue *
+GOMP_OFFLOAD_openacc_async_construct (void)
+{
+  struct goacc_asyncqueue *aq
+= GOMP_PLUGIN_malloc (sizeof (struct goacc_asyncqueue));
+  aq->cuda_stream = NULL;
+  CUDA_CALL_ASSERT (cuStreamCreate, >cuda_stream, CU_STREAM_DEFAULT);


Curiously (this was the same in the code before): does this have to be
"CU_STREAM_DEFAULT" instead of "CU_STREAM_NON_BLOCKING", because we want
to block anything from running in parallel with "acc_async_sync" GPU
kernels, that use the "NULL" stream?  (Not asking you to change this now,
but I wonder if this is overly strict?)


IIUC, this non-blocking only pertains to the "Legacy Default Stream" in CUDA, 
which we're pretty much ignoring; we should be using the newer per-thread default stream 
model. We could review this issue later.


+  if (aq->cuda_stream == NULL)
+GOMP_PLUGIN_fatal ("CUDA stream create NULL\n");


Can this actually happen, given the "CUDA_CALL_ASSERT" usage above?


Hmm, yeah I think this is superfluous too...


+  CUDA_CALL_ASSERT (cuStreamSynchronize, aq->cuda_stream);


Why is the synchronization needed here?


I don't remember, could likely be something added during debug.
I'll remove this and test if things are okay.


I have removed the above seemingly unneeded lines and re-tested, appears okay.
Also the formerly attached version seemed to for some reason had many conflicts
with older code, all resolved in this v2 nvptx part.


GOMP_OFFLOAD_openacc_async_construct is updated to return NULL for failure,
there's also some adjustments in oacc-async.c, coming next.

Chung-Lin



diff -ru trunk-orig/libgomp/plugin/plugin-nvptx.c 
trunk-work/libgomp/plugin/plugin-nvptx.c
--- trunk-orig/libgomp/plugin/plugin-nvptx.c2018-12-18 18:16:57.804871502 
+0800
+++ trunk-work/libgomp/plugin/plugin-nvptx.c2018-12-18 22:07:43.483068743 
+0800
@@ -1364,16 +1364,12 @@
 struct goacc_asyncqueue *
 GOMP_OFFLOAD_openacc_async_construct (void)
 {
+  CUstream stream = NULL;
+  CUDA_CALL_ERET (NULL, cuStreamCreate, , CU_STREAM_DEFAULT);
+
   struct goacc_asyncqueue *aq
 = GOMP_PLUGIN_malloc (sizeof (struct goacc_asyncqueue));
-  aq->cuda_stream = NULL;
-  CUDA_CALL_ASSERT (cuStreamCreate, >cuda_stream, CU_STREAM_DEFAULT);
-  if (aq->cuda_stream == NULL)
-GOMP_PLUGIN_fatal ("CUDA stream create NULL\n");
-
-  CUDA_CALL_ASSERT (cuStreamSynchronize, aq->cuda_stream);
-
-
+  aq->cuda_stream = stream;
   return aq;
 }
 
Index: libgomp/plugin/cuda/cuda.h
===
--- libgomp/plugin/cuda/cuda.h  (revision 267226)
+++ libgomp/plugin/cuda/cuda.h  (working copy)
@@ -54,7 +54,11 @@ typedef enum {
   CUDA_ERROR_INVALID_CONTEXT = 201,
   CUDA_ERROR_NOT_FOUND = 500,
   CUDA_ERROR_NOT_READY = 600,
-  CUDA_ERROR_LAUNCH_FAILED = 719
+  CUDA_ERROR_LAUNCH_FAILED = 719,
+  CUDA_ERROR_COOPERATIVE_LAUNCH_TOO_LARGE = 720,
+  CUDA_ERROR_NOT_PERMITTED = 800,
+  CUDA_ERROR_NOT_SUPPORTED = 801,
+  CUDA_ERROR_UNKNOWN = 999
 } CUresult;
 
 typedef enum {
@@ -173,6 +177,8 @@ CUresult cuModuleLoadData (CUmodule *, const void
 CUresult cuModuleUnload (CUmodule);
 CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
  CUoccupancyB2DSize, size_t, int);
+typedef void (*CUstreamCallback)(CUstream, CUresult, void *);
+CUresult cuStreamAddCallback(CUstream, CUstreamCallback, void *, unsigned int);
 CUresult cuStreamCreate (CUstream *, unsigned);
 #define cuStreamDestroy cuStreamDestroy_v2
 CUresult cuStreamDestroy (CUstream);
Index: libgomp/plugin/cuda-lib.def
===
--- libgomp/plugin/cuda-lib.def (revision 267226)
+++ libgomp/plugin/cuda-lib.def (working copy)
@@ -42,6 +42,7 @@ CUDA_ONE_CALL (cuModuleLoad)
 CUDA_ONE_CALL (cuModuleLoadData)
 CUDA_ONE_CALL (cuModuleUnload)
 CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
+CUDA_ONE_CALL (cuStreamAddCallback)
 CUDA_ONE_CALL (cuStreamCreate)
 CUDA_ONE_CALL (cuStreamDestroy)
 CUDA_ONE_CALL (cuStreamQuery)
Index: libgomp/plugin/plugin-nvptx.c
===
--- libgomp/plugin/plugin-nvptx.c   (revision 267226)
+++ libgomp/plugin/plugin-nvptx.c   (working copy)
@@ -192,21 +192,18 @@ cuda_error (CUresult r)
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
-struct cuda_map
+/* NVPTX/CUDA specific definition of asynchronous queues.  */
+struct goacc_asyncqueue
 {
-  CUdeviceptr d;
-  size_t size;
-  bool active;
-  struct cuda_map *next;
+  CUstream cuda_stream;
 };
 
-struct ptx_stream
+struct nvptx_callback
 {
-  CUstream stream;
-  pthread_t host_thread;
-  bool multithreaded;
-  struct cuda_map 

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v2)

2018-12-18 Thread Chung-Lin Tang

On 2018/9/25 9:10 PM, Chung-Lin Tang wrote:

Hi Thomas,
These are the OpenACC specific changes, mostly the re-implementation of 
async-related acc_* runtime
library API functions to use the new backend plugin interfaces, in a non-target 
specific way.



Hi Thomas,
this part includes some of the lookup_goacc_asyncqueue fixes we talked about.
I am still thinking about how the queue lock problem should really be solved, 
so regard
this patch as just fixing some of the problems.


diff -ru trunk-orig/libgomp/oacc-async.c trunk-work/libgomp/oacc-async.c
--- trunk-orig/libgomp/oacc-async.c 2018-12-14 22:11:29.252251925 +0800
+++ trunk-work/libgomp/oacc-async.c 2018-12-18 22:19:51.923102938 +0800
@@ -70,12 +70,16 @@
 
   struct gomp_device_descr *dev = thr->dev;
 
+  gomp_mutex_lock (>openacc.async.lock);
+
   if (!create
   && (async >= dev->openacc.async.nasyncqueue
  || !dev->openacc.async.asyncqueue[async]))
-return NULL;
+{
+  gomp_mutex_unlock (>openacc.async.lock);
+  return NULL;
+}
 
-  gomp_mutex_lock (>openacc.async.lock);
   if (async >= dev->openacc.async.nasyncqueue)
 {
   int diff = async + 1 - dev->openacc.async.nasyncqueue;
@@ -91,6 +95,12 @@
 {
   dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func 
();
 
+  if (!dev->openacc.async.asyncqueue[async])
+   {
+ gomp_mutex_unlock (>openacc.async.lock);
+ gomp_fatal ("async %d creation failed", async);
+   }
+  
   /* Link new async queue into active list.  */
   goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
   n->aq = dev->openacc.async.asyncqueue[async];
diff -ru trunk-orig/libgomp/oacc-host.c trunk-work/libgomp/oacc-host.c
--- trunk-orig/libgomp/oacc-host.c  2018-12-14 18:31:07.487203770 +0800
+++ trunk-work/libgomp/oacc-host.c  2018-12-18 22:23:26.771807667 +0800
@@ -266,6 +266,9 @@
 
   .exec_func = host_openacc_exec,
 
+  .create_thread_data_func = host_openacc_create_thread_data,
+  .destroy_thread_data_func = host_openacc_destroy_thread_data,
+
   .async = {
.construct_func = host_openacc_async_construct,
.destruct_func = host_openacc_async_destruct,
@@ -278,9 +281,6 @@
.host2dev_func = host_openacc_async_host2dev,
   },
 
-  .create_thread_data_func = host_openacc_create_thread_data,
-  .destroy_thread_data_func = host_openacc_destroy_thread_data,
-
   .cuda = {
.get_current_device_func = NULL,
.get_current_context_func = NULL,
diff -ru trunk-orig/libgomp/oacc-plugin.c trunk-work/libgomp/oacc-plugin.c
--- trunk-orig/libgomp/oacc-plugin.c2018-12-14 18:31:07.491203745 +0800
+++ trunk-work/libgomp/oacc-plugin.c2018-12-18 22:27:46.047722004 +0800
@@ -30,6 +30,13 @@
 #include "oacc-plugin.h"
 #include "oacc-int.h"
 
+void
+GOMP_PLUGIN_async_unmap_vars (void *ptr __attribute__((unused)),
+ int async __attribute__((unused)))
+{
+  gomp_fatal ("invalid plugin function");
+}
+
 /* Return the target-specific part of the TLS data for the current thread.  */
 
 void *
diff -ru trunk-orig/libgomp/plugin/plugin-nvptx.c 
trunk-work/libgomp/plugin/plugin-nvptx.c
Index: libgomp/oacc-async.c
===
--- libgomp/oacc-async.c(revision 267226)
+++ libgomp/oacc-async.c(working copy)
@@ -27,10 +27,97 @@
.  */
 
 #include 
+#include 
 #include "openacc.h"
 #include "libgomp.h"
 #include "oacc-int.h"
 
+static struct goacc_thread *
+get_goacc_thread (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+gomp_fatal ("no device active");
+
+  return thr;
+}
+
+static struct gomp_device_descr *
+get_goacc_thread_device (void)
+{
+  struct goacc_thread *thr = goacc_thread ();
+
+  if (!thr || !thr->dev)
+gomp_fatal ("no device active");
+
+  return thr->dev;
+}
+
+attribute_hidden struct goacc_asyncqueue *
+lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+{
+  /* The special value acc_async_noval (-1) maps to the thread-specific
+ default async stream.  */
+  if (async == acc_async_noval)
+async = thr->default_async;
+
+  if (async == acc_async_sync)
+return NULL;
+
+  if (async < 0)
+gomp_fatal ("bad async %d", async);
+
+  struct gomp_device_descr *dev = thr->dev;
+
+  gomp_mutex_lock (>openacc.async.lock);
+
+  if (!create
+  && (async >= dev->openacc.async.nasyncqueue
+ || !dev->openacc.async.asyncqueue[async]))
+{
+  gomp_mutex_unlock (>openacc.async.lock);
+  return NULL;
+}
+
+  if (async >= dev->openacc.async.nasyncqueue)
+{
+  int diff = async + 1 - dev->openacc.async.nasyncqueue;
+  dev->openacc.async.asyncqueue
+   = gomp_realloc (dev->openacc.async.asyncqueue,
+   sizeof (goacc_aq) * (async + 1));
+  memset (dev->openacc.async.asyncqueue + 

Re: [PATCH 1/6, OpenACC, libgomp] Async re-work, interfaces (revised, v2)

2018-12-18 Thread Chung-Lin Tang

On 2018/12/15 1:52 AM, Thomas Schwinge wrote:

As for the following changes, will you please make sure that there is one
common order for these, used in "libgomp/libgomp-plugin.h" function
prototypes, "libgomp/libgomp.h:acc_dispatch_t",
"libgomp/target.c:gomp_load_plugin_for_device", "libgomp/oacc-host.c"
function definitions as well as in "host_dispatch", and the
libgomp-plugin(s) themselves (that's all, I think?).


--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -93,22 +107,31 @@ extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void 
*, size_t);
  extern bool GOMP_OFFLOAD_can_run (void *);
  extern void GOMP_OFFLOAD_run (int, void *, void *, void **);
  extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);
+
  extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,
-  void **, int, unsigned *, void *);
-extern void GOMP_OFFLOAD_openacc_register_async_cleanup (void *, int);
-extern int GOMP_OFFLOAD_openacc_async_test (int);
-extern int GOMP_OFFLOAD_openacc_async_test_all (void);
-extern void GOMP_OFFLOAD_openacc_async_wait (int);
-extern void GOMP_OFFLOAD_openacc_async_wait_async (int, int);
-extern void GOMP_OFFLOAD_openacc_async_wait_all (void);
-extern void GOMP_OFFLOAD_openacc_async_wait_all_async (int);
-extern void GOMP_OFFLOAD_openacc_async_set_async (int);
+  void **, unsigned *, void *);
+extern void GOMP_OFFLOAD_openacc_async_exec (void (*) (void *), size_t, void 
**,
+void **, unsigned *, void *,
+struct goacc_asyncqueue *);
+extern struct goacc_asyncqueue *GOMP_OFFLOAD_openacc_async_construct (void);
+extern bool GOMP_OFFLOAD_openacc_async_destruct (struct goacc_asyncqueue *);
+extern int GOMP_OFFLOAD_openacc_async_test (struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_synchronize (struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_serialize (struct goacc_asyncqueue *,
+ struct goacc_asyncqueue *);
+extern void GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue 
*,
+  void (*)(void *), void 
*);
+extern bool GOMP_OFFLOAD_openacc_async_host2dev (int, void *, const void *, 
size_t,
+struct goacc_asyncqueue *);
+extern bool GOMP_OFFLOAD_openacc_async_dev2host (int, void *, const void *, 
size_t,
+struct goacc_asyncqueue *);


This patch revises the ordering of the above functions/hooks to be consistent
across libgomp, and un-deletes goacc_async_unmap_vars in libgomp.map.

Chung-Lin


Index: libgomp/libgomp.h
===
--- libgomp/libgomp.h   (revision 267226)
+++ libgomp/libgomp.h   (working copy)
@@ -949,25 +949,29 @@ typedef struct acc_dispatch_t
   /* Execute.  */
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
-  /* Async cleanup callback registration.  */
-  __typeof (GOMP_OFFLOAD_openacc_register_async_cleanup)
-*register_async_cleanup_func;
-
-  /* Asynchronous routines.  */
-  __typeof (GOMP_OFFLOAD_openacc_async_test) *async_test_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_test_all) *async_test_all_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait) *async_wait_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_async) *async_wait_async_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_all) *async_wait_all_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_wait_all_async)
-*async_wait_all_async_func;
-  __typeof (GOMP_OFFLOAD_openacc_async_set_async) *async_set_async_func;
-
   /* Create/destroy TLS data.  */
   __typeof (GOMP_OFFLOAD_openacc_create_thread_data) *create_thread_data_func;
   __typeof (GOMP_OFFLOAD_openacc_destroy_thread_data)
 *destroy_thread_data_func;
+  
+  struct {
+gomp_mutex_t lock;
+int nasyncqueue;
+struct goacc_asyncqueue **asyncqueue;
+struct goacc_asyncqueue_list *active;
 
+__typeof (GOMP_OFFLOAD_openacc_async_construct) *construct_func;
+__typeof (GOMP_OFFLOAD_openacc_async_destruct) *destruct_func;
+__typeof (GOMP_OFFLOAD_openacc_async_test) *test_func;
+__typeof (GOMP_OFFLOAD_openacc_async_synchronize) *synchronize_func;
+__typeof (GOMP_OFFLOAD_openacc_async_serialize) *serialize_func;
+__typeof (GOMP_OFFLOAD_openacc_async_queue_callback) *queue_callback_func;
+
+__typeof (GOMP_OFFLOAD_openacc_async_exec) *exec_func;
+__typeof (GOMP_OFFLOAD_openacc_async_dev2host) *dev2host_func;
+__typeof (GOMP_OFFLOAD_openacc_async_host2dev) *host2dev_func;
+  } async;
+
   /* NVIDIA target specific routines.  */
   struct {
 __typeof (GOMP_OFFLOAD_openacc_cuda_get_current_device)
@@ -1053,17 +1057,33 @@ enum gomp_map_vars_kind
   GOMP_MAP_VARS_ENTER_DATA
 };
 
-extern void 

[PATCH] Enable scatter vectorization with 128-bit and 256-bit vectors with AVX512VL (PR target/88464)

2018-12-18 Thread Jakub Jelinek
Hi!

We weren't vectorizing with unconditional or masked scatters when
-mprefered-vector-width={128,256}.  While for DI index and DF/DI
stores or SI index and SF/SI stores we even have the builtins,
for the remaining combinations I had to add a few alt builtins (with spaces
in names as in other cases).  I've also renamed the other alt builtin
visible names so that they match the IX86_BUILTIN_* names, they were pretty
confusing before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-18  Jakub Jelinek  

PR target/88464
* config/i386/i386-builtin-types.def
(VOID_FTYPE_PDOUBLE_QI_V8SI_V4DF_INT,
VOID_FTYPE_PFLOAT_QI_V4DI_V8SF_INT,
VOID_FTYPE_PLONGLONG_QI_V8SI_V4DI_INT,
VOID_FTYPE_PINT_QI_V4DI_V8SI_INT,
VOID_FTYPE_PDOUBLE_QI_V4SI_V2DF_INT,
VOID_FTYPE_PFLOAT_QI_V2DI_V4SF_INT,
VOID_FTYPE_PLONGLONG_QI_V4SI_V2DI_INT,
VOID_FTYPE_PINT_QI_V2DI_V4SI_INT): New builtin types.
* config/i386/i386.c (enum ix86_builtins): Add
IX86_BUILTIN_SCATTERALTSIV4DF, IX86_BUILTIN_SCATTERALTDIV8SF,
IX86_BUILTIN_SCATTERALTSIV4DI, IX86_BUILTIN_SCATTERALTDIV8SI,
IX86_BUILTIN_SCATTERALTSIV2DF, IX86_BUILTIN_SCATTERALTDIV4SF,
IX86_BUILTIN_SCATTERALTSIV2DI and IX86_BUILTIN_SCATTERALTDIV4SI.
(ix86_init_mmx_sse_builtins): Fix up names of IX86_BUILTIN_GATHERALT*,
IX86_BUILTIN_GATHER3ALT* and IX86_BUILTIN_SCATTERALT* builtins to
match the IX86_BUILTIN codes.  BuildIX86_BUILTIN_SCATTERALTSIV4DF,
IX86_BUILTIN_SCATTERALTDIV8SF, IX86_BUILTIN_SCATTERALTSIV4DI,
IX86_BUILTIN_SCATTERALTDIV8SI, IX86_BUILTIN_SCATTERALTSIV2DF,
IX86_BUILTIN_SCATTERALTDIV4SF, IX86_BUILTIN_SCATTERALTSIV2DI and
IX86_BUILTIN_SCATTERALTDIV4SI decls.
(ix86_vectorize_builtin_scatter): Expand those new builtins.

* gcc.target/i386/avx512f-pr88464-5.c: New test.
* gcc.target/i386/avx512f-pr88464-6.c: New test.
* gcc.target/i386/avx512f-pr88464-7.c: New test.
* gcc.target/i386/avx512f-pr88464-8.c: New test.
* gcc.target/i386/avx512vl-pr88464-5.c: New test.
* gcc.target/i386/avx512vl-pr88464-6.c: New test.
* gcc.target/i386/avx512vl-pr88464-7.c: New test.
* gcc.target/i386/avx512vl-pr88464-8.c: New test.
* gcc.target/i386/avx512vl-pr88464-9.c: New test.
* gcc.target/i386/avx512vl-pr88464-10.c: New test.
* gcc.target/i386/avx512vl-pr88464-11.c: New test.
* gcc.target/i386/avx512vl-pr88464-12.c: New test.
* gcc.target/i386/avx512vl-pr88464-13.c: New test.
* gcc.target/i386/avx512vl-pr88464-14.c: New test.
* gcc.target/i386/avx512vl-pr88464-15.c: New test.
* gcc.target/i386/avx512vl-pr88464-16.c: New test.

--- gcc/config/i386/i386-builtin-types.def.jj   2018-11-08 18:07:10.298826353 
+0100
+++ gcc/config/i386/i386-builtin-types.def  2018-12-18 11:22:07.965503704 
+0100
@@ -1068,7 +1068,14 @@ DEF_FUNCTION_TYPE (VOID, PFLOAT, HI, V8D
 DEF_FUNCTION_TYPE (VOID, PDOUBLE, QI, V16SI, V8DF, INT)
 DEF_FUNCTION_TYPE (VOID, PINT, HI, V8DI, V16SI, INT)
 DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V16SI, V8DI, INT)
-
+DEF_FUNCTION_TYPE (VOID, PFLOAT, QI, V4DI, V8SF, INT)
+DEF_FUNCTION_TYPE (VOID, PDOUBLE, QI, V8SI, V4DF, INT)
+DEF_FUNCTION_TYPE (VOID, PINT, QI, V4DI, V8SI, INT)
+DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V8SI, V4DI, INT)
+DEF_FUNCTION_TYPE (VOID, PFLOAT, QI, V2DI, V4SF, INT)
+DEF_FUNCTION_TYPE (VOID, PDOUBLE, QI, V4SI, V2DF, INT)
+DEF_FUNCTION_TYPE (VOID, PINT, QI, V2DI, V4SI, INT)
+DEF_FUNCTION_TYPE (VOID, PLONGLONG, QI, V4SI, V2DI, INT)
 
 DEF_FUNCTION_TYPE (V16SF, V16SF, PCVOID, V16SI, HI, INT)
 DEF_FUNCTION_TYPE (V8DF, V8DF, PCVOID, V8SI, QI, INT)
--- gcc/config/i386/i386.c.jj   2018-12-18 10:23:58.751164982 +0100
+++ gcc/config/i386/i386.c  2018-12-18 11:58:18.813311983 +0100
@@ -30072,6 +30072,14 @@ enum ix86_builtins
   IX86_BUILTIN_SCATTERALTDIV16SF,
   IX86_BUILTIN_SCATTERALTSIV8DI,
   IX86_BUILTIN_SCATTERALTDIV16SI,
+  IX86_BUILTIN_SCATTERALTSIV4DF,
+  IX86_BUILTIN_SCATTERALTDIV8SF,
+  IX86_BUILTIN_SCATTERALTSIV4DI,
+  IX86_BUILTIN_SCATTERALTDIV8SI,
+  IX86_BUILTIN_SCATTERALTSIV2DF,
+  IX86_BUILTIN_SCATTERALTDIV4SF,
+  IX86_BUILTIN_SCATTERALTSIV2DI,
+  IX86_BUILTIN_SCATTERALTDIV4SI,
   IX86_BUILTIN_SCATTERDIV16SF,
   IX86_BUILTIN_SCATTERDIV16SI,
   IX86_BUILTIN_SCATTERDIV8DF,
@@ -30879,7 +30887,7 @@ ix86_init_mmx_sse_builtins (void)
V4DF_FTYPE_V4DF_PCDOUBLE_V8SI_V4DF_INT,
IX86_BUILTIN_GATHERALTSIV4DF);
 
-  def_builtin_pure (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gatheraltdiv4sf256 ",
+  def_builtin_pure (OPTION_MASK_ISA_AVX2, "__builtin_ia32_gatheraltdiv8sf ",
V8SF_FTYPE_V8SF_PCFLOAT_V4DI_V8SF_INT,
IX86_BUILTIN_GATHERALTDIV8SF);
 
@@ -30887,7 +30895,7 @@ ix86_init_mmx_sse_builtins (void)
V4DI_FTYPE_V4DI_PCINT64_V8SI_V4DI_INT,
 

Re: [PATCH 2/3] Factor out duplicate code in gimplify_scan_omp_clauses

2018-12-18 Thread Jakub Jelinek
On Sat, Nov 10, 2018 at 09:11:19AM -0800, Julian Brown wrote:
> This patch, created while trying to figure out the open-coded linked-list
> handling in gimplify_scan_omp_clauses, factors out four somewhat
> repetitive portions of that function into two new outlined functions.
> This was done largely mechanically; the actual lines of executed code are
> more-or-less the same.  That means the interfaces to the new functions
> is somewhat eccentric though, and could no doubt be improved.  I've tried
> to add commentary to the best of my understanding, but suggestions for
> improvements are welcome!
> 
> As a bonus, one apparent bug introduced during an earlier refactoring
> to use the polynomial types has been fixed (I think!): "known_eq (o1,
> 2)" should have been "known_eq (o1, o2)".
> 
> Tested alongside other patches in this series and the async patches. OK?
> 
> ChangeLog
> 
>   gcc/
>   * gimplify.c (insert_struct_component_mapping)
>   (check_base_and_compare_lt): New.

I think
* gimplify.c (insert_struct_component_mapping,
check_base_and_compare_lt): New.
is what is used far more often than the above syntax.

> +
> +static tree
> +insert_struct_component_mapping (enum tree_code code, tree c, tree 
> struct_node,
> +  tree prev_node, tree *scp)

Please use a shorter name, like insert_struct_comp_mapping or even
insert_struct_comp_map, to avoid formatting glitches.

> +{
> +  enum gomp_map_kind mkind = (code == OMP_TARGET_EXIT_DATA
> +   || code == OACC_EXIT_DATA)
> +  ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC;

Please use
  enum gomp_map_kind mkind
= ((code == OMP_TARGET_EXIT_DATA || code == OACC_EXIT_DATA)
   ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC);
instead.

> +   int base_eq_orig_base
> + = check_base_and_compare_lt (OMP_CLAUSE_DECL (c),
> + _base, decl, , );

Incorrect formatting, _base needs to be below OMP_CLAUSE_DECL.  So:
  int base_eq_orig_base
= check_base_and_compare_lt (OMP_CLAUSE_DECL (c),
 _base, decl, ,
 );

> + int same_decl_offset_lt
> +   = check_base_and_compare_lt (
> +   OMP_CLAUSE_DECL (*sc), NULL, decl,
> +   , );
> + if (same_decl_offset_lt == -1)

Again, wrong formatting.  If even the first argument doesn't fit, just use
a temporary.
tree sc_decl = OMP_CLAUSE_DECL (*sc);
int same_decl_offset_lt
  = check_base_and_compare_lt (sc_decl, NULL, decl,
   , );

> +   tree cl
> + = insert_struct_component_mapping (code, c, NULL,
> + *prev_list_p, scp);

Also wrong formatting, should be:

  tree cl
= insert_struct_component_mapping (code, c, NULL,
   *prev_list_p,
   scp);

or if the name is shorter, you can fit more.

> if (sc == prev_list_p)
>   {
> *sc = cl;

Otherwise LGTM, but I admit I haven't verified every single statement.

Jakub


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-18 Thread Bernd Edlinger
Hi,

while I looked closely at the asm statement in the gdb,
I realized that the SP clobber forces the function to use
the frame pointer, and prevents the red zone.  That
makes the push / pop sequence in the asm statement safe
to use, as long as the stack is restored to the original
value.  That can be a quite useful feature.  And that might
have been the reason why the rsp clobber was chosen in the
first place.

This seems to work for all targets, but it started to work
this way with gcc-6, all versions before that do ignore
this clobber stmt (as confirmed by godbolt).

The clobber stmt make the LRA register allocator switch
frame_pointer_needed to 1, and therefore in all likelihood,
all targets should use that consistently.

On 12/17/18 12:47 PM, Richard Sandiford wrote:
> Dimitar Dimitrov  writes:
>> On Sun, Dec 16 2018 at 14:36:26 EET Bernd Edlinger wrote:
>>> Hi,
>>>
>>> if I understood that right, then clobbering sp is and has always been
>>> ignored.
> 
> PR77904 was about the clobber not being ignored, so the behaviour
> hasn't been consistent.
> 

I think 77904 was a fall-out from the change in the LRA register allocator.
The patch referenced in the PR does simply honor frame_pointer_needed,
which changed with gcc-6, and caused a regression on arm.

> I'm also not sure it was always ignored in recent sources.  The clobber
> does get added to the associated rtl insn, and it'd be surprising if
> that never had an effect.
> 
>>> If that is right, then I would much prefer a warning, that says exactly
>>> that, because that would also help to understand why removing that clobber
>>> statement is safe even for old gcc versions.
> 
> If the asm does leave sp with a different value, then it's never been safe,
> regardless of the gcc version.  That's why an error seems more appropriate.
> 
>> Thank you. Looks like general consensus is to have a warning. See attached
>> patch that switches the error to a warning.
> 
> I don't think there's a good reason to treat this differently from the
> preexisting PIC register error.  If the argument for making it a warning
> rather than an error is that the asm might happen to work by accident,
> then the same is true for the PIC register.
> 

In the light of my findings, I believe with a good warning message that
explains that the SP needs to be restored to the previous value, that
is a useful feature, that enables the asm statement to push temporary
values on the stack which would not be safe otherwise.

Therefore I propose not to rip it out at this time.
See my proposed patch.  What do you think?

Is it OK?


Thanks
Bernd.
2018-12-18  Bernd Edlinger  

	* cfgexpand.c (asm_clobber_reg_is_valid): Emit only a warning together
	with an information message when the stack pointer is clobbered.

testsuite:
2018-12-18  Bernd Edlinger  

	* gcc.target/i386/pr52813.c: Adjust test.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c	(revision 267164)
+++ gcc/cfgexpand.c	(working copy)
@@ -2854,6 +2854,7 @@ tree_conflicts_with_clobbers_p (tree t, HARD_REG_S
asm clobber operand.  Some HW registers cannot be
saved/restored, hence they should not be clobbered by
asm statements.  */
+
 static bool
 asm_clobber_reg_is_valid (int regno, int nregs, const char *regname)
 {
@@ -2872,11 +2873,23 @@ asm_clobber_reg_is_valid (int regno, int nregs, co
   error ("PIC register clobbered by %qs in %", regname);
   is_valid = false;
 }
-  /* Clobbering the STACK POINTER register is an error.  */
+  /* Clobbering the STACK POINTER register is likely an error.
+ However it is useful to force the use of frame pointer and prevent
+ the use of red zone.  Thus without this clobber, pushing temporary
+ values onto the stack might clobber the red zone or make stack based
+ memory references invalid.  */
   if (overlaps_hard_reg_set_p (regset, Pmode, STACK_POINTER_REGNUM))
 {
-  error ("Stack Pointer register clobbered by %qs in %", regname);
-  is_valid = false;
+  if (warning (0, "Stack Pointer register clobbered by %qs in %",
+		   regname))
+	{
+	  inform (input_location,
+		  "This does likely not do what you would expect."
+		  " The Stack Pointer register still needs to be restored to"
+		  " the previous value, however it is safe to push values onto"
+		  " the stack, when they are popped again from the stack"
+		  " before the asm statement terminates");
+	}
 }
 
   return is_valid;
Index: gcc/testsuite/gcc.target/i386/pr52813.c
===
--- gcc/testsuite/gcc.target/i386/pr52813.c	(revision 267164)
+++ gcc/testsuite/gcc.target/i386/pr52813.c	(working copy)
@@ -1,9 +1,10 @@
 /* Ensure that stack pointer cannot be an asm clobber.  */
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O3 -fomit-frame-pointer" } */
 
 void
 test1 (void)
 {
-  asm volatile ("" : : : "%esp"); /* { 

Re: [PATCH 2/3] Factor out duplicate code in gimplify_scan_omp_clauses

2018-12-18 Thread Julian Brown
On Sat, 10 Nov 2018 09:11:19 -0800
Julian Brown  wrote:

> This patch, created while trying to figure out the open-coded
> linked-list handling in gimplify_scan_omp_clauses, factors out four
> somewhat repetitive portions of that function into two new outlined
> functions. This was done largely mechanically; the actual lines of
> executed code are more-or-less the same.  That means the interfaces
> to the new functions is somewhat eccentric though, and could no doubt
> be improved.  I've tried to add commentary to the best of my
> understanding, but suggestions for improvements are welcome!
> 
> As a bonus, one apparent bug introduced during an earlier refactoring
> to use the polynomial types has been fixed (I think!): "known_eq (o1,
> 2)" should have been "known_eq (o1, o2)".
> 
> Tested alongside other patches in this series and the async patches.
> OK?

Now the main part of the attach/detach support has been conditionally
accepted pending Thomas's approval (thanks!), is this prerequisite part
OK too?

Thanks,

Julian


V7 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-18 Thread H.J. Lu
On Mon, Dec 17, 2018 at 08:53:32AM -0500, Jason Merrill wrote:
> On 12/17/18 7:42 AM, H.J. Lu wrote:
> > On Mon, Dec 17, 2018 at 1:39 AM Richard Biener
> >  wrote:
> > > 
> > > On Fri, Dec 14, 2018 at 11:48 PM H.J. Lu  wrote:
> > > > 
> > > > On Fri, Dec 14, 2018 at 2:10 PM Jason Merrill  wrote:
> > > > > 
> > > > > On 12/13/18 6:56 PM, H.J. Lu wrote:
> > > > > > On Thu, Dec 13, 2018 at 12:50 PM Jason Merrill  
> > > > > > wrote:
> > > > > > > 
> > > > > > > On 9/25/18 11:46 AM, H.J. Lu wrote:
> > > > > > > > On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill 
> > > > > > > >  wrote:
> > > > > > > > > On 07/23/2018 05:24 PM, H.J. Lu wrote:
> > > > > > > > > > 
> > > > > > > > > > On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
> > > > > > > > > > 
> > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > On Mon, 18 Jun 2018, Jason Merrill wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
> > > > > > > > > > > > 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Mon, 18 Jun 2018, Jason Merrill wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > > +  if (TREE_CODE (rhs) == COND_EXPR)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +  /* Check the THEN path first.  */
> > > > > > > > > > > > > > > +  tree op1 = TREE_OPERAND (rhs, 1);
> > > > > > > > > > > > > > > +  context = check_address_of_packed_member 
> > > > > > > > > > > > > > > (type, op1);
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This should handle the GNU extension of re-using 
> > > > > > > > > > > > > > operand 0 if operand
> > > > > > > > > > > > > > 1 is omitted.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Doesn't that just use a SAVE_EXPR?
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Hmm, I suppose it does, but many places in the compiler 
> > > > > > > > > > > > seem to expect
> > > > > > > > > > > > that it produces a COND_EXPR with TREE_OPERAND 1 as 
> > > > > > > > > > > > NULL_TREE.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Maybe that's used somewhere inside the C++ front end.  
> > > > > > > > > > > For C a SAVE_EXPR
> > > > > > > > > > > is produced directly.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Here is the updated patch.  Changes from the last one:
> > > > > > > > > > 
> > > > > > > > > > 1. Handle COMPOUND_EXPR.
> > > > > > > > > > 2. Fixed typos in comments.
> > > > > > > > > > 3. Combined warn_for_pointer_of_packed_member and
> > > > > > > > > > warn_for_address_of_packed_member into
> > > > > > > > > > warn_for_address_or_pointer_of_packed_member.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > c.i:4:33: warning: converting a packed ‘struct C *’ pointer 
> > > > > > > > > > increases the
> > > > > > > > > > alignment of ‘long int *’ pointer from 1 to 8 
> > > > > > > > > > [-Waddress-of-packed-member]
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I think this would read better as
> > > > > > > > > 
> > > > > > > > > c.i:4:33: warning: converting a packed ‘struct C *’ pointer 
> > > > > > > > > (alignment 1) to
> > > > > > > > > ‘long int *’ (alignment 8) may result in an unaligned pointer 
> > > > > > > > > value
> > > > > > > > > [-Waddress-of-packed-member]
> > > > > > > > 
> > > > > > > > Fixed.
> > > > > > > > 
> > > > > > > > > > +  while (TREE_CODE (base) == ARRAY_REF)
> > > > > > > > > > +   base = TREE_OPERAND (base, 0);
> > > > > > > > > > +  if (TREE_CODE (base) != COMPONENT_REF)
> > > > > > > > > > +   return NULL_TREE;
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Are you deliberately not handling the other 
> > > > > > > > > handled_component_p cases? If
> > > > > > > > > so, there should be a comment.
> > > > > > > > 
> > > > > > > > I changed it to
> > > > > > > > 
> > > > > > > > while (handled_component_p (base))
> > > > > > > >{
> > > > > > > >  enum tree_code code = TREE_CODE (base);
> > > > > > > >  if (code == COMPONENT_REF)
> > > > > > > >break;
> > > > > > > >  switch (code)
> > > > > > > >{
> > > > > > > >case ARRAY_REF:
> > > > > > > >  base = TREE_OPERAND (base, 0);
> > > > > > > >  break;
> > > > > > > >default:
> > > > > > > >  /* FIXME: Can it ever happen?  */
> > > > > > > >  gcc_unreachable ();
> > > > > > > >  break;
> > > > > > > >}
> > > > > > > >}
> > > > > > > > 
> > > > > > > > Is there a testcase to trigger this ICE? I couldn't find one.
> > > > > > > 
> > > > > > > You can take the address of an element of complex:
> > > > > > > 
> > > > > > >  __complex int i;
> > > > > > >  int *p = &__real(i);
> > > > > > > 
> > > > > 

Re: [PATCH 1/6, OpenACC, libgomp] Async re-work, interfaces

2018-12-18 Thread Chung-Lin Tang

On 2018/12/18 8:36 PM, Jakub Jelinek wrote:

On Fri, Dec 14, 2018 at 06:52:20PM +0100, Thomas Schwinge wrote:

--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h



@@ -199,7 +200,7 @@ enum gomp_map_kind
  /* Versions of libgomp and device-specific plugins.  GOMP_VERSION
 should be incremented whenever an ABI-incompatible change is introduced
 to the plugin interface defined in libgomp/libgomp.h.  */
-#define GOMP_VERSION   1
+#define GOMP_VERSION   2
  #define GOMP_VERSION_NVIDIA_PTX 1
  #define GOMP_VERSION_INTEL_MIC 0
  #define GOMP_VERSION_HSA 0


OK, I think -- but I'm never quite sure whether we do need to increment
"GOMP_VERSION" when only doing libgomp-internal libgomp-plugin changes,
which don't affect the user/GCC side?

GCC encodes "GOMP_VERSION" in "GOMP_offload_register_ver" calls
synthesized by "mkoffload": "GOMP_VERSION_PACK (/* LIB */ GOMP_VERSION,
/* DEV */ GOMP_VERSION_NVIDIA_PTX)", and then at run time libgomp checks
in "GOMP_offload_register_ver", so that we don't try to load offloading
code with an _old_ libgomp that has been compiled with/for the _new_
version.  (Right?)


To me it looks wrong to tie two different things in the same version number.
Just because we are changing something in the libgomp vs. the corresponding
plugin APIs doesn't mean we need to rebuild all binaries and libraries that
have offloading code in it.


The GOMP_offload_register_ver test is for "> GOMP_VERSION", so a wrt 
GOMP_VERSION's value
a libgomp can be too old, but never too new. It should not require a rebuild of 
programs
with offloading just because of this.


So, IMHO GOMP_VERSION should be bumped only if we do a change that requires
the offloading data to be changed, and either have an additional internal
version to make sure that the plugin are kept in sync with libgomp, or just
figure that out because dlsym will fail on some of the new symbols in the
plugin.


We can of course create a new symbol version number specifically for the 
libgomp/plugin
interface.

I'll update this.


--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -458,7 +462,6 @@ GOMP_PLUGIN_1.0 {
GOMP_PLUGIN_debug;
GOMP_PLUGIN_error;
GOMP_PLUGIN_fatal;
-   GOMP_PLUGIN_async_unmap_vars;
GOMP_PLUGIN_acc_thread;
  };


I think that's fine, but highlighting this again for Jakub, in case
there's an issue with removing a symbol from the libgomp-plugin
interface.


I'd prefer not to remove symbols from libgomp.so.*.  You can
do a gomp_fatal in it.


Okay, then.


--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h



+/* Opaque type to represent plugin-dependent implementation of an
+   OpenACC asynchronous queue.  */
+struct goacc_asyncqueue;
+
+/* Used to keep a list of active asynchronous queues.  */
+struct goacc_asyncqueue_list
+{
+  struct goacc_asyncqueue *aq;
+  struct goacc_asyncqueue_list *next;
+};
+
+typedef struct goacc_asyncqueue *goacc_aq;
+typedef struct goacc_asyncqueue_list *goacc_aq_list;


I'm not too fond of such "syntactic sugar" typedefs, but if that's fine
for Jakub to have in libgomp, then I won't object.


If it helps with making sure the formatting of the code isn't too ugly,
yes, otherwise no.


Thanks, formatting was exactly my intention.

Chung-Lin


I'd be in favor then of "typedef struct N *N" or "typedef struct N *N_t"


Please avoid *_t, that is reserved in POSIX.

Jakub



Re: [PATCH, rs6000] Clarify when typedef names can be used with AltiVec vector types

2018-12-18 Thread Bill Schmidt
On 12/18/18 7:43 AM, Ulrich Weigand wrote:

> Bill Schmidt wrote:
>
>> +@item
>> +When using @code{vector} in keyword-and-predefine mode; for example,
>> +
>> +@smallexample
>> +typedef signed short int16;
>> +vector int16 data;
>> +@end smallexample
>> +
>> +Note that keyword-and-predefine mode is enabled by disabling GNU
>> +extensions (e.g., by using @code{-std=c11}) and including
>> +@code{}.
>> +@end itemize
> This looks correct to me, and I've just verified that the example
> does indeed build with -std=c11 and #include  and fails
> to build without either of these.
>
> Bye,
> Ulrich
>
Thanks!  Committed as r267232.

Bill



Re: [PATCH, rs6000] Clarify when typedef names can be used with AltiVec vector types

2018-12-18 Thread Ulrich Weigand
Bill Schmidt wrote:

> +@item
> +When using @code{vector} in keyword-and-predefine mode; for example,
> +
> +@smallexample
> +typedef signed short int16;
> +vector int16 data;
> +@end smallexample
> +
> +Note that keyword-and-predefine mode is enabled by disabling GNU
> +extensions (e.g., by using @code{-std=c11}) and including
> +@code{}.
> +@end itemize

This looks correct to me, and I've just verified that the example
does indeed build with -std=c11 and #include  and fails
to build without either of these.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH][GCC][AArch64] Have empty HWCAPs string ignored during native feature detection

2018-12-18 Thread Tamar Christina
Hi All,

This patch makes the feature detection code for AArch64 GCC not add features
automatically when the feature had no hwcaps string to match against.

This means that -mcpu=native no longer adds feature flags such as +profile.
The behavior wasn't noticed before because at the time +profile was added a bug
was preventing any feature bits from being added by native detections.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-12-18  Tamar Christina  

PR target/88530
* config/aarch64/aarch64-option-extensions.def: Document it.
* config/aarch64/driver-aarch64.c (host_detect_local_cpu): Skip feature
if empty hwcaps.

gcc/testsuite/ChangeLog:

2018-12-18  Tamar Christina  

PR target/88530
* gcc.target/aarch64/options_set_10.c: New test.

-- 
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index cdf04e2d5fcccb8b9a32af8f83501ce23212bbab..323e642af2e87c2d463681c3a3efbaeff2ede018 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -43,7 +43,8 @@
the extension (for example, the 'crypto' extension depends on four
entries: aes, pmull, sha1, sha2 being present).  In that case this field
should contain a space (" ") separated list of the strings in 'Features'
-   that are required.  Their order is not important.  */
+   that are required.  Their order is not important.  An empty string means
+   do not detect this feature during auto detection.  */
 
 /* Enabling "fp" just enables "fp".
Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2",
diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c
index 98f9d7959506338bd6a8524500a168cc22ef5396..4f386dbf5fc29cc54ff85e062d0b9cd146fa00e8 100644
--- a/gcc/config/aarch64/driver-aarch64.c
+++ b/gcc/config/aarch64/driver-aarch64.c
@@ -253,6 +253,12 @@ host_detect_local_cpu (int argc, const char **argv)
 	  char *p = NULL;
 	  char *feat_string
 		= concat (aarch64_extensions[i].feat_string, NULL);
+
+	  /* If the feature contains no HWCAPS string then ignore it for the
+		 auto detection.  */
+	  if (strlen (feat_string) == 0)
+		continue;
+
 	  bool enabled = true;
 
 	  /* This may be a multi-token feature string.  We need
diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_10.c b/gcc/testsuite/gcc.target/aarch64/options_set_10.c
new file mode 100644
index ..5ffe83c199165dd4129814674297056bdf27cd83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/options_set_10.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target "aarch64*-*-linux*" } } */
+/* { dg-additional-options "-mcpu=native" } */
+
+int main ()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not {\.arch .+\+profile.*} } } */
+
+ /* Check that an empty feature string is not detected during mcpu=native.  */



[committed][MSP430] Define TARGET_VTABLE_ENTRY_ALIGN

2018-12-18 Thread Jozef Lawrynowicz
TARGET_VTABLE_ENTRY_ALIGN defaults to POINTER_SIZE, which is 20 for
msp430-elf -mlarge.

g++.dg/torture/pr41257.C ICEs after the invalid alignment of 20 is set.

> during GIMPLE pass: slp
> gcc/testsuite/g++.dg/torture/pr41257.C: In function 'void bar()':
> gcc/testsuite/g++.dg/torture/pr41257.C:17:6: internal compiler error: in 
> dr_analyze_innermost, at tree-data-ref.c:911
>17 | void bar()
>   |  ^~~
> 0x1427a98 dr_analyze_innermost(innermost_loop_behavior*, tree_node*, loop*, 
> gimple const*)
> gcc/build/../gcc/tree-data-ref.c:910
> 0x14285ab create_data_ref(edge_def*, loop*, tree_node*, gimple*, bool, bool)
> gcc/build/../gcc/tree-data-ref.c:1241
> 0x1428965 find_data_references_in_stmt(loop*, gimple*, vec va_heap, vl_ptr>*)
> gcc/build/../gcc/tree-data-ref.c:5089
> 0x1440d5d vect_find_stmt_data_reference(loop*, gimple*, vec va_heap, vl_ptr>*)
> gcc/build/../gcc/tree-vect-data-refs.c:3949
> 0x1034f0e vect_slp_bb(basic_block_def*)
> gcc/build/../gcc/tree-vect-slp.c:3020
> 0x10381ea execute
> gcc/build/../gcc/tree-vectorizer.c:1295

Pointer alignment is always 16 for MSP430, the attached patch (committed to
trunk) sets TARGET_VTABLE_ENTRY_ALIGN to 16 for MSP430. This fixes the above
ICE.
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 267228)
+++ gcc/ChangeLog	(revision 267229)
@@ -1,3 +1,7 @@
+2018-12-18  Jozef Lawrynowicz  
+
+	* config/msp430/msp430.h: Define TARGET_VTABLE_ENTRY_ALIGN.
+
 2018-12-18  Jakub Jelinek  
 
 	PR target/88513
Index: gcc/config/msp430/msp430.h
===
--- gcc/config/msp430/msp430.h	(revision 267228)
+++ gcc/config/msp430/msp430.h	(revision 267229)
@@ -159,6 +159,11 @@
 #define PTR_SIZE			(TARGET_LARGE ? 4 : 2)
 #define	POINTERS_EXTEND_UNSIGNED	1
 
+/* TARGET_VTABLE_ENTRY_ALIGN defaults to POINTER_SIZE, which is 20 for
+   TARGET_LARGE.  Pointer alignment is always 16 for MSP430, so set explicitly
+   here.  */
+#define TARGET_VTABLE_ENTRY_ALIGN 16
+
 #define ADDR_SPACE_NEAR	1
 #define ADDR_SPACE_FAR	2
 


Re: [PATCH, rs6000] Clarify when typedef names can be used with AltiVec vector types

2018-12-18 Thread Bill Schmidt
On 12/18/18 4:33 AM, Ulrich Weigand wrote:

> Bill Schmidt wrote:
>
>> +@item
>> +When using vector in keyword-and-predefine mode; for example,
>> +
>> +@smallexample
>> +/* With -maltivec only: */
> This is a bit confusing (at least to me).  What does "with -maltivec only"
> mean here?  Just adding -maltivec will *not* switch to keyword-and-
> predefine mode, as far as I can tell.  Rather, to switch to that mode
> you'll have to disable GNU extensions, e.g. via -std=c11, and then
> include  to get the predefine.
>
> Bye,
> Ulrich
>
Sorry about that.  Here's another try, also verified on powerpc64le-linux-gnu.
Is this ok?

Thanks,
Bill


2018-12-18  Bill Schmidt  

* doc/extend.texi (PowerPC Altivec/VSX Built-in Functions):
Describe when a typedef name can be used as the type specifier for
a vector type, and when it cannot.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 265974)
+++ gcc/doc/extend.texi (working copy)
@@ -16229,9 +16229,32 @@ disabled.  To use them, you must include @code{}.
+@end itemize
+
+@item
 For C, overloaded functions are implemented with macros so the following
 does not work:
 



Re: [patch] various OpenACC reduction enhancements - FE changes

2018-12-18 Thread Jakub Jelinek
On Thu, Dec 13, 2018 at 02:11:31PM +, Julian Brown wrote:
> > Any reason for the above (ditto in C), rather than just adding
> > && ort != C_ORT_ACC to the while loop condition for CPP_OPEN_SQUARE?
> > (, . or * after id-expression is like any other unhandled
> > characters...
> 
> I think the reason was that 'decl' ('t' in the C version) is not set to
> error_mark_node if the while loop is skipped, and then the gimplifier
> gets confused. I've tried to tackle this in another way, by checking
> there aren't any stray characters before the next comma or
> close-parenthesis.
> 
> I'm not sure if you were objecting to the error message too -- with the
> current patch, the user will just get e.g.:
> 
> error: expected ')' before '.' token
> 
> if they try to use an unsupported type of construct as a reduction
> target.

> @@ -12004,7 +12005,8 @@ c_parser_omp_variable_list (c_parser *parser,
>   case OMP_CLAUSE_REDUCTION:
>   case OMP_CLAUSE_IN_REDUCTION:
>   case OMP_CLAUSE_TASK_REDUCTION:
> -   while (c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
> +   while (ort != C_ORT_ACC
> +  && c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
>   {
> tree low_bound = NULL_TREE, length = NULL_TREE;
>  
> @@ -12074,6 +12076,10 @@ c_parser_omp_variable_list (c_parser *parser,
>   }
>   }
>   }
> +   if (ort == C_ORT_ACC
> +   && c_parser_next_token_is_not (parser, CPP_COMMA)
> +   && c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN))
> + t = error_mark_node;
> break;
>   default:
> break;

I still don't understand this at all, sorry.
So, t is guaranteed to be non-error_mark_node before entering this spot.
If you have reduction (decl[0]) etc. vs. reduction (decl), why do you care 
whether
it is added to the returned list or not for error recovery?  If it is something
that causes ICE in the gimplifier, then user could have written just
reduction (decl) or reduction (decl, ) and have it added to the list anyway,
so the bug would be that it isn't diagnosed as something incorrect in
c_finish_omp_clauses (or whatever the problem with it is).
If there is any kind of garbage after the decl, it will just return to the
caller at that point and the caller should do the error recovery, the same
for reduction (decl[0]) as well as for reduction (decl, [0]).

Jakub


Re: [PATCH, arm][PR88167] Fix __builtin_return_address returns invalid address

2018-12-18 Thread Mihail Ionescu



On 12/18/2018 09:32 AM, Mihail Ionescu wrote:

Hi All,

In Thumb mode when the function prologue gets expanded, in case of a 
multiple register push, additional mov instructions are generated to 
save the high registers which result in lr getting overwritten before 
it's value can be used to retrieve the return address.


The fix consists of detecting if lr is alive after the prologue, in 
which case, the lr register won't be used as a scratch.


Regression tested on arm-none-eabi.

gcc/ChangeLog:
2018-11-23  Mihail Ionescu  

 PR target/88167
 * config/arm/arm.c: Add lr liveness check.

gcc/testsuite/ChangeLog
2018-11-23  Mihail Ionescu  

 PR target/88167
 * gcc.target/arm/pr88167.c: New test.

If everything is ok for trunk, could someone commit it on my behalf?

Best regards,
    Mihail


Hi All,

Sorry, I forgot to attach the diff.

Regards,
 Mihail
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
8393f0b87f34c04c9dcc89c63d2e9bbd042c969c..b5c5942791530bc83f54ec96ed3c9c3838080e0f
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25186,7 +25186,10 @@ thumb1_expand_prologue (void)
 even if they can be pushed.  This is to avoid using them to stash the 
high
 registers.  Such kind of stash may clobber the use of arguments.  */
   pushable_regs = l_mask & (~arg_regs_mask);
-  if (lr_needs_saving)
+  bool lr_alive = REGNO_REG_SET_P (df_get_live_out (
+   ENTRY_BLOCK_PTR_FOR_FN (cfun)), 
LR_REGNUM);
+
+  if (lr_needs_saving || lr_alive)
pushable_regs &= ~(1 << LR_REGNUM);
 
   if (pushable_regs == 0)
diff --git a/gcc/testsuite/gcc.target/arm/pr88167.c 
b/gcc/testsuite/gcc.target/arm/pr88167.c
new file mode 100644
index 
..e0023716e0010ef3f09878fd6fa1a70f727228b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr88167.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-options "-mcpu=cortex-m0 -O2" }  */
+
+__attribute__ ((used))
+void *retaddr;
+
+__attribute__ ((noinline))
+void foo (void) {
+  retaddr = __builtin_return_address (0);
+
+  /* Used for enforcing registers stacking.  */
+  asm volatile ("" : : : "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
+"r8", "r9", "r10", "r11", "r12");
+}
+
+/* { dg-final { scan-assembler-not "mov\tlr," } } */


Re: [PATCH, OpenACC, 4/8] Multi-dimensional dynamic array support for OpenACC data clauses, omp-low: dynamic array descriptor creation

2018-12-18 Thread Jakub Jelinek
On Thu, Dec 13, 2018 at 10:52:32PM +0800, Chung-Lin Tang wrote:
> --- gcc/omp-low.c (revision 267050)
> +++ gcc/omp-low.c (working copy)
> @@ -60,6 +60,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "hsa-common.h"
>  #include "stringpool.h"
>  #include "attribs.h"
> +#include "tree-hash-traits.h"
>  
>  /* Lowering of OMP parallel and workshare constructs proceeds in two
> phases.  The first phase scans the function looking for OMP statements
> @@ -133,6 +134,9 @@ struct omp_context
>  
>/* True if this construct can be cancelled.  */
>bool cancellable;
> +
> +  /* Hash map of dynamic arrays in this context.  */
> +  hash_map *dynamic_arrays;

You still call it dynamic arrays.  Call it array descriptors or something
similar.  In the comment too.

>  
> +/* Helper function for create_dynamic_array_descr_type(), to append a new 
> field

Here too and many other spots.

> +  tree da_descr_type, name, x;

Even here.

> +  append_field_to_record_type (da_descr_type, get_identifier ("$dim_num"),
> +sizetype);

Why the $s in the identifiers?  Use . or __ if it shouldn't be user
accessible.  Think whether you want it to be in debuginfo or not, if not,
it should be DECL_IGNORED_P.

Jakub


Re: [patch] Fix bootstrap powerpc*-*-freebsd* targets

2018-12-18 Thread Alan Modra
On Tue, Dec 18, 2018 at 03:20:02AM -0600, Segher Boessenkool wrote:
> Hi Alan,
> 
> On Tue, Dec 18, 2018 at 10:39:27AM +1030, Alan Modra wrote:
> > On Mon, Dec 17, 2018 at 11:05:57AM -0600, Segher Boessenkool wrote:
> > > On Mon, Dec 17, 2018 at 10:40:01AM +1030, Alan Modra wrote:
> > > > Since I broke powerpc*-freebsd and the other non-linux powerpc
> > > > targets, I guess I ought to fix them.  The following is a variation on
> > > > your first patch, that results in -mcall-linux for powerpc-freebsd*
> > > > providing the 32-bit powerpc-linux dynamic linker.
> > > 
> > > That, like the first patch, abuses that header file.  Please do it
> > > somewhere sane instead, not in a random subtarget file?
> > 
> > Is there is a better place, currently?  sysv4.h contains a mess of OS
> > related defines already, to support various -mcall options.  If those
> > stay in sysv4.h I can't see a better place for the fall-back
> > GNU_USER_DYNAMIC_LINKER define.
> 
> I was hoping you would untangle it a bit.  My dastardly plan failed,
> apparently.  Drat.

Me untangling some of the linux bits was what caused the problem..

I think that -mcall-linux, -mcall-freebsd, -mcall-netbsd and
-mcall-openbsd should be deprecated.  That would make it possible to
put the OS specific defines where they belong.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH, OpenACC] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-12-18 Thread Jakub Jelinek
On Thu, Dec 13, 2018 at 03:44:25PM +, Julian Brown wrote:
> +static tree
> +convert_to_firstprivate_int (tree var, gimple_seq *gs)
> +{
> +  tree type = TREE_TYPE (var), new_type = NULL_TREE;
> +  tree tmp = NULL_TREE;
> +
> +  if (omp_is_reference (var))
> +type = TREE_TYPE (type);
> +
> +  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
> +{
> +  if (omp_is_reference (var))
> + {
> +   tmp = create_tmp_var (type);
> +   gimplify_assign (tmp, build_simple_mem_ref (var), gs);
> +   var = tmp;
> + }
> +
> +  return fold_convert (pointer_sized_int_node, var);
> +}
> +
> +  gcc_assert (tree_to_uhwi (TYPE_SIZE (type)) <= POINTER_SIZE);
> +
> +  new_type = lang_hooks.types.type_for_size (tree_to_uhwi (TYPE_SIZE (type)),
> +  true);
> +
> +  if (omp_is_reference (var))
> +{
> +  tmp = create_tmp_var (type);
> +  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
> +  var = tmp;
> +}

Why are you duplicating this if?  Can't you just do it before the
  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
test once, even better in the same if as you do type = TREE_TYPE (type); ?

Otherwise ok from me, but please check with Thomas if he is ok with it too.

Jakub


Re: [PATCH][testsuite] Enable vect_usad_char effective target for non-SVE aarch64

2018-12-18 Thread Richard Biener
On Tue, Dec 18, 2018 at 12:40 PM Kyrill Tkachov
 wrote:
>
> Hi all,
>
> In GCC 9 the aarch64 port learned how to do V16QImode SAD operations on 
> signed and unsigned chars.
> But I had missed enabling the effective target for that.
> This patch enables that target for non-SVE aarch64.
> Two new tests now PASS on aarch64:
> gcc.dg/vect/slp-reduc-sad.c
> gcc.dg/vect/vect-reduc-sad.c
>
> Ok for trunk?

OK.

> Thanks,
> Kyrill
>
> P.S. I notice that powerpc and mips also implement the necessary optabs.
> Perhaps the maintainers would like to check that the tests above pass on 
> their appropriate (sub)targets and
> enable this effective target appropriately.
>
> 2018-12-18  Kyrylo Tkachov  
>
>  * lib/target-supports.exp (check_effective_target_vect_usad_char):
>  Add non-SVE aarch64 to supported list.


Re: [PATCH 1/6, OpenACC, libgomp] Async re-work, interfaces

2018-12-18 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 06:52:20PM +0100, Thomas Schwinge wrote:
> > --- a/include/gomp-constants.h
> > +++ b/include/gomp-constants.h
> 
> > @@ -199,7 +200,7 @@ enum gomp_map_kind
> >  /* Versions of libgomp and device-specific plugins.  GOMP_VERSION
> > should be incremented whenever an ABI-incompatible change is introduced
> > to the plugin interface defined in libgomp/libgomp.h.  */
> > -#define GOMP_VERSION   1
> > +#define GOMP_VERSION   2
> >  #define GOMP_VERSION_NVIDIA_PTX 1
> >  #define GOMP_VERSION_INTEL_MIC 0
> >  #define GOMP_VERSION_HSA 0
> 
> OK, I think -- but I'm never quite sure whether we do need to increment
> "GOMP_VERSION" when only doing libgomp-internal libgomp-plugin changes,
> which don't affect the user/GCC side?
> 
> GCC encodes "GOMP_VERSION" in "GOMP_offload_register_ver" calls
> synthesized by "mkoffload": "GOMP_VERSION_PACK (/* LIB */ GOMP_VERSION,
> /* DEV */ GOMP_VERSION_NVIDIA_PTX)", and then at run time libgomp checks
> in "GOMP_offload_register_ver", so that we don't try to load offloading
> code with an _old_ libgomp that has been compiled with/for the _new_
> version.  (Right?)

To me it looks wrong to tie two different things in the same version number.
Just because we are changing something in the libgomp vs. the corresponding
plugin APIs doesn't mean we need to rebuild all binaries and libraries that
have offloading code in it.
So, IMHO GOMP_VERSION should be bumped only if we do a change that requires
the offloading data to be changed, and either have an additional internal
version to make sure that the plugin are kept in sync with libgomp, or just
figure that out because dlsym will fail on some of the new symbols in the
plugin.

> > --- a/libgomp/libgomp.map
> > +++ b/libgomp/libgomp.map
> > @@ -458,7 +462,6 @@ GOMP_PLUGIN_1.0 {
> > GOMP_PLUGIN_debug;
> > GOMP_PLUGIN_error;
> > GOMP_PLUGIN_fatal;
> > -   GOMP_PLUGIN_async_unmap_vars;
> > GOMP_PLUGIN_acc_thread;
> >  };
> 
> I think that's fine, but highlighting this again for Jakub, in case
> there's an issue with removing a symbol from the libgomp-plugin
> interface.

I'd prefer not to remove symbols from libgomp.so.*.  You can
do a gomp_fatal in it.
> 
> 
> > --- a/libgomp/libgomp-plugin.h
> > +++ b/libgomp/libgomp-plugin.h
> 
> > +/* Opaque type to represent plugin-dependent implementation of an
> > +   OpenACC asynchronous queue.  */
> > +struct goacc_asyncqueue;
> > +
> > +/* Used to keep a list of active asynchronous queues.  */
> > +struct goacc_asyncqueue_list
> > +{
> > +  struct goacc_asyncqueue *aq;
> > +  struct goacc_asyncqueue_list *next;
> > +};
> > +
> > +typedef struct goacc_asyncqueue *goacc_aq;
> > +typedef struct goacc_asyncqueue_list *goacc_aq_list;
> 
> I'm not too fond of such "syntactic sugar" typedefs, but if that's fine
> for Jakub to have in libgomp, then I won't object.

If it helps with making sure the formatting of the code isn't too ugly,
yes, otherwise no.

> I'd be in favor then of "typedef struct N *N" or "typedef struct N *N_t"

Please avoid *_t, that is reserved in POSIX.

Jakub


Re: [PATCH] OpenACC 2.6 manual deep copy support (attach/detach)

2018-12-18 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 07:00:30PM +, Julian Brown wrote:
> OpenACC 2.6 manual deep copy support (attach/detach)
> 
>   gcc/c-family/
>   * c-pragma.h (pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_ATTACH,
>   PRAGMA_OACC_CLAUSE_DETACH.

and instead of , ?
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -1176,6 +1176,7 @@ extern void c_omp_split_clauses (location_t, enum 
> tree_code, omp_clause_mask,
>  extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
>  extern void c_omp_declare_simd_clauses_to_decls (tree, tree);
>  extern enum omp_clause_default_kind c_omp_predetermined_sharing (tree);
> +extern const char * c_omp_map_clause_name (tree, bool);

No space after * in this case.

> +const char *
> +c_omp_map_clause_name (tree clause, bool oacc)
> +{
> +  if (oacc && OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP)
> +switch (OMP_CLAUSE_MAP_KIND (clause))
> +{
> +case GOMP_MAP_FORCE_ALLOC:
> +case GOMP_MAP_ALLOC: return "create";
> +case GOMP_MAP_FORCE_TO:
> +case GOMP_MAP_TO: return "copyin";
> +case GOMP_MAP_FORCE_FROM:
> +case GOMP_MAP_FROM: return "copyout";
> +case GOMP_MAP_FORCE_TOFROM:
> +case GOMP_MAP_TOFROM: return "copy";
> +case GOMP_MAP_RELEASE: return "delete";
> +case GOMP_MAP_FORCE_PRESENT: return "present";
> +case GOMP_MAP_ATTACH: return "attach";
> +case GOMP_MAP_FORCE_DETACH:
> +case GOMP_MAP_DETACH: return "detach";
> +case GOMP_MAP_DEVICE_RESIDENT: return "device_resident";
> +case GOMP_MAP_LINK: return "link";
> +case GOMP_MAP_FORCE_DEVICEPTR: return "deviceptr";
> +default:;

Please use default: break; instead.

>for (i = 0; i < tgt->list_count; i++)
>  {
>splay_tree_key k = tgt->list[i].key;
> +
>if (k == NULL)
>   continue;

Why the blank change?

Otherwise LGTM, if Thomas is ok with it.

Jakub


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-18 Thread Thomas Schwinge
Hi Chung-Lin!

On Tue, 18 Dec 2018 18:02:54 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/17 10:32 PM, Thomas Schwinge wrote:
> >> The reason there are deadlocks from inside the plugin on 
> >> GOMP_PLUGIN_fatal() is when we hold the
> >> struct gomp_device_descr's*device*  lock, which is also acquired when we 
> >> execute atexit device shutdown handlers, hence the deadlock.
> >>
> >> I don't think this is the case for the OpenACC entry points that grab at 
> >> the openacc.async.* hooks,
> > Ah, OK, I see.  (But I thought that method of deadlock had been fixed by
> > some structural changes, to have plugin functions call the
> > non-terminating "GOMP_PLUGIN_error" and return some error, instead of
> > calling "GOMP_PLUGIN_fatal"?  I may be misremembering.  Or, that's
> > another TODO item for later, separately...  Or, if that's actually the
> > case, that this has been fixed in the way I described, then should these
> > functions also be changed accordingly: instead of "GOMP_PLUGIN_fatal"
> > call "GOMP_PLUGIN_error", and then return an error code?)
> 
> You remembered correctly, although...
> 
> >> though I can audit them again if deemed so.
> > My understanding had been that deadlock may happen if we're inside some
> > of these async/wait/serialize/synchronize functions, with "async" locked,
> > then run into an error, then libgomp prepares to abort, and at that time
> > shuts down the device, which will shut down the asyncqueues
> > ("goacc_fini_asyncqueues"), which will again try to lock "async" -- which
> > it actually doesn't.  My misunderstanding, I guess?
> 
> ...at least now, you can see that goacc_fini_asyncqueues() does not attempt to
> acquire devicep->openacc.async.lock when doing cleanup.
> 
> Come to think of it, that might be a bug there. :P

Heh, I wondered about that, too.  ;-)

An asyncqueue as returned by "lookup_goacc_asyncqueue" itself is not
locked (and I suppose it shouldn't be, because that would be "too
much"?), so it may -- at least (only?) in a specially constructed test
case -- happen that an asyncqueue gets destructed
("goacc_fini_asyncqueues") while it's still in use?  (Don't know how the
CUDA Driver library thinks of that, for example.  Though, probably, that
scenario can only happen if the device used by a certain host thread is
shut down while an "async" operation is still running.

But, can we easily avoid that issue by calling
"openacc.async.synchronize_func" before "openacc.async.destruct_func"
(or, have the latter do that internally)?  Just have to make sure that
any such synchonization then doesn't raise (potentially) nested
"GOMP_PLUGIN_fatal" calls.  Hence the TODO comment I added in my "into
async re-work: locking concerns" commit, before the
"openacc.async.destruct_func" call: "Can/should/must we "synchronize"
here (how?), so as to make sure that no other operation on this
asyncqueue is going on while/after we've destructed it here?"

Probably an error-ignoring "cuStreamSynchronize" call before
"cuStreamDestroy" would be reasonable?


Oh, and don't we have another problem...  Consider an "acc_shutdown" run
by host thread 1, while another host thread 2 continues to use the
device-wide queues.  That "acc_shutdown" will call "gomp_fini_device",
which will call "goacc_fini_asyncqueues", which will happily destruct the
whole device-wide "async" data.  Just taking the "async" lock before
doing that won't solve the problem, as host thread 2 is supposed to
continue using the existing queues.  Reference counting required?

Anyway: I'm not asking you to fix that now.  "Fortunately", we're not at
all properly implementing OpenACC usage in context of multiple host
threads (such as created by OpenMP or the pthreads interface), so I'm
just noting that issue now, to be resolved later (as part of our internal
tracker issues CSTS-110 or CSTS-115).


Anyway:

> >> "If there are two or more host threads executing and sharing the same 
> >> accelerator device,
> >> two asynchronous operations with the same async-value will be enqueued on 
> >> the same activity queue"
> > Right, but then, in the very next sentence, it goes on to state: "If the
> > threads are not synchronized with respect to each other, the operations
> > may be enqueued in either order and therefore may execute on the device
> > in either order".  So this, and given that:
> 
> I actually didn't care much about that next sentence, since it's just stating 
> the obvious :)

;-)

> It also seem to imply that the multiple host threads are enqueuing operations 
> to the same async queue, hence further
> corroborating that queues are device-wide, not thread.

OK, that's your (certainly valid) interpretation; mine was to make our
life simpler:

> >> That said, I recall most (if not all) of the synchronization operations 
> >> and behavior are all
> >> defined to be with respect to operations of the local host thread only, so 
> >> the spec mentioning interaction with
> >> other host threads here may be moot, as there's no 

[PATCH][testsuite] Enable vect_usad_char effective target for non-SVE aarch64

2018-12-18 Thread Kyrill Tkachov

Hi all,

In GCC 9 the aarch64 port learned how to do V16QImode SAD operations on signed 
and unsigned chars.
But I had missed enabling the effective target for that.
This patch enables that target for non-SVE aarch64.
Two new tests now PASS on aarch64:
gcc.dg/vect/slp-reduc-sad.c
gcc.dg/vect/vect-reduc-sad.c

Ok for trunk?

Thanks,
Kyrill

P.S. I notice that powerpc and mips also implement the necessary optabs.
Perhaps the maintainers would like to check that the tests above pass on their 
appropriate (sub)targets and
enable this effective target appropriately.

2018-12-18  Kyrylo Tkachov  

* lib/target-supports.exp (check_effective_target_vect_usad_char):
Add non-SVE aarch64 to supported list.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5026c5906cd..76779ff7d70 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5925,7 +5925,10 @@ proc check_effective_target_vect_udot_hi { } {
 
 proc check_effective_target_vect_usad_char { } {
 return [check_cached_effective_target_indexed vect_usad_char {
-  expr { [istarget i?86-*-*] || [istarget x86_64-*-*] }}]
+  expr { [istarget i?86-*-*]
+	  || [istarget x86_64-*-*]
+	  || ([istarget aarch64*-*-*]
+		  && ![check_effective_target_aarch64_sve])}}]
 }
 
 # Return 1 if the target plus current options supports both signed


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-18 Thread Bin.Cheng
On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen  wrote:
>
> "bin.cheng"  writes:
>
> > Hi,
> >
> > Due to ICE and mal-functional bugs, indirect call value profile 
> > transformation
> > is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> > main issue is AutoFDO should store cgraph_node's profile_id of callee func 
> > in
> > the first histogram value's counter, rather than pointer to callee's name 
> > string
> > as it is now.
> > With the patch, some "Indirect call -> direct call" tests pass with 
> > autofdo, while
> > others are unstable.  I think the instability is caused by poor perf data 
> > collected
> > during regrets run, and can confirm these tests pass if good perf data 
> > could be
> > collected in manual experiments.
>
> Would be good to make the tests stable, otherwise we'll just have
> regressions in the future again.
>
> The problem is that the tests don't run long enough and don't get enough 
> samples?
Yes, take g++.dg/tree-prof/morefunc.C as an example:
-  int i;
-  for (i = 0; i < 1000; i++)
+  int i, j;
+  for (i = 0; i < 100; i++)
+for (j = 0; j < 50; j++)
  g += tc->foo();
if (g<100) g++;
 }
@@ -27,8 +28,9 @@ void test1 (A *tc)
 static __attribute__((always_inline))
 void test2 (B *tc)
 {
-  int i;
+  int i, j;
   for (i = 0; i < 100; i++)
+for (j = 0; j < 50; j++)

I have to increase loop count like this to get stable pass on my
machine.  The original count (1000) is too small to be sampled.

>
> Could add some loop?
> Or possibly increase the sampling frequency in perf (-F or -c)?
Maybe, I will have a try.
> Or run them multiple times and use gcov_merge to merge the files?
Without changing loop count or sampling frequency, this is not likely
to be helpful, since perf doesn't hit the small loop in most cases.

Thanks,
bin
>
>
> > FYI, an update about AutoFDO status:
> > All AutoFDO ICEs in regtest are fixed, while several tests still failing 
> > fall in below
> > three categories:
>
> Great!
>
> Of course it still ICEs with LTO?
>
> Right now there is no test case for this I think. Probably one should be 
> added.
>
> -Andi


Re: [PATCH] Fix AVX512VL gather ICEs (PR target/88513, PR target/88514)

2018-12-18 Thread Richard Biener
On Mon, 17 Dec 2018, Jakub Jelinek wrote:

> Hi!
> 
> Some of the following testcases ICE, because I was assuming that
> VEC_UNPACK_{LO,HI}_EXPR and VEC_PACK_TRUNC_EXPR just work on the
> VECTOR_BOOLEAN_TYPE_P mask types that AVX512* has (with scalar modes),
> but they really only work if the wider mode is different from the narrower
> one, so e.g. one can extract the lo or hi half of a nunits 16 
> VECTOR_BOOLEAN_TYPE_P
> type with VEC_UNPACK_*_EXPR to have a HImode -> QImode expander, or
> combine two nunits 8 halves into one 16 nunits one, i.e. QImode + QImode ->
> HImode with VEC_PACK_TRUNC_EXPR, but for bitmasks with fewer bits it is
> ambigious - either we need to extract lo/hi half of nunits 8
> VECTOR_BOOLEAN_TYPE_P or lo/hi half of nunits 4 VECTOR_BOOLEAN_TYPE_P, both
> would be QImode -> QImode operation and from the mode one can't figure out
> which one is which.  For VEC_PACK_TRUNC_EXPR it is even more complicated,
> because we use the operand mode as the name in the optab, so
> vec_pack_trunc_qi is already used the the 8 + 8 -> 16 nunits one which gives
> a HImode result and there is nothing left for the 4 + 4 -> 8 and 2 + 2 -> 4.
> 
> When not assuming it works, e.g. if I just used
> supportable_widening_operation in the gather and scatter INTEGRAL_TYPE_P
> masktype handling code, it would lead just to missed optimizations, because
> the vectorizer just punted in those cases (note, it doesn't affect
> -mprefer-vector-width=512 case, because even for DF/DImode we use 8 bits
> already, but the cases when AVX512VL is used with
> -mprefer-vector-width={128,256}.
> 
> The following patch introduces 3 new optabs which are like
> vec_pack_trunc_optab resp. vec_unpacks_{lo,hi}_optab, except that their
> expanders take another argument - CONST_INT representing the number of units
> in the wider of the two bitmask (VECTOR_BOOLEAN_TYPE_P) types and is meant
> to be used for the cases where both modes have different
> TYPE_VECTOR_SUBPARTS, but the same TYPE_MODE.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2018-12-17  Jakub Jelinek  
> 
>   PR target/88513
>   PR target/88514
>   * optabs.def (vec_pack_sbool_trunc_optab, vec_unpacks_sbool_hi_optab,
>   vec_unpacks_sbool_lo_optab): New optabs.
>   * optabs.c (expand_widen_pattern_expr): Use vec_unpacks_sbool_*_optab
>   and pass additional argument if both input and target have the same
>   scalar mode of VECTOR_BOOLEAN_TYPE_P vectors.
>   * expr.c (expand_expr_real_2) : Handle
>   VECTOR_BOOLEAN_TYPE_P pack where result has the same scalar mode
>   as the operands using vec_pack_sbool_trunc_optab.
>   * tree-vect-stmts.c (supportable_widening_operation): Use
>   vec_unpacks_sbool_{lo,hi}_optab for VECTOR_BOOLEAN_TYPE_P conversions
>   where both wider_vectype and vectype have the same scalar mode.
>   (supportable_narrowing_operation): Similarly use
>   vec_pack_sbool_trunc_optab if narrow_vectype and vectype have the same
>   scalar mode.
>   * config/i386/i386.c (ix86_get_builtin)
>   : Check for non-VECTOR_MODE_P
>   rather than VOIDmode.
>   * config/i386/sse.md (vec_pack_trunc_qi, vec_pack_trunc_):
>   Remove useless ()s around "register_operand", formatting fixes.
>   (vec_pack_sbool_trunc_qi, vec_unpacks_sbool_lo_qi,
>   vec_unpacks_sbool_hi_qi): New expanders.
>   * doc/md.texi (vec_pack_sbool_trunc_M, vec_unpacks_sbool_hi_M,
>   vec_unpacks_sbool_lo_M): Document.
> 
>   * gcc.target/i386/avx512f-pr88513-1.c: New test.
>   * gcc.target/i386/avx512f-pr88513-2.c: New test.
>   * gcc.target/i386/avx512vl-pr88464-1.c: New test.
>   * gcc.target/i386/avx512vl-pr88464-2.c: New test.
>   * gcc.target/i386/avx512vl-pr88464-3.c: New test.
>   * gcc.target/i386/avx512vl-pr88464-4.c: New test.
>   * gcc.target/i386/avx512vl-pr88513-1.c: New test.
>   * gcc.target/i386/avx512vl-pr88513-2.c: New test.
>   * gcc.target/i386/avx512vl-pr88513-3.c: New test.
>   * gcc.target/i386/avx512vl-pr88513-4.c: New test.
>   * gcc.target/i386/avx512vl-pr88514-1.c: New test.
>   * gcc.target/i386/avx512vl-pr88514-2.c: New test.
>   * gcc.target/i386/avx512vl-pr88514-3.c: New test.
> 
> --- gcc/optabs.def.jj 2018-12-14 20:35:37.883126125 +0100
> +++ gcc/optabs.def2018-12-17 15:18:27.817103784 +0100
> @@ -335,6 +335,7 @@ OPTAB_D (vec_pack_sfix_trunc_optab, "vec
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
>  OPTAB_D (vec_pack_ufix_trunc_optab, "vec_pack_ufix_trunc_$a")
> +OPTAB_D (vec_pack_sbool_trunc_optab, "vec_pack_sbool_trunc_$a")
>  OPTAB_D (vec_pack_usat_optab, "vec_pack_usat_$a")
>  OPTAB_D (vec_packs_float_optab, "vec_packs_float_$a")
>  OPTAB_D (vec_packu_float_optab, "vec_packu_float_$a")
> @@ -350,6 +351,8 @@ OPTAB_D (vec_unpacks_float_hi_optab, "ve
>  OPTAB_D (vec_unpacks_float_lo_optab, 

[PR86153] simplify more overflow tests in VRP

2018-12-18 Thread Alexandre Oliva
Jeff, you mentioned you had changes to the VRP overflow test that would
fix this, but I couldn't figure out whether or not you ever put them in
and it regressed again later, or what.  Anyway, here's my take on it.


PR 86153 was originally filed when changes to the C++11's
implementation of vector resize(size_type) limited inlining that were
required for testsuite/g++.dg/pr83239.C to verify that we did not
issue an undesired warning.

That was worked by increasing the limit for inlining, but that in turn
caused the C++98 implementation of vector resize, that is
significantly different, to also be fully inlined, and that happened
to issue the very warnings the test was meant to verify we did NOT
issue.

The reason we issued the warnings was that we failed to optimize out
some parts of _M_fill_insert, used by the C++98 version of vector
resize, although the call of _M_fill_insert was guarded by a test that
could never pass: test testcase only calls resize when the vector size
is >= 3, to decrement the size by two.  The limitation we hit in VRP
was that the compared values could pass as an overflow test, if the
vector size was 0 or 1 (we knew it wasn't), but even with dynamic
ranges we failed to decide that the test result could be determined at
compile time, even though after the test we introduced ASSERT_EXPRs
that required a condition known to be false from earlier ones.

I pondered turning ASSERT_EXPRs that show impossible conditions into
traps, to enable subsequent instructions to be optimized, but I ended
up finding an earlier spot in which an overflow test that would have
introduced the impossible ASSERT_EXPR can have its result deduced from
earlier known ranges and resolved to the other path.

Although such overflow tests could be uniformly simplified to compares
against a constant, the original code would only perform such
simplifications when the test could be resolved to an equality test
against zero.  I've thus avoided introducing compares against other
constants, and instead added code that will only simplify overflow
tests that weren't simplified before when the condition can be
evaluated at compile time.


Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR testsuite/86153
* vr-values.c
(vr_values::vrp_evaluate_conditional_warnv_with_ops): Extend
simplification of overflow tests to cover cases in which we
can determine the result of the comparison.

for  gcc/testsuite/ChangeLog

PR testsuite/86153
* gcc.dg/vrp-overflow-1.c: New.
---
 gcc/testsuite/gcc.dg/vrp-overflow-1.c |  151 +
 gcc/vr-values.c   |   32 +++
 2 files changed, 183 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vrp-overflow-1.c

diff --git a/gcc/testsuite/gcc.dg/vrp-overflow-1.c 
b/gcc/testsuite/gcc.dg/vrp-overflow-1.c
new file mode 100644
index ..8e5794c77b6d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vrp-overflow-1.c
@@ -0,0 +1,151 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-tree-forwprop" } */
+
+extern void __attribute__((noreturn)) unreachable (void);
+
+int fle22 (int a)
+{
+  unsigned i = a / 4;
+  unsigned j = i - 2;
+
+  if (j == 7) /* A dynamic range excludes a value from j for the rest of f1.  
*/
+return -1;
+
+  if (i <= 2) /* This dynamic range cannot be combined or compared with that 
of j.  */
+return 0;
+
+  if (i <= j) /* And so we couldn't compute this result.  */
+unreachable ();
+
+  return 1;
+}
+
+int fle32 (int a)
+{
+  unsigned i = a / 4;
+  unsigned j = i - 3;
+
+  if (j == 7) /* A dynamic range excludes a value from j for the rest of f1.  
*/
+return -1;
+
+  if (i <= 2) /* This dynamic range cannot be combined or compared with that 
of j.  */
+return 0;
+
+  if (i <= j) /* And so we couldn't compute this result.  */
+unreachable ();
+
+  return 1;
+}
+
+int flt22 (int a)
+{
+  unsigned i = a / 4;
+  unsigned j = i - 2;
+
+  if (j == 7)
+return -1;
+
+  if (i <= 2)
+return 0;
+
+  if (i < j)
+unreachable ();
+
+  return 1;
+}
+
+int flt32 (int a)
+{
+  unsigned i = a / 4;
+  unsigned j = i - 3;
+
+  if (j == 7)
+return -1;
+
+  if (i <= 2)
+return 0;
+
+  if (i < j)
+unreachable ();
+
+  return 1;
+}
+
+int fgt22 (int a)
+{
+  unsigned i = a / 4;
+  unsigned j = i + 2;
+
+  if (j == -7)
+return -1;
+
+  if (i >= -3)
+return 0;
+
+  if (i > j)
+unreachable ();
+
+  return 1;
+}
+
+int fgt32 (int a)
+{
+  unsigned i = a / 4;
+  unsigned j = i + 3;
+
+  if (j == -7)
+return -1;
+
+  if (i >= -3)
+return 0;
+
+  if (i > j)
+unreachable ();
+
+  return 1;
+}
+
+int fge22 (int a)
+{
+  unsigned i = a / 4;
+  unsigned j = i + 2;
+
+  if (j == -7)
+return -1;
+
+  if (i >= -3)
+return 0;
+
+  if (i >= j)
+unreachable ();
+
+  return 1;
+}
+
+int fge32 (int a)
+{
+  unsigned i = a / 4;
+  unsigned j = i + 3;
+
+  if (j == -7)
+return -1;
+
+  if (i >= -3)
+  

Re: [PATCH] [MSP430] Fix gcc.dg/pr85180.c and gcc.dg/pr87985.c timeouts for msp430-elf -mlarge

2018-12-18 Thread Jozef Lawrynowicz
On Tue, 18 Dec 2018 03:08:51 -0600
Segher Boessenkool  wrote:

> Hi!
> 
> On Fri, Dec 14, 2018 at 03:22:13PM +, Jozef Lawrynowicz wrote:
> > 2018-12-14  Jozef Lawrynowicz  
> > 
> > gcc/ChangeLog:
> > * combine.c (update_rsp_from_reg_equal): Only look for the nonzero bits
> > of src in nonzero_bits_mode if the mode of src is MODE_INT and
> > HWI_COMPUTABLE.
> > (reg_nonzero_bits_for_combine): Add clarification to comment.  
> 
> Is there some PR this fixes?

No not for this one, I just spotted the timeouts in the GCC testsuite.

> > 
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index 7e61139..c93aaed 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -1698,9 +1698,13 @@ update_rsp_from_reg_equal (reg_stat_type *rsp, 
> > rtx_insn *insn, const_rtx set,
> >/* Don't call nonzero_bits if it cannot change anything.  */
> >if (rsp->nonzero_bits != HOST_WIDE_INT_M1U)
> >  {
> > -  bits = nonzero_bits (src, nonzero_bits_mode);
> > +  machine_mode mode = GET_MODE (x);
> > +  if (GET_MODE_CLASS (mode) == MODE_INT
> > + && HWI_COMPUTABLE_MODE_P (mode))
> > +   mode = nonzero_bits_mode;
> > +  bits = nonzero_bits (src, mode);
> >if (reg_equal && bits)
> > -   bits &= nonzero_bits (reg_equal, nonzero_bits_mode);
> > +   bits &= nonzero_bits (reg_equal, mode);
> >rsp->nonzero_bits |= bits;
> >  }
> >  
> > @@ -10224,6 +10228,7 @@ simplify_and_const_int (rtx x, scalar_int_mode 
> > mode, rtx varop,
> >  
> >  /* Given a REG X of mode XMODE, compute which bits in X can be nonzero.
> > We don't care about bits outside of those defined in MODE.
> > +   We DO care about all the bits in MODE, even if XMODE is smaller than 
> > MODE.
> >  
> > For most X this is simply GET_MODE_MASK (GET_MODE (MODE)), but if X is
> > a shift, AND, or zero_extract, we can do better.  */  
> 
> I think this is okay for trunk, and for backports after waiting a week
> or so for fallout.  Thanks!

Thanks, applied to trunk.

Jozef


Re: [PATCH, rs6000] Clarify when typedef names can be used with AltiVec vector types

2018-12-18 Thread Ulrich Weigand
Bill Schmidt wrote:

> +@item
> +When using vector in keyword-and-predefine mode; for example,
> +
> +@smallexample
> +/* With -maltivec only: */

This is a bit confusing (at least to me).  What does "with -maltivec only"
mean here?  Just adding -maltivec will *not* switch to keyword-and-
predefine mode, as far as I can tell.  Rather, to switch to that mode
you'll have to disable GNU extensions, e.g. via -std=c11, and then
include  to get the predefine.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] cleanup libgomp's coalesce chunk data structures

2018-12-18 Thread Jakub Jelinek
On Tue, Dec 18, 2018 at 10:59:20AM +0100, Thomas Schwinge wrote:
> OK for trunk?
> 
> commit 20d3cbd6e27b10ae1cd352cc177d7697a4a57db0
> Author: Thomas Schwinge 
> Date:   Mon Dec 17 18:26:29 2018 +0100
> 
> Cleanup libgomp's coalesce chunk data structures
> 
> libgomp/
> * target.c (struct gomp_coalesce_chunk): New structure.
> (struct gomp_coalesce_buf): Update the chunks member to use that
> type.  Adjust all users.

Ok, thanks.

Jakub


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-18 Thread Chung-Lin Tang

On 2018/12/17 10:32 PM, Thomas Schwinge wrote:

The reason there are deadlocks from inside the plugin on GOMP_PLUGIN_fatal() is 
when we hold the
struct gomp_device_descr's*device*  lock, which is also acquired when we 
execute atexit device shutdown handlers, hence the deadlock.

I don't think this is the case for the OpenACC entry points that grab at the 
openacc.async.* hooks,

Ah, OK, I see.  (But I thought that method of deadlock had been fixed by
some structural changes, to have plugin functions call the
non-terminating "GOMP_PLUGIN_error" and return some error, instead of
calling "GOMP_PLUGIN_fatal"?  I may be misremembering.  Or, that's
another TODO item for later, separately...  Or, if that's actually the
case, that this has been fixed in the way I described, then should these
functions also be changed accordingly: instead of "GOMP_PLUGIN_fatal"
call "GOMP_PLUGIN_error", and then return an error code?)


You remembered correctly, although...


though I can audit them again if deemed so.

My understanding had been that deadlock may happen if we're inside some
of these async/wait/serialize/synchronize functions, with "async" locked,
then run into an error, then libgomp prepares to abort, and at that time
shuts down the device, which will shut down the asyncqueues
("goacc_fini_asyncqueues"), which will again try to lock "async" -- which
it actually doesn't.  My misunderstanding, I guess?


...at least now, you can see that goacc_fini_asyncqueues() does not attempt to
acquire devicep->openacc.async.lock when doing cleanup.

Come to think of it, that might be a bug there. :P


"If there are two or more host threads executing and sharing the same 
accelerator device,
two asynchronous operations with the same async-value will be enqueued on the same 
activity queue"

Right, but then, in the very next sentence, it goes on to state: "If the
threads are not synchronized with respect to each other, the operations
may be enqueued in either order and therefore may execute on the device
in either order".  So this, and given that:


I actually didn't care much about that next sentence, since it's just stating 
the obvious :)

It also seem to imply that the multiple host threads are enqueuing operations 
to the same async queue, hence further
corroborating that queues are device-wide, not thread.


That said, I recall most (if not all) of the synchronization operations and 
behavior are all
defined to be with respect to operations of the local host thread only, so the 
spec mentioning interaction with
other host threads here may be moot, as there's no way meaningful way to 
synchronize with
the OpenACC activity of other host threads (though correct me if I forgot some 
API)

..., I concluded something must be wrong in the OpenACC 2.6,
2.16.1. "async clause" text, and no such (host-side) inter-thread
synchronization can be expected from OpenACC "async"/"wait".  I've also
already got that on my list of things to clarify with the OpenACC
technical committee, later on.


I just remembered, there does seem to be one area where device vs. thread-wide 
interpretation will be visible:
when using acc_get/set_cuda_stream(). Supposedly, given the specification's 
device-wide queue/stream model,
different host-threads should access the same CUDA stream when using 
acc_get/set_cuda_stream().
This will break if we made async queues to be thread-local.


Also, CUDA streams do not seem to support local-thread-operation-only 
synchronization.
I remember this was an issue in the old implementation inside the nvptx plugin 
as well, and we
had hacks to somewhat alleviate it (i.e. calling streams "single" or "multi" 
threaded)

Right.


Well, another issue that we might want to bring up to the OpenACC committee:)
I agree that if async queues spaces were simply thread-local then things would 
be much simpler.

OK, so you agree with that, good.

And, no problem foreseeable about simply moving the asyncqueues into
"goacc_thread" -- and removing the "async" lock?


I think we should still try to solve the potential deadlock problems, and stay 
close to the current
implementation just for now. We can ask the committee for further guidance 
later.

Chung-Lin


Re: [PATCH] cleanup libgomp's coalesce chunk data structures

2018-12-18 Thread Thomas Schwinge
Hi Jakub!

Julian had a look at this, and now I too (and just fixed some
formatting):

On Wed, 2 May 2018 13:02:09 -0700, Cesar Philippidis  
wrote:
> Libgomp's usage of struct gomp_coalesce_buf is a little confusing. The
> member chunks is an array where the even elements represent the starting
> address of the chunk cache line and the odd elements represent the
> corresponding ending addresses. This patch clarifies the usage of the
> chunks member by introducing a new gomp_coalesce_chunk structure with
> explicit start and end members. Beyond that, there's no functional
> changes to this patch.

Thanks!

> Is it OK for trunk? I tested it against x86_64-linux with nvptx
> acceleration.

OK for trunk?

commit 20d3cbd6e27b10ae1cd352cc177d7697a4a57db0
Author: Thomas Schwinge 
Date:   Mon Dec 17 18:26:29 2018 +0100

Cleanup libgomp's coalesce chunk data structures

libgomp/
* target.c (struct gomp_coalesce_chunk): New structure.
(struct gomp_coalesce_buf): Update the chunks member to use that
type.  Adjust all users.
---
 libgomp/target.c | 52 +++-
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git libgomp/target.c libgomp/target.c
index a62ae2c3e4b3..0b4e0107f75d 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -180,16 +180,22 @@ gomp_device_copy (struct gomp_device_descr *devicep,
 /* Infrastructure for coalescing adjacent or nearly adjacent (in device 
addresses)
host to device memory transfers.  */
 
+struct gomp_coalesce_chunk
+{
+  /* The starting and ending point of a coalesced chunk of memory.  */
+  size_t start, end;
+};
+
 struct gomp_coalesce_buf
 {
   /* Buffer into which gomp_copy_host2dev will memcpy data and from which
  it will be copied to the device.  */
   void *buf;
   struct target_mem_desc *tgt;
-  /* Array with offsets, chunks[2 * i] is the starting offset and
- chunks[2 * i + 1] ending offset relative to tgt->tgt_start device address
+  /* Array with offsets, chunks[i].start is the starting offset and
+ chunks[i].end ending offset relative to tgt->tgt_start device address
  of chunks which are to be copied to buf and later copied to device.  */
-  size_t *chunks;
+  struct gomp_coalesce_chunk *chunks;
   /* Number of chunks in chunks array, or -1 if coalesce buffering should not
  be performed.  */
   long chunk_cnt;
@@ -222,14 +228,14 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, 
size_t start, size_t len)
 {
   if (cbuf->chunk_cnt < 0)
return;
-  if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1])
+  if (start < cbuf->chunks[cbuf->chunk_cnt - 1].end)
{
  cbuf->chunk_cnt = -1;
  return;
}
-  if (start < cbuf->chunks[2 * cbuf->chunk_cnt - 1] + MAX_COALESCE_BUF_GAP)
+  if (start < cbuf->chunks[cbuf->chunk_cnt - 1].end + MAX_COALESCE_BUF_GAP)
{
- cbuf->chunks[2 * cbuf->chunk_cnt - 1] = start + len;
+ cbuf->chunks[cbuf->chunk_cnt - 1].end = start + len;
  cbuf->use_cnt++;
  return;
}
@@ -239,8 +245,8 @@ gomp_coalesce_buf_add (struct gomp_coalesce_buf *cbuf, 
size_t start, size_t len)
   if (cbuf->use_cnt == 1)
cbuf->chunk_cnt--;
 }
-  cbuf->chunks[2 * cbuf->chunk_cnt] = start;
-  cbuf->chunks[2 * cbuf->chunk_cnt + 1] = start + len;
+  cbuf->chunks[cbuf->chunk_cnt].start = start;
+  cbuf->chunks[cbuf->chunk_cnt].end = start + len;
   cbuf->chunk_cnt++;
   cbuf->use_cnt = 1;
 }
@@ -271,20 +277,20 @@ gomp_copy_host2dev (struct gomp_device_descr *devicep,
   if (cbuf)
 {
   uintptr_t doff = (uintptr_t) d - cbuf->tgt->tgt_start;
-  if (doff < cbuf->chunks[2 * cbuf->chunk_cnt - 1])
+  if (doff < cbuf->chunks[cbuf->chunk_cnt - 1].end)
{
  long first = 0;
  long last = cbuf->chunk_cnt - 1;
  while (first <= last)
{
  long middle = (first + last) >> 1;
- if (cbuf->chunks[2 * middle + 1] <= doff)
+ if (cbuf->chunks[middle].end <= doff)
first = middle + 1;
- else if (cbuf->chunks[2 * middle] <= doff)
+ else if (cbuf->chunks[middle].start <= doff)
{
- if (doff + sz > cbuf->chunks[2 * middle + 1])
+ if (doff + sz > cbuf->chunks[middle].end)
gomp_fatal ("internal libgomp cbuf error");
- memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0]),
+ memcpy ((char *) cbuf->buf + (doff - cbuf->chunks[0].start),
  h, sz);
  return;
}
@@ -510,8 +516,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t 
mapnum,
   cbuf.buf = NULL;
   if (mapnum > 1 || pragma_kind == GOMP_MAP_VARS_TARGET)
 {
-  cbuf.chunks
-   = (size_t *) gomp_alloca ((2 * mapnum + 2) * sizeof (size_t));
+  size_t chunks_size = (mapnum + 1) * sizeof (struct 

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-18 Thread Chung-Lin Tang

On 2018/12/17 9:52 PM, Thomas Schwinge wrote:

Hi Chung-Lin!

On Fri, 14 Dec 2018 22:52:44 +0800, Chung-Lin Tang  
wrote:

On 2018/12/14 10:17 PM, Thomas Schwinge wrote:

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
wrote:

--- a/libgomp/oacc-async.c
+++ b/libgomp/oacc-async.c



+attribute_hidden struct goacc_asyncqueue *
+lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+{
+  /* The special value acc_async_noval (-1) maps to the thread-specific
+ default async stream.  */
+  if (async == acc_async_noval)
+async = thr->default_async;
+
+  if (async == acc_async_sync)
+return NULL;
+
+  if (async < 0)
+gomp_fatal ("bad async %d", async);


To make this "resolve" part more obvious, that is, the translation from
the "async" argument to an "asyncqueue" array index:


+  if (!create
+  && (async >= dev->openacc.async.nasyncqueue
+ || !dev->openacc.async.asyncqueue[async]))
+return NULL;
+[...]


..., I propose adding a "async2id" function for that, and then rename all
"asyncqueue[async]" to "asyncqueue[id]".


I don't think this is needed. This is the only place in the entire runtime that
does asyncqueue indexing, adding more conceptual layers of re-directed indexing
seems unneeded.


It makes the code better understandable?  Or, curious, why do you think
that the translation from an OpenACC async-argument to an internal
asyncqueue ID should not be a separate function?


Because the index is (1) not used elsewhere; nor supposed to really, 
lookup_goacc_asyncqueue()
is intended to be the centralized place for looking up async queues.
and (2) the special async number case handling here is really short, creating 
another
conceptual index-redirecting in the code feels like over-engineering.


I do think the more descriptive comments are nice though.




And, this also restores the current trunk behavior, so that
"acc_async_noval" gets its own, separate "asyncqueue".


Is there a reason we need to restore that behavior right now?


Because otherwise that's a functional change ("regression") from the
current GCC trunk behavior, which I wouldn't expect in a re-work.


Okay, but do take note that the acc_get/set_default_async is part of the 
upstreaming too.
The behavior change is due to that new 2.5 functionality, not really because I 
arbitrarily changed things.

Thanks,
Chung-Lin



[PATCH, arm][PR88167] Fix __builtin_return_address returns invalid address

2018-12-18 Thread Mihail Ionescu

Hi All,

In Thumb mode when the function prologue gets expanded, in case of a 
multiple register push, additional mov instructions are generated to 
save the high registers which result in lr getting overwritten before 
it's value can be used to retrieve the return address.


The fix consists of detecting if lr is alive after the prologue, in 
which case, the lr register won't be used as a scratch.


Regression tested on arm-none-eabi.

gcc/ChangeLog:
2018-11-23  Mihail Ionescu  

PR target/88167
* config/arm/arm.c: Add lr liveness check.

gcc/testsuite/ChangeLog
2018-11-23  Mihail Ionescu  

PR target/88167
* gcc.target/arm/pr88167.c: New test.

If everything is ok for trunk, could someone commit it on my behalf?

Best regards,
   Mihail


Re: [patch] Fix bootstrap powerpc*-*-freebsd* targets

2018-12-18 Thread Segher Boessenkool
Hi Alan,

On Tue, Dec 18, 2018 at 10:39:27AM +1030, Alan Modra wrote:
> On Mon, Dec 17, 2018 at 11:05:57AM -0600, Segher Boessenkool wrote:
> > On Mon, Dec 17, 2018 at 10:40:01AM +1030, Alan Modra wrote:
> > > Since I broke powerpc*-freebsd and the other non-linux powerpc
> > > targets, I guess I ought to fix them.  The following is a variation on
> > > your first patch, that results in -mcall-linux for powerpc-freebsd*
> > > providing the 32-bit powerpc-linux dynamic linker.
> > 
> > That, like the first patch, abuses that header file.  Please do it
> > somewhere sane instead, not in a random subtarget file?
> 
> Is there is a better place, currently?  sysv4.h contains a mess of OS
> related defines already, to support various -mcall options.  If those
> stay in sysv4.h I can't see a better place for the fall-back
> GNU_USER_DYNAMIC_LINKER define.

I was hoping you would untangle it a bit.  My dastardly plan failed,
apparently.  Drat.

Should anything use GNU_USER_DYNAMIC_LINKER if it isn't defined?  Maybe
it is better if the use in sysv4.h had an #ifdef around it?  For all the
other uses it should be always defined.

Or maybe we should have a linux32.h as well?


Segher


Re: [PATCH, rs6000] Clarify when typedef names can be used with AltiVec vector types

2018-12-18 Thread Segher Boessenkool
Hi Bill,

On Mon, Dec 17, 2018 at 03:54:23PM -0600, Bill Schmidt wrote:
> We recently discovered some incorrect documentation about this topic and 
> agreed it should be changed.
> This is my attempt to clarify it.  Built and verified on 
> powerpc64le-linux-gnu.  Is this ok for trunk?

Okay for trunk and all backports you want.  Thanks!


Segher


> 2018-12-17  Bill Schmidt  
> 
>   * doc/extend.texi (PowerPC Altivec/VSX Built-in Functions):
>   Describe when a typedef name can be used as the type specifier for
>   a vector type, and when it cannot.


Re: [PATCH] [MSP430] Fix gcc.dg/pr85180.c and gcc.dg/pr87985.c timeouts for msp430-elf -mlarge

2018-12-18 Thread Segher Boessenkool
Hi!

On Fri, Dec 14, 2018 at 03:22:13PM +, Jozef Lawrynowicz wrote:
> 2018-12-14  Jozef Lawrynowicz  
> 
>   gcc/ChangeLog:
>   * combine.c (update_rsp_from_reg_equal): Only look for the nonzero bits
>   of src in nonzero_bits_mode if the mode of src is MODE_INT and
>   HWI_COMPUTABLE.
>   (reg_nonzero_bits_for_combine): Add clarification to comment.

Is there some PR this fixes?

> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 7e61139..c93aaed 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -1698,9 +1698,13 @@ update_rsp_from_reg_equal (reg_stat_type *rsp, 
> rtx_insn *insn, const_rtx set,
>/* Don't call nonzero_bits if it cannot change anything.  */
>if (rsp->nonzero_bits != HOST_WIDE_INT_M1U)
>  {
> -  bits = nonzero_bits (src, nonzero_bits_mode);
> +  machine_mode mode = GET_MODE (x);
> +  if (GET_MODE_CLASS (mode) == MODE_INT
> +   && HWI_COMPUTABLE_MODE_P (mode))
> + mode = nonzero_bits_mode;
> +  bits = nonzero_bits (src, mode);
>if (reg_equal && bits)
> - bits &= nonzero_bits (reg_equal, nonzero_bits_mode);
> + bits &= nonzero_bits (reg_equal, mode);
>rsp->nonzero_bits |= bits;
>  }
>  
> @@ -10224,6 +10228,7 @@ simplify_and_const_int (rtx x, scalar_int_mode mode, 
> rtx varop,
>  
>  /* Given a REG X of mode XMODE, compute which bits in X can be nonzero.
> We don't care about bits outside of those defined in MODE.
> +   We DO care about all the bits in MODE, even if XMODE is smaller than MODE.
>  
> For most X this is simply GET_MODE_MASK (GET_MODE (MODE)), but if X is
> a shift, AND, or zero_extract, we can do better.  */

I think this is okay for trunk, and for backports after waiting a week
or so for fallout.  Thanks!


Segher


Re: [PATCH] Fix AVX512VL gather ICEs (PR target/88513, PR target/88514)

2018-12-18 Thread Jakub Jelinek
On Tue, Dec 18, 2018 at 08:25:37AM +0100, Uros Bizjak wrote:
> > : Check for non-VECTOR_MODE_P
> > rather than VOIDmode.
> 
> This entry doesn't match the change, you are checking for
> VECTOR_MODE_P. On a related note, should similar

Ok, I'll write: Check for VECTOR_MODE_P rather than non-VOIDmode.

> IX86_BUILTIN_GATHER3ALTDIV16{SF,SI} builtins be changed in the same
> way?

No, because IX86_BUILTIN_GATHER3ALTDIV16S{F,I} has always the HImode
for the mask argument (sometimes CONST_INT with VOIDmode) and wants QImode.

The reason we need the change in this patch is that we have the same case
handling:
case IX86_BUILTIN_GATHERALTDIV8SF:
case IX86_BUILTIN_GATHERALTDIV8SI:
which have V8S[IF]mode for the mask argument and we want to do what is
inside of the if where the patch changes the guard, and
case IX86_BUILTIN_GATHER3ALTDIV8SF:
case IX86_BUILTIN_GATHER3ALTDIV8SI:
which have QImode mask argument (sometimes CONST_INT with VOIDmode) and
wants QImode (just cares about the low 4 bits of it rather than all 8 bits
it was passed in).  We could use a kand, but the instruction ignores the
argument and we'd need to load the mask immediate 15 into a register, copy
it into a mask register and then kand.

Jakub